Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional requirements? #793

Closed
jsw-fnal opened this issue Jul 1, 2014 · 24 comments
Closed

Optional requirements? #793

jsw-fnal opened this issue Jul 1, 2014 · 24 comments
Labels
locked [bot] locked due to inactivity

Comments

@jsw-fnal
Copy link

jsw-fnal commented Jul 1, 2014

Sounds like an oxymoron. I started thinking about this just now when I did conda install matplotlib. I'm planning to use matplotlib exclusively in the IPython notebook, so I won't be using the Qt backend for matplotlib. However, Qt is installed because it is a dependency of matplotlib. Since Qt is so large, downloading the package took the bulk of the total install time.

Is there a way to mark that some dependencies will only be needed by some users? I'm not sure I can think of a good way to handle this sort of situation. Maybe a separate package called matplotlib-Qtbackend would be sufficient, but then some users will be quite surprised to find that installing matplotlib and Qt is not sufficient to get them a Qt backend. What if the matplotlib and Qt packages both knew to look for the other when they were installed, and if both are present, to install the backend package?

I'm just brainstorming.

@asmeurer
Copy link
Contributor

asmeurer commented Jul 1, 2014

What would the backend package be beyond matplotlib and qt?

@jsw-fnal
Copy link
Author

jsw-fnal commented Jul 1, 2014

Hmm. Good question. I guess it wouldn't be anything. Still, my gut says that I shouldn't wait for Qt to install if I'm never going to use Qt. Nor should it take up space on my disks if it won't be used. I'm just not sure how to accomplish that without breaking things for people who expect Qt to be installed with matplotlib.

@jsw-fnal
Copy link
Author

jsw-fnal commented Jul 1, 2014

I'm also not sure whether this is a conda issue or an issue for the matplotlib conda package. But I'm guessing that some possible solutions include changes to conda, so I put it here.

@jsw-fnal
Copy link
Author

jsw-fnal commented Jul 1, 2014

Another example: IPython works without the notebook stuff installed, so ipython-notebook is a separate package. And of course the notebook works fine without nbconvert. But, nbconvert needs pandoc and pygments and maybe a few other things, which are not installed as dependencies of any ipython* package. So if you want nbconvert, you have to know to get pandoc and pygments. Those could be added to the ipython-notebook package, but given how troublesome pandoc is, that's probably a poor plan. You could make an ipython-nbconvert package, but there wouldn't be any actual code in it, only a couple of dependencies.

@asmeurer
Copy link
Contributor

asmeurer commented Jul 2, 2014

What you are suggesting is what we call metapackages (a package with no code, only metadata). The ipython-notebook is a metapackage. It only exists for the dependencies (and for the "app" entry point for the launcher).

@asmeurer
Copy link
Contributor

asmeurer commented Jul 2, 2014

I'm personally a fan of using metapackages to solve these kinds of problems. They are already well supported, and you can actually do non-trivial things by specifying metadata in metapackages and passing off to the SAT solver. In my opinion, the only thing that's still fundamentally missing from the package metadata spec is conflicting packages.

Optional dependencies are something that I've thought about before, but I've never been clear how they would actually work. The SAT dependency solver in conda tries to pick the minimal number of packages to install, meaning any optional dependency would not be installed, unless it would be already installed anyway. I guess you could require that if it is installed that it be a certain version, although that also requires #634 to be useful.

There was an internal suggestion for a "provides" feature, as a way to deal with things like spyder that can depend on either pyqt or pyside. I quote my response below:

To answer the question, conda does not have any notion of optional
dependencies or logical operations.

It is possible to do it actually by creating a metapackage, one
version of which depends on pyqt, and the other of which depends on
pyside (the newer version would depend on pyqt, to indicate that it is
preferred). Then spyder could depend on this metapackage, without an
explicit version. Conda will then pick whatever version of that
metapackage it can, preferring the new one, and hence pyqt or pyside.

I think such a metapackage would really be the "provides" you are
looking for. I think there is enough logic in the version
specification now that you can do almost anything you'd want with
this.

[NB: Except for the conflicting concept I mentioned above]

A downside to this is that you have to version things, meaning you
have to always pick which option is "better" (has the newer version).
But then again, conda will have to pick one of the options at some
point anyway, so it's better if you make the decision for it, rather
than letting it be arbitrary.

The idea here was to provide a way to install spyder in environments where pyqt is disallowed due to licensing restrictions.

@cdeil
Copy link
Contributor

cdeil commented Aug 27, 2015

I have a conda package that depends only on a few things, but I want to package it so that users get some important optional dependencies by default as well.

In Debian this is possible with the "recommends" list
https://www.debian.org/doc/manuals/debian-faq/ch-pkg_basics.en.html#s-depends
and the user has to explicitly say --no-install-recommends to deselect those.

@asmeurer – You're saying this should be done by creating a separate conda metapackage?
I.e. in my case I'd create a meta-package gammapy-all that includes gammapy as well as optional dependencies like scipy, matplotlib, ...?

@cdeil
Copy link
Contributor

cdeil commented Aug 27, 2015

pip also has this feature ... there it's called "extras":
http://pythonhosted.org/setuptools/setuptools.html#declaring-extras-optional-features-with-their-own-dependencies

If the recommended solution for this remains to create meta-packages, can this be added to the conda docs? (I couldn't find something useful with Google)

@asmeurer
Copy link
Contributor

Look at, for instance, how blaze vs. blaze-core work (conda info blaze/conda info blaze-core).

@janssen
Copy link

janssen commented Jan 23, 2016

It would be great to have the matplotllib recipe drop the dependency on Qt. I use matplotlib purely with the agg backend, and I use it with Kivy. Neither use of it requires Qt, but I still get Qt when I specify matplotlib as a dependency in conda recipes. I have to go through and delete all the Qt stuff after the env is built. I think there should be a 'matplotlib' package, and a 'matplotlib-with-Qt' metapackage.

@dhirschfeld
Copy link
Contributor

I think there's a reasonable overhead in specifying a new yaml and building a new package for each set of optional dependencies - yes it can work but I think it would be much easier to specify in the same yaml as the required dependencies IMHO.

My suggestion would be to allow optional dependencies to simply be labelled sections under the build or run requirements. The labelled sections then specify the dependencies for that label - e.g. for the hypothetical meta.yaml below:

package:
  name: mypackage
  version: 1.0.0

  requirements:
    build:
      - python

    run:
      - python
      - numpy

      pyqt:
        - pyqt

      docs:
        - sphinx
        - numpydoc >= 0.5.0

      test:
        - nose
        - mock

Only python and numpy would get installed with:

conda install mypackage

to install all the optional deps you would use

conda install mypackage[all]

...which would be equivalent to

conda install mypackage[pyqt, docs, test]

To install mypackage with just the pyqt dependency would be

conda install mypackage[pyqt]

Conveniently this would also solve my own issue (#1665) where I want to be able to independently install the test deps.

@msarahan
Copy link
Contributor

Fwiw, I really like this syntax.

@mcg1969
Copy link
Contributor

mcg1969 commented Jan 25, 2016

I think something like this is a good idea. However, it's important to note that it depends on the underlying package being able to gracefully handle the different combinations of dependencies. It's also similar in execution to our with_features_depends system.

@jankatins
Copy link

IMO, this syntax conda install mypackage[all] is not a good idea as it is very python specific. It also doesn't address all issues:

  • You need to be able to add files to specific packages (matplotlib might build a qt backed, but this backend should be in a different package).
  • The above example assumes that the package is build at install time (at least if sphinx is used as a requirement for the docs package), but that doesn't work -> it needs to be installed for the build and the docs probably do not have any run requirements.

I would find it better if there is an additional way to add binary packages, which can take specific files and the rest is taken by the main package. Like:

package:
  name: mypackage
  version: 1.0.0

  requirements:
    build:
     # build requirements are for all packages...
      - python
      - .... all the rest, including the qt dependencies...

    run:
      - python
      - numpy

binary-package:
  name: mypackage-pyqt
  run-requirements:
       - pyqt
       - matplotlib {PACKAGE_VERSION} # replaced by the complete version of this package
  files:
        include: 
          - pyqt/*.*
        exclude:
          - pyqt/README.md

binary-package:
  name: mypackage-docs
  run-requirements:
       - matplotlib {PACKAGE_VERSION} # replaced by the complete version of this package
  files:
        include: 
          - docs/*.*
        exclude:
          - docs/README.md

binary-package:
  name: mypackage-tests
  run-requirements:
       - nose
       - mock
       - matplotlib {PACKAGE_VERSION} # replaced by the complete version of this package
  files:
        include: 
          - src/matplotlib/tests

This would build 4 packages: mypackage-tests, mypackage-docs, mypackage-pyqt and mypackage. Each package can be installed as a normal package... In this case, the three additional packages depend on the exact version of main package, so that updates to e.g mypackage-pyqt will also update the main package and keep them in sync.

See also the debian dir for the matplotlib debian package, which works similar, only the above info is split across multiple files: https://anonscm.debian.org/cgit/python-modules/packages/matplotlib.git/tree/debian

  • control defines the packages and their dependencies
  • *.install tells the build process, which files belong to which package

@mcg1969
Copy link
Contributor

mcg1969 commented Jan 25, 2016

Upon further reflection, it seems to me that the "package feature" idea really can't be fully executed without some significant improvements to conda's internals. In particular, this will require some sort of persistent storage to save which features the user has installed, and conda currently doesn't save details like this. In fact, conda really doesn't save data about the current environment beyond the list of packages installed. We have some good ideas about what we could do if we did give environments a database of metadata to play with; this would be yet another thing.

@JanSchulz's idea looks more feasible, because it relies on the standard package/dependency mechanism.

In the meanwhile, I do think that a lot of the work here can be accomplished on the package side (e.g., without conda changes). For instance, conda packages should nominally specify their minimal set of dependencies. If there is functionality that requires additional packages, that can be offered in the documentation, and the code should be written in such a manner as to detect the presence of those dependencies and fail gracefully if they are absent.

@jankatins
Copy link

@mcg1969 There are two sides: IMO the only changes needed for my proposal would be on the conda-build side: instead of one package produce multiple. Conda itself wouldn't need any changes.

But then a user would need to do the work, e.g. selecting the right backend. To make that easier things like alternatives (matplotlib depends on matplotlib-backend | matplotlib-backend-qt and each backend provides the matplotlib-backend package and if none is already installed, conda installs the one after | ) and enhances/ recommends (depending what the user put in their config, a recommended package gets installed or not; removing a recommended packages is possible) in addition to depends relationships are needed...

@jankatins
Copy link

See also #1696

@jbednar
Copy link

jbednar commented Feb 5, 2016

For instance, conda packages should nominally specify their minimal set of dependencies. If there is functionality that requires additional packages, that can be offered in the documentation, and the code should be written in such a manner as to detect the presence of those dependencies and fail gracefully if they are absent.

I agree with the principle here, but I don't think it really works in practice. I think there are a lot of cases where a package is not strictly a dependency, yet it's something that will be wanted by 80% or 90% of users of another package. Asking each one of those users to discover and install the recommended packages separately is a big burden on them, and requires a lot of human communication via documentation at a time when people are not likely to be reading it (when they are first trying things out and getting started, not diving deep). What will most likely happen in those cases is that the "recommended" packages just won't get installed, and then users will miss out on features that most people will want. Yet if those recommended packages are listed as hard dependencies, then there is no way that the software can be installed without them, which can be a major problem for some subset of users (when there are licensing issues, as for Qt with matplotlib, when e.g. some packages aren't available on certain platforms, due to lack of binaries, or just because the dependencies are really big). NumPy, for instance, can be installed with no dependencies, yet most users will want the associated numerical libraries, and if it's installed without them they may give up on NumPy because they think it's too slow. So it would be really great if recommended packages could be installed by default, unless overridden explicitly to obtain a minimal installation.

Yes, you can always solve this problem with metapackages, but that's a very heavyweight solution, multiplying the number of packages greatly. I do think it's the right approach for Matplotlib, which is already splitting into separate packages to avoid requiring Qt, but it seems like quite a burden on package maintainers in general, with the predictable result that they'll usually keep the 80% of users happy while making life very painful for the other 20%.

@jbednar
Copy link

jbednar commented Jul 18, 2016

Has there been any progress on this issue? We remain very frustrated when trying to make conda packages, being unable to specify optional packages without having them become strict dependencies!

@maartenbreddels
Copy link

As an intermediate option, I'd love to see sth like

conda install matplotlob --skip-dep=pyqt

I like that by default all batteries are included, but for people that know what they are doing, such an option would be useful for conda and pip. There does exists a --no-deps option, but that excludes everything.

@kalefranz
Copy link
Contributor

Part of conda 4.4

@jbednar
Copy link

jbednar commented Apr 27, 2017

As conda 4.4 has not been released, is there some PR you can point to (or other documentation) explaining what's been implemented?

@kalefranz
Copy link
Contributor

The PR for conda is #4982. Work now needs to be for conda-build and documentation.

I've created an issue in conda-build to track progress there. conda/conda-build#1964

@github-actions
Copy link

Hi there, thank you for your contribution to Conda!

This issue has been automatically locked since it has not had recent activity after it was closed.

Please open a new issue if needed.

@github-actions github-actions bot added the locked [bot] locked due to inactivity label Oct 31, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
locked [bot] locked due to inactivity
Projects
None yet
Development

No branches or pull requests