Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Conda solver slowdown FAQ and recommendations #13774
this issue is intended keep the community up-to-date about the recent state of the conda solver, how you can improve things, and what we are working on to make it better.
What is the problem?
Conda currently uses an SAT (boolean satisfiability) solver to figure out the correct, and hopefully working, set of packages required to construct a functional environment. This means downloading the package index, cutting down the search-space, iterating the graph, inspecting the pinnings and so on.
Conda/Bioconda is special in that we have 1000s of Python and R packages. Recently, we’ve begun adding entire Bioconductor releases, with thousands of packages. Conda supports mixed environments, like Python+R+Perl, and does not remove old packages from the index. On the one hand, this enables reproducibility in the future (Need an old version of an R package or deepTools? No problem.), on the other hand it results in an incredibly large search space for the dependency solver to traverse. So in contrast to other package managers, Conda is constantly growing and we are currently not cutting out dead wood.
So we do face a special situation in Conda. Please take this into account when considering Conda’s performance. Yes, Conda is slow and will probably never be as fast as other package managers because Conda is vastly larger and supports scientific use-cases that others do not support.
How to improve solver performance
Conda is especially slow if R is involved. This has historical reasons, as most of the packages are in all 3 supported channels (anaconda, conda-forge, bioconda). This was our fault. However, things should improve dramatically if you install the latest version available, e.g. bioconductor-deseq2=1.22.1. We’ve learned from past issues and now pin to one particular R version. However, old packages are still around for the sake of reproducibility.
Use pins, install packages with versions. Even
A few recommendations, especially for environments with R inside:
conda install pycryptosat conda config --set sat_solver pycryptosat
conda config --set channel_priority strict
cutting down the search space
Please have a look at https://github.com/regro/conda-metachannel. Conda Metachannels are work in progress but will allow users to specify the portion of the graph they care about upfront. It is very rare that users will actually need ALL of the packages in bioconda/conda-forge. Think about it like a constrained channel, only a specific set of your packages appear in this special channel. All others are not available, so you can not recreate a 3 years old environment with this channel. However, if you have this use case you can just switch back to the normal channels.
Maybe we should have this at some point for our community. The idea could be, having all recent (~2 years) packages in this space but all others still available to reproduce old envs. Start a discussion!
Bioconda is prepared
Very early on we recognised the special challenges that Conda is trying to face and we are prepared for the special use-case of long-term reproducibility - BioContainers. The containers are frozen sets of conda environments. A BioContainer is created for every Bioconda package, but you can also create your own. https://usegalaxy.eu is maintaining 1034 environments currently using BioContainers and it works well in that demanding environment.
I recommend BioContainers for static/reproducible environments. For flexible environments we could use a metachannel in the future if we want to maintain this.
That said, I use conda on a daily basis and with the above recommendation I do not need a metachannel, as the normal conda solver is fast enough for me. However, I believe the conda community is prepared for the future.
We would like to get feedback, benchmarks and examples do help us. What does slow mean? Considering what Conda is doing for you behind the scenes, is 30s or a minute really slow? Please provide numbers and the exact installation command.
Last but not least I would like to thank the conda-forge team, Anaconda and the
Björn and all;
Is this something we want to host/do in a standard way? It might be useful for other projects as well and also would be nice to have a standard set of URLs for these channels. I'd be interested to hear if anyone else has explored this yet.
We were discussing setting up a meta channel where only the last 1-2 years of packages would be included. In theory that could be setup for things like bcbio as well, though I expect that'd be done on the bcbio side (presumably after a convenient "step by step guide" was put together).
Are you using environment yaml files in bcbio to install things? I've been using them in snakePipes and it's generally worked pretty well for our rather complicated environments.
@chapmanb please let us know which exact command is slow and what means slow :)
Btw. are you not using environment.yaml files in bcbio? This should be super fast as the solver will not be stressed much.
added a commit
Feb 27, 2019
referenced this issue
Feb 27, 2019
Is there any statistics about the popularity and usage level of the R packages, e.g. the ratio of bioconda recipes that depend on the R packages or number of downloads?
Usually simple solutions win, like the Conda itself! Wouldn't it work to simply(?) split and clone the latest version of the recipes into 2-3 new spawned channels like
From my solo experience, bioconda has been much more about the python scripts&libs, pre-compiled C++ tools and also some perl dependencies. Also I guess hardcore R devs typically prefer sticking to R's specific packaging solutions.
added a commit
Mar 1, 2019
Can you tell us the contents of the environment? BioConda uses conda-forge for a great many dependencies.…
Sent from my iPhone
On 4. Mar 2019, at 23:37, Jake VanCampen ***@***.***> wrote: After no resolution in an overnight conda solving fiasco, I simply removed conda-forge channel from my conda config and solved in under three minutes, creating a new environment with a package from bioconda. — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
With conda-forge in my channels and the following config, the solver was still running overnight:
Make sure to put conda-forge before BioConda in your channel list.…
Sent from my iPhone
On 5. Mar 2019, at 08:35, Björn Grüning ***@***.***> wrote: @jakevc following the suggestions from above the following works on seconds for me: conda create -n cnvkit-r cnvkit r-base=3.5.1 — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
The recommended channels order did make everything more responsive: