-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conda solver slowdown FAQ and recommendations #13774
Comments
Thanks for this. I'm working on a similar post for Anaconda. I'll link to this and also copy some of the suggestions here. |
Björn and all; Is this something we want to host/do in a standard way? It might be useful for other projects as well and also would be nice to have a standard set of URLs for these channels. I'd be interested to hear if anyone else has explored this yet. |
We were discussing setting up a meta channel where only the last 1-2 years of packages would be included. In theory that could be setup for things like bcbio as well, though I expect that'd be done on the bcbio side (presumably after a convenient "step by step guide" was put together). Are you using environment yaml files in bcbio to install things? I've been using them in snakePipes and it's generally worked pretty well for our rather complicated environments. |
@chapmanb please let us know which exact command is slow and what means slow :) Btw. are you not using environment.yaml files in bcbio? This should be super fast as the solver will not be stressed much. |
According to bioconda/bioconda-recipes#13774 this should speed up conda resolutions
Hello @bioconda/core, Is there any statistics about the popularity and usage level of the R packages, e.g. the ratio of bioconda recipes that depend on the R packages or number of downloads? Usually simple solutions win, like the Conda itself! Wouldn't it work to simply(?) split and clone the latest version of the recipes into 2-3 new spawned channels like From my solo experience, bioconda has been much more about the python scripts&libs, pre-compiled C++ tools and also some perl dependencies. Also I guess hardcore R devs typically prefer sticking to R's specific packaging solutions. |
After no resolution in an overnight conda solving fiasco, I simply removed conda-forge channel from my conda config and solved in under three minutes, creating a new environment with a package from bioconda. |
Can you tell us the contents of the environment? BioConda uses conda-forge for a great many dependencies.
…Sent from my iPhone
On 4. Mar 2019, at 23:37, Jake VanCampen ***@***.***> wrote:
After no resolution in an overnight conda solving fiasco, I simply removed conda-forge channel from my conda config and solved in under three minutes, creating a new environment with a package from bioconda.
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
With conda-forge in my channels and the following config, the solver was still running overnight:
|
@jakevc following the suggestions from above the following works on seconds for me: |
Make sure to put conda-forge before BioConda in your channel list.
…Sent from my iPhone
On 5. Mar 2019, at 08:35, Björn Grüning ***@***.***> wrote:
@jakevc following the suggestions from above the following works on seconds for me: conda create -n cnvkit-r cnvkit r-base=3.5.1
—
You are receiving this because you are on a team that was mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@jakevc the recommended channels order changed a few months back when using bioconda and conda-forge, could you reorder your channels as below? channels:
- conda-forge
- bioconda
- defaults |
The recommended channels order did make everything more responsive:
|
Relevant: conda/conda#7239 and conda/conda#7700. |
Is there some documentation other than the "Medium" post about how to use metachannel? Looking at the command lines in the post, it seems hard to use. I'm suffering not only from lengthy install times, but the startup time on my WSL box is extremely long, like 30-50s. |
@abalter Hello, I often do this. function condain
conda install --override-channels -c https://metachannel.conda-forge.org/defaults,alienzj,pytorch,bioconda,conda-forge/$argv,--max-build-no "$argv"
end define a condain function in your shell (like fish shell)
you will install bwa soon. |
mamba also does a great job of speeding up the solving step. |
@mvdbeek good to know, maybe a good lead. But maybe worth mentioning the Beta status. |
Give conda 4.7 a try. https://www.anaconda.com/how-we-made-conda-faster-4-7/ |
This is best practice as described here: bioconda/bioconda-recipes#13774 The following command: conda install -c conda-forge -c bioconda 'changeo=0.4.4' 'biopython=1.72' 'python=3.7.1' 'r-data.table=1.11.4' 'bash=4.4.18' 'xlrd=1.2.0' 'r-reshape2=1.4.3' 'r-seqinr=3.4_5' 'r-scales=0.5.0' 'tar=1.34' 'file=5.39' 'r-ggplot2=3.0.0' 'unzip=6.0' -p /usr/local --copy --yes --strict-channel-priority Finishes in 10-15 seconds with strict channel priority. Never finishes (at least not within 3 hours) without it.
This is best practice as described here: bioconda/bioconda-recipes#13774 The following command: conda install -c conda-forge -c bioconda 'changeo=0.4.4' 'biopython=1.72' 'python=3.7.1' 'r-data.table=1.11.4' 'bash=4.4.18' 'xlrd=1.2.0' 'r-reshape2=1.4.3' 'r-seqinr=3.4_5' 'r-scales=0.5.0' 'tar=1.34' 'file=5.39' 'r-ggplot2=3.0.0' 'unzip=6.0' -p /usr/local --copy --yes --strict-channel-priority Finishes in 10-15 seconds with strict channel priority. Never finishes (at least not within 3 hours) without it.
are there any plans moving the bioconda build-system to mamba, like conda forge did? |
Hi all,
this issue is intended keep the community up-to-date about the recent state of the conda solver, how you can improve things, and what we are working on to make it better.
What is the problem?
Conda currently uses an SAT (boolean satisfiability) solver to figure out the correct, and hopefully working, set of packages required to construct a functional environment. This means downloading the package index, cutting down the search-space, iterating the graph, inspecting the pinnings and so on.
Conda/Bioconda is special in that we have 1000s of Python and R packages. Recently, we’ve begun adding entire Bioconductor releases, with thousands of packages. Conda supports mixed environments, like Python+R+Perl, and does not remove old packages from the index. On the one hand, this enables reproducibility in the future (Need an old version of an R package or deepTools? No problem.), on the other hand it results in an incredibly large search space for the dependency solver to traverse. So in contrast to other package managers, Conda is constantly growing and we are currently not cutting out dead wood.
So we do face a special situation in Conda. Please take this into account when considering Conda’s performance. Yes, Conda is slow and will probably never be as fast as other package managers because Conda is vastly larger and supports scientific use-cases that others do not support.
However, we are aware of this and multiple people are working on it. See our tips below.
How to improve solver performance
Conda is especially slow if R is involved. This has historical reasons, as most of the packages are in all 3 supported channels (anaconda, conda-forge, bioconda). This was our fault. However, things should improve dramatically if you install the latest version available, e.g. bioconductor-deseq2=1.22.1. We’ve learned from past issues and now pin to one particular R version. However, old packages are still around for the sake of reproducibility.
Use pins, install packages with versions. Even
conda create -n foo python=3 deeptools
will help. You will magically solve all your R envs by simply addingr-base=3.5.1
to your package install list.Recommendations
A few recommendations, especially for environments with R inside:
pycryptosat
* solver (https://www.anaconda.com/conda-4-6-release …)--strict-channel-priority
conda install
useconda create
*
Different people from the community are trying to improve the solver or using different strategies to improve the situation. This is, and probably always be, a work in progress. Conda will grow and Anaconda and the community will improve things as we go.cutting down the search space
Please have a look at https://github.com/regro/conda-metachannel. Conda Metachannels are work in progress but will allow users to specify the portion of the graph they care about upfront. It is very rare that users will actually need ALL of the packages in bioconda/conda-forge. Think about it like a constrained channel, only a specific set of your packages appear in this special channel. All others are not available, so you can not recreate a 3 years old environment with this channel. However, if you have this use case you can just switch back to the normal channels.
Maybe we should have this at some point for our community. The idea could be, having all recent (~2 years) packages in this space but all others still available to reproduce old envs. Start a discussion!
Bioconda is prepared
Very early on we recognised the special challenges that Conda is trying to face and we are prepared for the special use-case of long-term reproducibility - BioContainers. The containers are frozen sets of conda environments. A BioContainer is created for every Bioconda package, but you can also create your own. https://usegalaxy.eu is maintaining 1034 environments currently using BioContainers and it works well in that demanding environment.
Read more about this in our manuscript.
I recommend BioContainers for static/reproducible environments. For flexible environments we could use a metachannel in the future if we want to maintain this.
That said, I use conda on a daily basis and with the above recommendation I do not need a metachannel, as the normal conda solver is fast enough for me. However, I believe the conda community is prepared for the future.
Feedback
We would like to get feedback, benchmarks and examples do help us. What does slow mean? Considering what Conda is doing for you behind the scenes, is 30s or a minute really slow? Please provide numbers and the exact installation command.
Last but not least I would like to thank the conda-forge team, Anaconda and the
@bioconda/core team that are constantly working on all the packages and trying to keep things fast and reliable even with 100k packages.
The text was updated successfully, but these errors were encountered: