Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advice for how to cache the conda installation? #13

Closed
dhimmel opened this issue Feb 29, 2020 · 13 comments
Closed

Advice for how to cache the conda installation? #13

dhimmel opened this issue Feb 29, 2020 · 13 comments
Assignees

Comments

@dhimmel
Copy link

dhimmel commented Feb 29, 2020

We use this action for Manubot, see workflow and example build. It's great, thanks for making it! Our current configuration is:

      - name: Install Environment
        uses: goanpeca/setup-miniconda@v1
        with:
          activate-environment: manubot
          environment-file: build/environment.yml
          auto-activate-base: false
          miniconda-version: 'latest'

In manubot/rootstock#316, I'm looking into using actions/cache to cache a directory of data created during action runtime.

However, the slowest part of our build is often installing the conda environment, which currently takes about 100 seconds of the 200 second total time. So I was wondering if the conda environment would be cacheable.

For example what about the following:

      - name: Cache Conda
        id: cache-conda
        uses: actions/cache@v1
        with:
          path: WHAT_GOES_HERE
          key: conda-cache-${{ hashFiles('build/environment.yml') }}
      - name: Some conditional set here?
        if: steps.cache-conda.outputs.cache-hit != 'true'
        uses: goanpeca/setup-miniconda@v1

Basically our goal is use a pre-existing conda installation unless our environment.yml specification is changed. @goanpeca does that sound like a good idea? Any advice on what directory we should cache and how to configure this action to use a cached directory?

@goanpeca
Copy link
Member

goanpeca commented Mar 1, 2020

If you do not use the ‘latest’ but just do not define that key this action will use the bundled conda in the container which is good enough for most use cases. I will look into caching and document avaliable env variables so you can use it as needed.

Will need a couple of days though!

Thanks for making this action better and the kind words

@goanpeca goanpeca self-assigned this Mar 1, 2020
@dhimmel
Copy link
Author

dhimmel commented Mar 2, 2020

If you do not use the ‘latest’ but just do not define that key this action will use the bundled conda in the container which is good enough for most use cases

Good to know!

Will need a couple of days though!

No rush! Thanks for looking into this.

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

@dhimmel just to check, did the situation improve with

      - name: Install Environment
        uses: goanpeca/setup-miniconda@v1
        with:
          activate-environment: manubot
          environment-file: build/environment.yml
          auto-activate-base: false

?

@dhimmel
Copy link
Author

dhimmel commented Mar 3, 2020

Removed miniconda-version in manubot/rootstock@544458a. Conda installation times don't seem to decrease. Build time went from 1:24 to 1:40 (but probably not due to external variation).

I don't think that installing miniconda is actually taking a large amount of the time. Most of the time is being consumed by having to resolve the dependencies when installing the environment.yml. Therefore, I'm hoping for a solution that only needs to resolve dependencies when the actual environment spec changes.

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

I could add beta support for using mamba instead of conda 🤷‍♂

https://github.com/QuantStack/mamba

?

@dhimmel
Copy link
Author

dhimmel commented Mar 3, 2020

Not sure about mamba. Not sure how likely it would be to behave differently than conda and cause headaches.

I was thinking more along the lines of the solution in this CI config. Are there directories we can cache that would cache the package download or dependency files / resolution steps?

Another example at theochem/cgrid#8 (comment)

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

Are there directories we can cache that would cache the package download or dependency files / resolution steps?

Yes and no. Github imposes size limits on the cache you can store so if your dev environment requires too many packages it will fail.

In the meantime, could you try running conda config --showon your CI and cache whatever you consider appropriate

These values will change for different OSes, but those are the directories that store the downloaded packages.

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

Aslo conda solver is becoming horribly slow for some time now (years?) so that is why mamba exists.

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

Also if you are not testing for different python versions (or different packages versions) and always run the same thing then you could try to cache the pkgs_dirs entry in the config

@dhimmel
Copy link
Author

dhimmel commented Mar 3, 2020

Take a look at manubot/rootstock#319. It seems like it does save above 40 seconds by caching the pkg_dir at /usr/share/miniconda/envs/manubot.

According to the build log, the cache size is ~193 MB. So quite hefty but still fits. Now I wonder whether there is a way to cache more to cut down on the remaining time (which I think is probably dependency management).

Also if you are not testing for different python versions (or different packages versions) and always run the same thing

This is the case, but even if not I think creating different cache keys for different OSes solves the problem.

@goanpeca
Copy link
Member

goanpeca commented Mar 3, 2020

This is the case, but even if not I think creating different cache keys for different OSes solves the problem.

Sure, I mean if you are running the linux container to test things with Py37 and py38. Caching pkg_dir will only work if the conda packages are no arch, or not python dependencies, otherwise the caching won't help much so it really does depend on your workflow and your environment.

As I said, conda has become really slow :-\

@nh13 nh13 mentioned this issue Mar 4, 2020
@wasdee
Copy link

wasdee commented Mar 23, 2020

what is the guideline to enable cache?
From my understanding, don't specify miniconda to latest, right

@goanpeca
Copy link
Member

Correct do not specify miniconda versión at all and dont activate the conda autoupdate version. That will use all that is installed with the container.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants