Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow caching of repodata from file:/// channels #9730

Merged

Conversation

LorcanHamill
Copy link
Contributor

@LorcanHamill LorcanHamill commented Mar 3, 2020

(This is the second attempt at this, sorry for any inconvenience.)

As discussed in issue #9661 the performance of "file:///" channels is very poor in some circumstances with the latest conda versions. This seems to be because the solver requests repodata about directories more often than it used to. There is a caching mechanism for repodata but it is explicitly disabled for local channels; this PR removes that check so that the caching mechanism works for file:/// channels in the same way as HTTP channels.

That broke some unit tests, because of the way that an environment variable can be set to force conda to use only .tar.bz2 files, not .conda files. The tests try both settings, which changes the repodata that is fetched and should therefore be cached. The tests fail because the wrong version of the data appears in the cache. This could never happen in production as the cache is in memory. So it starts out empty when the conda command is invoked, and the command will only see one setting of the environment variable, so the cache will only ever have the appropriate data. To fix the tests, I added a method to clear the cache and invoked it before each test.

This change makes a huge difference to my company's use of conda. We use large conda channels on relatively slow NFS volumes, and without this change the creation of a new environment takes minutes when it previously took a few seconds.

Fixes #9661

@cla-bot
Copy link

cla-bot bot commented Mar 3, 2020

We require contributors to sign our Contributor License Agreement, and we don't have one on file for @LorcanHamill. In order for us to review and merge your code, please e-sign the PDF at https://conda.io/en/latest/contributing.html#conda-contributor-license-agreement. We then need to manually verify your signature. We will ping the bot to refresh the PR status when we have confirmed your signature.

@LorcanHamill
Copy link
Contributor Author

Okay, this is a new PR that just adds a class method that is not (yet) called from anywhere. And yet lots of tests are broken. I would be very grateful if someone could explain what I've done wrong..?

@LorcanHamill
Copy link
Contributor Author

LorcanHamill commented Mar 3, 2020

To be clear, this draft PR is just meant to be a starting point; I suspected that the tests might fail, and they did, but I can't see how my code change could have caused that. When the tests are passing I'll update the PR to have a full implementation of the new caching behaviour.

@LorcanHamill
Copy link
Contributor Author

I've been away from this for some time but hope to start work again on it shortly. I've just signed the CLA again; I think it may not have worked previously because my GitHub profile somehow didn't include my email address. Hopefully the CLA process will work this time...

@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch from 3733364 to 92b7fbb Compare April 17, 2020 16:00
@cla-bot
Copy link

cla-bot bot commented Apr 17, 2020

We require contributors to sign our Contributor License Agreement, and we don't have one on file for @LorcanHamill. In order for us to review and merge your code, please e-sign the PDF at https://conda.io/en/latest/contributing.html#conda-contributor-license-agreement. We then need to manually verify your signature. We will ping the bot to refresh the PR status when we have confirmed your signature.

@LorcanHamill
Copy link
Contributor Author

Could someone please check why the "CLA-Signed" check is failing on this PR? I've signed the CLA at least three times, and the bot eventually marked a previous PR (#9677) with the "cla-signed" label on March 3. Thanks!

@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch from 92b7fbb to 21a1809 Compare April 20, 2020 12:13
@cla-bot
Copy link

cla-bot bot commented Apr 20, 2020

We require contributors to sign our Contributor License Agreement, and we don't have one on file for @LorcanHamill. In order for us to review and merge your code, please e-sign the PDF at https://conda.io/en/latest/contributing.html#conda-contributor-license-agreement. We then need to manually verify your signature. We will ping the bot to refresh the PR status when we have confirmed your signature.

@jjhelmus
Copy link
Contributor

@cla-bot check

@cla-bot cla-bot bot added the cla-signed [bot] added once the contributor has signed the CLA label Apr 20, 2020
@cla-bot
Copy link

cla-bot bot commented Apr 20, 2020

The cla-bot has been summoned, and re-checked this pull request!

@jjhelmus
Copy link
Contributor

Could someone please check why the "CLA-Signed" check is failing on this PR? I've signed the CLA at least three times, and the bot eventually marked a previous PR (#9677) with the "cla-signed" label on March 3. Thanks!

This process is manually. I've added you to the signed list and summoned the bot.

@LorcanHamill
Copy link
Contributor Author

This process is manually. I've added you to the signed list and summoned the bot.

Ah, I see - I had assumed it was an automated process. Thanks, @jjhelmus !

@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch 2 times, most recently from e097c13 to 34c555c Compare April 21, 2020 14:36
@LorcanHamill LorcanHamill marked this pull request as ready for review April 21, 2020 15:15
@LorcanHamill LorcanHamill requested a review from a team as a code owner April 21, 2020 15:15
@LorcanHamill
Copy link
Contributor Author

As far as I can tell, this PR is ready. The AppVeyor build didn't fail - it was killed for exceeding the 1 hour time limit.

The issue that this fixes - #9661 - is preventing our company from upgrading to conda 4.8, so I'd really like to see this merged soon. Please let me know if there's anything I can do to help that happen. (This is my first conda PR, so I'm not sure of the procedure.) Thanks!

@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch 2 times, most recently from af854a4 to 10db6bd Compare April 24, 2020 05:52
@LorcanHamill
Copy link
Contributor Author

I don't understand what's happened. All checks were passing on this PR, except that the AppVeyor build was timing out. I pushed a couple of empty commits to trigger re-runs of the checks, and now some tests are failing. Guess I may have rebased on top of other changes that broke those tests? Or is something else going on? If anyone could enlighten me, that would be great!

@aldanor
Copy link
Contributor

aldanor commented Apr 28, 2020

@msarahan Could you please take a look at this?

@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch from 10db6bd to 718930b Compare May 7, 2020 08:02
@LorcanHamill
Copy link
Contributor Author

Ok, this PR is passing all checks again, apart from AppVeyor which appears to be broken?

Is there anything else that I need to do, to get this processed? If not, can it please be reviewed and merged?

I appreciate that everyone is busy, but this is essentially a one-line code change, with the additional tweaks that are required to keep the tests passing. It is very important to our company; we can't possibly upgrade to conda 4.8 without this change or something similar; the performance in our environment is so bad as to be unusable. We're not the only ones affected, either - see #9661 and comments on #9677 from @carloswanguw

It's almost three months since I submitted the first version of this change, and I'd really like to see it accepted soon.

@msarahan
Copy link
Contributor

msarahan commented May 7, 2020

I am no longer employed by Anaconda and can't merge anything. I'm not sure about this pr. How does the cache for local repodata ever get invalidated? From what I can tell, it doesn't. That will cause immense problems to anyone who builds packages.

@LorcanHamill
Copy link
Contributor Author

I am no longer employed by Anaconda and can't merge anything. I'm not sure about this pr. How does the cache for local repodata ever get invalidated? From what I can tell, it doesn't. That will cause immense problems to anyone who builds packages.

Thanks for the response. I was unaware of the change in your situation, I hope that's a positive development for you.

The cache is an in-memory one that only lives for the duration of the process and is now handled in the same way for file:// and https:// URLs. Does the metadata for file URLs get used during package building? And does it change during package building, after it has been read, and during the lifetime of a single conda process? (To be clear, those are not rhetorical questions, I don't know the answers.) In any case, I would have assumed that issues like that would be caught by one of the many unit and integration tests?

@msarahan
Copy link
Contributor

msarahan commented May 7, 2020

Does the metadata for file URLs get used during package building? And does it change during package building, after it has been read, and during the lifetime of a single conda process?

Yes, unfortunately. Conda-build loads this the first time it creates the build or host environments, and then it needs to reload it once the new package has been built so that the test environment creation can use the new package. Conda-build could add a call to your new function here that clears things, but this change would break all past versions of conda-build.

There are tests that rely on "file:///" channels NOT being cached
between tests.  There are also some that rely on "https:///" channels
being cached.  So the conservative approach is to ensure that what's
cached between tests remains the same as before.
This is a test that changes the value of the "CONDA_USE_ONLY_TAR_BZ2"
environment variable.  That affects the metadata that is returned, so
the cache must be cleared when it changes.   This issue does not arise
in production as the cache is in memory and the variable doesn't
change once the conda process is up and running.
The first one confirms that if we create SubdirData object for
a channel and then do that again, the second object is actually
the same one as the first; it's been fetched from the in-memory cache.

The second test confirms that if we do the same thing but clear the
cache between the creation of the first and second objects then we get
two distinct objects, but they're equivalent; a query on each of them
yields the same result.
@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch from 2ce514b to e7e8f1b Compare July 23, 2020 16:51
@LorcanHamill LorcanHamill marked this pull request as ready for review July 23, 2020 19:10
@LorcanHamill
Copy link
Contributor Author

LorcanHamill commented Jul 23, 2020

This now passes all tests. For some reason AppVeyor marking the py27 run of the integration tests as a failure but it succeeded the next time it ran (which just an amended commit message).

Anyway, this is once again ready for review. Can someone please review and approve this?

We now timestamp each cached metadata object with the time it was placed
in the cache.  When we're about to use a cached object from a
"file:///" channel, we first check if the JSON file from which it was
loaded still exists, and get its modification time.   If the JSON file
no longer exists, or if the JSON file is newer than the cached object,
then we ignore the cached object and create a new one (which will
replace the old one in the cache).
@LorcanHamill LorcanHamill force-pushed the allow-repodata-caching-for-local-channels branch from e7e8f1b to e27ad54 Compare July 24, 2020 08:13
@aldanor
Copy link
Contributor

aldanor commented Jul 24, 2020

@mcg1969 @chenghlee @jjhelmus Could this be re-reviewed please? This is a major regression so would be nice to have it finally patched.

@LorcanHamill
Copy link
Contributor Author

This PR has been awaiting review for over a week now. Is there anything else I can do to try to progress this?

I'm not sure how to request review other than by tagging people in a comment here, so I'll mention everyone who recently approved or merged PRs:

@chenghlee @mcg1969 @angloyna @jjhelmus

Apologies if that looks like I'm spamming people, but I don't know what else to do. There doesn't seem to be any other forum to discuss this sort of thing? (The "conda" mailing list, mentioned in some docs that I've seen, appears to be dead?)

@mcg1969
Copy link
Contributor

mcg1969 commented Jul 31, 2020

@LorcanHamill as you can see we have a fair amount of backlog here. I assure you we're going to get to it, and in fact we are in the process of making some internal adjustments so we can devote more resources to clearing this backlog. That's all I can assure you of now, except to say that I acknowledge this is great work and I'm appreciative. (And to be honest, even if it were merged, you'd then be waiting for a release to be tagged, too.)

@LorcanHamill
Copy link
Contributor Author

Thanks for the response, @mcg1969 . It's the not knowing that is most frustrating, as I'm sure you understand. I appreciate that it's not always possible to keep up with incoming PRs. As a matter of interest, is there a schedule somewhere for releases, with cut-off dates for merges and so on? It would help with our planning if we have some idea of when this would eventually make it into a release, if all goes well.

@mcg1969
Copy link
Contributor

mcg1969 commented Jul 31, 2020

I think that's a good suggestion @LorcanHamill ... as it turns out there are active discussions in the conda community around the larger questions of governance for this project. I must admit I'm not involved in those discussions so I can't tell you when or what but I'm genuinely positive changes are coming.

@LorcanHamill
Copy link
Contributor Author

I gather this PR has missed the cut for 4.8.4? Is there something I need to do now, to keep this PR alive?

@chenghlee chenghlee added this to the 4.9.0 milestone Aug 12, 2020
@chenghlee
Copy link
Contributor

Closing & reopening this PR to force new CI checks to run; once those pass, I'll merge this PR.

@chenghlee chenghlee closed this Oct 13, 2020
@chenghlee chenghlee reopened this Oct 13, 2020
@chenghlee chenghlee merged commit da9498f into conda:master Oct 13, 2020
@LorcanHamill
Copy link
Contributor Author

Great to see this merged - thanks, @chenghlee !

@haywardlam
Copy link

My company is using the latest conda 4.9.2 and experiencing the same problem with network drive based channels. Any idea what version of conda will have this fix?

@LorcanHamill Really appreciate your work on this.

@LorcanHamill
Copy link
Contributor Author

LorcanHamill commented Apr 23, 2021 via email

@haywardlam
Copy link

@LorcanHamill I see it in the changelog 4.9.0. However, I am using 4.9.2 (latest Anaconda install) with our file:// network drive and conda is still extremely slow. Is there any configuration in .condarc required to use this caching feature?

@LorcanHamill
Copy link
Contributor Author

LorcanHamill commented May 3, 2021 via email

@github-actions
Copy link

github-actions bot commented May 4, 2022

Hi there, thank you for your contribution!

This pull request has been automatically locked because it has not had recent activity after being closed.

Please open a new issue or pull request if needed.

Thanks!

@github-actions github-actions bot added the locked [bot] locked due to inactivity label May 4, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla-signed [bot] added once the contributor has signed the CLA locked [bot] locked due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conda 4.7+ extremely slow "Collecting package metadata (repodata.json)" with local (file:///) channels
7 participants