Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid redundant metadata downloads #2682

Merged
merged 1 commit into from
Feb 26, 2019

Conversation

HebaruSan
Copy link
Member

@HebaruSan HebaruSan commented Feb 11, 2019

Motivation

Updating the registry can be slow, as it requires downloading a 2.5 MB ZIP or tar.gz file, then extracting it and parsing the JSON of all the modules and populating the registry with them. In GUI there's a further delay as the mod list and filters are updated.

If there haven't been any metadata changes since the last time you refreshed, then all of this is done for naught. It would be nice to skip it in that case.

Changes

Now we use the ETag HTTP response header to determine when a repo's metadata is already up to date. This is an opaque hexadecimal string (with quotes around it) that changes when the master.tar.gz file does. It is returned when the file is downloaded, but it can also be retrieved with a much quicker HEAD request. In my testing, this value does indeed change when something in the CKAN-meta repo is modified.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag

The Repository objects stored in the registry now have a last_server_etag property corresponding to the most recent value received for their URLs.

Repo.UpdateAllRepositories's return type is changed from bool to an enum with three options: Updated (formerly true), Failed (formerly false), and NoChanges. If all of our repos have the same ETag that they had the last time we downloaded them, then it returns NoChanges without updating anything, which is much quicker than doing a full update. Otherwise an update is performed as normal, and the ETags are captured from WebClient and saved into the Repository objects. (We don't attempt to suppress individual repo updates because we don't have the ability to identify which modules are from which repos. To keep the list complete, updates have to be all or nothing.)

When it encounters a NoChanges return value, GUI skips rebuilding the mod list, which further speeds up redundant refreshes.

Fixes #854.

@HebaruSan HebaruSan added Enhancement New features or functionality GUI Issues affecting the interactive GUI Cmdline Issues affecting the command line Core (ckan.dll) Issues affecting the core part of CKAN Pull request Registry Issues affecting the registry Network Issues affecting internet connections of CKAN labels Feb 11, 2019
@techman83
Copy link
Member

I have some work to split up the repo to have historical data on separate branches. To make the payload smaller. But some issues with the testing held me up, I should get back to it KSP-CKAN/NetKAN-bot#58

@DasSkelett
Copy link
Member

Can confirm that it is a lot faster:
(Tested with time ckan update)
Before: ~8.3s
After: ~4.8s
-> 1.7 times faster

Copy link
Member

@politas politas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a great little tweak. Please merge with changelog update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cmdline Issues affecting the command line Core (ckan.dll) Issues affecting the core part of CKAN Enhancement New features or functionality GUI Issues affecting the interactive GUI Network Issues affecting internet connections of CKAN Registry Issues affecting the registry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants