Retries and performance improvements #9
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #7
Closes #8
Related:
We now attempt to read the
collection_info
property before trying to read the manifest out of the tarball via artifactory (#5), and we fall back to reading it in case the property doesn't exist. This is just a holdover in case there are any collections in artifactory that were uploaded with the first version of galactory (which probably only affects me).The fallback will be removed in the next version, so find any of those collections, and if they're upstreams-made-local, maybe just delete them and let them get repopulated. If they're local, republish them.
This reduces a potentially expensive connection.
But I went even further in terms of improving performance. As I've used this it's gotten slower and slower as the number of collections increased, and I realized the reason is that when iterating, even when we know the specific collection and version we want, we're kind of just browsing the list, and for each collection we're asking for a stat and its properties, which are two separate HTTP requests, before we can eliminate the collection as a candidate. This was really slowing things down.
So I've added a fast detection mode, on by default (and not user configurable at this time), that uses the naming convention of the file to try to determine the things we can use to prevent furthe requests and skip quickly.
For example,
briantist-whatever-0.1.0.tar.gz
can be split tonamespace: briantist
andcollection: whatever
andversion: 0.1.0
and that's enough info to not match, we can skip before making any additional requests.If the collection isn't eliminated with that, we proceed with the rest of the screening as before, except that I've split the
stat
andproperties
requests now, doing stat first, then the conditional that might skip, then ask for properties, then the conditional that might skip on that. So a small improvement there in addition to the above.In testing, this is a LOT faster, and I think it will also help alleviate some of the connection errors I was seeing that led to #7 and #8.