Retries and performance improvements #9

briantist · 2022-08-08T23:30:27Z

Closes #7
Closes #8

Remove reliance on Artifactory archive file download #5

We now attempt to read the collection_info property before trying to read the manifest out of the tarball via artifactory (#5), and we fall back to reading it in case the property doesn't exist. This is just a holdover in case there are any collections in artifactory that were uploaded with the first version of galactory (which probably only affects me).

The fallback will be removed in the next version, so find any of those collections, and if they're upstreams-made-local, maybe just delete them and let them get repopulated. If they're local, republish them.

This reduces a potentially expensive connection.

But I went even further in terms of improving performance. As I've used this it's gotten slower and slower as the number of collections increased, and I realized the reason is that when iterating, even when we know the specific collection and version we want, we're kind of just browsing the list, and for each collection we're asking for a stat and its properties, which are two separate HTTP requests, before we can eliminate the collection as a candidate. This was really slowing things down.

So I've added a fast detection mode, on by default (and not user configurable at this time), that uses the naming convention of the file to try to determine the things we can use to prevent furthe requests and skip quickly.

For example, briantist-whatever-0.1.0.tar.gz can be split to namespace: briantist and collection: whatever and version: 0.1.0 and that's enough info to not match, we can skip before making any additional requests.

If the collection isn't eliminated with that, we proceed with the rest of the screening as before, except that I've split the stat and properties requests now, doing stat first, then the conditional that might skip, then ask for properties, then the conditional that might skip on that. So a small improvement there in addition to the above.

In testing, this is a LOT faster, and I think it will also help alleviate some of the connection errors I was seeing that led to #7 and #8.

briantist added 4 commits August 8, 2022 19:15

add _session_with_retries utility function

14a4200

add retries to artifactory calls

e264b02

add retries to upstream calls

d48aae3

greatly reduce requests to artifactory for iterating over collections

b04972d

briantist added the enhancement New feature or request label Aug 8, 2022

briantist self-assigned this Aug 8, 2022

briantist added 2 commits August 8, 2022 19:34

cleanup

c7056a2

fix tests

1db8cf2

briantist merged commit db35a31 into main Aug 8, 2022

briantist deleted the retries branch August 8, 2022 23:48

briantist mentioned this pull request Oct 2, 2022

Replace manifest loading from Artifactory #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retries and performance improvements #9

Retries and performance improvements #9

briantist commented Aug 8, 2022

Retries and performance improvements #9

Retries and performance improvements #9

Conversation

briantist commented Aug 8, 2022