-
-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve publish performance, especially for prefixes with a large number of snapshots #1222
Conversation
23a815b
to
ae09dfc
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1222 +/- ##
==========================================
- Coverage 74.85% 66.06% -8.80%
==========================================
Files 143 143
Lines 16187 16215 +28
==========================================
- Hits 12117 10712 -1405
- Misses 3134 4752 +1618
+ Partials 936 751 -185 ☔ View full report in Codecov by Sentry. |
I think this is a fantastic change! @neolynx @dario-gallucci you wanna have a look at this? |
I think this is a fantastic change!
yes indeed,thanks !
@neolynx @dario-gallucci you wanna have a look at this?
sure!
|
In some local tests w/ a slowed down filesystem, this massively cut down on the time to clean up a repository by ~3x, bringing a total 'publish update' time from ~16s to ~13s. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
When merging reflists with ignoreConflicting set to true and overrideMatching set to false, the individual ref components are never examined, but the refs are still split anyway. Avoiding the split when we never use the components brings a massive speedup: on my system, the included benchmark goes from ~1500 us/it to ~180 us/it. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
The cleanup phase needs to list out all the files in each component in order to determine what's still in use. When there's a large number of sources (e.g. from having many snapshots), the time spent just loading the package information becomes substantial. However, in many cases, most of the packages being loaded are actually shared across the sources; if you're taking frequent snapshots, for instance, most of the packages in each snapshot will be the same as other snapshots. In these cases, re-reading the packages repeatedly is just a waste of time. To improve this, we maintain a list of refs that we know were processed for each component. When listing the refs from a source, only the ones that have not yet been processed will be examined. Some tests were also added specifically to check listing the files in a component. With this change, listing the files in components on a copy of our production database went from >10 minutes to ~10 seconds, and the newly added benchmark went from ~300ms to ~43ms. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
Reflists are basically stored as arrays of strings, which are quite space-efficient in MessagePack. Thus, using zero-copy decoding results in nice performance and memory savings, because the overhead of separate allocations ends up far exceeding the overhead of the original slice. With the included benchmark run for 20s with -benchmem, the runtime, memory usage, and allocations go from ~740us/op, ~192KiB/op, and 4100 allocs/op to ~240us/op, ~97KiB/op, and 13 allocs/op, respectively. Signed-off-by: Ryan Gonzalez <ryan.gonzalez@collabora.com>
As an aside, we've been running this on our production infra for a bit now, and nothing has exploded, which is probably a good thing. |
This contains a variety of improvements for publish performance, specifically speeding up the cleanup operation by:
Benchmarks were added for all of these, and some unit tests were added specifically to test aspects of cleanup.
We have a relatively large aptly repository with >90 repositories, ~207k packages across all of the repositories, and >3.5k snapshots; a testing version of that repository was used to measure the publishing performance. Prior to these changes, publishing took >9 minutes, with over 8 minutes of that time just in the cleanup phase. With these, the cleanup time goes down to ~13 seconds, for a total publish time of a little under a minute.
Checklist
AUTHORS
(I already have a commit for this in Improve test output regexes for better perf and Go 1.20 support #1220 and don't want merge conflicts)