Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize delta generation (in speed and size) #299

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lmbarros
Copy link
Collaborator

@lmbarros lmbarros commented Jun 3, 2022

Use optimal block length to generate deltas

Previously, we used a block length hardcoded to 512 bytes. Our measurements have shown that this value was generally inadequate: it produced relatively large deltas in took relatively long times to do that.

librsync, by default, uses block length equals to the square root of the old (basis) file. This value results in significantly smaller deltas and shorter run times.

In this commit, we do one more optimization and round this value up to the next power of two value. Since librsync-go has a code path optimized for buffers with sizes that are powers of two, this gives us another performance gain.

- What I did

- How I did it

- How to verify it

- Description for the changelog

Previously, we used a block length hardcoded to 512 bytes. Our
measurements have shown that this value was generally inadequate: it
produced relatively large deltas and took relatively long times to do
that.

librsync, by default, uses block length equals to the square root of the
old (basis) file (and a minimum of 256). This value results in
significantly smaller deltas and shorter run times.

In this commit, we do one more optimization and round this value up to
the next power of two value. Since librsync-go has a code path optimized
for buffers with sizes that are powers of two, this gives us another
performance gain.
@lmbarros
Copy link
Collaborator Author

librsync, by default, uses block length equals to the square root of the old (basis) file. This value results in significantly smaller deltas and shorter run times.

Well, this is the theory and the results I got in librsyc-go when testing with synthetic data. I am now running some measurements with actual images and the results aren't good. In summary:

  • Memory usage went down by something in the order of 1/2 or even close 1/3. This is good news, and makes sense (because AFAIK, our current memory bottleneck during image generation is the delta signature, whose size grows linearly with the number of blocks on the basis image; larger blocks means less blocks, hence less memory usage).
  • Delta sizes went up, often by a factor of 2, but more in other cases, up up to almost 5 in one case. Generally speaking, seems to be worse in larger images. So far I didn't investigate why, but I'd guess the issue is that in a typical image we'd not have super long matches, so large blocks decrease the number of matches drastically.
  • Times to generate the deltas were a mixed bag. Good improvements in some cases, disappointing regressions in others. I don't have a good explanation on how this change could make delta times go up.

Will need to spend more time digging.

@balena-os balena-os deleted a comment Jul 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant