Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid the use of a local window when splitting #70

Closed
wants to merge 1 commit into from

Conversation

Dr-Emann
Copy link

@Dr-Emann Dr-Emann commented Jul 6, 2022

The splitting code in bupsplit.c was written as if it had to handle
the case where bytes were fed one by-one, but it actually gets the whole
data chunk: there's no need to copy to a local window, when we have the
original data available: we can simply use the passed data to fetch the
byte that is leaving the current window. This should actually be a fair
bit faster.

Additionally, try to use __builtin_ctz for filling in the bits ptr, if
it is available.

Signed-off-by: Zachary Dremann dremann@gmail.com

@gdt
Copy link
Contributor

gdt commented Jul 6, 2022

It would be good to make sure this is ok on all compilers currently known to build bup. I'm not sure what that is exactly, and I'm not claiming it would cause issues -- I am really just suggesting that we be sure this doesn't bring an unnoticed requirements bump. I will try to test it on NetBSD 9 with gcc 7.

@Dr-Emann
Copy link
Author

Dr-Emann commented Jul 6, 2022

It seems to work on at least gcc 4.1.2 when copied into godbolt

@gdt
Copy link
Contributor

gdt commented Jul 6, 2022

As you'll see from the list, i had to work through 2 problems to be able to build master, and after that I was able to build with the commit in this PR, and pass tests.

My system was:

  • python 3.9
  • NetBSD 9 stable
  • amd64 CPU, running as a Xen domU
  • dependencies all from pkgsrc

I have an older netbsd-5 system kept around for extreme portability testing and I'm not sure I can build bup on that; I'll try that too. But it's now too crufty to care about, and I think it fails with gcc 4.1 because of earlier non-portable commits about inttypes.

@rlbdv
Copy link
Member

rlbdv commented Jul 9, 2022

Thanks for the help. I suspect this may need adjustment for, or be superseded by Johannes' pending treesplitting work, which first overhauls the hashsplitter to make the blob and fanout bits customizable (it also moves the entire hashsplitter to C). We've been working on it off and on via IRC for a while, and I think it's about ready to merge. Here's the current state: https://github.com/rlbdv/bup/tree/tmp-johill-treesplit-v2, though I think it's likely we'll squash my changes into his before we finish. I was planning to post it to the list for review once a couple of remaning questions are settled.

And though I haven't thought about them yet, the ctz changes may be interesting in their own right. (I didn't know about __has_builtin, though now I see that intprops.h uses it too.)

I'm also trying to decide if I think we should pursue a release before merging the hashsplitter overhaul, since we already have a good bit of work pending, and deferring it (given its critical nature) would give it plenty of time to settle before a release that includes it.

Thanks again, appreciate your time.

The splitting code in `bupsplit.c` was written as if it had to handle
the case where bytes were fed one by-one, but it actually gets the whole
data chunk: there's no need to copy to a local window, when we have the
original data available: we can simply use the passed data to fetch the
byte that is leaving the current window.

Additionally, try to use __builtin_ctz for filling in the bits ptr, if
it is available.

Signed-off-by: Zachary Dremann <dremann@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants