-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Needle #26
Generic Needle #26
Conversation
- makes library `no_std` - uses `AsRef<[u8]>` instead of `Needle` (this _should_ be optimised like `Needle`). - merges in work from @zakcutner for memcmp
Thanks! I'll try to take a look soon but I am pretty busy right now. |
Sorry for not taking a look at this sooner, I really appreciate your contribution! I like your ideas (particularly making the crate You probably saw that in #8 I bounded let comparison = match self.size() {
2 => memcmp::memcmp1(chunk, needle),
3 => memcmp::memcmp2(chunk, needle),
4 => memcmp::memcmp3(chunk, needle),
5 => memcmp::memcmp4(chunk, needle),
6 => memcmp::memcmp5(chunk, needle),
7 => memcmp::memcmp6(chunk, needle),
8 => memcmp::memcmp7(chunk, needle),
9 => memcmp::memcmp8(chunk, needle),
10 => memcmp::memcmp9(chunk, needle),
11 => memcmp::memcmp10(chunk, needle),
12 => memcmp::memcmp11(chunk, needle),
13 => memcmp::memcmp12(chunk, needle),
n => memcmp::memcmp(chunk, needle, n - 1),
}; If the generic needle parameter, is set to something like let comparison = memcmp::memcmp3(chunk, needle); Instead, if the generic is instead Interestingly, from a few small benchmarks, I found that specialising My Thanks again for your work on this 😄 @marmeladema What do you think? It would definitely be cool to make the API more generic. |
So I think there are actually 3 different things to discuss in this PR:
|
I agree, it seems to me that the changes are also all independent from one another so they can be implemented individually.
I cannot see any reason this would improve download/build time because these dependencies are listed in One option would be to make the dependencies on these other crates optional and not include compiling those benchmarks in CI but as you pointed out we still want to check everything compiles without errors. An alternative approach I've seen used in the
My understanding is that this should create no difference as long as we do not use anything that is not compatible with |
Making the needle generic has been done in #31 |
FWIW, Aspects of the |
@BurntSushi glad to hear we were a motivation ! Very cool work! 👍 I have added some benchmarks against your new implementation in #37 |
Ah interesting. Here are my benchmarks (limited to measurements with a 5% or more difference):
(You can run that same command on a checkout of the Two of the benchmarks, Note that I do expect
Yeah I don't track perf regressions by instruction count, so this is difficult for me to interpret. I do expect some cases to get slightly slower since |
Ok so it seems we are not actually measuring the same things. I never actually measure wall-clock time because from my own experience, it's very hard to have reproducible results on any recent Intel x86_64 hardware whereas measuring instructions, even though is far from perfect, at least gives pretty good reproducible results. To get acceptable results you would need configure your kernel to isolate some physical core, removing irqs for this core too, disabling all frequency scaling features (like turbo mode etc) and even with all of that, I cannot get reproducible results on my laptop with a From run to run of the exact same binary with |
Yeah I don't experience that same level of noise. But to be clear, this isn't just an instruction count thing. The wall clock time differences in your benchmarks are bigger than in mine, and those differences are stable for me. (I run my benchmarks on an otherwise idle machine with CPU frequency governor set to Digging more, I've spotted one inconsistency. What I thought was your "long haystack" benchmark wasn't quite correct. Some time ago, it changed from using and this in yours (Those are both from So I think there's still some stuff to work out here to convince the compiler to give better codegen more consistently. But it's been pretty hit or miss for me. :-/ |
It turns out that upstreams 'long_haystack' benchmark changed from using the i386 corpus to something different (and also smaller). We keep the i386 benchmark, but add a benchmark corresponding to sliceslice's current 'long_haystack' benchmark. See: cloudflare/sliceslice-rs#26
Wow, I am not sure that was an intended change :/ I will have to look at it more closely |
Ok @BurntSushi thank you very much for finding out the unintended benchmark change! I have pushed a fix and have updated the results in #37 |
@marmeladema Aye, thanks. I think our measurements roughly align now, with the only unknown remaining is the difference in codegen for I would definitely encourage you to take a look at the benchmarks in |
@BurntSushi yes I definitely will! Those are very very good to have and also the results you posted are incredibly useful information because it seems there are quite a few big outliers that are not handled properly in our implementation. |
Closing this, it seems this PR has already been the place of too much unrelated discussion ^^ @BurntSushi I am not closing to end the discussion but we can probably continue in a dedicated issue or something else more appropriate 👍 |
@marmeladema Aye no worries. I think we hit a good point anyway! Very happy to get pinged and talk about stuff! And would always love to add more benchmarks. |
no_std
AsRef<[u8]>
instead ofNeedle
(this should be optimised likeNeedle
).Note: I haven't seen out how
Needle
was impl by @marmeladema, just saw the trait so there could be more work there missing in this one.