-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update the Rust crate to v0.2 and enable the "c" feature #14
Conversation
This enables optimized assembly implementations, and also AVX-512 support. This requires GCC or similar, but callers building native code are pretty likely to have that installed.
They're working on it, but WebAssembly doesn't support threading yet. Some compilers have support that shims threads into webworkers, but setup is finicky and without shared array buffers the overhead of copying memory would probably be slower than doing everything in the main thread or a single worker, which runs at about 400 MB/s on my machine. The update to 0.2 speeds up Node by about 2x for large data, nice. WebAssembly is about the same speed.
It looks like swapping in rayon-based
I'd be happy to accept a PR for adding such an option, or I can do it later this week/end. One note: I would add it as an option in the |
That's a little surprising to me. Assuming you're running on an AVX2 machine, the new assembly implementation should be maybe 10-20% faster than the previous version. But there are a lot of potential factors, depending on exactly how you're benchmarking.
Yep, that's typical. Usually I find the breakeven point is around 128 KiB, and as a rule of thumb I tell people not to bother with multithreading below 1 MiB.
Hmm, everything would depend on the buffer size they're using inside of When I've experimented with multithreaded streaming before (as opposed to hashing a large input all-at-once, which makes recursive fork-join concurrency a good fit), it's required some fairly complicated background buffering. The idea is that you use a couple of really big buffers, for example 1 MiB each, and you have a background thread filling one buffer, while all the worker threads hash the other one. That avoids sleeping all the worker threads every time you need to refill the buffer, which kills your parallelism. You can get pretty close to all-at-once performance if you go through all that effort, but I don't expect many people to do it in practice. Realistically speaking, in a pipelined/streaming setting, you're much more likely to be bottlenecked by the network or something. And programs that are reading the filesystem, and that really care about this, should be using memory mapping instead of buffered reads. |
This is something that's unfortunate about how streams are done in JavaScript. Buffers are passed to subsequent operators in the stream, rather than shared with them, which means no reuse and new allocations for every byte of incoming data. I much prefer Go's simple io.Reader/Writer approach. There are low level, C-ish read/write functions, but they are not idiomatic and are very rarely used.
I see. I think we could do this in Neon--return control back to user code and trigger a call back into Node code when the hash completes. This would allow the Node consumer to implement appropriate buffering for whatever use case they have. A prepackaged file hashing implementation could be might make sense--I would do that on the Rust side of things though since Node.js doesn't support memory mapping and we would probably bottleneck transiting the data through V8 anyhow. But as you pointed out, few consumers are likely to be bottlenecked around the hash function. BLAKE3 is just too fast already 😛 |
This should be a measurable performance improvement in the native build. That said, I'm not quite sure how to test or benchmark this project properly :)
One of the things changed from v0.1 to v0.2 is that multithreading can now be enabled on a per-caller basis. This library could consider exposing a boolean
multithreading
flag or something like that, as I've done in https://github.com/oconnor663/blake3-py. I don't know what the multithreading story is for Wasm (I assume there isn't one yet?), but fingers crossed with the native build it could Just Work.