You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Greetings! I am using the library and really like it. Very flexible and performs quite well. However... there's always room for optimization. I noticed that the activity on this project is low and most comments are very old, so I should ask first, should I be looking into a newer project that has effectively replaced this one?
That being said, my experience in C++ based sorting showed two improvements can produce very significant results:
Pipelining: during the block sort phase, instead of doing the accumulate/blocksort/write in a procedural loop, fill each block and then lob it into an execution pipeline that separates the sort and write into separate parallel tasks.
Compression: A light compression like Snappy can reduce temp space by 70% or so, and can result in faster I/O (especially if compression is done in parallel using the above pipeline technique).
I haven't dug deep enough into the code to see if some of this is already supported, please tell me to RTFM if I've missed something.
Thanks,
john
The text was updated successfully, but these errors were encountered:
Hiya! Project is stable, mature, as I haven't needed changes myself, but there is no replacement that I know of (i.e. I have not written newer package).
If you are anyone else is interested in experimenting with improvements -- performance, usability/ergonomics, configurability, interoperability -- I'd be happy to help in getting those integrated.
Right now I don't have personal itch to work on things, but I still maintain it if someone was to find a bug for example.
Now: on pipelining -- now support for it at this point. Someone actually did something like that for LZF codec I wrote (https://github.com/ning/compress), and the main question there is probably that of modeling of how things should fit together, how to expose tuning wrt threads to use, sync.
As to compression: I think that this is something that can be handled by allowing extensions and does not necessarily have to be part of core package... although I can see how maybe supporting codecs that JDK comes with (deflate/gzip) could be out of the box, as default implementation.
Alternatively this package could be made multi-maven project so that extension compression codecs could be built from same repo, just result in separate jar(s).
Greetings! I am using the library and really like it. Very flexible and performs quite well. However... there's always room for optimization. I noticed that the activity on this project is low and most comments are very old, so I should ask first, should I be looking into a newer project that has effectively replaced this one?
That being said, my experience in C++ based sorting showed two improvements can produce very significant results:
I haven't dug deep enough into the code to see if some of this is already supported, please tell me to RTFM if I've missed something.
Thanks,
john
The text was updated successfully, but these errors were encountered: