New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve throughput of read streams by transferring multiple records at once #70
Comments
Respecting |
Getting more than one item at the same time will significantly increase throughput.
The streams read ahead accordingly to On-demand read head will guarantee some performance speedup because it reduces the time an object will live on the heap: as a result, the GC will end up doing less work. I would recommend implementing this anyway. |
Error-handling included, I propose the following API: iterator.nextv(size, function (err, records) {
if (err) // errored
if (records.length === 0) // reached end
for (let record of records) // ..
})
|
We can play with this idea in It might increase the number of times we cross the C++ / JS boundary, especially for small records, because the highWaterMark of streams in objectMode is measured in number of records, while the highWaterMark of leveldown's iterator is measured in bytes. IMO this does not matter because both these parameters can be tweaked by the user as necessary. Would it warrant semver-major though? |
what changes to leveldown would this require? changing the c++ iterator cache size (per iterator) to match the nextv(size... ? |
Depends on what we want to do with the |
👍 We implement When creating a leveldown iterator we set the high water mark to Implementing a native |
I didn't even know you could specify it on the iterator options (its not on the docs). Is highWaterMark meant to be not documented? Also is this something that is (or can be) made uniform across AbstractLevelDown implementations? |
@MeirionHughes It's missing in docs (Level/leveldown#468).
Only |
How would y'all feel about altogether removing the hwm measured in bytes, in favor of a |
👍 for I think removing Alternatively phrased, if there's no |
TODO: benchmarks, decide on default highWaterMark. Ref Level/community#70 Ref #1
On the C++ side: - Replace asBuffer options with encoding options - Refactor iterator_next to work for nextv(). We already had a iterator.ReadMany(size) method in C++, with a hardcoded size. Now size is taken from the JS argument to _nextv(size). The cache logic for next() is the same as before. Ref Level/community#70 Ref Level/abstract-level#12 - Use std::vector<Entry> in iterator.cache_ instead of std::vector<std::string> so that the structure of the cache matches the desired result of nextv() in JS. On the JS side: - Use classes for ChainedBatch, Iterator, ClassicLevel - Defer approximateSize() and compactRange() - Encode arguments of approximateSize() and compactRange(). Ref Level/community#85 - Add promise support to additional methods - Remove tests that were copied to abstract-level. This is the most of it, a few more changes are needed in follow-up commits.
To remove a conflict with streams. Also adds documentation. Ref Level/leveldown#468 Ref Level/community#70
On the C++ side: - Replace asBuffer options with encoding options - Refactor iterator_next to work for nextv(). We already had a iterator.ReadMany(size) method in C++, with a hardcoded size. Now size is taken from the JS argument to _nextv(size). The cache logic for next() is the same as before. Ref Level/community#70 Ref Level/abstract-level#12 - Use std::vector<Entry> in iterator.cache_ instead of std::vector<std::string> so that the structure of the cache matches the desired result of nextv() in JS. On the JS side: - Use classes for ChainedBatch, Iterator, ClassicLevel - Defer approximateSize() and compactRange() - Encode arguments of approximateSize() and compactRange(). Ref Level/community#85 - Add promise support to additional methods - Remove tests that were copied to abstract-level. This is the most of it, a few more changes are needed in follow-up commits.
To remove a conflict with streams. Also adds documentation. Ref Level/leveldown#468 Ref Level/community#70
Done in When not using streams, you can still benefit from the new machinery by using |
To remove a conflict with streams. Also adds documentation. Ref Level/leveldown#468 Ref Level/community#70
Working on the
level-bench
benchmarks got me thinking. Currentlylevel-iterator-stream
ignores thesize
argument ofstream._read(size)
. Per tick it transfers only 1 db record from the underlying iterator to the stream's buffer. I think we can be smarter about this. By connecting the knowledge thatsize
records are requested, all the way down to the db (in the case ofleveldown
, down to the C++, potentially replacing its current read-ahead cache mechanism).In pseudo-code it would look something like (ignore error handling for a moment):
This also avoids allocating 3 callback functions per record. Alternatively:
Or if streams were to get a
.pushv
method similar to.writev
:/cc @mcollina: could such an API be faster? I'm also wondering how
_read()
behaves in an asyncIterator. Issize
always 1 in that case, or does the stream read ahead?@peakji @ralphtheninja /cc @kesla
The text was updated successfully, but these errors were encountered: