-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch operations #107
Comments
@dominictarr Reverse iteration would be implemented optimistically. Forward iteration is performed using links, but reverse iteration is performed optimistically, starting from the right most leaf, you work your way backwards descending the tree to the preceding. This is the intended design.The locking strategy for Strata is designed for liveness. The locking required to maintain reverse links would reduce liveness. It is assumed that descents are cheap in Strata, and quick to make progress because exclusive locks are not global. Batch operations are a requirement for a log structured tree. Currently, every insert into Strata is an append. Nothing is lost. However, each write opens and closes the leaf page. Yes, I know, horrible, but I now that I understand the requirements, it's easy enough to use a But, yes, durability was there at the start. Absurdly durable. The file open and close was in the hopes that it would get the filesystem to flush. You can probably tell me a lot about how to properly sync Node.js I/O and I'd be happy to learn. Note that, Strata doesn't store a record. It is a b-tree primitive. The record that goes into Strata is not the one provided by the user. It was always assumed that in order to build a useful structure for a database, you would include database metadata in a record. For example, Strata does not support duplicate index values, but it was designed so that you could easily fake duplicates with a hidden increment. The increment makes the key unique, but you don't share that with the user. With the right record, or a hidden record, you could fulfill your atomicity requirement. It is my intention to build a MVCC NoSQL database on top of Strata as proof of concept. (Your concepts would be powerful proof too, so maybe I'll focus on that at the outset.) The MVCC database would use Strata to store an object/document keeping an version number that gets incremented for each connection. This version number is stored hidden to the user in a record that contains the user object/document. When it visits a leaf page, it can inspect the version numbers of the other records with the same key. There a common collection of outstanding transactions. If one of the versions is that of an outstanding transaction, we know that when we commit, if the outstanding transaction has already committed, we have a conflict and we can give constraint error to the user. Thus, to get a batch, you might use a separate Strata b-tree that represents WAL. It would be that each record represents a component of your batching data structure, instead of trying to fit it all into one record. You would key the records on a batch number. |
I'm Let me know where you believe I should insert strata into the LevelDB ecosystem and I'll fork and submit a Pull Request. I can have a look myself, I suppose, to see if there is a backing storage API. |
No pull request is needed, instead, you just need to support the api of https://github.com/rvagg/node-abstract-leveldown Hmm, I still think that batch operations should be lower in the abstraction hierachy - but I'm not gonna pressure the point. Hmm, I get your point about the writeahead log - in fact, that is pretty much how leveldb does it. The first level in a in memory cache that is backed by a log file - and when this gets large, it's compacted into a sorted file. Since this is relatively small, you could use the same approach on top of strata... hmm, it wouldn't even need to be a btree... it could just be a small file that is appended to, with all the data kept in memory, and then when it gets to a certain size, you'd write it out to the btree, and start a new file. to get batches, you'd just write each collection of records out inside an array, so if it crashed part way through, hmm, this is inspiring me to continue with by pure js leveldown that I started a while back... |
Strata is going to create a file backed table of sorted records as you've described. If you do not want it to be a b-tree, then don't balance it. But, you probably do want it to be a b-tree, and balanced frequently, because then, if you'er inserting random data, operations can be performed in parallel, as different clients are updating different leaves of the tree. I don't know if LevelDB supports this or if it is a single-writer API. (I'll look harder tomorrow.) Strata has always been intended for use in an MVCC database, and I've always wanted to allow concurrent writes to a b-tree. I'll look harder at the abstraction too. I'm not sure I see the support for batched operations. I'm looking forward to that directory of benchmarks in the LevelDOWN repository. In any case, this is an open issue, because using those benchmarks, I'll be designing a batching strategy for Strata (along with a binary storage strategy, since it appears that everyone feels that is necessary.) |
leveldb is threadsafe, but may only be opened by a single process, and you can do parallel writes. For binary data, maybe you can just use msgpack? Another option for off-the-shelf serialization would be the redis protocol. |
I'd do checksummed and length, but with delimiters so that you could scan through the file and save records. One nice attribute of strata is that the leaves are lines of JSON, so if worst came to worst, you could run through a ruined b-tree reading the leaves, saving what you can. However, there is no operation in Strata isn't either a copy or an append. One of the issues open is to archive dead pages instead of deleting them. An option that will allow Strata some time to build trust, and recover from violations of that trust. So, some sort of sentry to find where the records begin and end. From their there is a length and checksum to retrieve the record. Of course, if there is no damage, there is no scanning. I'm only thinking about the damage. Then storage is just a blob, so it could be anything and Msgpack is a good option at the next level of abstraction. I'm not sure what LevelDB stores as binary. I only see binary. I also have been asked for binary by Kevin Swiber for his https://github.com/argo/medea database. I know I don't do binary. I suppose the right API is to offer up a buffer, or a write stream that accepts buffers. Currently, storage is just a JSON blob. The user provides a collation function and a key extraction function. Their might be a vivify function added to the mix, that might be a no-op, that allows the user to transform their buffer into a more meaningful intermediate state, like a JSON object from serialized JSON or Msgpack or other. I'm going to need a faster checksum. I was going to implement Murmur 3. Looking forward to benchmarks blasting away at my assumptions. I know I need better batching, a better delayed write strategy, but I didn't want to bother implementing it without a real application. Also, I'm obviously going to need a merging iterator. |
Oh yeah, I also have this: https://npmjs.org/package/json-buffer It just converts buffers into base64, but you could use this and strata would have binary, abet, not the most performant possible. I appreciate the simplicity you get from a line separated format, but the down side is that you have to escape the data. The difference if you can avoid mem copies is measurable. Leveldb uses Protocol Buffers, and Snappy. just found this Snappy is fast compression (byte oriented, instead of bit oriented, optimised for speed first and compression second). There doesn't seem to be a js implementation of snappy (it can be disabled in leveldb), but it would be pretty cool. |
This is actually related to good old #4. |
See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
Convert final use of `writePositions` to write footer separately. See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
Implement `writeBranch` using the streaming entry writing functions. Rework `Transcript` to correctly convert streams errors to callback errors, I hope, hard to tell if I've managed to capture every possible error condition. See bigeasy#356. See bigeasy#355. See bigeasy#107. See bigeasy#4.
Having a batch operation, that atomically processes a series of puts and dels,
is vital for building reliable applications.
For example, say I wanted to implement a map-reduce, or some other async transformation on inserted data: on inserting the data, also insert a job item,
and when that is durably written, start performing the job, and when the result of the job is written, delete the job.
Thus, if the system crashes while data is being written, you either get no data and no job, or data, and then the job, eventually!
I have been doing a lot of work build node.js databases based on leveldb ( https://github.com/rvagg/node-levelup ), and have been looking for an suitable database primitive implemented in javascript, so that it is not necessary to compile anything.
Then, strata will be usable with the growing number of level-* modules
https://github.com/rvagg/node-levelup/wiki/Modules
The other thing that is needed, is reverse iteration.
Browsing your code, I'm thinking that to get a batch operation, you'd need a extended checksum, that validated the entire batch, or some other way, of grouping a whole batch of writes into one line? maybe some other separator, or a symbol meaning that the line is a part of a batch, and the checksum applies to the batch as a whole?
Cheers, Dominic
The text was updated successfully, but these errors were encountered: