Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk writes #1

Open
dominictarr opened this issue Aug 28, 2019 · 2 comments
Open

bulk writes #1

dominictarr opened this issue Aug 28, 2019 · 2 comments

Comments

@dominictarr
Copy link
Owner

hey @Happy0 for ssbc/ssb-db#264 a simpler approach might be to update async-write to accept an array, and reduce that array into the async write buffer. https://github.com/dominictarr/async-write/blob/master/index.js#L30

currently, calling async-write in a loop will call the underlying write method as soon as the buffer is full, but if you passed it an array, you could put everything into the buffer before isFull(buffer) is checked.

That would cover cases where one message in a batch is invalid, but it could still be a problem if a batch of messages are half written, we'd need to fix that in the flumelog though.

@Happy0
Copy link

Happy0 commented Sep 12, 2019

Sorry about not replying sooner! Thanks for looking :).

I'm super nervous about changing code this core to the stack, so more simple than my original PRs is better. I'd hate to break SSB - I can think of nothing more devastating!

I think I've thought of a way to handle the openlaw use case at an application level (since we already keep track of a part field, we could just treat the entry as though it doesn't exist if there's any missing parts.) Really, the main situation I'm trying to avoid is someone pulling the plug on the server in the middle of a write or a big import.

However, I think this would still be nice to have in the ssb stack if we can achieve it in a safe and well tested way. Your idea sounds promising - how tricky would it be to deal with the 'half written' situation (someone pulling the plug on the server) in the flumelog?

@dominictarr
Copy link
Owner Author

It's actually much easier to implement this sort of thing at the lower levels, and especially to test it. firstly, because there is just much less stuff flying around there, and secondly it's nicer if the low level generic stuff is where the hard problems are solved, and the high level stuff (ssb-db) should really just be glue. If you are worried about breaking everything, it doesn't make much difference if it's "more core" or not. if it's in ssb-db everything uses that, so if there is a mistake it's still gonna break everything.

How do handle half-written records: I think this needs to be a part of the format. It's all about how to recover from a half write. Currently, when a flumelog starts up, it scans backwards to find the last valid offset. This works if a single record was half written. But if several records were meant to be written in one batch, you'd need a way to rewind back to the start.

the flumelog-offset format is currently:

offset-><data.length (UInt32BE)>
        <data ...>
        <data.length (UInt32BE)>
        <file_length (UInt32BE or Uint48BE or Uint53BE)>
offset-><data.length (UInt32BE)>
        <batch remaining (UInt32BE)> //added this
        <data ...>
        <data.length (UInt32BE)>
        <file_length (UInt32BE or Uint48BE or Uint53BE)>

where batch remaining is a integer that counts down. If you write a single record it's zero. If you write three it's 2, 1, 0

Does that makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants