bulk writes #1

dominictarr · 2019-08-28T10:05:16Z

hey @Happy0 for ssbc/ssb-db#264 a simpler approach might be to update async-write to accept an array, and reduce that array into the async write buffer. https://github.com/dominictarr/async-write/blob/master/index.js#L30

currently, calling async-write in a loop will call the underlying write method as soon as the buffer is full, but if you passed it an array, you could put everything into the buffer before isFull(buffer) is checked.

That would cover cases where one message in a batch is invalid, but it could still be a problem if a batch of messages are half written, we'd need to fix that in the flumelog though.

The text was updated successfully, but these errors were encountered:

Happy0 · 2019-09-12T09:16:03Z

Sorry about not replying sooner! Thanks for looking :).

I'm super nervous about changing code this core to the stack, so more simple than my original PRs is better. I'd hate to break SSB - I can think of nothing more devastating!

I think I've thought of a way to handle the openlaw use case at an application level (since we already keep track of a part field, we could just treat the entry as though it doesn't exist if there's any missing parts.) Really, the main situation I'm trying to avoid is someone pulling the plug on the server in the middle of a write or a big import.

However, I think this would still be nice to have in the ssb stack if we can achieve it in a safe and well tested way. Your idea sounds promising - how tricky would it be to deal with the 'half written' situation (someone pulling the plug on the server) in the flumelog?

dominictarr · 2019-09-16T01:51:04Z

It's actually much easier to implement this sort of thing at the lower levels, and especially to test it. firstly, because there is just much less stuff flying around there, and secondly it's nicer if the low level generic stuff is where the hard problems are solved, and the high level stuff (ssb-db) should really just be glue. If you are worried about breaking everything, it doesn't make much difference if it's "more core" or not. if it's in ssb-db everything uses that, so if there is a mistake it's still gonna break everything.

How do handle half-written records: I think this needs to be a part of the format. It's all about how to recover from a half write. Currently, when a flumelog starts up, it scans backwards to find the last valid offset. This works if a single record was half written. But if several records were meant to be written in one batch, you'd need a way to rewind back to the start.

the flumelog-offset format is currently:

offset-><data.length (UInt32BE)>
        <data ...>
        <data.length (UInt32BE)>
        <file_length (UInt32BE or Uint48BE or Uint53BE)>

offset-><data.length (UInt32BE)>
        <batch remaining (UInt32BE)> //added this
        <data ...>
        <data.length (UInt32BE)>
        <file_length (UInt32BE or Uint48BE or Uint53BE)>

where batch remaining is a integer that counts down. If you write a single record it's zero. If you write three it's 2, 1, 0

Does that makes sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk writes #1

bulk writes #1

dominictarr commented Aug 28, 2019

Happy0 commented Sep 12, 2019

dominictarr commented Sep 16, 2019

bulk writes #1

bulk writes #1

Comments

dominictarr commented Aug 28, 2019

Happy0 commented Sep 12, 2019

dominictarr commented Sep 16, 2019