Flush write buffer when full #253

mheinzel · 2024-06-16T12:31:53Z

No description provided.

mheinzel · 2024-06-17T13:11:09Z

src/Database/LSMTree/Internal.hs

+    modifyMVar_ (tableContent thEnv) $ \tableContent -> do
+      let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $
+                  tableWriteBuffer tableContent
+      if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th))


We might actually just want to look at number of entries for now.

For the runs, this simplifies things quite a bit (e.g. with mupsert a merged run might end up larger than the sum of its inputs), so it would make sense to do it for the write buffer as well.

I wonder if it would be useful to still keep the size calculation in here

I think on balance we should drop it (but keep the code on a branch in case we decide to resurrect it).

Yes, I think we can keep this branch. I've opened #259 with the alternative approach based on number of entries

jorisdral

Looks good already! I'll take over this PR and make changes if necessary

test/Test/Database/LSMTree/Internal/Merge.hs

src-extras/Database/LSMTree/Extras/Generators.hs

src/Database/LSMTree/Internal/WriteBuffer.hs

test/Test/Database/LSMTree/Internal/Lookup.hs

src/Database/LSMTree/Internal.hs

jorisdral · 2024-06-17T14:48:05Z

src/Database/LSMTree/Internal.hs

+flushWriteBuffer ::
+     m ~ IO
+  => TableHandleEnv m h
+  -> Levels (Handle h)
+  -> WriteBuffer
+  -> m (TableContent h)
+flushWriteBuffer thEnv levels wb = do
+    n <- incrUniqCounter (tablesSessionUniqCounter thEnv)
+    let runPaths = Paths.runPath (tableSessionRoot thEnv) n
+    run <- Run.fromWriteBuffer (tableHasFS thEnv) runPaths wb
+    let levels' = addRunToLevels run levels
+    return TableContent {
+        tableWriteBuffer = WB.empty
+      , tableLevels = levels'
+      , tableCache = mkLevelsCache levels'
+      }


Maybe this should only partially flush the write buffer: a batch of updates might have overfilled the write buffer by a lot

Wouldn't matter if we go by key count only.

Oh I see. Perhaps it'd be simpler to split a batch of inserts so that we never over-fill the write buffer.

Or even simpler, if the batch would take us over the keys limit, flush the buffer as-is (so it's not 100% full) and put the new batch into the new (empty) buffer. That works except when the buffer is hardly full but the batch is so large it'd take us over the limit. In that case we'd need to split batches of inserts.

jorisdral · 2024-06-17T14:53:18Z

src/Database/LSMTree/Internal.hs

+    modifyMVar_ (tableContent thEnv) $ \tableContent -> do
+      let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $
+                  tableWriteBuffer tableContent
+      if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th))


I wonder if it would be useful to still keep the size calculation in here

The directory that runs were created in was deleted accidentally, causing an `FsError` to appear after the first benchmark.

dcoutts

This is looking ok, but I agree that the way to go for now is just size in number of keys, and not size in a bytes measure.

dcoutts · 2024-06-18T13:35:36Z

src/Database/LSMTree/Internal/WriteBuffer.hs

+    writeBufferContent   :: !(Map SerialisedKey (Entry SerialisedValue SerialisedBlob))
+    -- | Keeps track of the total size of keys and values in the buffer.
+    -- This means reconstructing the 'WB' constructor on every update.
+  , writeBufferSizeBytes :: {-# UNPACK #-} !Int


Following our discussion yesterday, is it worth tracking this? We concluded that the size measure we would use for the LSM merging would be the number of keys rather than size of keys + values. There's a non-trivial cost to tracking it here in the WriteBuffer. Is it worth paying?

Provided we keep the branch, we can always grab the code again if we decide we do need it.

dcoutts · 2024-06-18T13:48:46Z

src/Database/LSMTree/Internal/WriteBuffer.hs

+    let (!wb', !s') = runState (insert k wb) s
+    in WB wb' s'
+  where
+    -- TODO: this seems inelegant, but we want to avoid traversing the Map twice


And this is only needed to track the size measure.

dcoutts · 2024-06-18T14:02:53Z

src/Database/LSMTree/Internal.hs

+    modifyMVar_ (tableContent thEnv) $ \tableContent -> do
+      let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $
+                  tableWriteBuffer tableContent
+      if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th))


I think on balance we should drop it (but keep the code on a branch in case we decide to resurrect it).

dcoutts · 2024-06-18T14:03:33Z

src/Database/LSMTree/Internal.hs

+flushWriteBuffer ::
+     m ~ IO
+  => TableHandleEnv m h
+  -> Levels (Handle h)
+  -> WriteBuffer
+  -> m (TableContent h)
+flushWriteBuffer thEnv levels wb = do
+    n <- incrUniqCounter (tablesSessionUniqCounter thEnv)
+    let runPaths = Paths.runPath (tableSessionRoot thEnv) n
+    run <- Run.fromWriteBuffer (tableHasFS thEnv) runPaths wb
+    let levels' = addRunToLevels run levels
+    return TableContent {
+        tableWriteBuffer = WB.empty
+      , tableLevels = levels'
+      , tableCache = mkLevelsCache levels'
+      }


Wouldn't matter if we go by key count only.

jorisdral · 2024-06-20T10:59:54Z

We are going to use a write buffer allocation method that is defined in terms of number of entries, instead of number of bytes. We're keeping the code in this PR around in case we want to change the write buffer allocation method to byte size later, and we've opened #259 for the alternative method.

mheinzel requested review from dcoutts and jorisdral as code owners June 16, 2024 12:31

mheinzel force-pushed the mheinzel/table-write-flush branch from 67e0129 to ee21e0d Compare June 16, 2024 12:52

mheinzel mentioned this pull request Jun 17, 2024

Merge runs when levels become full #254

Closed

mheinzel commented Jun 17, 2024

View reviewed changes

jorisdral reviewed Jun 17, 2024

View reviewed changes

jorisdral self-assigned this Jun 17, 2024

jorisdral force-pushed the mheinzel/table-write-flush branch 2 times, most recently from 3429b6a to 2b588e8 Compare June 18, 2024 12:44

mheinzel and others added 5 commits June 18, 2024 14:57

tweak table handle

e84245c

make WriteBuffer track size of its content

6abf57c

TOSQUASH: track write buffer size

fcca0ac

flush write buffer when full

67e1512

Fix WriteBuffer benchmarks

48fd56b

The directory that runs were created in was deleted accidentally, causing an `FsError` to appear after the first benchmark.

jorisdral force-pushed the mheinzel/table-write-flush branch from 2b588e8 to 48fd56b Compare June 18, 2024 12:59

SizeInBytes

aa11da0

dcoutts reviewed Jun 18, 2024

View reviewed changes

jorisdral closed this Jun 20, 2024

jorisdral deleted the mheinzel/table-write-flush branch July 2, 2024 10:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flush write buffer when full #253

Flush write buffer when full #253

mheinzel commented Jun 16, 2024

mheinzel Jun 17, 2024

jorisdral Jun 17, 2024 •

edited

Loading

dcoutts Jun 18, 2024

jorisdral Jun 20, 2024

jorisdral left a comment

jorisdral Jun 17, 2024

dcoutts Jun 18, 2024

dcoutts Jun 18, 2024

jorisdral Jun 17, 2024 •

edited

Loading

dcoutts left a comment

dcoutts Jun 18, 2024

dcoutts Jun 18, 2024

dcoutts Jun 18, 2024

dcoutts Jun 18, 2024

jorisdral commented Jun 20, 2024 •

edited

Loading

Flush write buffer when full #253

Flush write buffer when full #253

Conversation

mheinzel commented Jun 16, 2024

Choose a reason for hiding this comment

jorisdral Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisdral left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisdral Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

dcoutts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisdral commented Jun 20, 2024 • edited Loading

jorisdral Jun 17, 2024 •

edited

Loading

jorisdral Jun 17, 2024 •

edited

Loading

jorisdral commented Jun 20, 2024 •

edited

Loading