-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush write buffer when full #253
Conversation
67e0129
to
ee21e0d
Compare
src/Database/LSMTree/Internal.hs
Outdated
modifyMVar_ (tableContent thEnv) $ \tableContent -> do | ||
let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $ | ||
tableWriteBuffer tableContent | ||
if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might actually just want to look at number of entries for now.
For the runs, this simplifies things quite a bit (e.g. with mupsert a merged run might end up larger than the sum of its inputs), so it would make sense to do it for the write buffer as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be useful to still keep the size calculation in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on balance we should drop it (but keep the code on a branch in case we decide to resurrect it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can keep this branch. I've opened #259 with the alternative approach based on number of entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good already! I'll take over this PR and make changes if necessary
flushWriteBuffer :: | ||
m ~ IO | ||
=> TableHandleEnv m h | ||
-> Levels (Handle h) | ||
-> WriteBuffer | ||
-> m (TableContent h) | ||
flushWriteBuffer thEnv levels wb = do | ||
n <- incrUniqCounter (tablesSessionUniqCounter thEnv) | ||
let runPaths = Paths.runPath (tableSessionRoot thEnv) n | ||
run <- Run.fromWriteBuffer (tableHasFS thEnv) runPaths wb | ||
let levels' = addRunToLevels run levels | ||
return TableContent { | ||
tableWriteBuffer = WB.empty | ||
, tableLevels = levels' | ||
, tableCache = mkLevelsCache levels' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should only partially flush the write buffer: a batch of updates might have overfilled the write buffer by a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't matter if we go by key count only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. Perhaps it'd be simpler to split a batch of inserts so that we never over-fill the write buffer.
Or even simpler, if the batch would take us over the keys limit, flush the buffer as-is (so it's not 100% full) and put the new batch into the new (empty) buffer. That works except when the buffer is hardly full but the batch is so large it'd take us over the limit. In that case we'd need to split batches of inserts.
src/Database/LSMTree/Internal.hs
Outdated
modifyMVar_ (tableContent thEnv) $ \tableContent -> do | ||
let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $ | ||
tableWriteBuffer tableContent | ||
if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be useful to still keep the size calculation in here
3429b6a
to
2b588e8
Compare
The directory that runs were created in was deleted accidentally, causing an `FsError` to appear after the first benchmark.
2b588e8
to
48fd56b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking ok, but I agree that the way to go for now is just size in number of keys, and not size in a bytes measure.
writeBufferContent :: !(Map SerialisedKey (Entry SerialisedValue SerialisedBlob)) | ||
-- | Keeps track of the total size of keys and values in the buffer. | ||
-- This means reconstructing the 'WB' constructor on every update. | ||
, writeBufferSizeBytes :: {-# UNPACK #-} !Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our discussion yesterday, is it worth tracking this? We concluded that the size measure we would use for the LSM merging would be the number of keys rather than size of keys + values. There's a non-trivial cost to tracking it here in the WriteBuffer. Is it worth paying?
Provided we keep the branch, we can always grab the code again if we decide we do need it.
let (!wb', !s') = runState (insert k wb) s | ||
in WB wb' s' | ||
where | ||
-- TODO: this seems inelegant, but we want to avoid traversing the Map twice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this is only needed to track the size measure.
src/Database/LSMTree/Internal.hs
Outdated
modifyMVar_ (tableContent thEnv) $ \tableContent -> do | ||
let !wb = WB.addEntries (resolveMupsert (tableConfig th)) es $ | ||
tableWriteBuffer tableContent | ||
if WB.sizeInBytes wb <= fromIntegral (confWriteBufferAlloc (tableConfig th)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on balance we should drop it (but keep the code on a branch in case we decide to resurrect it).
flushWriteBuffer :: | ||
m ~ IO | ||
=> TableHandleEnv m h | ||
-> Levels (Handle h) | ||
-> WriteBuffer | ||
-> m (TableContent h) | ||
flushWriteBuffer thEnv levels wb = do | ||
n <- incrUniqCounter (tablesSessionUniqCounter thEnv) | ||
let runPaths = Paths.runPath (tableSessionRoot thEnv) n | ||
run <- Run.fromWriteBuffer (tableHasFS thEnv) runPaths wb | ||
let levels' = addRunToLevels run levels | ||
return TableContent { | ||
tableWriteBuffer = WB.empty | ||
, tableLevels = levels' | ||
, tableCache = mkLevelsCache levels' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't matter if we go by key count only.
We are going to use a write buffer allocation method that is defined in terms of number of entries, instead of number of bytes. We're keeping the code in this PR around in case we want to change the write buffer allocation method to byte size later, and we've opened #259 for the alternative method. |
No description provided.