Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Refactor split/spill #148

Merged
merged 1 commit into from
May 3, 2014
Merged

Refactor split/spill #148

merged 1 commit into from
May 3, 2014

Conversation

benbjohnson
Copy link
Member

This commit refactors the split/spill functionality. The previous implementation attempted to avoid additional allocations by doing some fancy depth sorting and looping trickery but it was really hard to follow/debug/extend.

The new implementation adds a node.children slice that is populated with in-memory child nodes. This makes the split() and spill() code a simple recursive algorithm and will make #94 easy to fix. This refactor is also required to make inline buckets (#124) possible.

Although there is an extra allocation required for branch nodes, I would expect it to amortize well over transactions. I'll add line notes to the PR for further explanation.

Note: This has no effect on the file format or the API. It's just an internal change.

/cc @snormore @mkobetic

tx *Tx
buckets map[string]*Bucket
rootNode *node
nodes map[pgid]*node
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pending slice was used for tracking in-progress split nodes during allocation. This was in case an allocation caused a remap of the mmap, all pending nodes could be dereferenced.

Since the split/spill phases have been separated this is no longer necessary.

@benbjohnson
Copy link
Member Author

@mkobetic
Copy link
Contributor

mkobetic commented May 3, 2014

I'm still catching up on how the nested buckets work. It seems that a nested bucket is a key/value pair where the value is the simple bucket struct {root pgid, sequence uint64} right? Simple enough. The bit I'm struggling with is what exactly is the purpose of Bucket.spill() ? Why does it need to recurse into nested buckets?

@mkobetic
Copy link
Contributor

mkobetic commented May 3, 2014

OK it seems spilling is the creation of the storage pages from the materialized nodes during commit. Nodes should already be split and rebalanced as needed by that time, so they just need to be written out, right?

@@ -703,13 +765,12 @@ func TestBucket_Delete_Quick(t *testing.T) {
db.View(func(tx *Tx) error {
b := tx.Bucket([]byte("widgets"))
for j, exp := range items {
var value = b.Get(exp.Key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: value := b.Get(exp.Key) for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snormore I've been moving toward using var since it makes declarations more explicit but you're right -- it's inconsistent. I'll change it back.

@snormore
Copy link
Member

snormore commented May 3, 2014

LGTM

@benbjohnson
Copy link
Member Author

@mkobetic Yep. You're correct. Spilling simply writes the nodes to byte slices and those bytes slices get written out to disk. The spill() function calls split() on the nodes because splitting can be somewhat expensive. It has to go through each inode and add up the size. Maybe we can do some caching in the future so we can split as inodes are added.

The spilling gets called on nested buckets only when those nested buckets have been materialized. So if you don't use a bucket in a transaction then it won't be spilled. Changing the spill to be recursive makes it easy to determine the size of a bucket is small enough to fit inline on a parent page.

benbjohnson added a commit that referenced this pull request May 3, 2014
@benbjohnson benbjohnson merged commit 60624c4 into boltdb:master May 3, 2014
@benbjohnson benbjohnson deleted the split-spill branch May 3, 2014 22:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants