Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Add nested buckets. #127

Merged
merged 3 commits into from
Apr 11, 2014
Merged

Add nested buckets. #127

merged 3 commits into from
Apr 11, 2014

Conversation

benbjohnson
Copy link
Member

Overview

This pull request adds the ability to nest buckets inside of other buckets. It also replaces the buckets page with a root bucket. This pull request looks bigger than it is. It's mostly moving a lot of Tx functionality over to the Bucket.

API Changes

Previously, the ability to create, retrieve, and delete buckets was entirely in Tx and buckets. This made sense because buckets were a top-level construct. Now that buckets can exist inside other buckets, all this functionality has moved into the Bucket type.

Buckets and simple byte slice values all exist in the same keyspace. However, trying to create a bucket named foo and simple value with the foo key will return an ErrIncompatibleValue error.

New Bucket functions:

func (b *Bucket) Bucket(name []byte) *Bucket
func (b *Bucket) CreateBucket(key []byte) error
func (b *Bucket) CreateBucketIfNotExists(key []byte) error
func (b *Bucket) DeleteBucket(key []byte) error

Changes in Bucket functions:

  • Bucket.Get() returns nil when a bucket key is provided.
  • Bucket.Put() returns ErrIncompatibleValue when putting to an existing bucket key.
  • Bucket.Delete() returns ErrIncompatibleValue when deleting an existing bucket key.

Tx functions

All the previous Tx bucket functions still exist and simply wrap the root bucket's same functions:

func (tx *Tx) Bucket(name []byte) *Bucket
func (tx *Tx) CreateBucket(name []byte) error
func (tx *Tx) CreateBucketIfNotExists(name []byte) error
func (tx *Tx) DeleteBucket(name []byte) error

However, since buckets are in the regular keyspace, their names are byte slices instead of strings.

File Format Changes

Allowing buckets to be nested made the buckets page redundant. Now, instead of a specialized page type, there is a root bucket that the Tx uses to find and create top-level buckets. This root bucket is only accessible to the Tx so it only stores buckets (e.g. no simple byte slice values).

Because of this change, older Bolt databases are not compatible after this pull request is merged. I'll be merging in an import/export tool through a separate PR before this pull request is merged. That will allow databases to be exported to a generalized JSON format and imported into a new version of Bolt.

The data format version has been bumped to 2 with this PR. Opening previous data format versions of the database will return ErrVersionMismatch error.

Upgrading

To upgrade a Bolt database that uses the previous version (1), you'll need to install Bolt from the data/v1 tag:

$ cd $GOPATH/src/github.com/boltdb/bolt
$ git fetch origin
$ git checkout "data/v1"
$ go install ./cmd/bolt
$ bolt export path/to/db > dump.json

Then install Bolt from master:

$ git checkout master
$ go install ./cmd/bolt
$ bolt import --input dump.json path/to/newdb

Now you can swap out your old database:

$ mv path/to/db path/to/db.bak
$ mv path/to/newdb path/to/db

Other Notes

This pull request lacks a few optimizations that will be coming in future PRs:

  1. Inline buckets (Inline Buckets #124) - Currently buckets require their own root page. Inlining buckets would allow small buckets (< pageSize) to be written with their data into the same value that their bucket header exists.
  2. Allocations (Optimize nested buckets for read-only transactions #125) - Currently subbuckets require allocation of a Bucket for access. This is mostly to track nodes for each bucket but that shouldn't be required for read-only transactions. Tweaking this should make iteration faster for nested buckets.
  3. Cursor Reuse (Cursor reuse #126) - If someone is iterating over two levels of buckets then they should be able to only allocate two cursors and reuse them for each level. Currently Bolt requires a new cursor after every change in a higher level bucket.

Fixes #56.

/cc @tv42 @snormore @mkobetic

@mkobetic
Copy link
Contributor

mkobetic commented Apr 9, 2014

Haven't looked at the code yet, but a thought invoked by your excellent writeup: It seems a bit too easy to delete "buckets of data" by accidentally overwriting the bucket key with a simple value. Maybe consider treating it as an error instead, i.e. you need to explicitly delete the bucket before you can reuse the key for simple value. FWIW it would be similar to directories and files that way

@benbjohnson
Copy link
Member Author

@mkobetic That's a good point. I'll change it to disallow overwrite by different value types.

@benbjohnson
Copy link
Member Author

@mkobetic I changed Put() and Delete() to return ErrIncompatibleValue if an existing bucket exists for the provided key. I also changed CreateBucket() and DeleteBucket() to return ErrIncompatibleValue if an existing non-bucket value exists for the provided key.

I added a bunch of test coverage for everything and fixed some bugs today. I'm feeling pretty good about it now. Tomorrow I'll add nested buckets to the randomized tests, add an importer/exporter, and then merge the PR.

/cc @snormore

This commit adds the ability to create buckets inside of other buckets.
It also replaces the buckets page with a root bucket.

Fixes boltdb#56.
Conflicts:
	db_test.go
	tx_test.go
@benbjohnson
Copy link
Member Author

I updated the PR comment to show how to upgrade existing Bolt databases to the "Version 2" format and I documented the new API.

I rewrote the randomized testing yesterday and I found some deadlock issues so those are fixed.

I also fixed #101 while I was in there.

I've gone through and reviewed the PR several times and it seems to be pretty solid.

benbjohnson added a commit that referenced this pull request Apr 11, 2014
@benbjohnson benbjohnson merged commit 2c8020e into boltdb:master Apr 11, 2014
@benbjohnson benbjohnson deleted the nested-keys branch April 11, 2014 21:11
@ericnp
Copy link

ericnp commented Jul 13, 2015

I don't see in the docs or in the tests how to traverse the bucket trees (i.e. nested buckets). It seems when cursor or ForEach hits upon a sub-bucket key it returns the value as nil. Is that the only case where you can get nil; i.e., if you see nil value in that context is that guaranteed to be a subbucket? And then how do you access the sub-bucket to traverse it? ~~~I guess it's actually stored at the root level of the db? Meaning all buckets actually share that one namespace and you have to worry about naming collisions among all subbuckets and top level buckets? Sorry if this is a stupid question. I'd be happy to work on embellishing the documentation with this stuff if you welcome contributions like that.~~~

Awesome software btw, thank you for open sourcing :)

@tv42
Copy link
Contributor

tv42 commented Jul 13, 2015

http://godoc.org/github.com/boltdb/bolt#Cursor
"Cursors see nested buckets with value == nil."

http://godoc.org/github.com/boltdb/bolt#Bucket.Bucket
"Bucket retrieves a nested bucket by name."

@ericnp
Copy link

ericnp commented Jul 14, 2015

Oh, right, Bucket.Bucket. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nested Keys
4 participants