moss - a simple, fast, ordered, persistable, key-val storage library for golang
Go Python Makefile
Switch branches/tags
Nothing to show
Clone or download
abhinavdangeti MB-30339: Log segment, file and other info on mmap failure
Change-Id: I0acc55639ce3c1269378fc24b25ef2b3e5cff540
Reviewed-on: http://review.couchbase.org/96360
Well-Formed: Build Bot <build@couchbase.com>
Reviewed-by: Steve Yen <steve.yen@gmail.com>
Tested-by: Abhinav Dangeti <abhinav@couchbase.com>
Latest commit 956632e Jul 2, 2018
Permalink
Failed to load latest commit information.
graph MB-23319: Adapt DGM Test with multiple writers Jul 18, 2017
.gitignore moved smat tests to main dir May 27, 2016
.travis.yml attempted travis CI fix by upgrading to go 1.6 May 26, 2016
CONTRIBUTING.md added CONTRIBUTING.md Dec 5, 2016
DESIGN.md white-space / code formatting scrub Aug 28, 2017
IDEAS.md update IDEAS.md, as nested collections & histograms are done May 10, 2017
LICENSE.txt added LICENSE - Apache 2.0 Feb 6, 2016
Makefile add gofmt to the metalinter invocation Mar 28, 2017
README-dgm.md white-space / code formatting scrub Aug 28, 2017
README-smat.md MB-21477 - refresh smat w/ more close/reopen/compaction Oct 25, 2016
README.md Update README.md & DESIGN.md with child-collection support May 26, 2017
api.go Add Stats to track Close() snapshots Apr 26, 2018
api_test.go MB-18611 - added json tags for CollectionOptions funcs Mar 10, 2016
benchmark_get_test.go Updating README, adding benchmark tests for Collection.Get() Mar 23, 2017
benchmark_store_big_test.go benchmark store test for 1 billion 20 byte keys, 0-sized values Jul 24, 2016
benchmark_store_test.go MB-23561: Fix file ref counts to allow cleanup on compaction Mar 28, 2017
benchmark_test.go update Segment interface to allow for different implementations Mar 27, 2017
child_collection_test.go fixup comments / double periods / capitalization Mar 4, 2017
collection.go Add Stats to track Close() snapshots Apr 26, 2018
collection_merger.go MB-25419: Dedicated go routine to manage idle merger wakes Oct 25, 2017
collection_stats.go MB-22834: Adding histograms to store, coll Feb 20, 2017
collection_test.go MB-25386: Disable unit test checks that fail on Windows Jul 25, 2017
dgm_moss_test.go MB-23319: Adapt DGM Test with multiple writers Jul 18, 2017
dgm_test.go MB-25419: Don't create NewTimers for every idleCompaction loop Jul 26, 2017
file.go MB-24645 - Improving the compaction related stats Jul 13, 2017
file_test.go MB-19576 - moss store compaction May 26, 2016
iterator.go white-space / code formatting scrub Aug 28, 2017
iterator_single.go MB-23567 - fix iterator-single test blocker / regression Mar 28, 2017
iterator_test.go Fix Data Race in iterator_test Jul 21, 2017
mmap.go MB-25705: Introducing an in-memory index for segments Sep 12, 2017
mmap_gen.go white-space / code formatting scrub Aug 28, 2017
mmap_test.go MB-24680: Leveled Compactions Part II - The Algorithm Jun 20, 2017
mmap_windows.go MB-25386: Disable unit test checks that fail on Windows Jul 25, 2017
persister.go MB-25419: Don't try to wake merger if it isn't asleep Aug 1, 2017
persister_test.go MB-25419: Dedicated go routine to manage idle merger wakes Oct 25, 2017
ping.go fix all straightforward cases reported by vetshadow Mar 28, 2017
ping_test.go ping_test.go added Feb 27, 2016
segment.go MB-28894 - moss.(*segment).findKeyPos crash Apr 25, 2018
segment_index.go fixups of misc comments / whitespace Jan 31, 2018
segment_index_test.go MB-25705: Rounding off read/write times to 3 decimal spaces Sep 13, 2017
segment_stack.go Add Stats to track Close() snapshots Apr 26, 2018
segment_stack_merge.go Add Stats to track Close() snapshots Apr 26, 2018
segment_test.go MB-22479: Adding key and value size limit checks Jan 31, 2017
slice_util.go MB-100 - refresh API docs and README Nov 12, 2016
slice_util_safe.go safe variant of endian() returns 'unknown' Jul 30, 2016
smat.go MB-23575: fix smat test helper var name Mar 29, 2017
smat_crash_test.go using gofuzz build tag for smat tests May 31, 2016
smat_generate_test.go MB-21477 - refresh smat w/ more close/reopen/compaction Oct 25, 2016
store.go MB-27607 - mossStore should delete half-written files on an error Jan 21, 2018
store_api.go MB-25705: Set defaults for segment keys index Nov 7, 2017
store_compact.go MB-29664 - provide 'base' seg-stack during partial compactions May 16, 2018
store_footer.go MB-30339: Log segment, file and other info on mmap failure Jul 2, 2018
store_previous.go Fix data race in snapshotPrevious() Jul 26, 2017
store_revert.go Implement Child Collections in mossStore Feb 27, 2017
store_stats.go MB-24645 - Improved compaction stats Jul 10, 2017
store_test.go MB-29664 - provide 'base' seg-stack during partial compactions May 16, 2018
util_merge.go fix issues identified by the lint tool Mar 28, 2017
wrap.go MB-23576: Go metalint Make snapshotWrapper public Mar 30, 2017

README.md

moss

moss provides a simple, fast, persistable, ordered key-val collection implementation as a 100% golang library.

moss stands for "memory-oriented sorted segments".

Build Status Coverage Status GoDoc Go Report Card

Features

  • ordered key-val collection API
  • 100% go implementation
  • key range iterators
  • snapshots provide for isolated reads
  • atomic mutations via a batch API
  • merge operations allow for read-compute-write optimizations for write-heavy use cases (e.g., updating counters)
  • concurrent readers and writers don't block each other
  • child collections allow multiple related collections to be atomically grouped
  • optional, advanced API's to avoid extra memory copying
  • optional lower-level storage implementation, called "mossStore", that uses an append-only design for writes and mmap() for reads, with configurable compaction policy; see: OpenStoreCollection()
  • mossStore supports navigating back through previous commit points in read-only fashion, and supports reverting to previous commit points.
  • optional persistence hooks to allow write-back caching to a lower-level storage implementation that advanced users may wish to provide (e.g., you can hook moss up to leveldb, sqlite, etc)
  • event callbacks allow the monitoring of asynchronous tasks
  • unit tests
  • fuzz tests via go-fuzz & smat (github.com/mschoch/smat); see README-smat.md
  • moss store's diagnostic tool: mossScope

License

Apache 2.0

Example

import github.com/couchbase/moss

c, err := moss.NewCollection(moss.CollectionOptions{})
defer c.Close()

batch, err := c.NewBatch(0, 0)
defer batch.Close()

batch.Set([]byte("car-0"), []byte("tesla"))
batch.Set([]byte("car-1"), []byte("honda"))

err = c.ExecuteBatch(batch, moss.WriteOptions{})

ss, err := c.Snapshot()
defer ss.Close()

ropts := moss.ReadOptions{}

val0, err := ss.Get([]byte("car-0"), ropts) // val0 == []byte("tesla").
valX, err := ss.Get([]byte("car-not-there"), ropts) // valX == nil.

// A Get can also be issued directly against the collection
val1, err := c.Get([]byte("car-1"), ropts) // val1 == []byte("honda").

For persistence, you can use...

store, collection, err := moss.OpenStoreCollection(directoryPath,
    moss.StoreOptions{}, moss.StorePersistOptions{})

Design

The design is similar to a (much) simplified LSM tree, with a stack of sorted, immutable key-val arrays or "segments".

To incorporate the next Batch of key-val mutations, the incoming key-val entries are first sorted into an immutable "segment", which is then atomically pushed onto the top of the stack of segments.

For readers, a higher segment in the stack will shadow entries of the same key from lower segments.

Separately, an asynchronous goroutine (the "merger") will continuously merge N sorted segments to keep stack height low.

In the best case, a remaining, single, large sorted segment will be efficient in memory usage and efficient for binary search and range iteration.

Iterations when the stack height is > 1 are implementing using a N-way heap merge.

In this design, the stack of segments is treated as immutable via a copy-on-write approach whenever the stack needs to be "modified". So, multiple readers and writers won't block each other, and taking a Snapshot is also a similarly cheap operation by cloning the stack.

See also the DESIGN.md writeup.

Limitations and considerations

NOTE: Keys in a Batch must be unique. That is, myBatch.Set("x", "foo"); myBatch.Set("x", "bar") is not supported. Applications that do not naturally meet this requirement might maintain their own map[key]val data structures to ensure this uniqueness constraint.

Max key length is 2^24 (24 bits used to track key length).

Max val length is 2^28 (28 bits used to track val length).

Metadata overhead for each key-val operation is 16 bytes.

Read performance characterization is roughly O(log N) for key-val retrieval.

Write performance characterization is roughly O(M log M), where M is the number of mutations in a batch when invoking ExecuteBatch().

Those performance characterizations, however, don't account for background, asynchronous processing for the merging of segments and data structure maintenance.

A background merger task, for example, that is too slow can eventually stall ingest of new batches. (See the CollectionOptions settings that limit segment stack height.)

As another example, one slow reader that holds onto a Snapshot or onto an Iterator for a long time can hold onto a lot of resources. Worst case is the reader's Snapshot or Iterator may delay the reclaimation of large, old segments, where incoming mutations have obsoleted the immutable segments that the reader is still holding onto.

Error handling

Please note that the background goroutines of moss may run into errors, for example during optional persistence operations. To be notified of these cases, your application can provide (highly recommended) an optional CollectionOptions.OnError callback func which will be invoked by moss.

Logging

Please see the optional CollectionOptions.Log callback func and the CollectionOptions.Debug flag.

Performance

Please try go test -bench=. for some basic performance tests.

Each performance test will emit output that generally looks like...

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
spec: {numItems:1000000 keySize:20 valSize:100 batchSize:100 randomLoad:false noCopyValue:false accesses:[]}
     open || time:     0 (ms) |        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s || cumulative:        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s
     load || time:   840 (ms) |  1190476 wop/s |   139508 wkb/s |        0 rop/s |        0 rkb/s || cumulative:  1190476 wop/s |   139508 wkb/s |        0 rop/s |        0 rkb/s
    drain || time:   609 (ms) |        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s || cumulative:   690131 wop/s |    80874 wkb/s |        0 rop/s |        0 rkb/s
    close || time:     0 (ms) |        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s || cumulative:   690131 wop/s |    80874 wkb/s |        0 rop/s |        0 rkb/s
   reopen || time:     0 (ms) |        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s || cumulative:   690131 wop/s |    80874 wkb/s |        0 rop/s |        0 rkb/s
     iter || time:    81 (ms) |        0 wop/s |        0 wkb/s | 12344456 rop/s |  1446616 rkb/s || cumulative:   690131 wop/s |    80874 wkb/s | 12344456 rop/s |  1446616 rkb/s
    close || time:     2 (ms) |        0 wop/s |        0 wkb/s |        0 rop/s |        0 rkb/s || cumulative:   690131 wop/s |    80874 wkb/s | 12344456 rop/s |  1446616 rkb/s
total time: 1532 (ms)
file size: 135 (MB), amplification: 1.133
BenchmarkStore_numItems1M_keySize20_valSize100_batchSize100-8

There are various phases in each test...

  • open - opening a brand new moss storage instance
  • load - time to load N sequential keys
  • drain - additional time after load for persistence to complete
  • close - time to close the moss storage instance
  • reopen - time to reopen the moss storage instance (OS/filesystem caches are still warm)
  • iter - time to sequentially iterate through key-val items
  • access - time to perform various access patterns, like random or sequential reads and writes

The file size measurement is after final compaction, with amplification as a naive calculation to compare overhead against raw key-val size.

Contributing changes

Please see the CONTRIBUTING.md document.