Roaring bitmaps in Go (golang)
Go Other
Latest commit 943a2da Dec 23, 2016 @lemire lemire committed on GitHub correcting link to C version
Permalink
Failed to load latest commit information.
testdata add readl-roaring-datasets as git submodule Dec 21, 2016
.gitignore ignore test generated files Dec 21, 2016
.gitmodules add readl-roaring-datasets as git submodule Dec 21, 2016
.travis.yml Extending cross-compilation. Dec 23, 2016
AUTHORS Standardizing copyright statements. Jun 8, 2016
CONTRIBUTORS Adding @0x4139 to the contributors Dec 21, 2016
LICENSE Standardizing copyright statements. Jun 8, 2016
LICENSE-2.0.txt Standardizing copyright statements. Jun 8, 2016
Makefile add fetch-real-roaring-datasets to Makefile Dec 21, 2016
README.md correcting link to C version Dec 24, 2016
arraycontainer.go atg. decommission numberOfTrailingZeros in favor of countTrailingZero… Dec 21, 2016
arraycontainer_gen.go runContainer16/32 and the associated Dec 16, 2016
arraycontainer_gen_test.go runContainer16/32 and the associated Dec 16, 2016
arraycontainer_test.go Ran gofmt -s on source code. Dec 21, 2016
benchmark_test.go runContainer16/32 and the associated Dec 16, 2016
bitmapcontainer.go atg. update to use faster deBruijn method; also fix bug in deBruijn impl Dec 21, 2016
bitmapcontainer_gen.go runContainer16/32 and the associated Dec 16, 2016
bitmapcontainer_gen_test.go runContainer16/32 and the associated Dec 16, 2016
bitmapcontainer_test.go Ran gofmt -s on source code. Dec 21, 2016
container_test.go atg. update to use faster deBruijn method; also fix bug in deBruijn impl Dec 21, 2016
ctz.go atg. elminate ctz asm. fixes #89 Dec 23, 2016
ctz_test.go atg. elminate ctz asm. fixes #89 Dec 23, 2016
example_roaring_test.go go vet and golint cleanup Dec 17, 2016
fastaggregation.go runContainer16/32 and the associated Dec 16, 2016
fastaggregation_test.go Adding a few more tests. Dec 21, 2016
popcnt.go runContainer16/32 and the associated Dec 16, 2016
popcnt_amd64.s Cleaning popcount code. Feb 11, 2016
popcnt_asm.go Renaming. Feb 12, 2016
popcnt_generic.go Fixing a typo in the readme and moving todos in a separate file. Aug 25, 2014
popcnt_test.go Adding a few tests to improve coverage. Apr 9, 2016
priorityqueue.go Removing dead code Dec 21, 2016
rle.go Ran gofmt -s on source code. Dec 21, 2016
rle16.go Ran gofmt -s on source code. Dec 21, 2016
rle16_gen.go private runIterators, msgpack docs++ Dec 17, 2016
rle16_gen_test.go private runIterators, msgpack docs++ Dec 17, 2016
rle16_test.go Ran gofmt -s on source code. Dec 21, 2016
rle_gen.go private runIterators, msgpack docs++ Dec 17, 2016
rle_gen_test.go private runIterators, msgpack docs++ Dec 17, 2016
rle_test.go Ran gofmt -s on source code. Dec 21, 2016
rlecommon.go atg. improve rlecommon test coverage Dec 17, 2016
rlei.go Ran gofmt -s on source code. Dec 21, 2016
rlei_test.go Ran gofmt -s on source code. Dec 21, 2016
roaring.go More simplification and modularity for serialization. Dec 22, 2016
roaring_test.go Simplifying readFrom Dec 22, 2016
roaringarray.go Simplifying readFrom Dec 22, 2016
roaringarray_gen.go runContainer16/32 and the associated Dec 16, 2016
roaringarray_gen_test.go runContainer16/32 and the associated Dec 16, 2016
roaringcow_test.go atg. rebase and fix serialization size checks Dec 21, 2016
serialization.go provide benchmarks for the countTrailingZero Dec 21, 2016
serialization_generic.go atg. elminate ctz asm. fixes #89 Dec 23, 2016
serialization_littleendian.go provide benchmarks for the countTrailingZero Dec 21, 2016
serialization_test.go Ran gofmt -s on source code. Dec 21, 2016
setutil.go atg. decommission numberOfTrailingZeros in favor of countTrailingZero… Dec 21, 2016
setutil_test.go Fix bug in localintersection2by2 Mar 20, 2015
shortiterator.go cleanup more lint issues Aug 6, 2014
smat.go 1. Adding RunOptimize function, Dec 16, 2016
smat_generate_test.go fuzz testing via the smat library Sep 19, 2016
smat_hits_test.go fuzz testing via the smat library Sep 19, 2016
util.go provide benchmarks for the countTrailingZero Dec 21, 2016

README.md

roaring Build Status Coverage Status GoDoc Go Report Card

This is a go port of the Roaring bitmap data structure.

Roaring is used by Apache Spark (https://spark.apache.org/), Apache Kylin (http://kylin.io), Druid.io (http://druid.io/), Whoosh (https://pypi.python.org/pypi/Whoosh/) and Apache Lucene (http://lucene.apache.org/) (as well as supporting systems such as Solr and Elastic).

The original java version can be found at https://github.com/RoaringBitmap/RoaringBitmap and there is a C/C++ version at https://github.com/RoaringBitmap/CRoaring

The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. The format specification can be found at https://github.com/RoaringBitmap/RoaringFormatSpec

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Copyright 2016 by the authors.

References

Dependencies

Dependencies are fetched automatically by giving the -t flag to go get.

they include

  • github.com/smartystreets/goconvey/convey
  • github.com/willf/bitset
  • github.com/mschoch/smat

Note that the smat library requires Go 1.6 or better.

Installation

  • go get -t github.com/RoaringBitmap/roaring

Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring.NewBitmap()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.NewBitmap()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
        fmt.Println("I wrote the content to a byte stream and read it back.")
    }
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

    rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    buf := new(bytes.Buffer)
    size,err:=rb.WriteTo(buf)
    if err != nil {
        t.Errorf("Failed writing")
    }
    newrb:= NewBitmap()
    size,err=newrb.ReadFrom(buf)
    if err != nil {
        t.Errorf("Failed reading")
    }
    if ! rb.Equals(newrb) {
        t.Errorf("Cannot retrieve serialized version")
    }

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring

Goroutine safety

In general, it should not generally be considered safe to access the same bitmaps using different goroutines--they are left unsynchronized for performance. Should you want to access a Bitmap from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *Bitmap around (in Go style; so there is only ever one owner), or by using sync.Mutex to serialize operations on Bitmaps.

Coverage

We test our software. For a report on our test coverage, see

https://coveralls.io/github/RoaringBitmap/roaring?branch=master

Benchmark

Type

     go test -bench Benchmark -run -

Iterative use

You can use roaring with gore:

  • go get -u github.com/motemen/gore
  • Make sure that $GOPATH/bin is in your $PATH.
  • go get github/RoaringBitmap/roaring
$ gore
gore version 0.2.6  :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"

Fuzzy testing

You can help us test further the library with fuzzy testing:

     go get github.com/dvyukov/go-fuzz/go-fuzz
     go get github.com/dvyukov/go-fuzz/go-fuzz-build
     go test -tags=gofuzz -run=TestGenerateSmatCorpus
     go-fuzz-build github.com/RoaringBitmap/roaring
     go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200

Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.

Alternative in Go

There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Mailing list/discussion group

https://groups.google.com/forum/#!forum/roaring-bitmaps