Go Other
Clone or download
lemire Merge pull request #200 from RoaringBitmap/bp/iandnot
AndNot: avoid returning bitmapcontainer with arraycontainer cardinality
Latest commit 3d677d3 Aug 2, 2018
Permalink
Failed to load latest commit information.
testdata add a fail input data based on fuzz Apr 21, 2018
.gitignore ignore test generated files Dec 21, 2016
.gitmodules In the process of deleting the real data submodule Jul 15, 2017
.travis.yml Moving travis versions to quotes. Jul 20, 2018
AUTHORS adding @joenall to AUTHORS May 29, 2018
CONTRIBUTORS adding fredim Jul 26, 2018
LICENSE Standardizing copyright statements. Jun 8, 2016
LICENSE-2.0.txt Standardizing copyright statements. Jun 8, 2016
Makefile Code cleaning: May 28, 2018
README.md adding a comment to the code sample regarding iterators May 29, 2018
aggregation_test.go Fix for #183 May 7, 2018
arraycontainer.go Merge pull request #199 from fredim/addoffset Jul 26, 2018
arraycontainer_gen.go Marking all the Msg serialization stuff as deprecated as per #165 (co… May 28, 2018
arraycontainer_gen_test.go runContainer16/32 and the associated Dec 16, 2016
arraycontainer_test.go Fixes #190, ports add offset from java Jul 25, 2018
benchmark_test.go Minor code cleaning. Jul 21, 2018
bitmapcontainer.go AndNot: Avoid returning bitmapcontainer with arraycontainer cardinality Aug 1, 2018
bitmapcontainer_gen.go Marking all the Msg serialization stuff as deprecated as per #165 (co… May 28, 2018
bitmapcontainer_gen_test.go runContainer16/32 and the associated Dec 16, 2016
bitmapcontainer_test.go Merge pull request #199 from fredim/addoffset Jul 26, 2018
clz.go Making the reverse iterators in bitmap containers depend on fast lead… May 29, 2018
clz_compat.go Fixed pre-1.9 issue. May 29, 2018
clz_test.go Making the reverse iterators in bitmap containers depend on fast lead… May 29, 2018
container_test.go Fix for iIssue #191 - reverseIterator hasNext() termination condition… May 31, 2018
ctz.go Rename ctz and popcnt source file, comment build constraints Aug 27, 2017
ctz_compat.go Applying go fmt Oct 21, 2017
ctz_test.go Code cleaning: May 28, 2018
example_roaring_test.go Code cleaning: May 28, 2018
fastaggregation.go Eliminate unnecessary Bitmap allocation in (*Bitmap).lazyOR Feb 5, 2018
fastaggregation_test.go Introduce generic Fast and Heap aggregation tests Feb 5, 2018
manyiterator.go Adds bitmap.NextMany() functionality - a higher performance bulk next… Feb 22, 2018
parallel.go Minor code cleaning. Jul 21, 2018
parallel_benchmark_test.go Parallel: introduce parallelism parameter Oct 21, 2017
popcnt.go Rename ctz and popcnt source file, comment build constraints Aug 27, 2017
popcnt_amd64.s Fix popcnt build constraints Aug 25, 2017
popcnt_asm.go Fix popcnt build constraints Aug 25, 2017
popcnt_bench_test.go Fix popcnt build constraints Aug 25, 2017
popcnt_compat.go Rename ctz and popcnt source file, comment build constraints Aug 27, 2017
popcnt_generic.go Fix popcnt build constraints Aug 25, 2017
popcnt_slices.go Rename ctz and popcnt source file, comment build constraints Aug 27, 2017
popcnt_slices_test.go Rename ctz and popcnt source file, comment build constraints Aug 27, 2017
priorityqueue.go Removing dead code Dec 21, 2016
real_data_benchmark_test.go Faster ParOr Apr 10, 2018
roaring.go Fixes #190, ports add offset from java Jul 25, 2018
roaring_test.go AndNot: Avoid returning bitmapcontainer with arraycontainer cardinality Aug 1, 2018
roaringarray.go Fixes #190, ports add offset from java Jul 25, 2018
roaringarray_gen.go Marking all the Msg serialization stuff as deprecated as per #165 (co… May 28, 2018
roaringarray_gen_test.go runContainer16/32 and the associated Dec 16, 2016
roaringcow_test.go Fix TestStats for 32 bit architectures Jan 23, 2018
runcontainer.go Merge pull request #199 from fredim/addoffset Jul 26, 2018
runcontainer_gen.go Marking all the Msg serialization stuff as deprecated as per #165 (co… May 28, 2018
runcontainer_gen_test.go Code cleaning: May 28, 2018
runcontainer_test.go Merge pull request #199 from fredim/addoffset Jul 26, 2018
serialization.go Calling go fmt. May 28, 2018
serialization_generic.go AndNot: Avoid returning bitmapcontainer with arraycontainer cardinality Aug 1, 2018
serialization_littleendian.go AndNot: Avoid returning bitmapcontainer with arraycontainer cardinality Aug 1, 2018
serialization_test.go AndNot: Avoid returning bitmapcontainer with arraycontainer cardinality Aug 1, 2018
serializationfuzz.go add fuzzers for stream and buffer deserialization routines Apr 4, 2018
setutil.go Shifts are faster, sometimes. May 3, 2017
setutil_test.go Fix bug in localintersection2by2 Mar 20, 2015
shortiterator.go Fix for iIssue #191 - reverseIterator hasNext() termination condition… May 31, 2018
smat.go renaming smat fuzzing function to FuzzSmat Apr 4, 2018
smat_generate_test.go fuzz testing via the smat library Sep 19, 2016
smat_hits_test.go fuzz testing via the smat library Sep 19, 2016
util.go Minor code cleaning. Jul 21, 2018

README.md

roaring Build Status Coverage Status GoDoc Go Report Card

This is a go version of the Roaring bitmap data structure.

Roaring bitmaps are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Metamarkets' Druid, LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, Pilosa, Microsoft Visual Studio Team Services (VSTS), and eBay's Apache Kylin.

Roaring bitmaps are found to work well in many important applications:

Use Roaring for bitmap compression whenever possible. Do not use other bitmap compression methods (Wang et al., SIGMOD 2017)

The roaring Go library is used by

This library is used in production in several systems, it is part of the Awesome Go collection.

There are also Java and C/C++ versions. The Java, C, C++ and Go version are binary compatible: e.g, you can save bitmaps from a Java program and load them back in Go, and vice versa. We have a format specification.

This code is licensed under Apache License, Version 2.0 (ASL2.0).

Copyright 2016-... by the authors.

References

  • Daniel Lemire, Owen Kaser, Nathan Kurz, Luca Deri, Chris O'Hara, François Saint-Jacques, Gregory Ssi-Yan-Kai, Roaring Bitmaps: Implementation of an Optimized Software Library, Software: Practice and Experience 48 (4), 2018 arXiv:1709.07821
  • Samy Chambi, Daniel Lemire, Owen Kaser, Robert Godin, Better bitmap performance with Roaring bitmaps, Software: Practice and Experience 46 (5), 2016. http://arxiv.org/abs/1402.6407 This paper used data from http://lemire.me/data/realroaring2014.html
  • Daniel Lemire, Gregory Ssi-Yan-Kai, Owen Kaser, Consistently faster and smaller compressed bitmaps with Roaring, Software: Practice and Experience 46 (11), 2016. http://arxiv.org/abs/1603.06549

Dependencies

Dependencies are fetched automatically by giving the -t flag to go get.

they include

  • github.com/smartystreets/goconvey/convey
  • github.com/willf/bitset
  • github.com/mschoch/smat
  • github.com/glycerine/go-unsnap-stream
  • github.com/philhofer/fwd
  • github.com/jtolds/gls

Note that the smat library requires Go 1.6 or better.

Installation

  • go get -t github.com/RoaringBitmap/roaring

Example

Here is a simplified but complete example:

package main

import (
    "fmt"
    "github.com/RoaringBitmap/roaring"
    "bytes"
)


func main() {
    // example inspired by https://github.com/fzandona/goroar
    fmt.Println("==roaring==")
    rb1 := roaring.BitmapOf(1, 2, 3, 4, 5, 100, 1000)
    fmt.Println(rb1.String())

    rb2 := roaring.BitmapOf(3, 4, 1000)
    fmt.Println(rb2.String())

    rb3 := roaring.New()
    fmt.Println(rb3.String())

    fmt.Println("Cardinality: ", rb1.GetCardinality())

    fmt.Println("Contains 3? ", rb1.Contains(3))

    rb1.And(rb2)

    rb3.Add(1)
    rb3.Add(5)

    rb3.Or(rb1)

    // computes union of the three bitmaps in parallel using 4 workers  
    roaring.ParOr(4, rb1, rb2, rb3)
    // computes intersection of the three bitmaps in parallel using 4 workers  
    roaring.ParAnd(4, rb1, rb2, rb3)


    // prints 1, 3, 4, 5, 1000
    i := rb3.Iterator()
    for i.HasNext() {
        fmt.Println(i.Next())
    }
    fmt.Println()

    // next we include an example of serialization
    buf := new(bytes.Buffer)
    rb1.WriteTo(buf) // we omit error handling
    newrb:= roaring.New()
    newrb.ReadFrom(buf)
    if rb1.Equals(newrb) {
    	fmt.Println("I wrote the content to a byte stream and read it back.")
    }
    // you can iterate over bitmaps using ReverseIterator(), Iterator, ManyIterator()
}

If you wish to use serialization and handle errors, you might want to consider the following sample of code:

	rb := BitmapOf(1, 2, 3, 4, 5, 100, 1000)
	buf := new(bytes.Buffer)
	size,err:=rb.WriteTo(buf)
	if err != nil {
		t.Errorf("Failed writing")
	}
	newrb:= New()
	size,err=newrb.ReadFrom(buf)
	if err != nil {
		t.Errorf("Failed reading")
	}
	if ! rb.Equals(newrb) {
		t.Errorf("Cannot retrieve serialized version")
	}

Given N integers in [0,x), then the serialized size in bytes of a Roaring bitmap should never exceed this bound:

8 + 9 * ((long)x+65535)/65536 + 2 * N

That is, given a fixed overhead for the universe size (x), Roaring bitmaps never use more than 2 bytes per integer. You can call BoundSerializedSizeInBytes for a more precise estimate.

Documentation

Current documentation is available at http://godoc.org/github.com/RoaringBitmap/roaring

Goroutine safety

In general, it should not generally be considered safe to access the same bitmaps using different goroutines--they are left unsynchronized for performance. Should you want to access a Bitmap from more than one goroutine, you should provide synchronization. Typically this is done by using channels to pass the *Bitmap around (in Go style; so there is only ever one owner), or by using sync.Mutex to serialize operations on Bitmaps.

Coverage

We test our software. For a report on our test coverage, see

https://coveralls.io/github/RoaringBitmap/roaring?branch=master

Benchmark

Type

     go test -bench Benchmark -run -

To run benchmarks on Real Roaring Datasets run the following:

go get github.com/RoaringBitmap/real-roaring-datasets
BENCH_REAL_DATA=1 go test -bench BenchmarkRealData -run -

Iterative use

You can use roaring with gore:

  • go get -u github.com/motemen/gore
  • Make sure that $GOPATH/bin is in your $PATH.
  • go get github/RoaringBitmap/roaring
$ gore
gore version 0.2.6  :help for help
gore> :import github.com/RoaringBitmap/roaring
gore> x:=roaring.New()
gore> x.Add(1)
gore> x.String()
"{1}"

Fuzzy testing

You can help us test further the library with fuzzy testing:

     go get github.com/dvyukov/go-fuzz/go-fuzz
     go get github.com/dvyukov/go-fuzz/go-fuzz-build
     go test -tags=gofuzz -run=TestGenerateSmatCorpus
     go-fuzz-build github.com/RoaringBitmap/roaring
     go-fuzz -bin=./roaring-fuzz.zip -workdir=workdir/ -timeout=200

Let it run, and if the # of crashers is > 0, check out the reports in the workdir where you should be able to find the panic goroutine stack traces.

Alternative in Go

There is a Go version wrapping the C/C++ implementation https://github.com/RoaringBitmap/gocroaring

For an alternative implementation in Go, see https://github.com/fzandona/goroar The two versions were written independently.

Mailing list/discussion group

https://groups.google.com/forum/#!forum/roaring-bitmaps