ruby-gcs

This is a small prerelease Ruby library for generating and querying Golomb Compressed Set databases, as produced by gcstool.

Golomb Compressed Sets are similar to Bloom filters - they're space-efficient data structures that let you test whether a given element is a member of a set.

Like Bloom filters, they have a controllable rate of false-positives - they may consider an element a member of a set even if it's never been seen before - while having no false negatives. If a GCS hasn't seen it, it's not on the list.

Their main benefit over Bloom filters is being a little more compact - particularly with larger lists and better false-positive rates.

Usage

ruby-gcs comes with two small command-line utilities to create and query GCS databases.

bin/create

% wc -l /usr/share/dict/words
  235924 words
% gzip --stdout -9 /usr/share/dict/words >words.gz && du -Ah words.gz
  737K    words.gz
% bin/create 10000000 /usr/share/dict/words words-p10m.gcs
% du -Ah words-p10m.gcs
  741K    words-p10m.gcs

So, about the same size as a gzip -9 file, at the expense of 1 in every 10 million queries for words not in the dictionary being "found".

bin/query

% bin/query words-p10m.gcs
abiogenesis
Found in 0.77ms
llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch
Not found in 1.73ms

Refer to these to see how the API works.

The full 500 million-strong pwned-passwords-2.0.txt imports to 1.5GB with a 1 in 10 million false-positive rate - some improvement on the 9GB compressed hash list, 30GB uncompressed text file, or 1.95GB Bloom filter.

You're advised to use gcstool for generating such large databases, as it's both much faster (~20x) and much more memory efficient (~8x).

TODO

Test suite.
Less basic tools.
On-disk intermediate state for building large files.
Better documentation.
Plugin for Rodauth.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
bin		bin
lib		lib
test		test
.gitignore		.gitignore
.travis.yml		.travis.yml
Gemfile		Gemfile
LICENSE.txt		LICENSE.txt
README.md		README.md
Rakefile		Rakefile
gcs.gemspec		gcs.gemspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ruby-gcs

Usage

bin/create

bin/query

TODO

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Freaky/ruby-gcs

Folders and files

Latest commit

History

Repository files navigation

ruby-gcs

Usage

bin/create

bin/query

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages