Skip to content

Latest commit

 

History

History
197 lines (128 loc) · 4.62 KB

code.rst

File metadata and controls

197 lines (128 loc) · 4.62 KB

pyprobables API

Here you can find the full developer API for the pyprobables project. pyprobables provides a suite of probabilistic data-structures to be used in data analytics and data science projects.

Data Structures and Classes

Bloom Filters

Bloom Filters are a class of probabilistic data structures used for set operations. Bloom Filters guarantee a zero percent false negative rate and a predetermined false positive rate. Once the number of elements inserted exceeds the estimated elements, the false positive rate will increase over the desired amount.

Further Reading

BloomFilter

probables.BloomFilter

BloomFilterOnDisk

probables.BloomFilterOnDisk

For more information of all methods and properties, see BloomFilter.

ExpandingBloomFilter

probables.ExpandingBloomFilter

RotatingBloomFilter

probables.RotatingBloomFilter

CountingBloomFilter

probables.CountingBloomFilter

Cuckoo Filters

Cuckoo filters are a space efficient data structure that supports set membership testing. Cuckoo filters support insertion, deletion, and lookup of elements with low overhead and few false positive results. The name is derived from the cuckoo hashing strategy used to resolve conflicts.

Further Reading

CuckooFilter

probables.CuckooFilter

CountingCuckooFilter

probables.CountingCuckooFilter

Count-Min Sketches

Count-Min Sketches, and its derivatives, are good for estimating the number of occurrences of an element in streaming data while not needing to retain all the data elements. The result is a probabilistic count of elements inserted into the data structure. It will always provide the maximum number of times a data element was encountered. Notice that the result may be more than the true number of times it was inserted, but never fewer.

Further Reading

CountMinSketch

probables.CountMinSketch

CountMeanSketch

probables.CountMeanSketch

For more information of all methods and properties, see CountMinSketch.

CountMeanMinSketch

probables.CountMeanMinSketch

For more information of all methods and properties, see CountMinSketch.

HeavyHitters

probables.HeavyHitters

For more information of all methods and properties, see CountMinSketch.

StreamThreshold

probables.StreamThreshold

For more information of all methods and properties, see CountMinSketch.

QuotientFilter

Quotient filters are an aproximate membership query filter (AMQ) that is both space efficient and returns a zero false negative rate and a probablistic false positive rate. Unlike Bloom filters, the quotient filter only requires a single hash of the element to insert. The upper q bits denote the location within the filter while the lower r bits are stored in the filter.

Quotient filters provide some useful benifits over Bloom filters including:

  • Merging of two filters (not union)
  • Resizing of the filter
  • Ability to remove elements

Further Reading

QuotientFilter

probables.QuotientFilter

Utilities

Bitarray

probables.utilities.Bitarray

Exceptions

probables.exceptions

Hashing Functions

probables.hashes

Indices and Tables

  • home
  • quickstart
  • genindex
  • modindex
  • search