Skip to content
python-hll
Python Makefile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
python_hll
tests
.flake8
.gitignore
AUTHORS.rst
CONTRIBUTING.rst
HISTORY.rst
LICENSE
MANIFEST.in
Makefile
README.rst
conftest.py
requirements_dev.txt
setup.cfg
setup.py
tox.ini

README.rst

python-hll

Documentation Status https://img.shields.io/badge/github-python--hll-yellow

A Python implementation of HyperLogLog whose goal is to be storage compatible with java-hll, js-hll and postgresql-hll.

NOTE: This is a fairly literal translation/port of java-hll to Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255). Also this implementation is quite slow: for example, in Java HLLSerializationTest takes 12 seconds to run while in Python test_hll_serialization takes 1.5 hours to run (about 400x slower).

Overview

See java-hll for an overview of what HLLs are and how they work.

Usage

Hashing and adding a value to a new HLL:

from python_hll.hll import HLL
import mmh3
value_to_hash = 'foo'
hashed_value = mmh3.hash(value_to_hash)

hll = HLL(13, 5) # log2m=13, regwidth=5
hll.add_raw(hashed_value)

Retrieving the cardinality of an HLL:

cardinality = hll.cardinality()

Unioning two HLLs together (and retrieving the resulting cardinality):

hll1 = HLL(13, 5) # log2m=13, regwidth=5
hll2 = HLL(13, 5) # log2m=13, regwidth=5

# ... (add values to both sets) ...

hll1.union(hll2) # modifies hll1 to contain the union
cardinalityUnion = hll1.cardinality()

Reading an HLL from a hex representation of storage specification, v1.0.0 (for example, retrieved from a PostgreSQL database):

from python_hll.util import NumberUtil
input = '\\x128D7FFFFFFFFFF6A5C420'
hex_string = input[2:]
hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))

Writing an HLL to its hex representation of storage specification, v1.0.0 (for example, to be inserted into a PostgreSQL database):

bytes = hll.to_bytes()
output = "\\x" + NumberUtil.to_hex(bytes, 0, len(bytes))

Also see the API documentation.

Development

See Contributing for how to get started building, testing, and deploying the code.

You can’t perform that action at this time.