-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
lots of changes; fix some issues; better init
- Loading branch information
Showing
10 changed files
with
197 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# PyProbables Changelog | ||
|
||
### Initial Version: | ||
* Probabilistic data structures: | ||
* Bloom Filter | ||
* Bloom Filter (on disk) | ||
* Count-Min Sketch | ||
* Heavy Hitters | ||
* Stream Threshold | ||
* Import and export of each |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
PyProbables | ||
=========== | ||
|
||
**pyprobables** is a python library for probabilistic data structures. The goal | ||
is to provide the developer with a pure-python implementation of common | ||
probabilistic data-structures to use in their work. | ||
|
||
Installation | ||
------------------ | ||
|
||
Pip Installation: ** coming** | ||
|
||
:: | ||
|
||
$ pip install pyprobables | ||
|
||
To install from source: | ||
|
||
To install `pyprobables`, simply clone the `repository on GitHub | ||
<https://github.com/barrust/pyprobables>`__, then run from the folder: | ||
|
||
:: | ||
|
||
$ python setup.py install | ||
|
||
`pyprobables` supports python versions 2.7 and 3.3 - 3.6 | ||
|
||
Documentation | ||
------------- | ||
|
||
Documentation is currently under development. The documentation of | ||
the latest release will be hosted on | ||
`readthedocs.io <http://pyprobables.readthedocs.io/en/stable/?>`__ | ||
|
||
Once completed, you can build the documentation yourself by running: | ||
|
||
:: | ||
|
||
$ pip install sphinx | ||
$ cd docs/ | ||
$ make html | ||
|
||
|
||
|
||
Automated Tests | ||
------------------ | ||
|
||
To run automated tests, one must simply run the following command from the | ||
downloaded folder: | ||
|
||
:: | ||
|
||
$ python setup.py test | ||
|
||
|
||
Quickstart | ||
------------------ | ||
|
||
Import pyprobables and setup a Bloom Filter: | ||
|
||
.. code:: python | ||
>>> from probables import (BloomFilter) | ||
>>> blm = BloomFilter(est_elements=1000, false_positive_rate=0.05) | ||
>>> blm.add('google.com') | ||
>>> blm.check('facebook.com') # should return False | ||
>>> blm.check('google.com') # should return True | ||
Import pyprobables and setup a Count-Min Sketch: | ||
|
||
.. code:: python | ||
>>> from probables import (CountMinSketch) | ||
>>> cms = CountMinSketch(width=1000, depth=5) | ||
>>> cms.add('google.com') # should return 1 | ||
>>> cms.add('facebook.com', 25) # insert 25 at once; should return 25 | ||
See the documentation for other data structures available and for further | ||
|
||
Changelog | ||
------------------ | ||
|
||
Please see the `changelog | ||
<https://github.com/barrust/pyprobables/blob/master/CHANGELOG.md>`__ for a list | ||
of all changes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
''' Probables Hashing library ''' | ||
from __future__ import (unicode_literals, absolute_import, print_function) | ||
from hashlib import (md5, sha256) | ||
from struct import (unpack) # needed to turn digests into numbers | ||
|
||
|
||
UIN64_MAX = 2 ** 64 | ||
|
||
|
||
def default_fnv_1a(key, depth): | ||
''' the default fnv-1a hashing routine ''' | ||
res = list() | ||
tmp = key | ||
for _ in range(depth): | ||
if tmp != key: | ||
tmp = fnv_1a("{0:x}".format(tmp)) | ||
else: | ||
tmp = fnv_1a(key) | ||
res.append(tmp) | ||
return res | ||
|
||
|
||
def fnv_1a(key): | ||
''' 64 bit fnv-1a hash ''' | ||
hval = 14695981039346656073 | ||
fnv_64_prime = 1099511628211 | ||
for t_str in key: | ||
hval = hval ^ ord(t_str) | ||
hval = (hval * fnv_64_prime) % UIN64_MAX | ||
return hval | ||
|
||
|
||
def default_md5(key, depth): | ||
''' the defualt md5 hashing routine ''' | ||
res = list() | ||
tmp = key | ||
for _ in range(depth): | ||
tmp = md5(tmp).digest() | ||
res.append(str(unpack('Q', tmp[:8])[0])) # turn into 64 bit number | ||
return res | ||
|
||
|
||
def default_sha256(key, depth): | ||
''' the defualt sha256 hashing routine ''' | ||
res = list() | ||
tmp = key | ||
for _ in range(depth): | ||
tmp = sha256(tmp).digest() | ||
res.append(str(unpack('Q', tmp[:8])[0])) # turn into 64 bit number | ||
return res |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
''' some utility functions ''' | ||
from __future__ import (unicode_literals, absolute_import, print_function) | ||
import string | ||
import os | ||
|
||
|
||
def is_hex_string(hex_string): | ||
''' check if the passed in string is really hex ''' | ||
if hex_string is None: | ||
return False | ||
return all(c in string.hexdigits for c in hex_string) | ||
|
||
|
||
def is_valid_file(filepath): | ||
''' check if the passed filepath points to a real file ''' | ||
if filepath is None: | ||
return False | ||
return os.path.isfile(filepath) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters