Python C extension to split a file by variable block sizes using a Rabin fingerprint.
C Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Build Status

Python C extension to find chunk boundaries using a Rabin-Karp rolling hash. This is useful to slice data into variable sized chunks based on the content. If a file is changed the modified chunk (and maybe the next one) is affected, but not the following chunks. This makes it useful to apply data deduplication before sending data over slow connections or storing multiple similar files (like backups using tar snapshots).

Have a look at for an introduction to the Rabin-Karp rolling hash.


Installation requires a working GCC compiler and Python development libraries.

git clone git://
cd python-rabin-fingerprint
sudo python install