Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
The motivation behind this project was me trying to come up with a hash function suitable for 8M+ keys with zero collisions. When using an array as a hash table, your keyspace becomes 0 to N-1 keys. Generating a hash function here without collisions isn't something I was capable of, so I began searching for other solutions. Using straight MD5 or SHA-1 hashing means you can't exactly fit a proper hash value into an array of N keys. I tried looking into things like gperf but running it on over 8 million keys just wasn't going to happen. I decided on using a balanced binary search tree instead of an array for storing my key/value pairs. The lookups aren't O(1) like a normal hash table, but they're still pretty fast at O(log n). I'm planning on using this hashtree project in my wikigraph project (http://github.com/davidreynolds/wikigraph) eventually. That's why I had to test it with my key dataset. This project is licensed under the MIT License. HashTree is a static library written in C that may eventually be an installable shared library. For now it's meant to be included in your project builds. To build libhashtree.a: $ make To build and run tests/CLI: $ make test $ ./test --convert sample.txt The above command converts sample.txt from a keyfile into a hashtree file (keytree.hat). Load the generated hashtree: $ ./test keytree.hat You'll then be presented with an interface to run hashtree lookups on the key file you imported. In the case of sample.txt: > foo 782023 => foo > bar 6475191 => bar It runs like this until you ctrl+c the program. To include libhashtree.a in your own projects, see the Makefile for how I link it in test.o. Your link step should look something like -lssl -L. -lhashtree