Implementing a dictionary data structure using a red-black tree.
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


The motivation behind this project was me trying to come up with a hash function suitable for 8M+ keys with zero collisions. When using an array as a hash table, your keyspace becomes 0 to N-1 keys. Generating a hash function here without collisions isn't something I was capable of, so I began searching for other solutions. Using straight MD5 or SHA-1 hashing means you can't exactly fit a proper hash value into an array of N keys.

I tried looking into things like gperf but running it on over 8 million keys just wasn't going to happen. I decided on using a balanced binary search tree instead of an array for storing my key/value pairs. The lookups aren't O(1) like a normal hash table, but they're still pretty fast at O(log n).

I'm planning on using this hashtree project in my wikigraph project ( eventually. That's why I had to test it with my key dataset.

This project is licensed under the MIT License.

HashTree is a static library written in C that may eventually be an installable shared library. For now it's meant to be included in your project builds.

To build libhashtree.a:

    $ make

To build and run tests/CLI:

    $ make test
    $ ./test --convert sample.txt

The above command converts sample.txt from a keyfile into a hashtree file (keytree.hat).

Load the generated hashtree:

    $ ./test keytree.hat

You'll then be presented with an interface to run hashtree lookups on the key file you imported. In the case of sample.txt:
    > foo
    782023 => foo
    > bar
    6475191 => bar

It runs like this until you ctrl+c the program.

To include libhashtree.a in your own projects, see the Makefile for how I link it in test.o. Your link step should look something like -lssl -L. -lhashtree