TST - Erlang ternary search tree
The ternary search tree is a prefix binary search tree. Compared to similar tries, it is more memory efficient at the cost of execution speed.
This library provides an Erlang module for creating, manipulating and searching in-memory TSTs. The following operations are supported:
- membership testing (
- insertion (
- partial match searches (
- near neighbor searches (
- word size of given tree (
- coercion from list of strings (
- coercion to list of strings (
Binaries are not supported, though this is an intended feature. This TST is case-sensitive. The words "moose", "Moose" and "MoOsE" are all considered distinct words.
> erl -pa ebin Erlang R15B03 (erts-5.9.3) [source] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.9.3 (abort with ^G) 1> TST = tst:from_list(["moose", "goose", "house", "computer"]). ... 2> tst:contains("moose", TST). true 3> tst:partial_matches(".oose", TST). ["goose","moose"] 4> tst:partial_matches(".o.se", TST). ["goose","house","moose"] 5> tst:to_list(TST). ["computer","goose","house","moose"] 6> tst:to_list(tst:insert("abba", TST)). ["abba","computer","goose","house","moose"]
Please see in-line documentation and in-line test cases for more.
tst uses Erlang strings--lists of machine words as
character--internally to represent strings the search tree will consume more
memory than reading the source words into memory as binaries would. Consider:
> du -sh priv/words priv/words 2.4M
The test suite reads this file in to run benchmarks; the size of the TST
priv/words is reported in megabytes.
src/tst.erl:459:<0.50.0>: TST size :: 42.339 megabytes
Future work will attempt to get the resulting TST to consume no more than the same amount of memory as an on-disk representation.
This data-structure is implemented from the discussion given in Bentley and Sedgewick's Fast Algorithsm for Searching and Sorting. It is released under the MIT license.