Skip to content
MoreComplexDataStructures is a class library containing a collection of data structures more complex than those found in the standard .NET framework
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
MoreComplexDataStructures.UnitTests
MoreComplexDataStructures
Referenced Libraries
.gitattributes
.gitignore
LICENSE
MoreComplexDataStructures.sln
NOTICE
README.md

README.md

MoreComplexDataStructures

MoreComplexDataStructures is a class library containing a collection of data structures (plus related utility classes) more complex than those found in the standard .NET framework.

Contents

WeightBalancedTree - An implementation of a weight-balanced tree. The tree maintains counts of the nodes in each node's subtrees, and implements basic self balancing by performing rotations on any nodes where the subtree sizes differ, and rotation would improve the balance. The class implements several methods to traverse based on item value comparison (e.g. GetNextLessThan(), GetNextGreaterThan(), GetAllLessThan(), GetAllGreaterThan()), and executes GetCountLessThan() and GetCountGreaterThan() in O(log(n)) time (since these values are stored and maintained at each node). The class also provides methods to perform pre, post, and in-order depth-first search, breadth-first search, and to return a random node item.

Note that the current balancing algorithm can result in unbalanced trees for certain specific sequences of input values (e.g. as a worst case, inserting values 1, 7, 2, 6, 3, 5, 4 in order would result in a tree which was effectively a linked list).

MinHeap - A tree-based implementation of a min heap. Insert() and ExtractMin() methods return with order O(log(n)) time complexity. Also provides a method to traverse the nodes of the underlying tree via a breadth-first search.

MaxHeap - A tree-based implementation of a max heap. Insert() and ExtractMax() methods return with order O(log(n)) time complexity. Also provides a method to traverse the nodes of the underlying tree via a breadth-first search.

LongIntegerStatusStorer - Stores a true/false status for a complete set of long (Int64) integers. Uses an underlying tree holding ranges of integers to store the statuses. Also provides a method TraverseTree() to traverse the ranges stored in the tree via a breadth-first search. Designed to be more memory efficient than an equivalent boolean array when large sets of the Int64 key values are contiguous (and hence can be 'condensed' into a range), and to support ranges larger than Int32.MaxValue.

ListRandomizer - Randomizes a List or Array using the Fisher/Yates/Knuth algorithm (O(n) time complexity).

WeightedRandomGenerator - Returns items randomly based on configured weightings. The underlying implementation uses a tree, so the Generate() method returns with order O(log(n)) time complexity (where n is the number of weightings defined). The items and weightings can be defined as follows...

WeightedRandomGenerator<Char> weightedRandomGenerator = new WeightedRandomGenerator<Char>();
List<Tuple<Char, Int64>> weightings = new List<Tuple<Char, Int64>>()
{
    new Tuple<Char, Int64>('a', 1), 
    new Tuple<Char, Int64>('b', 2),
    new Tuple<Char, Int64>('c', 3),
    new Tuple<Char, Int64>('d', 4)
};a
weightedRandomGenerator.SetWeightings(weightings);

...then calling the Generate() method 100,000 times would result in a distribution similar to the following...

a : 10022
b : 19694
c : 30456
d : 39828

Trie - An implementation of a trie / prefix tree. Nodes of the trie maintain a count of the number of sequences in each subtree, hence the GetCountOfSequencesWithPrefix() method returns a count of sequences with the specified prefix with order O(n) time complexity (where n is the number of elements in the prefix sequence).

FrequencyTable - A simple (Dictionary-based) frequency table, which stores the frequency of occurrence of objects.

BinarySearchTreeBalancedInserter - Inserts a set of items into a binary search tree, ensuring that the tree is balanced, and depth is minimized (although this class is now somewhat redundant since balancing has been implemented in WeightBalancedTree).

CharacterTrie - Effectively a Trie<Char>, but with special implementations of the Insert(), Delete(), and Contains() methods which accept String parameters (and avoid the overhead of having to call String.ToCharArray() as is required with a Trie<Char>).

LRUCache - A simple implementation of a least-recently-used cache using an underlying Dictionary and LinkedList. By default the cache stores a fixed number of items (defined by constructor parameter 'itemLimit'), but it optionally allows overriding the routine to check whether the cache is full (e.g. to decide instead based on total memory usage, etc...).

UnflaggedNumberGenerator - Allows 'flagging' (i.e. setting true or false) for each number in a consecutive range (where the range length is <= Int64.MaxValue), and then provides methods to identify which numbers have not been flagged (GetLowestUnflaggedNumbers(), GetHighestUnflaggedNumbers(), etc...). The underlying implementation uses a tree of integer ranges, so the class is most efficient when large, contiguous sections of the overall range are either flagged or unflagged.

UniqueRandomGenerator - Generates unique Int64 random numbers within a given range (where the range length is <= Int64.MaxValue). The underlying implementation uses a balanced tree of integer ranges, so the Generate() method returns with order O(log(n)) time complexity.

Note - Another (more simple) way to generate a range of unique random numbers is to initialize the numbers sequentially in an array, and then randomize the array using the Fisher/Yates/Knuth algorithm (as the ListRandomizer class does). The issue with this method for very large ranges is that the memory usage is also large (since every number in the range must be preallocated in memory). Also the range cannot be larger than Int32.MaxValue. I built this class with the goal of generating larger ranges with far more efficient memory usage (the class starts with a single node item representing the full range, and then ’splits’ these node items and pushes them down the tree as the Generate() method is called… hence consecutive numbers are ‘condensed’ into a single range class). After testing I found that during the process of generating all numbers in the range, at some point, the number of nodes in the tree will become ~n/4 (where n is the range size). Unfortunately this makes the class less memory efficient than the Fisher/Yates/Knuth array method, as each node consumes more than 4 times as much memory as an Int64 (references to parent and child nodes, counts of child nodes, counts of ranges of child nodes, object overhead, etc…). In any case building the class was an interesting exercise, and someone may find a use for it.

Future Enhancements

  • Enhance WeightBalancedTree balancing algorithm to handle sequences of input values which currently result in an unbalanced tree (e.g. Int32 tree with insert sequence 1, 7, 2, 6, 3, 5, 4).
  • Enhance any methods which return an IEnumerable to throw an InvalidOperationException if the object structure is changed while enumerating.
  • Remove inefficiency of converting List<Char> to String in CharacterTrie.GetAllStringsWithPrefix().
  • Remove redundant / unused conditional branches in WeightBalancedTree.GetRandomItem() method (e.g. 'if (currentNode.LeftChildNode == null)')
  • Refactor to remove inefficiency of traversing to the start node twice in GetAllLessThan(T item) and GetAllGreaterThan(T item) in the WeightBalancedTree class.
  • Refactor methods Insert() and ExtractMax()/ExtractMin() on the MaxHeap/MinHeap classes into the HeapBase class.
  • Abstract use of IComparable<T>.CompareTo() in heap classes to make code easier to read.
  • Consider adding a linked list implementation which supports Contains(T item) ( O(1) ), by additionally storing list data in a HashSet.
  • Consider adding a priority queue.
  • Consider adding a skip list.
  • Consider building an IBinarySearchTree implementation using struct rather than class nodes (for potential reduced memory usage).

Release History

Version Changes
1.5.0.0 Converted to .NET Standard.
Added LRUCache, CharacterTrie, UniqueRandomGenerator, UnflaggedNumberGenerator classes.
Implemented automatic balancing in WeightBalancedTree.
Added Min and Max properties to WeightBalancedTree.
Added method Get() to WeightBalancedTree to return a specified node item (useful when the node items are container classes which hold additional data to that used in the IComparable implementation... i.e. allowing the tree to be used as a treemap).
Trie.GetAllSequencesWithPrefix() now returns IEnumerable<List<T>>.
Added an additional Trie constructor which returns the root node via an 'out' parameter (to allow custom traversals).
Added method FrequencyTable.Clear().
Refactored WeightedRandomGenerator to remove Dictionary member 'weightingToItemMap' and instead store data in ItemAndWeighting<T> as each tree node item.
Removed TreeBasedListRandomizer class.
1.4.0.0 Added GetAllLessThan(T item) and GetAllGreaterThan(T item) methods to WeightBalancedTree.
1.3.0.0 Added Trie, FrequencyTable, and BinarySearchTreeBalancedInserter.
WeightBalancedTree.GetRandomItem() method now returns random items with even distribution.
WeightBalancedTree now implements interface IBinarySearchTree<T>.
1.2.0.0 Added WeightedRandomGenerator.
1.1.0.0 Added MinHeap, MaxHeap, LongIntegerStatusStorer, and ListRandomizer.
Allowed the WeightBalancedTree 'Depth' property to return without an exception if called after an item has been removed from the tree.
Added additional constructor to WeightBalancedTree to accept an IEnumerable<T> object, whose contents are added to the tree.
1.0.0.0 Initial version containing WeightBalancedTree.
You can’t perform that action at this time.