Re-implemented the Trie crow uses to match rules with URLs #166

The-EDev · 2021-07-10T20:44:13Z

One of the parts I avoided documenting because of how complex it was.. Here I am digging into the deepest parts of it and making a better version.

With the poetic crap out of the way, Here's what the old version did and how I changed it.

OLD

Nodes are placed in a list, initially the list contains only 1 head node.
Nodes have 3 variables, rule_index, param_childrens, and children.
- rule_index is an integer to inform where a rule name ends.
- param_childrens is an array of 5 integers, representing the 5 different parameter types (<int>, <path> etc..). if a child of the node is a parameter, its index (from the nodes list) is placed in the corresponding place in the array.
- children is a map<string, uint> which contained the letter (or letters) associated with a child, and its index in the rules list.
adding a node to the trie involves increasing the size of the nodes list and then adding the index of the last element to the children of the parent node.
searching is done by going through the tree one children list at a time, then recursively looking for the rest of the string in the correct child's children (if found), the idea is that checking an entire level happens in 1 function call, preventing unnecessary calls and jumps.
optimizing the node simply flattens the entire trie, this is done by getting the children of a node and merging their children into them, then recursively doing the same thing for the new merged children. Merging involves adding the strings of the grandchildren and children of a node, and having the new result point to the grandchild's index. This did not happen in cases where the grandchild was a param node, or if the child had a rule index. There were some anomalies where merging did not occur, but they were not investigated (primarily because the implementation was to be changed anyway).
Printing the tree had a problem where params would not be in the correct position according to their level, this is due to to CROW_LOG_DEBUG adding its own line end, and the fact that the printing function was adding the level spaces and param in 2 different calls.

NEW

The Trie always has direct access to the head node only, with each node containing its own children.
Nodes have 4 variables, rule_index, key, param, children
- rule_index remains unchanged.
- key was previously the string in children, is now part of the node itself.
- param replaces param_childrens, being a single variable telling the parameter type of the node itself.
- children a list of node pointers, replacing the map.
adding a node now involves adding a pointer to a new node (a parent has to be given) in the node's children.
searching was only changed as far as to be compatible with the variable changes, which simplified certain parts of the function. but the functionality was mostly left unchanged.
optimization was almost completely rewritten, the optimize function now checks only the node itself and its child, and only merges if the node has 1 child and if the node itself has no rule index or children with a param (because param overrides any key for a given node). solving both the flattening issue and the no merging anomalies.
printing was cleaned up to prevent the spacing issue from happening, and some commented code was removed.

The architectural changes were primarily giving the node more autonomy over data related to it, and fixing the optimization.

Performance wise, 2 different tests consistently showed the following:

old optimization function increased the memory consumption of the trie by about 10%
new implementation takes about 35% as much memory for the same tree as the old one (pre optimization)
post optimization, the new implementation consumes 10% as much memory as the old implementation did. Proving that the optimization (at least when it comes to memory) actually works.
speed testing (for searching) shows that the new implementation is a lot more consistent, having much less standard deviation from the average time when compared to the old version. And being consistently faster, taking on average 50% as much time to find a string. To put things into perspective, old: 20µs with 40µs jumps and new: 10µs, so while it is a large gain percentange, it is very small in actual time.

…est the trie outside crow)

include/crow/routing.h

The-EDev added 2 commits July 10, 2021 22:45

Re-implemented Trie

341a9b7

used CROW_LOG_DEBUG instead of std::cout (which was used to rapidly t…

32d8872

…est the trie outside crow)

The-EDev requested a review from mrozigor July 10, 2021 20:44

Merge branch 'master' into better_trie

603dbe3

mrozigor reviewed Jul 18, 2021

View reviewed changes

include/crow/routing.h Show resolved Hide resolved

include/crow/routing.h Outdated Show resolved Hide resolved

removed unnecessary boolean

6d6fbe2

mrozigor approved these changes Jul 19, 2021

View reviewed changes

The-EDev merged commit b18fbb1 into master Jul 19, 2021

The-EDev deleted the better_trie branch August 14, 2021 01:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-implemented the Trie crow uses to match rules with URLs #166

Re-implemented the Trie crow uses to match rules with URLs #166

The-EDev commented Jul 10, 2021

Re-implemented the Trie crow uses to match rules with URLs #166

Re-implemented the Trie crow uses to match rules with URLs #166

Conversation

The-EDev commented Jul 10, 2021