Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage of Trie maybe #21

Open
UnknownShadow200 opened this issue Mar 1, 2017 · 3 comments
Open

Reduce memory usage of Trie maybe #21

UnknownShadow200 opened this issue Mar 1, 2017 · 3 comments

Comments

@UnknownShadow200
Copy link
Collaborator

can probably get rid of Tag and look directly at Children and Payload. not a high priority though.

@mstefarov
Copy link
Collaborator

mstefarov commented Mar 1, 2017

Originally, Tag was added to reduce memory use 😛. This field's first use is to describe TrieNode type based on the number of child nodes:

  • "Leaf" tag means no children. Such nodes are found at the bottom of the hierarchy. This can be inferred if Children==null.
  • "Multi" tag means multiple children. Such nodes are found at the top and middle of the hierarchy. This can be inferred if Children != null && Children.Length>1.
  • All other tag values mean "single" child. Such nodes are found in the middle of the hierarchy. That's where Tag has a secondary use: it identifies which character code the single child represents. This information cannot be easily inferred.

For example, a Trie containing words "bat" and "bacon" would have this structure:

                      _____________________________________________
                     |               single (root)                 |
                     | Tag='b', Payload=null, Children=TrieNode[1] |
                     |_____________________________________________|
                                          |
                                          v
                      _____________________________________________
                     |                  single                     |
                     | Tag='a', Payload=null, Children=TrieNode[1] |
                     |_____________________________________________|
                                          |
                                          v
                     ________________________________________________
                    |                  multi                         |
                    | Tag=Multi, Payload=null, Children=TrieNode[37] |
                    |________________________________________________|
                                |                           |
                                v                           v
  _____________________________________________     ________________________________________
 |                  single                     |   |                 leaf                   |
 | Tag='o', Payload=null, Children=TrieNode[1] |   | Tag=Leaf, Payload="bat", Children=null |
 |_____________________________________________|   |________________________________________|
                        |
                        v
  _____________________________________________
 |                  single                     |
 | Tag='n', Payload=null, Children=TrieNode[1] |
 |_____________________________________________| 
                        |
                        v
    __________________________________________
   |                  leaf                    |
   | Tag=Leaf, Payload="bacon", Children=null |
   |__________________________________________|

I'd recommend looking into radix trees for further memory savings. You might lose some insertion speed, but you can save lots of nodes.

Edit: Fixed a couple things in the diagram since original posting

@UnknownShadow200
Copy link
Collaborator Author

First off: Great explanation and diagram!

I did a rather poor job of explaining what I was attempting to do.
The intent was to instead store everything as an object, then use type casting, to minimise memory usage. (at the cost of slightly less performant lookups)

So for example,
MultiNode becomes a straight object[37]
SingleNode stores Tag and object
LeafNode removed entirely

object can be cast to either null, the direct payload, a single node, or an object[37]

                      ______________________________________
                     |           single (root)              |
                     | Tag='b'  Payload=single              |
                     |______________________________________|
                                          |
                                          v
                      ______________________________________
                     |           single                     |
                     | Tag='a'  Payload=object[37]          |
                     |______________________________________|
                                          |
                                          v
                      ______________________________________
                     |            object[37]                |
                     |______________________________________|
                                |                       |
                                v                       v
  ______________________________________     ______________________________________
 |           single                     |   | "bat"                                |
 | Tag='o'  Payload=single              |   |______________________________________|
 |______________________________________|
                        |
                        v
  ______________________________________
 |           single                     |
 | Tag='n'  Payload="bacon"             |
 |______________________________________| 

I don't know if this is actually feasible though. Last time I tried to implement it, ended up running into issues and gave up.

@mstefarov
Copy link
Collaborator

The only problem I see with your approach is handling values attached to non-leaf nodes. For example, consider the case where we add "ba" and "batman" to the Trie. Now object[37] node needs a payload, and "bat" node needs a child.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants