# Trie

## What is a Trie

It's a tree based data structure that organiwes information in a hierarchy. While most of other structures are designed to manipulate generative data, Trie is often used with strings.

Properties:
* It is typically used to store or search strings in a space and time efficiency way
* Any node in trie can store non repetitive multiple characters.
* Every node stores link of the next character of the string
* Every node keeps track of *'end of string'*

Let's look at theses properties one by one. Basically when we work with strings, our main goal is to make the search operations as efficient as possible and to do that Trie is the ideal data structure.
Let's see one example here:

                                             AB
                                           /    \
                                          I      AIM
                                        /       / | \
                                      RT       R  L  .
                                     /   \    /   |
                                    .     .  .    .

Here the works in the Trie are respectively: **AIR, AIT, BAIR, BIL, BM**

## Why do we need Trie?

To solve many standard problems in efficient way.

* Spelling checker
* Auto completion

## Common operations

### Creation of Trie

In [25]:
class TrieNode:
    def __init__(self):
        self.children = {} # dict because we need to link children to their children
        self.is_word = False # end of string

class Trie:
    def __init__(self):
        self.root = TrieNode()

    # O(m) time | O(m) space
    def insert_string(self, word):
        current_node = self.root
        for i in word:
            character = i
            node = current_node.children.get(character)
            if node is None:
                node = TrieNode()
                current_node.children[character] = node
                # current.children.update({character:node})
            current_node = node
        current_node.is_word = True
        print('inserted:', word)

    # O(m) time | O(1) space
    def search_string(self, word):
        current_node = self.root
        for i in word:
            node = current_node.children.get(i)
            if node is None:
                return False
            current_node = node
        if current_node.is_word == True:
            return True
        else:
            return False

new_trie = Trie()

### Insert a string in a Trie

When we want to insert a string in a Trie, we face 4 cases

* Case 1: **A Trie is blank**
Let's imagine, we want to insert *APP*
  * First  we create a blank node
  * Assign *A* to the blank node, we set *is_word* to Null to indicate that there are some characters left
  * We create a new node with the character *P*. *A* is linked to *P* and we set *is_word* to Null here again
  * Le last step is repeated with the $2^{nd}$ character *P* and *is_word* is set to Null
  * Finally a last node with *is_word* set to True

* Case 2: **New strings prefix is common to another strings prefix**
Now let's imagine we want to insert *API*
  * We can see that *A* and *P* are common characters to these 2 strings, it means that we won't insert *A* and *P* to the Trie because they already exists, we will insert *I* as it's the missing one
  * We know that in one node we can insert multiple characters, so we will add *I* to the node, along with *P*
  * We create a new blank to indicate that it's the end of the word and we set  *is_word* to True as the word API is formed.

* Case 3: **New strings prefix is already present as complete string**
Let's add *APIS* to the previous *Trie*. We see that *API* is a complete string, and we want to add *S* to it.
  * We will set the blank node present a the end of the word already formed to *S*
  * We create a new blank node linked to *S* and set the *is_word* to True

* Case 4: String to be inserted is already present in the Trie. If the string exist we don't need to do anything.

Let's create the insertion method in the above Trie class

In [26]:
new_trie.insert_string('APP')
new_trie.insert_string('APPL')

inserted: APP
inserted: APPL


### Search a string in a Trie

* Case 1: **String does not exist in a Trie**
For example, we want to search *BCD* in the Trie.

                                        A
                                       /
                                      P
                                     /
                                    I
                                     \
                                       .

  * The fist thing to do is that we'll take the fist character *B* and we compare it with the root, *A* in our case. So it doesn't exist in this Trie. return false

* Case 2: **String exist in a Trie**
*API*
  * We take the first character *A* and compare it with the root, *A* in our case. So it exist in this Trie.
  * We continue to the next node, we have *P* and *P*
  * And the next node is we have *I* and *I*, it means that all characters are present in the Trie but not that the word exist in this Trie, because we have to check if the next character is the end of string or not.
  * Here we have an end of string, so we return True as the string exist in the Trie.
* Case 3: **String is a prefix of another string, but it does not exist in a Trie**
*AP*
  * As usual we take the fist character and compare it with the root, *A* and root *A* in our case
  * Same thing with the second character
  * All the characters exist but we should not forget the last step, checking if the last node is the end of string or not
  * Here the next node is not the end of string so *AP* doesn't exist in the Trie over here, it's just a prefix for *API* string.

Let's create search method in the above Trie class

In [27]:
print(new_trie.search_string('APP'))

True


### Delete a string from Trie

* Case 1: Some other prefix of string is same as the one that we want to delete. (API, APPLE)
```markdown
                                                A
                                              /
                                            P
                                          /   \
                                        I       P
                                         \        \
                                           .        L
                                                      \
                                                        E
                                                          \
                                                            .
```
We can clearly see that the $2$ strings share the same prefix *AP* here, in this situation we have to be very careful and important point is **deletion of any string from Trie always strart from the LEAF NODE** then continue up to the root.
  * So the first step is to check if this string is in this Trie or not
  * After finding that the string is located in the Trie, we start from the last node and go up one by one, every time we check if there is any node which depends on this node over here: (leaf node is *'.'*)
    * If there's any, we simply delete it continue upwards. (*'.'* and*I*)
    * If there's one it means that we cannot delete it (*P*).
  * At this point we have successfully deleted *API* from the Trie.

* Case 2: The string is a prefix of another string (*API, APIS*)
```markdown
                                                A
                                              /
                                            P
                                          /   \
                                        I       P
                                         \        \
                                           S.        L
                                             \         \
                                               .        E
                                                          \
                                                            .
```
Here if we want to delete *API* we see that there's another string which depends on *API* so we have a problem, here we won't have to delete the string, we'll just **update the is_word to False** so that we won't recognise it as a complete string and it will just serve as a prefix for other strings.
  * The fist step here if we want to delete *API* is to go to the root where we see that *A* exists then we continue to the *P* it also exists we continue to the *I* and it finishes there so the next node will be and end of string node.
  * To remove *API* here we just have to remove the end of string by setting*is_word* to *False*

* Case 3: Other string is a prefix of this string (*APIS, AP*)
```python
                                                A
                                              /
                                            P
                                          /
                                        I. 
                                         \ 
                                           S
                                             \
                                               .
```
Here we want to delete *APIS* but there is another string *API* which serves a preffix for the string. As always we start from the leaf node but before that we will check if the string *APIS* exists in the Trie or not.
* we start from the last node and go up one by one, every time we check if there is any node which depends on this node over here: (leaf node is *'.'*)
    * If there's any, we simply delete it continue upwards. (*'.'* and*S*)
    * We reach a node with *I* and *is_word* set to *True* it means that the parent of this node is the end of string here in other words *AP* is a complete string here
    * We know that when we are indicating end of string we just need a blank node so we don't need *I* here so we can simply remove it


* Case 4: Not any node depends on this string (*K*)
```python
                                        AK
                                      /    \
                                    p        .
                                  /
                                I.
                                  \
                                    S
                                      \
                                        .
```
* If we want to delete *K* here we have to check if there is any node that depend on it so go to the child node and check it
* We can see that there's no link from *K* to the other nodes it's next node is the end of string node so we can simply remove it
* The next step we see that we have to characters but we are only looking to delete *K* so we put the value of *K* to null

In [28]:
def delete_string(root, word, index):
    char = word[index]
    current_node = root.children.get(char)
    can_be_deleted = False

    # 1st case
    if len(current_node.children) > 1:
       delete_string(current_node, word, index + 1)
       return False
    
    # 2nd case
    if index == len(word) - 1:
        if len(current_node.children) >= 1:
            current_node.is_word = False
            return False
        else:
            root.children.pop(char)
            return True
    # 3rd case
    if current_node.is_word == True:
        delete_string(current_node, word, index + 1)
        return False

    # 4th case
    can_be_deleted = delete_string(current_node, word, index + 1)
    if can_be_deleted == True:
        root.children.pop(char)
        return True
    else:
        return False

In [29]:
delete_string(new_trie.root, 'APP', 0)
print(new_trie.search_string('APP'))

False


### Practical use of Trie

* Auto completion
* Spelling checker