# Trees and Graphs

In programming, a tree is a data structure that simulates a hierarchical structure. Trees are a very important data structure because they are used to represent the structure of a large amount of data. Similar to a Linked List, a tree is a collection of nodes that are linked, but in this case, they are not linearly connected. Trees have a lot of applications, and are a very important part of the data science community.

![](images/Tree_structure.png)



A tree is a data structure composed of:

1. A root node (Technically, this isn't necessary in graph theory, but it's usually how trees are used in programming)
2. The root node has zero or more child nodes
3. Each child node has zero or more child nodes, and so on

Before moving on, we should specify some nomenclature:

![](images/tree_nomenclature.jpg)

- Root: The root node is the first node in the tree.
- Parent: Any node except the root node has one edge upward to a node called parent.
- Child: A node is a child if it has a parent node.
- Leaf: A node is a leaf if it has no children.
- Sibling: Two nodes are sibling if it has the same parent node.
- Level: It represents the generation of a node. If the root is at level 0, then the first child node is at level 1, the second child node is at level 2, and so on.


## Traversing a Tree

We have two main strategies to traverse a tree:
1. BFS: breadth-first search. In BFS, we scan the tree level by level, starting from the root, and then moving to the left and right children.
2. DFS: depth-first search. In DFS, we take depth as the main priority. 

Within DFS, we can find three ways to traverse a tree:
1. In-order Traversal
2. Pre-order Traversal
3. Post-order Traversal

![](images/BFS_DFS.png)


### In-order Traversal

In order traversal is the default traversal method. It traverses the tree in the order of left, root, right:

![](images/inorder_traversal.jpg)

We start at the root, and recursively traverse the left subtree. Then we visit the root, and recursively traverse the right subtree. In this case, the output will be: 

D -> B -> E -> A -> F -> C -> G

### Pre-Order Traversal

In pre-order traversal, we visit the root node first, then recursively visit the left subtree and finally the right subtree.

![](images/preorder_traversal.jpg)

The process goes on until all the nodes are visited. The output of this tree is:

A -> B -> D -> E -> C -> F -> G

### Post-Order Traversal   
In post-order traversal, we visit the left subtree, then the right subtree, then the root.

![](images/postorder_traversal.jpg)

The process goes until all the nodes are visited. The output of this tree is:

D -> E -> B -> F -> G -> C -> A

Visit this link for getting more information about traversing binary trees: [Traversing binary trees simply and cheaply](https://www.sciencedirect.com/science/article/abs/pii/0020019079900681)

## Binary Trees

Binary trees are trees where each node has up to two children.

![](images/general_tree_vs_binary.jpg)

## Implementing a Binary Tree

We already have seen Nodes and Linked Lists. Now we will implement a Binary Tree using the same principle we did for Linked Lists. In this case, we are implementing a very basic binary tree, using just a node class. Take into account that the nodes have two next attibutes: left and right.

In [6]:
class Node:
    def __init__(self, data: int, left, right):
        # If both children are None, then this node is a leaf
        self.left = None # The left child of this node
        self.right = None # The right child of this node
        self.data = data
    
    def __str__(self):
        if self.left is None and self.right is None:
            return 'None <-' + str(self.data) + '-> None'
        elif self.left is None:
            return 'None <-' + str(self.data) + '->' + str(self.right.data)
        elif self.right is None:
            return str(self.left) + '<-' + str(self.data) + '-> None'
        return f'''        <---------{str(self.data)}---------> \n {self.left}      {self.right} '''


root = Node(1, None, None)
root.left = Node(2, 1, 2)
root.right = Node(3, 4, 5)
print(root)

    

        <---------1---------> 
 None <-2-> None      None <-3-> None 


## Implementing Tree Traversal algorithms

As we saw, there are two main ways to traverse a tree: BFS and DFS.

In one of the assessments you will be asked to traverse a tree in a BFS manner. Here, we are going to see the DFS version.



### Implementing In-Order Traversal and Pre-Order Traversal

Now, I am going to run through two codes, one for in-order traversal and one for pre-order traversal. Then you will have to implement the post-order traversal code:

In [5]:
class Node:

    def __init__(self, data):

        self.left = None
        self.right = None
        self.data = data

    def insert(self, data):
        # If the tree contains data, we are going to traverse in an inorder fashion.
        if self.data:
            # If the data is less than the root, we are going to look in the left hand side subtree
            if data < self.data:
                # If the left subtree is empty, we are going to insert the data to the left
                if self.left is None:
                    self.left = Node(data)
                # Otherwise, we are going to traverse in the left subtree recursively (notice we are calling the same function)
                else:
                    self.left.insert(data)
            # If the data is greater than the root, we are going to look in the right hand side subtree
            elif data > self.data:
                # If the right subtree is empty, we are going to insert the data to the right
                if self.right is None:
                    self.right = Node(data)
                # Otherwise, we are going to traverse in the right subtree recursively (notice we are calling the same function)
                else:
                    self.right.insert(data)
        # If the tree is empty, we are going to insert the data to the root
        else:
            self.data = data

# Print the Tree
    def PrintTree(self):
        if self.left:
            self.left.PrintTree()
        print(self.data),
        if self.right:
            self.right.PrintTree()

    def inorderTraversal(self, root):
        res = []
        if root:            
            res = self.inorderTraversal(root.left)
            res.append(root.data)
            res = res + self.inorderTraversal(root.right) 
            print(res)
        return res

root = Node(5)
root.insert(2)
root.insert(6)
root.insert(4)
root.insert(8)
root.insert(42)
root.insert(9)
print(root.inorderTraversal(root))


[4]
[2, 4]
[9]
[9, 42]
[8, 9, 42]
[6, 8, 9, 42]
[2, 4, 5, 6, 8, 9, 42]
[2, 4, 5, 6, 8, 9, 42]
2
4
5
6
8
9
42


In [23]:
class Node:

    def __init__(self, data):

        self.left = None
        self.right = None
        self.data = data

    def insert(self, data):
        # If the tree contains data, we are going to traverse in an inorder fashion.
        if self.data:
            # If the data is less than the root, we are going to look in the left hand side subtree
            if data < self.data:
                # If the left subtree is empty, we are going to insert the data to the left
                if self.left is None:
                    self.left = Node(data)
                # Otherwise, we are going to traverse in the left subtree recursively (notice we are calling the same function)
                else:
                    self.left.insert(data)
            # If the data is greater than the root, we are going to look in the right hand side subtree
            elif data > self.data:
                # If the right subtree is empty, we are going to insert the data to the right
                if self.right is None:
                    self.right = Node(data)
                # Otherwise, we are going to traverse in the right subtree recursively (notice we are calling the same function)
                else:
                    self.right.insert(data)
        # If the tree is empty, we are going to insert the data to the root
        else:
            self.data = data

# Print the Tree
    def PrintTree(self):
        if self.left:
            self.left.PrintTree()
        print( self.data),
        if self.right:
            self.right.PrintTree()

    def inorderTraversal(self, root):
        res = []
        if root:
            res = self.inorderTraversal(root.left)
            res.append(root.data)
            res = res + self.inorderTraversal(root.right)
        return res

    def PreorderTraversal(self, root):
        res = []
        if root:
            res.append(root.data)
            res = res + self.PreorderTraversal(root.left)
            res = res + self.PreorderTraversal(root.right)
        return res

root = Node(5)
root.insert(2)
root.insert(6)
root.insert(4)
root.insert(8)
root.insert(42)
root.insert(9)
print(root.inorderTraversal(root))
print(root.PreorderTraversal(root))

[2, 4, 5, 6, 8, 9, 42]
[5, 2, 4, 6, 8, 42, 9]


## Challenge: Postorder Traversal

Now, using the same code above, implement a postorder traversal method.

In [73]:
class Node:

    def __init__(self, data):

        self.left = None
        self.right = None
        self.data = data

    def insert(self, data):
        # If the tree contains data, we are going to traverse in an inorder fashion.
        if self.data:
            # If the data is less than the root, we are going to look in the left hand side subtree
            if data < self.data:
                # If the left subtree is empty, we are going to insert the data to the left
                if self.left is None:
                    self.left = Node(data)
                # Otherwise, we are going to traverse in the left subtree recursively (notice we are calling the same function)
                else:
                    self.left.insert(data)
            # If the data is greater than the root, we are going to look in the right hand side subtree
            elif data > self.data:
                # If the right subtree is empty, we are going to insert the data to the right
                if self.right is None:
                    self.right = Node(data)
                # Otherwise, we are going to traverse in the right subtree recursively (notice we are calling the same function)
                else:
                    self.right.insert(data)
        # If the tree is empty, we are going to insert the data to the root
        else:
            self.data = data

# Print the Tree
    def PrintTree(self):
        if self.left:
            self.left.PrintTree()
        print( self.data),
        if self.right:
            self.right.PrintTree()

    def inorderTraversal(self, root):
        res = []
        if root:
            res = self.inorderTraversal(root.left)
            res.append(root.data)
            res = res + self.inorderTraversal(root.right)
        return res

    def PreorderTraversal(self, root):
        res = []
        if root:
            res.append(root.data)
            res = res + self.PreorderTraversal(root.left)
            res = res + self.PreorderTraversal(root.right)
        return res
    
    def PostorderTraversal(self, root):
        ### Your code here
        pass

root = Node(5)
root.insert(2)
root.insert(6)
root.insert(4)
root.insert(8)
root.insert(42)
root.insert(9)
print(root.inorderTraversal(root))
print(root.PreorderTraversal(root))
print(root.PostorderTraversal(root))


[2, 4, 5, 6, 8, 9, 42]
[5, 2, 4, 6, 8, 42, 9]
None


## Binary Search Trees

One special type of binary search is binary search trees. A binary search tree is a binary tree in which every node fits a specific property: ```all left descendents have a values lower or equal than the node, and in turn, the node has a lower value than all right descendents```

This inequality must be true for every node in the tree, not just its immediate children.

![](images/bst-vs-not-bst.jpg)

_Why the right hand side tree is not binary?_

This data structure makes searching a specific value very fast. The first challenge in this notebook consists on implementing a binary search tree. Take a good look at the picture, remember the rules, and try to implement the algorithm.

One of the most common use-cases is in storing indexes and keys in a database. For example, in PostgreSQL every primary key column is a binary tree where each node is a key and these nodes are pointing to the database rows.

## Graphs

We have just seen trees, which are a special case of graphs. Not all graphs are trees. Essentially, a graph is a collection of nodes (or vertex) and edges. They are a powerful tool that allow us to represent relationships. For example, imagine a map. Each node represents a city, and each edge the path between both. 

![](images/usa_map.jpg)


We can use graphs to calculate the shortest path between two cities. This is one of the main structures used by Google Maps for example



### Graph characteristics

- Graphs can be directed or undirected: directed means that there is a directed edge from one vertex to another. Undirected means that the edges are connected in both directions.
- Graphs can have cycles (cyclic graph) or not (acyclic graph).
- A graph can be weighted or unweighted. Each branch is given a numerical weight.


![](images/DAG.png)

One of the most commnon DAGs you will see are neural networks, where each node is a neuron.

![](images/Neural_Network.png)

## Representing a graph

In programming, there are two common ways to represent a graph:
1. Adjacency list
2. Adjacency matrix

### Adjacency List
Every vertex stores a list of adjacent vertices. In an undirected graph, and edge would be stored twice.

Let's see an example:

![](images/graph_example.png)


In [7]:
class AdjNode:
    def __init__(self, value):
        self.vertex = value
        self.next = None


class Graph:
    def __init__(self, num):
        self.V = num
        self.graph = [None] * self.V

    # Add edges
    def add_edge(self, s, d):
        node = AdjNode(d)
        node.next = self.graph[s]
        self.graph[s] = node

        node = AdjNode(s)
        node.next = self.graph[d]
        self.graph[d] = node

    # Print the graph
    def print_agraph(self):
        for i in range(self.V):
            print("Vertex " + str(i) + ":", end="")
            temp = self.graph[i]
            while temp:
                print(" -> {}".format(temp.vertex), end="")
                temp = temp.next
            print(" \n")

V = 4

# Create graph and edges
graph = Graph(V)
graph.add_edge(0, 1)
graph.add_edge(0, 2)
graph.add_edge(0, 3)
graph.add_edge(1, 2)

graph.print_agraph()

None
<__main__.AdjNode object at 0x00000182B70763A0>
<__main__.AdjNode object at 0x00000182B70766A0>
<__main__.AdjNode object at 0x00000182B706FD90>
Vertex 0: -> 3 -> 2 -> 1 

Vertex 1: -> 2 -> 0 

Vertex 2: -> 1 -> 0 

Vertex 3: -> 0 



### Adjacency Matrix

An adjancency matrix is a 2D array of booleans where a true value at matrix[i][j] that represents an edge from node i to node j.

For the example above, the adjacency matrix would look like:

|   | 0 | 1  | 2  | 3  |
|---|---|---|---|---|
| 0 | 0 | 1  | 1  | 1  |
| 1 | 1 | 0  | 1  | 0  |
| 2 | 1 | 1  | 0  | 0  |
| 3 | 1 | 0  | 0  | 0 |

Observe that the matrix is symmetric, this is beacuse the graph is undirected. On the other hand, if the graph were directed, the matrix would not necessarily be symmetric.

The purpose of representing a graph is not only visual, but also finding relationships between two nodes is much faster if we don't have to iterate through the whole structure. For example, in the adjancency matrix we will have a list like this:
```
adj_mat = [[0, 1, 1, 1],
           [1, 0, 1, 0],
           [1, 1, 0, 0],
           [1, 0, 0, 0]]
```
And we can easily see the relationships of node 2 by looking at `adj_mat[1]`, or even deeper, we can see if node 2 and 3 are connected by looking at `adj_mat[1][2]`

Both matrix and list are used for search algorithms. However, adjacency lists are more efficient since you can easily iterate through the neigbors of a node, whereas an adjacency matrix requires you to iterate through all the nodes in the graph.

## Graph Search

The most common ways to search a graph are depth-first search and breadth-first search.

In DFS. we start at an arbitrary node and visit all the nodes connected to it. That is, we go deep first before we go wide. DFS is often preferred if we want to visit every node. 

In BFS, we start at the root and explode each neighbour before going on to any of their children. If we want to find the shortest path between two nodes, we can use BFS.

![](images/BFS-DFS.png)

# Challenges

## 1. Implement a Binary Search Tree.

Take into account that the tree is a binary tree, and that the left child of a node is always smaller than the node itself, and the right child of a node is always bigger than the node itself. This rule is met not only for the immediate children, but also for the entire tree.



## 2. Number of nodes

Given a tree, find the number of nodes, and max depth


## 3. DFS and BFS Implementation for graphs

Given the implementation of DFS, implement BFS algorithms for a graph. Use the following graph as an example:

![](images/graphs_q3.png)

In [84]:
from collections import defaultdict

class Graph:
 
    def __init__(self):

        self.graph = defaultdict(list)
 
    def addEdge(self, u, v):
        self.graph[u].append(v)
 
    def DFSUtil(self, v, visited):
 
        visited.add(v)
        print(v, end=' ')
 
        for neighbour in self.graph[v]:
            if neighbour not in visited:
                self.DFSUtil(neighbour, visited)
 
    def DFS(self, v):
        visited = set()
        self.DFSUtil(v, visited)

g = Graph()
g.addEdge(0, 1)
g.addEdge(0, 2)
g.addEdge(1, 2)
g.addEdge(2, 0)
g.addEdge(2, 3)
g.addEdge(3, 3)
 
print("Following is DFS from (starting from vertex 2)")
g.DFS(2)

Following is DFS from (starting from vertex 2)
2 0 1 3 


## 4. Binary Search?

Given a tree, implement a function that returns whether the tree is a binary search tree.

## 5. Mirrored Tree

Create a function that, given a tree, returns true if it is a mirrored tree.

## 6. Path Sum

Return true if and only if the given binary tree has a root-to-leaf path such that adding up all the values along the path equals the given sum.

# Assessments

1. Look information for Heap. (max heap, min heap)
2. Look information for n-ary trees. 
3. Look information for Tries (Prefix Tree)
4. Implement In-order, Pre-order, Post-order traversals using iterative methods. (Hint: You might need to use a stack, so you can use the deque library.)
5. Implement BFS traversal iteratively. (Hint: You might need to use a queue, so you can use the deque library.)
6. Look information about topological sort
7. Look information about Dijkstra's algorithm
8. Look information about AVL trees
9. Look information about Red-Black trees