## Chapter 4 - Trees and Graphs

In [1]:
class Graph:
    def __init__(self):
        self._graph = {}

    def add_vertex(self,x):
        if x not in self._graph:
            self._graph[x] = []

    def add_vertices(self,vertices):
        for v in vertices:
            self.add_vertex(v)

    def add_edge(self,edge):
        vertex1, vertex2 = tuple(edge)
        if vertex1 in self._graph:
            self._graph[vertex1].append(vertex2)
        else:
            self._graph[vertex1] = [vertex2]

    def add_edges(self,edges):
        for edge in edges:
            self.add_edge(edge)

    def get_edges(self):
        edges = []
        for v in self._graph:
            for neigh in self._graph[v]:
                edges.append(v,neigh)

        return edges

class TreeNode:
    def __init__(self,val):
        self.val = val
        self.left = None
        self.right = None
        self.parent = None


4.1 **Route between nodes** Given a directed graph, design an algorithm to find out whether there is a route between two nodes.

In [2]:
def is_route(graph,start,end):
    queue = [start]
    visited = set()
    visited.add(start)
    while queue:
        node = queue.pop(0)
        for neighbour in graph[node]:
            if neighbour not in visited:
                if neighbour == end:
                    return True
                queue.append(neighbour)
                visited.add(neighbour)
    return False

4.2 **Minimal Tree**: Given a sorted (increasing order) array with unique integer elements, write an algorithm to create a binary search tree with minimal height.

In [3]:
def bst(array, start, end):
    if start > end:
        return None
    
    mid = (start + end) // 2
    curr = TreeNode(array[mid])
    curr.left = bst(array, start, mid-1)
    curr.right = bst(array, mid+1, end)
    return curr

4.3 **List of Depths**: Given a  binary tree, deisng an algorithm which creates a linked list of all the nodes at each depth (e.g., if you have a tree with depth D, you'll have D linked lists)

In [4]:
def depths(tree,level,lists):
    if not tree:
        return
    
    lst = []
    if len(lists) == level:
        lst.append(tree)
        lists.append(lst)
    else:
        lst = lists[level]
        lst.append(tree)
        
    depths(tree.left,level+1,lists)
    depths(tree.right,level+1,lists)

4.4 **Check balanced** Implement a function to check if a binary tree is balanced. For the purpose of this question, a balanced binary tree is defined to be a tree such that the heights of the two subtrees of any node never differ by more than one.

In [7]:
def height(root):
    return 1 + max(height(root.left),height(root.right))

def balanced(root):
    if not root:
        return 0
    
    left = balanced(root.left)
    if left == -1:
        return -1
    
    right = balanced(root.right)
    if right == -1:
        return -1
    
    diff = abs(left - right)
    if diff > 1:
        return -1
    else:
        return 1 + max(balanced(root.left), balanced(root.right))

4.5 **Validate BST**

In [7]:
def validate_bst(root,low,high):
    if not root:
        return True
    
    alright = (low is None or root.val > low) and (high is None or root.val < high)
    return alright and validate_bst(root.left,low,root.val) and validate_bst(root.right,root.val,high)

4.6 **Successor** Write an algorithm to find the next node (i.e. in-order successor) of a given node in a binary search tree. You may asume that each node has a link to its parent.

In [8]:
def successor(node):
    if node.right:
        temp = node.right
        while temp.next:
            temp = temp.next
        return temp
    else:
        temp = node.parent
        while temp.parent.left != temp:
            temp = temp.parent
        return temp

4.7 **Build Order** - You are given a list of projects and a list of dependencies. All of a project's dependencies must be built before the project is. Find a build order that will allow the projects to be built. If there is not a valid bulid order, reurn an error.

In [5]:
class Sol47:

    def __init__(self,graph):
        self.graph = graph

    def build_order(self):
        self.order = []
        self.visited = set()
        self.partial = set()
        for node in self.graph:
            if node not in self.visited:
                if not self.dfs(node):
                    return None
        return self.order[::-1]

    def dfs(self,node):
        if node in self.partial:
            return False

        for neighbour in self.graph[node]:
            self.partial.add(node)
            if neighbour not in self.visited:
                if not self.dfs(neighbour):
                    return False

        self.visited.add(node)
        self.order.append(node)
        return True

4.8 **First Common Ancestor**: Design an algorithm to write code to find the first common ancestor of two nodes in a binary tree. Avoid storing additional nodes in a datastructure. NOTE: This is not necessarily a binary search tree.

In [8]:
# INPUT: root, first_node, second_node
# OUTPUT: node that is first common ancestor (or least common ancestor)

# Several approaches are possible, the most obvious one seems to be conditined by having a pointer to the parent node
# You would then just travel upwards until you reach the root node (or the LCA), and count the number of steps until
# the root. Accounting for the difference position the node furthest to the root closer by (difference) steps, and then
# just go up.

# Obviously this is not a very generalized solution. Another solution would be to count the number of occurences
# of the searched for nodes recursively starting from the root. And move accordingly until there is one on each side.
# So: Start at root, recursively look for 'first_node' and 'second_node' in both subtrees, there are 3 cases:
# - 2 hits on the left, we need to move left - 2 hits on the right, we need to move right,
# - one hit on each side means the current node is the LCA.
# While this top-down approach works for the general case, it is inefficient (O(N^2)), because we are constantly looking at 
# the same nodes over and over again.

# A better approach would be to look for the nodes in a bottom-up manner. When we find a node we propagate it upwards.
# Until we've found both nodes.
def LCA(root,p,q):
    if not root:
        return None
    
    # Found one of the nodes we are looking for.
    if root == p or root == q:
        return root
    
    # Recursively look for the nodes in both subtrees.
    L = LCA(root.left,p,q)
    R = LCA(root.right,p,q)
    
    # If we have found both, return the current node.
    if L and R:
        return root
    
    # Return either one we have found
    return L or R

# This approach leverages the idea of recrusively searching for something in a binary tree in a bottom-up manner
# And being able to 'propagate' that value upwards.
# The previous exercise where u needed to check if a binary tree is balanced uses a similar approach.
# It first searched downwards on both subtrees. The corner cases are: the basic corner case where we have reached the 
# end of the subtree, we either return or some specific value. Then we take the values of the subtrees and treat specific cases
# that interest us (in that case if the result was -1 then we would just propagatate that upwards.). 
# For this exercise we were intersted in finding specific nodes. So we used the current node(root) as a container for what
# we propagated upwards in the tree, so we can actually check for specific values before performing the recursive search
# this might seem confusing at first, but it applies strictly because to the case where u are basically not using tail recursion
# to propagate a VALUE but rather a specific node. If we were using a VALUE to determine what to do next in the tree we needed
# to do the recrusive search first.

4.9 **BST Sequences** - A binary search tree was created by traversing through an array from left to right, and inserting each element. Given a binary search tree with distinct elements, print all possible arrays that could have led to this tree.
EXAMPLE
INPUT: 1<-2->3
Output: {2,1,3},{2,3,1}

In [18]:
# There was a similar problem with this one in the google foobar challenge. It involved however counting the number
# of ways one could a specific binary search tree from an array, but not all the arrays one needs.

"""
The idea behind the algorithm is understanding how the binary search tree is constructed, and how that relates to the
positions of the elements in the array.
How do we insert an element in a binary search tree? We recursively search through the tree, smaller values go left,
larger values go right, until we find an empty spot. At that spot we place the current element we are trying to insert.

So if we have a binary search tree with 50 at the root, in the array that constructed the binary seearch tree 50 has to 
be the first element. The next elements will be children of the root. The order of left-right placed elements doesn't matter.
Only the order of the elements that are relatively the same comapred to the current root. So if root is 50, when we get,
{20, 60}, their order is irrelevant. So we have 2 possible arrays that can form the tree 20<-50->60: {50,20,60},{50,60,20}.

When we recieve elements that relative to the root are in the same category (samller, or larger) then the order matters.
So {50,20,10}, we cannot use {50,10,20}, this would be a different tree because it would modify the value of the left child.
Initially being 20 and in the second case it would be 10.

So lets say we had the subarrays for left and right child of the root. Call them leftArray20, and rightArray60. How would we
construct the arrays where the root is also included? We need to 'weave' the arrays, and for each combination prepend the value
of the root.

What is 'weaving'? Getting all the possible combinations by combining the elements of 2 arrays, while still preserving the relative
order of the elements. So for {1,2} and {3,4}, we would get: {1,2,3,4}, {1,3,2,4}, {1,3,4,2}, {3,1,2,4}, {3,1,4,2}, {3,4,1,2}.
See that the relative order of the elements in the individual arrays is preserved in the combination arrays.

Lets build the weaving function first. It would recieve 2 arrays as arguments, a 3rd array to store the final result, and a prefix
array that is optional and can be empty.

What we do is basically chop elements from the first and second arrays and add them to the prefix, when either first or second,
is empty we have an array constructed and add it to the results.
"""

def weave_lists(first, second, results, prefix=[]):
    # Base Case
    # Check if first or second are empty
    if not first or not second:
        # Prepend prefix
        result = prefix[:]
        result.extend(first)
        result.extend(second)

        # Add to final results
        results.append(result)
        return

    # Remove elements and recurse
    # Use backtracking to put items back
    # Always remove first element of array

    headFirst = first.pop(0)
    prefix.append(headFirst)
    weave_lists(first,second,results,prefix)
    prefix.pop()
    first.insert(0,headFirst)

    headSecond = second.pop(0)
    prefix.append(headSecond)
    weave_lists(first,second,results,prefix)
    prefix.pop()
    second.insert(0,headSecond)
    
"""
Now let's focus on using the weaveing function to build all possible sequences.
We are gonna use a bottom up approach, basically we are going to recurse to the bottom of the three and start from the leaves
building up the arrays for each subtree on each node.
So you would have initially 1 element arrays for the leaves. Then next level (call it n-1) would use the leaving function to combine
2 arrays with 1 elements, and prepend the value at that node to each array. And so on.
"""

def allSequences(root):
    # All results stored here
    result = []
    # Base case, no node.
    if not root:
        result.append([])
        return result

    # We are going to use the value of the current node as
    # a prefix to the arrays from the subtrees that are going to be weaved.
    prefix = [root.val]

    # Recurse in subtrees
    leftSequences = allSequences(root.left)
    rightSequences = allSequences(root.right)

    # To try out all possible combinations. Weave together each list from each side.
    for leftSeq in leftSequences:
        for rightSeq in rightSequences:
            # Store weave results
            weaved = []
            weave_lists(leftSeq,rightSeq,weaved,prefix)
            result.extend(weaved)

    return result

4.10 **Check Subtree** - T1 and T2 are two very large binary trees, with T1 much bigger than T2. Create an algorithm to determine if T2 is a subtree of T1. A tree T2 is a subtree of T1 if there exists a node n in T1 such that the subtree of n is identical to T2. This is, if you cut off the tree at node n, the two trees would be identical.

In [20]:
"""
We must determine if T1 has a subtree with identical structures and values with T2.
We could use a traversal (in-order for example) saving the values, maybe as strings, inside an array. And then looking
for a structure like the one outputed traversing T2 inside T1 (substring matching basically.)

An alternative approach would be to search within the larger tree (T1) looking for a node that matches the root of
T2. Then we would call another method to tell us if the subtree is an exact match. (we would recurse inside the subtree
as well as T2 at the same time.)

Initially let's build the method that recurses through the larger tree looking for a node that matches the value of 
the root of T2.
"""
def check_subtree(t1root,t2root):
    # Base case
    # We've reached the end of the tree and haven't found a match.
    if not t1root:
        return False
    
    # We've found a node with the exact same value as the root of T2.
    if t1root.val == t2.root.val:
        # Determine if it is a match, and if so return True
        if is_match(t1root,t2root):
            return True
    
    # Recurse top down
    return check_subtree(t1root.left,t2root) or check_subtree(t1root.right,t2root)

def is_match(t1root,t2root):
    # We've reached the end of the subtree and all values have matched, return True
    if not t2root:
        return True
    
    # We've reached an end for the larger tree, and still have some nodes left in the subtree.
    if not t1root:
        return False
    
    # Values differ, can't be identical subtree
    if t1root.val != t2root.val:
        return False
    
    # Continue to recurse and check top-down for all values to match.
    return is_match(t1root.left,t2root.left) and is_match(t1root.right,t2root.right)

4.11 **Random Node** - You are implementing a binary tree class from scratch which, in addition to insert, find, and delete, has a method getRandomNode() which returns a random node from the tree. All nodes should be equally likely to be chosen. Design and implement an algorithm for getRandomNode, and explain how you would implement the rest of the methods.

In [22]:
"""
I need to review probability theory again for this one.
One option would be to build the tree normally and when getRandomNode() is used we would use a traversal algorithm to 
store all the tree values in an array, and then just return a random element. Time and space would both be O(N). 
Solution is too obvious, let's look for something more efficient.
Because we are building the class for the binary tree, we should leverage that and use an internal data structure. 
However, lets say even if we stored the values as an array while building it, and random access would be O(1), delete
would still be O(N).

Well I guess probability theory isn't really necessary, logically every node should have 1/N chances of being selected.
Where N = number of nodes in the tree. This means that when we are at the root, the root should have 1/N chance of being
selected, and going left and right should have probabilities distributed depending on the sizes of the subtrees.

Example: tree with root and left subtree has 8 nodes, right subtree has 4 nodes. We should go left more often than right
logically. How often? Well we have 13 nodes in total. The root should have probability 1/13, going left 8/13 and going
right 4/13, so they add up to 1.

This means that at each point(node) we compute a random number between 1 and N and depending on its value go left right
or choose the current node. We can store the size of the subtrees inside each node as a variable.
"""
import random
class RandomTree:
    def __init__(self,val):
        self.val = val
        self.size = 1 # the current node counts to the size
        self.left = None
        self.right = None
        
    def insert(self,val):
        if val <= self.val:
            if not self.left:
                self.left = RandomTree(val)
            else:
                self.left.insert(val)
        else:
            if not self.right:
                self.right = RandomTree(val)
            else:
                self.right.insert(val)
        
        self.size += 1
    
    def find(self,val):
        if self.val == val:
            return self
        elif self.val < val:
            self.right.find(val)
        else:
            self.left.find(val)
            
    
    def getRandomNode(self):
        leftSize = self.left.size if self.left else 0
        index = random.randint(0,leftSize)
        if index < leftSize:
            return self.left.getRandomNode()
        elif index == leftSize:
            return self
        else:
            return self.right.getRandomNode()

4.12 **Paths with Sum**: You are given a binary tree in which each node contains an integer value (which might be positive or negative). Design an algorithm to count the number of paths that sum to a given value. The path does not need to start or end at the root or a leaf, but it must go downwards (traveling only from parent nodes to child nodes).

In [23]:
"""
I guess a brute-force approach would be to start searching from each node in the tree and sum up every possible 
combination. Basically looking at each possible path, tracking the sum as we go. As soon as we hit a target sum,
we increment a counter.
"""
# Top down brute force aproach
"""
A top down approach means operating initially on the current node. And then recursing and applying the same operation
on the left and right child. The result from all 3 is then combined or modified depending on the type of problem,
and returned or again whatever is needed to do with it.

Both of these fucntions use a top down appoarch. The first one is used to travel through the tree and at each node
count the number of paths starting from that node, that might form the target value, so we are going to get back
a count. Then we need the count for the entire left subtree and right subtree for the same problem. Adding them up
gives us the total number of paths that sum up to the target in the tree.

What does the second function do? It also recurses top down through the tree. It also keeps a sum variable which
holds the sum of all the node values we have encountered so far. Basically tracing a path through the entire tree.
At each node it reaches, it adds the nodes value to the running sum (# operates on current node first) and then
determines if the current sum is equal to the target we are looking for. If it is, it increments a variable that
holds the number of paths in the tree starting at that node. Then it computes the number of paths, recursively, for
the left and right tree and returns the total number of paths found.
"""
def brute_force_count(root, target):
    if not root:
        return 0
    
    # Count paths for current node
    paths_root = count_path(root,target,0)
    
    # Recursively do this for left and right node
    left_paths = brute_force_count(root.left,target)
    right_paths = brute_force_count(root.right,target)
    
    return paths_root + left_paths + right_paths

# This will return the number of paths that sum up to target
# STARTING SPECIFICALLY from this node.
def count_path(root,target,current_sum):
    if not root:
        return 0
    
    current_sum += node.val
    
    total_paths = 0
    if current_sum == target:
        total_paths += 1
        
    total_paths += count_path(root.left,target,current_sum)
    total_paths += count_path(root.right,target,current_sum)
    
    return total_paths

# Optimizing the algorithm as current approach is O(N^2)
