Algorithm Design and Implementation

Resource: The Algorithms

Table of contents
Part A - Algorithm
1. Recursion
2. Sorting
3. Analysis of Algorithm
4. BFS and DFS
- 4.1. BFS
- 4.2. DFS
Part B - Data Structure
1. What is Data Structure
- 1.1. Abstract Data Type
2. Arrays
3. Linked List
4. Stacks, Queues and Deques
5. Dynamic Arrays
6. Tree
7. Heaps
8. Graph
- 8.1. Graph Property
- 8.2. Graph Search Algorithms

Part A. Algorithm

1. Recursion

1.1. Recursion Examples

General Example: Factorial, Fibonacci Number, Euclidean Algorithm, Extended Euclidean Algorithm
Reverse Example: Reverse String, Reverse the Linked List

1.2. Principle of Recursion

Base case: a terminating scenario that does not use recursion to produce an answer.
Recurrence Relation: the relationship between the result of a problem and the result of its subproblems.that reduces all other cases towards the base case.
Tail Recursion: Tail recursion is a recursion where the recursive call is the final instruction in the recursion function. And there should be only one recursive call in the function.
- The benefit of having tail recursion is that it could avoid the accumulation of stack overheads during the recursive calls
- Since in tail recursion, we know that as soon as we return from the recursive call we are going to immediately return as well, so we can skip the entire chain of recursive calls returning and return straight to the original caller. That means we don't need a call stack at all for all of the recursive calls, which saves us space.
- Different from the memoization technique, tail recursion could optimize the space complexity of the algorithm, by eliminating the stack overhead incurred by recursion. More importantly, with tail recursion, one could avoid the problem of stack overflow that comes often with recursion.
- Another advantage about tail recursion is that often times it is easier to read and understand, compared to non-tail-recursion. Because there is no post-call dependency in tail recursion (i.e. the recursive call is the final action in the function), unlike non-tail-recursion.
```
def sum_non_tail_recursion(ls):
    if len(ls) == 0:
        return 0

    # not a tail recursion because it does some computation after the recursive call returned.
    return ls[0] + sum_non_tail_recursion(ls[1:])

def sum_tail_recursion(ls, acc = 0):
    if len(ls) == 0:
       return acc

    # this is a tail recursion because the final instruction is a recursive call.
    return sum_tail_recursion(ls[1:], ls[0] + acc)
```

Memoization: duplicate calculations problem that could happen with recursion. We will then propose a common technique called memoization that can be used to avoid this problem.

Cache memo can be passed into the function as an input param

    def climbStairs(n, memo = {1: 1, 2:2}):
        if n not in memo: #T(n) = T(n-1) + T(n-2)
            memo[n] = self.climbStairs(n-1, memo) + self.climbStairs(n-2, memo) 
        return memo[n]

Cache memo can also be initialized in the __init__ class

class Solution(object):
  def __init__(self):
      self.cache = {0:1, 1:1}
  def climbStairs(self, n):
      if n not in self.cache:
        self.cache[n] = self.climbStairs(n-1) + self.climbStairs(n-2)
      return self.cache[n]

1.3. Complexity Analysis

1.3.1. Time Complexity

Execution tree, which is a tree that is used to denote the execution flow of a recursive function in particular.
- Each node in the tree represents an invocation of the recursive function. Therefore, the total number of nodes in the tree corresponds to the number of recursion calls during the execution.
- For example: Execution tree for the calculation of Fibonacci number f(4).
  - In a full binary tree with n levels, the total number of nodes would be 2^n - 1. So, Time complexity for f(n) would be O(2^n).
  - With memoization, we save the result of Fibonacci number for each index n. As a result, the recursion to calculate f(n) would be invoked n-1 times to calculate all the precedent numbers that it depends on. So, Time complexity for f(n) would be O(n).

1.3.2. Space Complexity

Two parts of the space consumption that one should bear in mind when calculating the space complexity of a recursive algorithm: recursion related and non-recursion related space.

Recursion Related Space
- The recursion related space refers to the memory cost that is incurred directly by the recursion, i.e. the stack to keep track of recursive function calls.
- It is due to recursion-related space consumption that sometimes one might run into a situation called stack overflow, where the stack allocated for a program reaches its maximum space limit and the program crashes. Therefore, when designing a recursive algorithm, one should carefully check if there is a possibility of stack overflow when the input scales up.
Non-Recursion Related Space
- As suggested by the name, the non-recursion related space refers to the memory space that is not directly related to recursion, which typically includes the space (normally in heap) that is allocated for the global variables.

1.4. Theme of Recursion

Theme of recursion: paradigms that are often applied together with the recursion to solve some problems.

Divide and Conquer
Backtracking

1.4.1. Divide and Conquer

Divide-and-conquer algorithm: works by recursively breaking the problem down into two or more subproblems of the same or related type, until these subproblems become simple enough to be solved directly. Then one combines the results of subproblems to form the final solution.
Example: Merge Sort, Quick Sort, Search a 2D Matrix II

Divide-and-conquer Template: essential part of the divide and conquer is to figure out the recurrence relationship between the subproblems and the original problem, which subsequently defines the functions of divide() and combine().

def divide_and_conquer( S ):
    # (1). Divide the problem into a set of subproblems.
    [S1, S2, ... Sn] = divide(S)

    # (2). Solve the subproblem recursively,
    #   obtain the results of subproblems as [R1, R2... Rn].
    rets = [divide_and_conquer(Si) for Si in [S1, S2, ... Sn]]
    [R1, R2,... Rn] = rets

    # (3). combine the results from the subproblems.
    #   and return the combined result.
    return combine([R1, R2,... Rn])

1.4.2. Backtracking

Procedure of backtracking as the tree traversal.
- Starting from the root node, one sets out to search for solutions that are located at the leaf nodes.
- Each intermediate node represents a partial candidate solution that could potentially lead us to a final valid solution.
- At each node, we would fan out to move one step further to the final solution, i.e. we iterate the child nodes of the current node.
- Once we can determine if a certain node cannot possibly lead to a valid solution, we abandon the current node and backtrack to its parent node to explore other possibilities.
- It is due to this backtracking behaviour, the backtracking algorithms are often much faster than the brute-force search algorithm, since it eliminates many unnecessary exploration.
- Example: N-Queens II, Robot Room Cleaner, Sudoku Solver, Permutations

Backtracking Template

def backtrack(candidate):
    if find_solution(candidate):
        output(candidate)
        return
    
    # iterate all possible candidates.
    for next_candidate in list_of_candidates:
        if is_valid(next_candidate):
            # try this partial candidate solution
            place(next_candidate)
            # given the candidate, explore further.
            backtrack(next_candidate)
            # backtrack
            remove(next_candidate)

2. Sorting

2.1. Insertion Sort

Time Complexity: O(n^2)

def insertion_sort(A):
    for i in range(1, len(A)): #i = start from the second the element to End of Array
        key = A[i] #Select A[i] as Key, and start to insert from j = i - 1 all the way to 0
        j = i - 1
        while (j >= 0 and key < A[j]):
            A[j+1] = A[j] #if A[j] > Key, swap A[j] to A[j+1] position
            j-=1
        A[j+1] = key #Place key to the right pos, which is A[j+1]
    return A

2.2. Merge Sort

Time Complexity: O(n*log(n))

def merge_sort(A,lower, upper):
    if lower < upper:
        mid = (lower+upper)//2
        merge_sort(A, lower, mid)
        merge_sort(A, mid + 1, upper)
        merge(A, lower, mid, upper)

array = [-4,2,1,3,5,3,11,100,7,4,-1]
merge_sort(array, 0, len(array)-1) #[-4, -1, 1, 2, 3, 3, 4, 5, 7, 11, 100]

For the merge function, we will create 2 copy of A[lower, mid] and A[mid+1, upper] to tmpL and tmpR and start to sort back
Need to add the stopper float("inf") to the end of both tmpL and tmpR

def merge(A, lower, mid, upper):
    #Step 1: Create Copy of A[lower, mid] to tmpL and A[mid+1, upper] to tmpR
    tmpL = [0]*((mid - lower + 1)+1)  #mid - lower + 1 = len(tmpL), +1 to add the place for Stopper
    for i in range(0, (mid - lower)+1): #len(A[lower, mid]) = mid - lower + 1
        tmpL[i] = A[i+lower]
    tmpL[mid-lower+1] = float('inf') #add the stopper to the end of tmpL
    
    tmpR = [0]*(upper - mid + 1) #[upper - (mid+1) + 1] + 1 = upper - mid + 1
    for j in range(0, upper - mid): #len(A[mid+1, upper]) = upper - (mid+1) + 1 = upper - mid
        tmpR[j] = A[j+mid+1]
    tmpR[upper - mid] = float('inf') #add the stopper to the end of tmpR
    
    #Step 2: Start to merge tmpL and tmpR back to A[lower, upper]
    i, j = 0, 0
    for k in range(lower, upper + 1):
        if tmpL[i] <= tmpR[j]:
            A[k] = tmpL[i]
            i+=1
        else:
            A[k] = tmpR[j]
            j+=1

2.3. Quick Sort

Depending on the pivot values, the time complexity of the quick sort algorithm can vary from best case O(N*Log(N)) to worst case (already sorted input) O(N^2).

def partition(A, p, r):
    pivot = A[r]
    i = p - 1 #track A[j] < pivot
    for j in range(p, r):
        if A[j] <= pivot:
            i+=1 #grow Blue
            A[i], A[j] = A[j], A[i]
    #Swap A[i+1] & A[r] once done
    A[i+1], A[r] = A[r], A[i+1]
    return i+1

def quicksort(A, p, r):
    if p < r:
        q = partition(A, p, r)
        quicksort(A, p, q - 1)
        quicksort(A, q + 1, r)

quicksort(A, 0, len(A) - 1)

In order to avoid the worst case O(N^2) in quick sort, we can use Random sampling technique, which is to pick a random pivot element.
On average, randomized_quicksort will cost O(N*Log(N))

def randomized_partition(A, p, r):
    i = random.randint(p, r)
    A[i], A[r] = A[r], A[i]
    #Swap i, which is a random pivot, to last pos (r), and then 
    #call the standard partition
    return partition(A, p, r)
def randomized_quicksort(A, p, r):
    if p < r:
        q = randomized_partition(A, p, r)
        randomized_quicksort(A, p, q - 1)
        randomized_quicksort(A, q + 1, r)

3. Analysis of Algorithm

3.1. Types of Analysis

Worst Case:
Average Case: difficult to compute as need to find combination of inputs and then take the average of run time
Amortized Case: sampling with replacement

3.2. Asymptotic Order of Growth

Asymptotic Order of Growth: know how the function behaves as the input gets larger, towards infinity (i.e: in the limit)
- For example: the blue algo is scale better then the red on

3.2.1. Upper Bound (Big O) Notation

Note 1: Exponential > (dominates) polynomial > (dominates) logarithmic
- For example: n*log(n) = O(n^(1.5)) = O(n*sqrt(n)), since log is growth slower than sqrt(n), so n*log(n) will be upper bounded by n*sqrt(n)
- Generalize: n*log(n) = O(n^(k)) for any k > 0
Note 2: log⁡(n)<n → log⁡(log⁡(n)) < log⁡(n) have as log is monotonic increasing function
Note 3: n! = O(n^n)

3.2.2. Lower Bound (Omega) Notation

3.2.3. Tight Asymptotic (Theta) Bound

Exercise of Big O, Omega and Theta

Determine which relationship is correct and briefly explain why

a. f(n) = log(n^2) = 2log(n), so f(n) = Theta(g(n))
b. g(n) = log(n^2) = 2log(n), so f(n) = Omega(g(n))
c. log(n) < n => log(log(n)) < log(n), so f(n) = O(g(n))
d. since (log(n))^2 lower than n, so f(n) = Omega(g(n))
e. f(n) = Omega(g(n))
f. f(n) = Theta(g(n))
g. f(n) = O(g(n))
h. f(n) = O(g(n))

3.3. Recurrence

3.3.1 Divide and Conquer

Motivation: Divide and Conquer Break a problem of size n into smaller problems such that by solving the smaller problems, we can construct a solution the entire problem:

Divide
Solve each of the smaller problems
Combine solutions

3.3.2. Recurrence

Recurrence describes a function T(n) in terms of its values on smaller inputs T(m), where m < n
There are 3 ways to solve Recurrence

Method 1. Substitution Method

Method 2. Recursion Tree Method

Method 3. Master Theorem Method

Example of Master Theorem:

(Back to top)

4. BFS and DFS

4.1. BFS

BFS: needs to use queue as processing order of the nodes in First-in-First-out (FIFO) manner.
Application: do traversal or find the shortest path from the root node to the target node
- Round 1: we process the root node
- Round 2: we process the nodes next to the root node
- Round 3: we process the nodes which are two steps from the root node; so on and so forth.
- Similar to tree's level-order traversal, the nodes closer to the root node will be traversed earlier.
- If a target node is added to the queue in the kth round, the length of the shortest path between the root node and target node is exactly k.
- That is to say, you are already in the shortest path the first time you find the target node.

4.1.1. BFS Template 1 for Tree

After each outer while loop, we are one step farther from the root node.
Variable step indicates the distance from the root node and the current node we are visiting.
Template 1 not need keep the visited hash set becauese:
- There is no cycle, for example, in tree traversal
- You do want to add the node to the queue multiple times.

def BFS(root, target):
    queue = dequeue()  #store all nodes which are waiting to be processed
    step = 0           #number of steps neeeded from root to current node
    queue.append(root) #append root to queue

    while queue: 
        step = step + 1
        #iterate the nodes which are already in the queue
        size = len(queue)
        for _ in range(size):
            cur_node = queue.popleft() #dequeue the first node in queue
            if cur_node == target: return step #return step if cur is target
            #else: continue to add the neighbors of cur_node to queue
            for (Node next : the neighbors of cur_node):
                add next to queue

    return -1 #there is no path from root to target

4.1.1. BFS Template 2 for Graph

It is important to make sure that we never visit a node twice. Otherwise, we might get stuck in an infinite loop, e.g. in graph with cycle.
Used set, instead of list, to make a visited set for marking the node is visted. For example: Open the Lock, Number of Islands,

def BFS(root, target):
    queue = dequeue()  #store all nodes which are waiting to be processed
    visited = set()    #store all the nodes that we've visited
    step = 0           #number of steps neeeded from root to current node
    queue.append(root) #append root to queue

    while queue: 
        step = step + 1
        #iterate the nodes which are already in the queue
        size = len(queue)
        for _ in range(size):
            cur_node = queue.popleft() #dequeue the first node in queue
            if cur_node == target: return step #return step if cur is target
            #else: continue to add the neighbors of cur_node to queue
            for neighbour in cur_node.neighbour:
                if neighbour not in visited: #ensure that the neighbor is not visited
                    visited.add(neighbour)
                    queue.append(neighbour)
                
    return -1 #there is no path from root to target

(Back to top)

Part B. Data Structure

ADTs: designing abstractions
Data Structures: concrete implementations

1. What is Data Structure

Data Structures are ways to store and organize of data to facilitate efficient
- Access(Read)
- Modification(Write)

1.1. Abstract Data Type

An Abstract Data Type (ADT) is a mathematical models of the data structures (DS is implemenetation), which defines
- Types of data stored
- Operations that should be supported (For ex: design DS easily to insert and delete)
- Parameters for the operations

(Back to top)

2. Arrays

Array is a continuous chunks of memory. Computer registers:

Starting Address: 0x8600 (0x is prefix for hexadecimal). Address of nth cell = starting addr + index*sizeOf(data_type)
Data Type: of values stored in the cells
Size: How many cells are there

(Back to top)

3. Linked List

Node, is a data structure, consists of a value and a pointer to the next node
Linked List is the parent of data structure of node
Computer registers: head of the Linked List

3.1. Advantage vs Disadvantage of Linked List

Advantage: Easy to expand
Disadvantage:
- (1) Space: Half of space is wasted on pointer as need to store address of next value (modern computer's address is 8 bytes)
- (2) Slow to retrieve: as need to jump here and there in the memory (Array you can continuously search and retrieve, say 30th element = starting addr + 30x4 - O(1))

3.2. Other Linked List Variants

Circular Linked List: a linked list where all nodes are connected to form a circle. There is no NULL at the end.
Doubly Linked List: usually use both Head & Tail for reversing the list, as Tail will be another Head of the list from the other direction.

3.3. Arrays vs Linked List

Lef(): Linked List O(n) as need to travel from Head to next not me
Insert() and Delete(): for Array O(n) as need to shift anything when insert or delete a cell

4. Stacks Queues and Deques

4.1. ADT Design for Stacks Queues and Deques

Deques: Since we have 2 ends, so we can enqueue & deque from both the front and back

4.2. Stacks Queues and Deques Implementation using Linked List

Stacks: Singly Linked List with Head
Queues: Singly Linked List with Head & Tail
Deques: Doubly Linked List with Head & Tail

4.3. Queue

Application: BFS

4.3.1. Queue Implementation

To implement a queue using dynamic array and an index pointing to the head of the queue.
- Drawback: With the movement of the head pointer when dequeue, more and more space is wasted
  - In this example, when we dequeue(5), the head pointer move to the second position, but then array[0] pos will never be used
Solution: Circular queue Specifically, we may use a fixed-size array and two pointers to indicate the starting position and the ending position. And the goal is to reuse the wasted storage we mentioned previously.
- Example: Design Circular Queue Please refer the link for Circular Queue animation

4.3.2. Queue Usage in Python

Queue with List: dequeue list.pop(0) (But requiring O(n) as need to shift all elements) time and enqueue list.append(item)
Queue with Built-in Function:

from collections import deque
q = deque() # Initializing a queue
q.append('a') # Adding elements to a queue
# Removing elements from a queue - only O(1) in compare with using List to implement Queue
q.popleft()

(Back to top)

5. Dynamic Arrays

5.1. Dynamic Array Properties

5.1.1. Dynamic Arrays: a linked list of arrays
- Which Level ? (i.e: which array): Array_Index = (pos + 1).bit_length - 1 = h
- Which Cell ? (at the particular Array index): Cell_Index = pos - array_index = pos - (2**h - 1)
- For example: pos = 5 → pos + 1 = 6 = (110) → bit_len(110) = 3 → Array_Index = bit_len(110) - 1 = 2.
  - Therefore, the element @ pos = 5 at 2nd array_index, and the cell index = 5 - (2**2-1) = 2.

def __translate(self, pos):
    h = (pos + 1).bit_length() - 1
    return h, pos + 1 – 2 ** h

5.1.2. Number of Arrays: # of arrays = log(capcity + 1)
- Capacity = 1 → 1 array
- Capacity = 3 → 2 arrays
- Capacity = 7 → 3 arrays
5.1.3. Dynamic Arrays in Python: In Python, the capacity grows ~1/8 of the current size

def grow(curr_size):
    new_allocated = (curr_size >> 3) + (3 if curr_size < 9 else 6)
    return curr_size + new_allocated

5.2. Arrays vs Linked Lists vs Dynamic Arrays

Access(i) = O(log(n)): Since we first we need to travel at most log(n), which is max level of arrays from 5.1.2. Number of Arrays, then O(1) to access the horizontal position within that array level.
Append(i) = O(log(n)): As we will need to traverse to the last array by log(n) then O(1) to the last position.
Delete & Insert = O(n): We need to shift anyone

(Back to top)

6. Tree

Tree: is made up of a single node r (called the root) connected to disjoint sets of nodes (called subtrees of r)
Internal Node: A node with at least one child
Leaf Node: A node with no children
From graph view, a tree can also be defined as a directed acyclic graph which has N nodes and N-1 edges.

Level & Heights

Level of a Node:
- If Root → Level 0
- If not Root → Level of Node = Level of Parent + 1
Height of the Tree: maximum level of its nodes

6.1. Binary Tree

An empty tree or a node with at most 2 branches

6.1.1. Full Binary Tree vs Complete Binary Tree

Full Binary Tree: Either empty or A node with both left and right subtrees being full binary trees of the same height
- # of Nodes of a full binary tree of height h: 2^(h+1) - 1

Complete Binary Tree: Left-justified Tree
- A completely filled tree on all levels except possibly the lowest level, which is filled from the left.
- A complete binary tree has at most one node with only one child.

6.2. Traverse A Tree

Traversal Type
- Pre-Order: Root - Left - Right
- In-Order : Left - Root - Right
```
class Solution:
  def inorderTraversal(self, root: Optional[TreeNode]) -> List[int]:
      res = []
      self.dfs_inorder(root, res)
      return res
  def dfs_inorder(self, root, res):
      if (root):
          self.dfs_inorder(root.left, res)
          res.append(root.val)
          self.dfs_inorder(root.right, res)
```
- Post-Order : Left - Right - Root
  - Note 1: Delete nodes in a tree, deletion process will be in post-order. That is to say, when you delete a node, you will delete its left child and its right child before you delete the node itself.
  - Note 2: Post-order is widely use in mathematical expression.
    - original expression using the inorder traversal. However, it is not easy for a program to handle this expression since you have to check the priorities of operations.
    - If you handle this tree in postorder, you can easily handle the expression using a stack. Each time when you meet a operator, you can just pop 2 elements from the stack, calculate the result and push the result back into the stack.

6.3. Level-order Traversal

Breadth-First Search is an algorithm is to traverse the tree level by level.

We will use a queue to help us to do BFS.

  def levelOrder(self, root: Optional[TreeNode]) -> List[List[int]]:
      res = []
      if (root):
          queue = [root]
          while(len(queue) > 0):
              #in order to keep track on Level, we will based on len of queue
              size, cur_level = len(queue), []
              #at level 0, size_level_0 = 1
              #at level 1, size_level_2 = 2
              for _ in range(size):
                  cur_node = queue.pop(0)
                  cur_level.append(cur_node.val)
                  if (cur_node.left):
                      queue.append(cur_node.left)
                  if (cur_node.right):
                      queue.append(cur_node.right)
              res.append(cur_level)
      return res

6.4. Binary Search Tree

A binary search tree (BST), a special form of a binary tree, satisfies the binary search property:
- The value in each node must be greater than (or equal to) any values stored in its left subtree.
- The value in each node must be less than (or equal to) any values stored in its right subtree.
inorder traversal in BST will be in ascending order
BSTs support three main operations: search, insertion and deletion.

6.4.1. BST - Search

class Solution:
    def searchBST(self, root, val):
        if (root):
            if root.val == val:
                return root
            elif root.val > val:
                return self.searchBST(root.left, val)
            elif root.val < val:
                return self.searchBST(root.right, val)
        else:
            return None

6.4.2. BST - Insertion

class Solution:
    def insertIntoBST(self, root, val):
        if not root:
            return TreeNode(val) #Base Case: Null Node, then return a new Node with "val"
        if root.val < val:
            root.right = self.insertIntoBST(root.right, val) #If root.right is Null, so TreeNode(val) will be returned from Base Case & assign to root.right
        elif root.val > val:
            root.left = self.insertIntoBST(root.left, val) #If root.left is Null, so TreeNode(val) will be returned from Base Case & assign to root.left
        return root #return root after insertion

6.4.3. BST - Deletion

First, traverse it until root.val == key.
Case 0: node do not have any children, like 1, 8, 11, 14, 6 or 18: then we just delete it and nothing else to do here.
Case 1: node has left children, but do not have right, for example 3 or 20. In this case we can just delete this node and put connection betweeen its parent and its children: for example for 3, we put connection 5->1 and for 20 we put connection 17->18. Note, that the property of BST will be fulfilled, because for parent all left subtree will be less than its value and nothing will change for others nodes.
Case 2: node has both children, like 12, 5, 7, 9 or 15. In this case we can not just delete it.
- Let us consider node 5. We want to find succesor of this node: the node with next value, to do this we need to go one time to the right and then as left as possible.
- For node 5 our succesor will be 6: we go 5->7->6.
- How we can delete node 5 now? We swap nodes 5 and 6 (or just put value 6 to 5) and then we need to deal with new tree, where we need to delete node which I put in square.
- How to do it? Just understand, that this node do not have left children, so it is either Case 1 or Case 3, which we already can solve.

class Solution:
    def deleteNode(self, root, key):
        if not root: return None #Base Case
        if root.val > key:
            root.left = self.deleteNode(root.left, key)
        elif root.val < key:
            root.right = self.deleteNode(root.right, key)
        else: #root.val = key
            #Case 0: node has no child
            if not root.left and not root.right: return None 
            
            #Case 1: node has 1 child
            if not root.left: return root.right #if no left child, so right child will replace root
            if not root.right: return root.left #if no right child, so left child will replace root
            
            #Case 2: node has 2 child, will replace root with its successor
            if root.left and root.right:
                #root's succssor = left-most child of the root's right sub-tree
                temp = root.right
                while(temp.left):
                    temp = temp.left
                #replace root with it successor's val 
                root.val = temp.val
                #delete root's succssor: Since root's succssor will have no left child
                #so delete it will fall under case 0, or case 1 only
                root.right = self.deleteNode(root.right, temp.val)
                
        return root

(Back to top)

7. Heaps

definition. A binary heap is a complete binary tree with the following binary heap property:

(1) if the key at a node is greater than or equal to the key of its parent, we call it a min-heap.
(2) if the key at a node is smaller than or equal to the key of its parent, we call it a max-heap.

7.1. Heap Representation

A heap can be represented by a list with indices starting from 1.
example. A max-heap in the previous example can be represented by a list [None, 6, 5, 3, 4, 2, 1].

Finding Parent & Child

Given a key at position i:

Position of the left child: 2*i
Position of the right child: 2*i + 1
Position of the parent: i//2
- example. i = 4, value = 17 → parent = i//2 Left Child = 2*i, Right Child = 2*i+ 1

Height of Heap

# of Internal Nodes = floor(log(n)) + 1
if heap has 1 node it's height will be 1
if heap has from 2 to 3 nodes it's height will be 2
if heap has from 4 to 7 nodes it's height will be 3
...
if heap has from 2^i to 2^(i+1) - 1 nodes it's height will be i

Internal Nodes vs Leafs

Since Heap is is a complete binary tree, therefore:

# of Internal Nodes = floor(n/2) or n//2
# of Leafs = # of internal nodes or # of internal nodes + 1

7.2. Heap Operations

heapify: convert an arbitrary list → a list that satisfies max-heap property. For ex: [None, 1, 2, 3] → [None, 3, 2, 1]
insert: insert an element into the list without violating the max-heap property.
pop: retrieve the maximum element of the list.

7.2.1. Heapify

Time Complexity of Heapify: O(n) (See Proof Below)
Time Complexity of Heapify_at(i): O(log(n)) since worst case, we need to shift down from root to leaf = all levels (h = log(n))

Step 1: Start from the last internal node
- For a heap of size n, there are exactly n//2 internal node, so we can start checking the max-heap property from the last internal node, with index n//2 to the root node (index 1).

def heapify (self):
   for i in range (self.__size // 2, 0, -1):
       self.__heapify_at(i)

Step 2: Execute heapify_at(i) for all the internal nodes

heapify_at(i) function does is to make sure, every sub-heap rooted at element with position i satisfies the heap property, i.e., the key at the root is not smaller than any other keys in the sub-heap.
try_swap function: if the child key is greater than the parent key, we swap the two keys, and return true for the success of swap action. Otherwise, it is false. The heapify will be notified by the returned true or false, and decide if it is necessary to heapify the sub-heap rooted at the child, since the key has changed smaller.

def __heapify_at (self, i):
  if 2*i > self.__size: #Base case: There is no left or right child, i.e: it is the leaf node now
      return
  elif 2*i == self.__size: #Case 1: internal node has only child, i.e: this left child (2*i) at the end of the list
      if self.__try_swap(i, 2*i):
          __heapify_at(2*i) 
  elif self.__content[2*i] > self.__content[2*i+1]: #Case 2.1: internal node has 2 child & left child is larger
      if self.__try_swap(i, 2*i): 
          __heapify_at(2*i) #Swap i with its left child and perform heapify_at(its_left_child)
  else: #Case 2.2: internal node has 2 child &  right child is larger
      if self.__try_swap(i, 2*i+1):
          __heapify_at(2*i+1) #Swap i with its right child and perform heapify_at(its_right_child)

def __try_swap(self, i,j):
  if self.__content[i] >= self.__content[j]: #if value_at_i > value_at_j, dont need to swap as Parent > its child
      return False
  else:
      self.__content[i], self.__content[j] = self.__content[j], self.__content[i] #if parent < child, need to swap
      return True

Proof Time Complexity of Heapify is O(n)

7.2.2. Insertion

Time Complexity of Insertion: O(log(n)) since worst case, we need to shift up from left to root = all levels (h = log(n))
insert is to append a new element at the end of the list.
After we inserted this new element, we have to check if it is larger than its parent,
- if yes, we swap it with the parent, until the next parent is larger.
- The try_swap function is used here to break the loop when the parent contains a larger key.

def insert (self, k):
    self.__size += 1
    i = self.__size
    self.__content[i] = k
    while i > 1 and self.__try_swap(i // 2, i):
        i = i // 2

7.2.3. Pop

Time Complexity of Insertion: O(log(n)) as same as T(heapify_at(root))
pop is to return the maximum key in the heap, which is contained in the root of the heap, at index 1.
However, we cannot leave the heap without its root.
So the last element of the heap is now put at the root, and
this is followed by heapify_at to maintain the heap property.

def pop (self):
    if self.__size == 0:
        return None
    else:
        k = self.__content[1]
        self.__content[1], self.__content[self.__size] = self.__content[self.__size], None
        self.__size -= 1
        self.__heapify_at(1)
        return k

7.3. Heap Advantage

Question: For a priority queue, where does the saving come from using heap extract and insert instead of merge sort?

Since priority queue problem is to extract the maximum
Using heap, we can use heapify, which take O(n), to maintain the max heap, say 10 9 7 8 4 5 3 (→ this is partially sorted order) → then we can extract the maximum 10.
Meanwhile, if using merge sort, which will take O(nlogn), to maintain totally sorted order, say 10 9 8 7 5 4 3 → then we can extract the maximum 10.
Answer: depend on the use-case, for this use-case, we only need to maintain partially sorted order, so using Heap O(n) is Better than Merge Sort O(nlogn)

(Back to top)

8. Graph

8.1. Graph Property

8.1.1. Edge & Vertex Relationship:

Un-directed Graph: n(n-1)/2
- In an undirected graph, each edge is specified by its two endpoints and order doesn't matter. The number of edges is therefore the number of subsets of size 2 chosen from the set of vertices. Since the set of vertices has size n, the number of such subsets is given by the binomial coefficient C(n,2) (also known as "n choose 2"). Using the formula for binomial coefficients, C(n,2) = n(n-1)/2
Directed Graph: n(n-1)
- Each edge is specified by its start vertex and end vertex. There are n choices for the start vertex. Since there are no self-loops, there are n-1 choices for the end vertex. Multiplying these together we have n(n-1) all possible choices.

8.1.2. Sparse and Dense Graph:

Dense Graph: |E| = O(|V|^2)
Sparse: |E| = O(|V|)

8.2. Graph Search Algorithms

Main Difference (BFS and DFS on Graph (vs Tree)): have to mark nodes that already searched
Time Complexity:
- Search on Tree: O(n) where n is number of nodes
- Search on Graph: O(n+e) where n is number of nodes; e is number of edges

(Back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
algorithm		algorithm
LICENSE		LICENSE
README.md		README.md

License

CodexploreRepo/data-structure-and-algorithm

Folders and files

Latest commit

History

Repository files navigation

Algorithm Design and Implementation

Table of contents

Part A. Algorithm

1. Recursion

1.1. Recursion Examples

1.2. Principle of Recursion

1.3. Complexity Analysis

1.3.1. Time Complexity

1.3.2. Space Complexity

Recursion Related Space

Non-Recursion Related Space

1.4. Theme of Recursion

1.4.1. Divide and Conquer

1.4.2. Backtracking

2. Sorting

2.1. Insertion Sort

2.2. Merge Sort

2.3. Quick Sort

3. Analysis of Algorithm

3.1. Types of Analysis

3.2. Asymptotic Order of Growth

3.2.1. Upper Bound (Big O) Notation

3.2.2. Lower Bound (Omega) Notation

3.2.3. Tight Asymptotic (Theta) Bound

Exercise of Big O, Omega and Theta

3.3. Recurrence

3.3.1 Divide and Conquer

3.3.2. Recurrence

Method 1. Substitution Method

Method 2. Recursion Tree Method

Method 3. Master Theorem Method

4. BFS and DFS

4.1. BFS

4.1.1. BFS Template 1 for Tree

4.1.1. BFS Template 2 for Graph

Part B. Data Structure

1. What is Data Structure

1.1. Abstract Data Type

2. Arrays

3. Linked List

3.1. Advantage vs Disadvantage of Linked List

3.2. Other Linked List Variants

3.3. Arrays vs Linked List

4. Stacks Queues and Deques

4.1. ADT Design for Stacks Queues and Deques

4.2. Stacks Queues and Deques Implementation using Linked List

4.3. Queue

4.3.1. Queue Implementation

4.3.2. Queue Usage in Python

5. Dynamic Arrays

5.1. Dynamic Array Properties

5.2. Arrays vs Linked Lists vs Dynamic Arrays

6. Tree

Level & Heights

6.1. Binary Tree

6.1.1. Full Binary Tree vs Complete Binary Tree

6.2. Traverse A Tree

6.3. Level-order Traversal

6.4. Binary Search Tree

6.4.1. BST - Search

6.4.2. BST - Insertion

6.4.3. BST - Deletion

7. Heaps

7.1. Heap Representation

Finding Parent & Child

Height of Heap

Internal Nodes vs Leafs

7.2. Heap Operations

7.2.1. Heapify

Proof Time Complexity of Heapify is O(n)

7.2.2. Insertion

7.2.3. Pop

7.3. Heap Advantage

8. Graph

Packages