- Resource: The Algorithms
- Table of contents
- Part A - Algorithm
- 1. Recursion
- 2. Sorting
- 3. Analysis of Algorithm
- 4. BFS and DFS
- Part B - Data Structure
- 1. What is Data Structure
- 2. Arrays
- 3. Linked List
- 4. Stacks, Queues and Deques
- 5. Dynamic Arrays
- 6. Tree
- 7. Heaps
- 8. Graph
- General Example: Factorial, Fibonacci Number, Euclidean Algorithm, Extended Euclidean Algorithm
- Reverse Example: Reverse String, Reverse the Linked List
-
Base case
: a terminating scenario that does not use recursion to produce an answer. -
Recurrence Relation
: the relationship between the result of a problem and the result of its subproblems.that reduces all other cases towards the base case. -
Tail Recursion: Tail recursion is a recursion where the recursive call is the final instruction in the recursion function. And there should be only one recursive call in the function.
- The benefit of having tail recursion is that it could avoid the accumulation of stack overheads during the recursive calls
- Since in tail recursion, we know that as soon as we return from the recursive call we are going to immediately return as well, so we can skip the entire chain of recursive calls returning and return straight to the original caller. That means we don't need a call stack at all for all of the recursive calls, which saves us space.
- Different from the memoization technique, tail recursion could optimize the space complexity of the algorithm, by eliminating the stack overhead incurred by recursion. More importantly, with tail recursion, one could avoid the problem of stack overflow that comes often with recursion.
- Another advantage about tail recursion is that often times it is easier to read and understand, compared to non-tail-recursion. Because there is no post-call dependency in tail recursion (i.e. the recursive call is the final action in the function), unlike non-tail-recursion.
def sum_non_tail_recursion(ls): if len(ls) == 0: return 0 # not a tail recursion because it does some computation after the recursive call returned. return ls[0] + sum_non_tail_recursion(ls[1:]) def sum_tail_recursion(ls, acc = 0): if len(ls) == 0: return acc # this is a tail recursion because the final instruction is a recursive call. return sum_tail_recursion(ls[1:], ls[0] + acc)
-
Memoization: duplicate calculations problem that could happen with recursion. We will then propose a common technique called memoization that can be used to avoid this problem.
- Cache
memo
can be passed into the function as an input param
def climbStairs(n, memo = {1: 1, 2:2}): if n not in memo: #T(n) = T(n-1) + T(n-2) memo[n] = self.climbStairs(n-1, memo) + self.climbStairs(n-2, memo) return memo[n]
- Cache
memo
can also be initialized in the__init__
class
class Solution(object): def __init__(self): self.cache = {0:1, 1:1} def climbStairs(self, n): if n not in self.cache: self.cache[n] = self.climbStairs(n-1) + self.climbStairs(n-2) return self.cache[n]
- Cache
-
Execution tree, which is a tree that is used to denote the execution flow of a recursive function in particular.
- Each node in the tree represents an invocation of the recursive function. Therefore, the total number of nodes in the tree corresponds to the number of recursion calls during the execution.
- For example: Execution tree for the calculation of Fibonacci number f(4).
- In a full binary tree with n levels, the total number of nodes would be
2^n - 1
. So, Time complexity for f(n) would beO(2^n)
. - With memoization, we save the result of Fibonacci number for each index
n
. As a result, the recursion to calculatef(n)
would be invokedn-1
times to calculate all the precedent numbers that it depends on. So, Time complexity for f(n) would beO(n)
.
- In a full binary tree with n levels, the total number of nodes would be
-
Two parts of the space consumption that one should bear in mind when calculating the space complexity of a recursive algorithm:
recursion related
andnon-recursion related space
.- The recursion related space refers to the memory cost that is incurred directly by the recursion, i.e. the stack to keep track of recursive function calls.
- It is due to recursion-related space consumption that sometimes one might run into a situation called
stack overflow
, where the stack allocated for a program reaches its maximum space limit and the program crashes. Therefore, when designing a recursive algorithm, one should carefully check if there is a possibility of stack overflow when the input scales up.
- As suggested by the name, the non-recursion related space refers to the memory space that is not directly related to recursion, which typically includes the space (normally in heap) that is allocated for the global variables.
Theme of recursion: paradigms that are often applied together with the recursion to solve some problems.
- Divide and Conquer
- Backtracking
- Divide-and-conquer algorithm: works by recursively breaking the problem down into two or more subproblems of the same or related type, until these subproblems become simple enough to be solved directly. Then one combines the results of subproblems to form the final solution.
- Example: Merge Sort, Quick Sort, Search a 2D Matrix II
- Divide-and-conquer Template: essential part of the divide and conquer is to figure out the
recurrence relationship
between the subproblems and the original problem, which subsequently defines the functions ofdivide()
andcombine()
.
def divide_and_conquer( S ):
# (1). Divide the problem into a set of subproblems.
[S1, S2, ... Sn] = divide(S)
# (2). Solve the subproblem recursively,
# obtain the results of subproblems as [R1, R2... Rn].
rets = [divide_and_conquer(Si) for Si in [S1, S2, ... Sn]]
[R1, R2,... Rn] = rets
# (3). combine the results from the subproblems.
# and return the combined result.
return combine([R1, R2,... Rn])
- Procedure of backtracking as the tree traversal.
- Starting from the root node, one sets out to search for solutions that are located at the leaf nodes.
- Each intermediate node represents a partial candidate solution that could potentially lead us to a final valid solution.
- At each node, we would fan out to move one step further to the final solution, i.e. we iterate the child nodes of the current node.
- Once we can determine if a certain node cannot possibly lead to a valid solution, we abandon the current node and backtrack to its parent node to explore other possibilities.
- It is due to this backtracking behaviour, the backtracking algorithms are often much faster than the brute-force search algorithm, since it eliminates many unnecessary exploration.
- Example: N-Queens II, Robot Room Cleaner, Sudoku Solver, Permutations
- Backtracking Template
def backtrack(candidate):
if find_solution(candidate):
output(candidate)
return
# iterate all possible candidates.
for next_candidate in list_of_candidates:
if is_valid(next_candidate):
# try this partial candidate solution
place(next_candidate)
# given the candidate, explore further.
backtrack(next_candidate)
# backtrack
remove(next_candidate)
- Time Complexity:
O(n^2)
def insertion_sort(A):
for i in range(1, len(A)): #i = start from the second the element to End of Array
key = A[i] #Select A[i] as Key, and start to insert from j = i - 1 all the way to 0
j = i - 1
while (j >= 0 and key < A[j]):
A[j+1] = A[j] #if A[j] > Key, swap A[j] to A[j+1] position
j-=1
A[j+1] = key #Place key to the right pos, which is A[j+1]
return A
- Time Complexity:
O(n*log(n))
def merge_sort(A,lower, upper):
if lower < upper:
mid = (lower+upper)//2
merge_sort(A, lower, mid)
merge_sort(A, mid + 1, upper)
merge(A, lower, mid, upper)
array = [-4,2,1,3,5,3,11,100,7,4,-1]
merge_sort(array, 0, len(array)-1) #[-4, -1, 1, 2, 3, 3, 4, 5, 7, 11, 100]
- For the merge function, we will create 2 copy of
A[lower, mid]
andA[mid+1, upper]
totmpL
andtmpR
and start to sort back - Need to add the stopper
float("inf")
to the end of bothtmpL
andtmpR
def merge(A, lower, mid, upper):
#Step 1: Create Copy of A[lower, mid] to tmpL and A[mid+1, upper] to tmpR
tmpL = [0]*((mid - lower + 1)+1) #mid - lower + 1 = len(tmpL), +1 to add the place for Stopper
for i in range(0, (mid - lower)+1): #len(A[lower, mid]) = mid - lower + 1
tmpL[i] = A[i+lower]
tmpL[mid-lower+1] = float('inf') #add the stopper to the end of tmpL
tmpR = [0]*(upper - mid + 1) #[upper - (mid+1) + 1] + 1 = upper - mid + 1
for j in range(0, upper - mid): #len(A[mid+1, upper]) = upper - (mid+1) + 1 = upper - mid
tmpR[j] = A[j+mid+1]
tmpR[upper - mid] = float('inf') #add the stopper to the end of tmpR
#Step 2: Start to merge tmpL and tmpR back to A[lower, upper]
i, j = 0, 0
for k in range(lower, upper + 1):
if tmpL[i] <= tmpR[j]:
A[k] = tmpL[i]
i+=1
else:
A[k] = tmpR[j]
j+=1
- Depending on the pivot values, the time complexity of the quick sort algorithm can vary from best case
O(N*Log(N))
to worst case (already sorted input)O(N^2)
.
def partition(A, p, r):
pivot = A[r]
i = p - 1 #track A[j] < pivot
for j in range(p, r):
if A[j] <= pivot:
i+=1 #grow Blue
A[i], A[j] = A[j], A[i]
#Swap A[i+1] & A[r] once done
A[i+1], A[r] = A[r], A[i+1]
return i+1
def quicksort(A, p, r):
if p < r:
q = partition(A, p, r)
quicksort(A, p, q - 1)
quicksort(A, q + 1, r)
quicksort(A, 0, len(A) - 1)
- In order to avoid the worst case
O(N^2)
in quick sort, we can use Random sampling technique, which is to pick a random pivot element. - On average,
randomized_quicksort
will costO(N*Log(N))
def randomized_partition(A, p, r):
i = random.randint(p, r)
A[i], A[r] = A[r], A[i]
#Swap i, which is a random pivot, to last pos (r), and then
#call the standard partition
return partition(A, p, r)
def randomized_quicksort(A, p, r):
if p < r:
q = randomized_partition(A, p, r)
randomized_quicksort(A, p, q - 1)
randomized_quicksort(A, q + 1, r)
- Worst Case:
- Average Case: difficult to compute as need to find combination of inputs and then take the average of run time
- Amortized Case: sampling with replacement
- Asymptotic Order of Growth: know how the function behaves as the input gets larger, towards infinity (i.e: in the limit)
- For example: the blue algo is scale better then the red on
- Note 1: Exponential > (dominates) polynomial > (dominates) logarithmic
- For example:
n*log(n)
=O(n^(1.5)) = O(n*sqrt(n))
, since log is growth slower than sqrt(n), son*log(n)
will be upper bounded byn*sqrt(n)
- Generalize:
n*log(n)
=O(n^(k))
for any k > 0
- For example:
- Note 2:
log(n)<n
→log(log(n)) < log(n)
have as log is monotonic increasing function - Note 3:
n! = O(n^n)
- Determine which relationship is correct and briefly explain why
- a. f(n) = log(n^2) = 2log(n), so f(n) = Theta(g(n))
- b. g(n) = log(n^2) = 2log(n), so f(n) = Omega(g(n))
- c. log(n) < n => log(log(n)) < log(n), so f(n) = O(g(n))
- d. since (log(n))^2 lower than n, so f(n) = Omega(g(n))
- e. f(n) = Omega(g(n))
- f. f(n) = Theta(g(n))
- g. f(n) = O(g(n))
- h. f(n) = O(g(n))
- Motivation: Divide and Conquer Break a problem of size n into smaller problems such that by solving the smaller problems, we can construct a solution the entire problem:
- Divide
- Solve each of the smaller problems
- Combine solutions
- Recurrence describes a function
T(n)
in terms of its values on smaller inputsT(m)
, where m < n - There are 3 ways to solve Recurrence
- Example of Master Theorem:
- BFS: needs to use queue as processing order of the nodes in First-in-First-out (FIFO) manner.
- Application: do traversal or find the
shortest path from the root node to the target node
- Round 1: we process the root node
- Round 2: we process the nodes next to the root node
- Round 3: we process the nodes which are two steps from the root node; so on and so forth.
- Similar to tree's level-order traversal, the nodes closer to the root node will be traversed earlier.
- If a target node is added to the queue in the
kth
round, the length of the shortest path between the root node and target node is exactlyk
. - That is to say, you are already in the shortest path the first time you find the target node.
- After each outer while loop, we are one step farther from the root node.
- Variable
step
indicates the distance from the root node and the current node we are visiting. Template 1
not need keep the visited hash set becauese:- There is no cycle, for example, in tree traversal
- You do want to add the node to the queue multiple times.
def BFS(root, target):
queue = dequeue() #store all nodes which are waiting to be processed
step = 0 #number of steps neeeded from root to current node
queue.append(root) #append root to queue
while queue:
step = step + 1
#iterate the nodes which are already in the queue
size = len(queue)
for _ in range(size):
cur_node = queue.popleft() #dequeue the first node in queue
if cur_node == target: return step #return step if cur is target
#else: continue to add the neighbors of cur_node to queue
for (Node next : the neighbors of cur_node):
add next to queue
return -1 #there is no path from root to target
- It is important to make sure that we never visit a node twice. Otherwise, we might get stuck in an infinite loop, e.g. in graph with cycle.
- Used
set
, instead oflist
, to make avisited
set for marking the node is visted. For example: Open the Lock, Number of Islands,
def BFS(root, target):
queue = dequeue() #store all nodes which are waiting to be processed
visited = set() #store all the nodes that we've visited
step = 0 #number of steps neeeded from root to current node
queue.append(root) #append root to queue
while queue:
step = step + 1
#iterate the nodes which are already in the queue
size = len(queue)
for _ in range(size):
cur_node = queue.popleft() #dequeue the first node in queue
if cur_node == target: return step #return step if cur is target
#else: continue to add the neighbors of cur_node to queue
for neighbour in cur_node.neighbour:
if neighbour not in visited: #ensure that the neighbor is not visited
visited.add(neighbour)
queue.append(neighbour)
return -1 #there is no path from root to target
- ADTs: designing abstractions
- Data Structures: concrete implementations
- Data Structures are ways to store and organize of data to facilitate efficient
- Access(Read)
- Modification(Write)
- An Abstract Data Type (ADT) is a mathematical models of the data structures (DS is implemenetation), which defines
- Types of data stored
- Operations that should be supported (For ex: design DS easily to insert and delete)
- Parameters for the operations
Array is a continuous chunks of memory. Computer registers:
- Starting Address: 0x8600 (0x is prefix for hexadecimal). Address of nth cell =
starting addr + index*sizeOf(data_type)
- Data Type: of values stored in the cells
- Size: How many cells are there
- Node, is a data structure, consists of a value and a pointer to the next node
- Linked List is the parent of data structure of node
- Computer registers: head of the Linked List
- Advantage: Easy to expand
- Disadvantage:
- (1) Space: Half of space is wasted on pointer as need to store address of next value (modern computer's address is 8 bytes)
- (2) Slow to retrieve: as need to jump here and there in the memory (Array you can continuously search and retrieve, say 30th element = starting addr + 30x4 - O(1))
- Circular Linked List: a linked list where all nodes are connected to form a circle. There is no NULL at the end.
- Doubly Linked List: usually use both Head & Tail for reversing the list, as Tail will be another Head of the list from the other direction.
Lef()
: Linked List O(n) as need to travel from Head to next not meInsert()
andDelete()
: for Array O(n) as need to shift anything when insert or delete a cell
- Deques: Since we have 2 ends, so we can enqueue & deque from both the front and back
- Stacks: Singly Linked List with
Head
- Queues: Singly Linked List with
Head
&Tail
- Deques: Doubly Linked List with
Head
&Tail
- Application: BFS
- To implement a queue using dynamic array and an index pointing to the
head
of the queue. - Solution:
Circular queue
Specifically, we may use a fixed-size array and two pointers to indicate the starting position and the ending position. And the goal is to reuse the wasted storage we mentioned previously.- Example: Design Circular Queue Please refer the link for Circular Queue animation
- Queue with List: dequeue
list.pop(0)
(But requiring O(n) as need to shift all elements) time and enqueuelist.append(item)
- Queue with Built-in Function:
from collections import deque
q = deque() # Initializing a queue
q.append('a') # Adding elements to a queue
# Removing elements from a queue - only O(1) in compare with using List to implement Queue
q.popleft()
- 5.1.1. Dynamic Arrays: a linked list of arrays
- Which Level ? (i.e: which array):
Array_Index = (pos + 1).bit_length - 1 = h
- Which Cell ? (at the particular Array index):
Cell_Index = pos - array_index = pos - (2**h - 1)
- For example: pos = 5 → pos + 1 = 6 = (110) → bit_len(110) = 3 → Array_Index = bit_len(110) - 1 = 2.
- Therefore, the element @ pos = 5 at 2nd array_index, and the cell index = 5 - (2**2-1) = 2.
- Which Level ? (i.e: which array):
def __translate(self, pos):
h = (pos + 1).bit_length() - 1
return h, pos + 1 – 2 ** h
-
5.1.2. Number of Arrays:
# of arrays = log(capcity + 1)
- Capacity = 1 → 1 array
- Capacity = 3 → 2 arrays
- Capacity = 7 → 3 arrays
-
5.1.3. Dynamic Arrays in Python: In Python, the capacity grows ~1/8 of the current size
def grow(curr_size):
new_allocated = (curr_size >> 3) + (3 if curr_size < 9 else 6)
return curr_size + new_allocated
- Access(i) = O(log(n)): Since we first we need to travel at most log(n), which is max level of arrays from 5.1.2. Number of Arrays, then O(1) to access the horizontal position within that array level.
- Append(i) = O(log(n)): As we will need to traverse to the last array by log(n) then O(1) to the last position.
- Delete & Insert = O(n): We need to shift anyone
- Tree: is made up of a single node r (called the
root
) connected to disjoint sets of nodes (called subtrees of r) - Internal Node: A node with at least one child
- Leaf Node: A node with no children
- From graph view, a tree can also be defined as a directed acyclic graph which has
N
nodes andN-1
edges.
- Level of a Node:
- If Root → Level 0
- If not Root → Level of Node = Level of Parent + 1
- Height of the Tree: maximum level of its nodes
- An empty tree or a node with at most 2 branches
- Full Binary Tree: Either empty or A node with both left and right subtrees being full binary trees of the same height
- # of Nodes of a full binary tree of height h:
2^(h+1) - 1
- # of Nodes of a full binary tree of height h:
- Complete Binary Tree: Left-justified Tree
- A completely filled tree on all levels except possibly the lowest level, which is filled from the left.
- A complete binary tree has at most one node with only one child.
- Traversal Type
Pre-Order
: Root - Left - RightIn-Order
: Left - Root - Right
class Solution: def inorderTraversal(self, root: Optional[TreeNode]) -> List[int]: res = [] self.dfs_inorder(root, res) return res def dfs_inorder(self, root, res): if (root): self.dfs_inorder(root.left, res) res.append(root.val) self.dfs_inorder(root.right, res)
Post-Order
: Left - Right - Root- Note 1: Delete nodes in a tree, deletion process will be in post-order. That is to say, when you delete a node, you will delete its left child and its right child before you delete the node itself.
- Note 2: Post-order is widely use in mathematical expression.
- original expression using the inorder traversal. However, it is not easy for a program to handle this expression since you have to check the priorities of operations.
- If you handle this tree in postorder, you can easily handle the expression using a stack. Each time when you meet a operator, you can just pop 2 elements from the stack, calculate the result and push the result back into the stack.
Breadth-First Search
is an algorithm is to traverse the tree level by level.- We will use a queue to help us to do BFS.
def levelOrder(self, root: Optional[TreeNode]) -> List[List[int]]: res = [] if (root): queue = [root] while(len(queue) > 0): #in order to keep track on Level, we will based on len of queue size, cur_level = len(queue), [] #at level 0, size_level_0 = 1 #at level 1, size_level_2 = 2 for _ in range(size): cur_node = queue.pop(0) cur_level.append(cur_node.val) if (cur_node.left): queue.append(cur_node.left) if (cur_node.right): queue.append(cur_node.right) res.append(cur_level) return res
- A
binary search tree
(BST), a special form of a binary tree, satisfies the binary search property:- The value in each node must be greater than (or equal to) any values stored in its left subtree.
- The value in each node must be less than (or equal to) any values stored in its right subtree.
- inorder traversal in BST will be in ascending order
- BSTs support three main operations: search, insertion and deletion.
class Solution:
def searchBST(self, root, val):
if (root):
if root.val == val:
return root
elif root.val > val:
return self.searchBST(root.left, val)
elif root.val < val:
return self.searchBST(root.right, val)
else:
return None
class Solution:
def insertIntoBST(self, root, val):
if not root:
return TreeNode(val) #Base Case: Null Node, then return a new Node with "val"
if root.val < val:
root.right = self.insertIntoBST(root.right, val) #If root.right is Null, so TreeNode(val) will be returned from Base Case & assign to root.right
elif root.val > val:
root.left = self.insertIntoBST(root.left, val) #If root.left is Null, so TreeNode(val) will be returned from Base Case & assign to root.left
return root #return root after insertion
- First, traverse it until
root.val == key
. - Case 0: node do not have any children, like 1, 8, 11, 14, 6 or 18: then we just delete it and nothing else to do here.
- Case 1: node has left children, but do not have right, for example 3 or 20. In this case we can just delete this node and put connection betweeen its parent and its children: for example for 3, we put connection 5->1 and for 20 we put connection 17->18. Note, that the property of BST will be fulfilled, because for parent all left subtree will be less than its value and nothing will change for others nodes.
- Case 2: node has both children, like 12, 5, 7, 9 or 15. In this case we can not just delete it.
- Let us consider node 5. We want to find succesor of this node: the node with next value, to do this we need to go one time to the right and then as left as possible.
- For node 5 our succesor will be 6: we go 5->7->6.
- How we can delete node 5 now? We swap nodes 5 and 6 (or just put value 6 to 5) and then we need to deal with new tree, where we need to delete node which I put in square.
- How to do it? Just understand, that this node do not have left children, so it is either Case 1 or Case 3, which we already can solve.
class Solution:
def deleteNode(self, root, key):
if not root: return None #Base Case
if root.val > key:
root.left = self.deleteNode(root.left, key)
elif root.val < key:
root.right = self.deleteNode(root.right, key)
else: #root.val = key
#Case 0: node has no child
if not root.left and not root.right: return None
#Case 1: node has 1 child
if not root.left: return root.right #if no left child, so right child will replace root
if not root.right: return root.left #if no right child, so left child will replace root
#Case 2: node has 2 child, will replace root with its successor
if root.left and root.right:
#root's succssor = left-most child of the root's right sub-tree
temp = root.right
while(temp.left):
temp = temp.left
#replace root with it successor's val
root.val = temp.val
#delete root's succssor: Since root's succssor will have no left child
#so delete it will fall under case 0, or case 1 only
root.right = self.deleteNode(root.right, temp.val)
return root
definition. A binary heap is a complete binary tree with the following binary heap property:
- (1) if the key at a node is greater than or equal to the key of its parent, we call it a min-heap.
- (2) if the key at a node is smaller than or equal to the key of its parent, we call it a max-heap.
- A heap can be represented by a list with indices starting from 1.
- example. A max-heap in the previous example can be represented by a list
[None, 6, 5, 3, 4, 2, 1]
.
Given a key at position i
:
- Position of the left child:
2*i
- Position of the right child:
2*i + 1
- Position of the parent:
i//2
- example. i = 4, value = 17 → parent =
i//2
Left Child =2*i
, Right Child =2*i+ 1
- example. i = 4, value = 17 → parent =
- # of Internal Nodes =
floor(log(n)) + 1
- if heap has 1 node it's height will be 1
- if heap has from 2 to 3 nodes it's height will be 2
- if heap has from 4 to 7 nodes it's height will be 3
- ...
- if heap has from 2^i to 2^(i+1) - 1 nodes it's height will be i
Since Heap is is a complete binary tree, therefore:
- # of Internal Nodes =
floor(n/2)
orn//2
- # of Leafs =
# of internal nodes
or# of internal nodes + 1
heapify
: convert an arbitrary list → a list that satisfies max-heap property. For ex:[None, 1, 2, 3]
→[None, 3, 2, 1]
insert
: insert an element into the list without violating the max-heap property.pop
: retrieve the maximum element of the list.
- Time Complexity of Heapify:
O(n)
(See Proof Below) - Time Complexity of Heapify_at(i):
O(log(n))
since worst case, we need to shift down from root to leaf = all levels (h = log(n))
- Step 1: Start from the last internal node
- For a heap of size
n
, there are exactlyn//2
internal node, so we can start checking the max-heap property from the last internal node, with indexn//2
to the root node (index 1).
- For a heap of size
def heapify (self):
for i in range (self.__size // 2, 0, -1):
self.__heapify_at(i)
- Step 2: Execute heapify_at(i) for all the internal nodes
- heapify_at(i) function does is to make sure, every sub-heap rooted at element with position i satisfies the heap property, i.e., the key at the root is not smaller than any other keys in the sub-heap.
- try_swap function: if the child key is greater than the parent key, we swap the two keys, and return true for the success of swap action. Otherwise, it is false. The heapify will be notified by the returned true or false, and decide if it is necessary to heapify the sub-heap rooted at the child, since the key has changed smaller.
def __heapify_at (self, i): if 2*i > self.__size: #Base case: There is no left or right child, i.e: it is the leaf node now return elif 2*i == self.__size: #Case 1: internal node has only child, i.e: this left child (2*i) at the end of the list if self.__try_swap(i, 2*i): __heapify_at(2*i) elif self.__content[2*i] > self.__content[2*i+1]: #Case 2.1: internal node has 2 child & left child is larger if self.__try_swap(i, 2*i): __heapify_at(2*i) #Swap i with its left child and perform heapify_at(its_left_child) else: #Case 2.2: internal node has 2 child & right child is larger if self.__try_swap(i, 2*i+1): __heapify_at(2*i+1) #Swap i with its right child and perform heapify_at(its_right_child)
def __try_swap(self, i,j): if self.__content[i] >= self.__content[j]: #if value_at_i > value_at_j, dont need to swap as Parent > its child return False else: self.__content[i], self.__content[j] = self.__content[j], self.__content[i] #if parent < child, need to swap return True
- Time Complexity of Insertion:
O(log(n))
since worst case, we need to shift up from left to root = all levels (h = log(n)) - insert is to append a new element at the end of the list.
- After we inserted this new element, we have to check if it is larger than its parent,
- if yes, we swap it with the parent, until the next parent is larger.
- The try_swap function is used here to break the loop when the parent contains a larger key.
def insert (self, k):
self.__size += 1
i = self.__size
self.__content[i] = k
while i > 1 and self.__try_swap(i // 2, i):
i = i // 2
- Time Complexity of Insertion:
O(log(n))
as same as T(heapify_at(root)) - pop is to return the maximum key in the heap, which is contained in the root of the heap, at index 1.
- However, we cannot leave the heap without its root.
- So the last element of the heap is now put at the root, and
- this is followed by heapify_at to maintain the heap property.
def pop (self):
if self.__size == 0:
return None
else:
k = self.__content[1]
self.__content[1], self.__content[self.__size] = self.__content[self.__size], None
self.__size -= 1
self.__heapify_at(1)
return k
Question: For a priority queue, where does the saving come from using heap extract and insert instead of merge sort?
- Since priority queue problem is to extract the maximum
- Using heap, we can use heapify, which take
O(n)
, to maintain the max heap, say10 9 7 8 4 5 3
(→ this is partially sorted order) → then we can extract the maximum10
. - Meanwhile, if using merge sort, which will take
O(nlogn)
, to maintain totally sorted order, say10 9 8 7 5 4 3
→ then we can extract the maximum10
. - Answer: depend on the use-case, for this use-case, we only need to maintain partially sorted order, so using Heap
O(n)
is Better than Merge SortO(nlogn)
- Un-directed Graph:
n(n-1)/2
- In an undirected graph, each edge is specified by its two endpoints and order doesn't matter. The number of edges is therefore the number of subsets of size 2 chosen from the set of vertices. Since the set of vertices has size n, the number of such subsets is given by the binomial coefficient C(n,2) (also known as "n choose 2"). Using the formula for binomial coefficients, C(n,2) = n(n-1)/2
- Directed Graph:
n(n-1)
- Each edge is specified by its start vertex and end vertex. There are n choices for the start vertex. Since there are no self-loops, there are n-1 choices for the end vertex. Multiplying these together we have n(n-1) all possible choices.
- Dense Graph:
|E| = O(|V|^2)
- Sparse:
|E| = O(|V|)
- Main Difference (BFS and DFS on Graph (vs Tree)): have to mark nodes that already searched
- Time Complexity:
- Search on Tree:
O(n)
where n is number of nodes - Search on Graph:
O(n+e)
where n is number of nodes; e is number of edges
- Search on Tree: