# Oral Exam preparation:


### Algorithms to know
- Merge sort, quicksort
- Longest common subsequence
- Huffman encoding
- Signals, BFS, DFS, Shortest paths, topological sort
- Dijkstra, Bellman-Ford, Floyd-Warshall, Prim, PageRank






### Big O

In [1]:
# Why slicing and concatenation ar O(n) and not O(k):
#

# Slicing :
#       k is the length of the substring being sliced or concatenated.
#       k is different at different subproblems (between 1...n).
#       k ultimately depends on n.
#   
#       But worst case, k = n (k grows linearly with the input size.)
#       
#       Slicing is O(n).

# Concatenation
#       Strings are immutable in Python.
#       So adding a char (concatenation) you must create a NEW string.
#       Which means iterating over the entire string (len k).
#       
#       Again, worst case could be k = n.
#       
#       So concatenation is O(n).


## 1. Divide and Conquer

### Merge Sort

In [2]:
# Merge Sort :
#       To visualize : https://pythontutor.com/
#
# Methodology :
#       - Goal : sort a list of numbers
#       - Idea : halve the list recursively until we have lists of size 1 or 0 (which are sorted by definition)
#               then merge the sorted lists back together
# 
#       1. Base case : Return immediately if list is size 1 (sorted).
#       2. Divide : Split input into two halves, 
#       3. Conquer : 
#           recursively sort each half. 
#           compare the 1st two elems of each list, add the smallest one to the sorted_list

#   Example : [3, 7, 4, 1]
#       Split into [3, 7] and [4, 1]
#       split [3, 7] into [3] and [7] 
#        -> while A and B are not empty, compare the first elements of each, pop them and add them to sorted_list.
#        -> result is [3, 7]
#       same for [4, 1] -> [1, 4]
#
#       finally merge [3, 7] and [1, 4] 
#         -> compare A[0]=3 and B[0]=1, pop B[0], add to sorted_list = [1]
#         -> compare A[0]=3 and B[0]=4, pop A[0], add to sorted_list = [1, 3]
#         -> compare A[0]=7 and B[0]=4, pop B[0], add to sorted_list = [1, 3, 4]
#         -> B is empty, add remaining A to sorted_list = [1, 3, 4, 7]

# ====================================
def merge_sort(numbers):
    # Simple cases
    if len(numbers) < 2: return numbers         # O(1)
        
    # Divide
    M = len(numbers) // 2                       # O(1)
    L1 = numbers[:M]                            # O(1)
    L2 = numbers[M:]                            # O(1)
    
    # Conquer - assuming merge_sort worked on A and B
    A = merge_sort(L1)                          # T(n/2)
    B = merge_sort(L2)                          # T(n/2)
    sorted_numbers = []                         # O(1)
    while A and B:                              # O(n) - iterations expected
        sorted_numbers += [
            A.pop(0) if A[0] < B[0] else B.pop(0)   
        ]
    return sorted_numbers + A + B              # O(n) for concatenation

cards = [3, 7, 4, 1, 2, 7, 3]
sorted_cards = merge_sort(cards)
print(sorted_cards)


# ======== COMPLEXITY =========
#
#       Base case : O(1)
#       Divide    : O(1)
#       Conquer :
#           -Recursion : 2 T(n/2)
#           -Merging   : O(n)
#
#       Total : 
#           T(n) = 2 T(n/2) + O(n) 
#
#       Dividing the list in half cost O(logn) because 8->4->2->1 is 3 divisions (log2(8)=3)
#       Merging at each level costs O(n)
#       So total cost is O(n) * O(logn) = O(n log n)
#
#       => T(n) = O( n log n )      (or prove it by Master Theorem)
#

# -----------------------------------
# This one avoids pop(0) which adds O(n) complexity due to shifting elements
# Instead we use indices for both sublists.
def merge_sort_efficient(numbers):
    if len(numbers) < 2:
        return numbers                                   # O(1)
    m = len(numbers) // 2
    left = merge_sort_efficient(numbers[:m])             # T(n/2) (slicing cost O(m))
    right = merge_sort_efficient(numbers[m:])            # T(n/2) (slicing cost O(n-m))
    i = j = 0
    out = []
    while i < len(left) and j < len(right):              # O(n) total for merge
        if left[i] < right[j]:
            out.append(left[i]); i += 1                  # append is amortized O(1)
        else:
            out.append(right[j]); j += 1
    if i < len(left):
        out.extend(left[i:])                             # O(k) copy of remainder
    if j < len(right):
        out.extend(right[j:])                            # O(k) copy of remainder
    return out



[1, 2, 3, 3, 4, 7, 7]


### Quick Sort

In [3]:
# Quick Sort :
#
# Methodology :
#    Goal : sort a list of numbers
#    Idea : pick a pivot (median, first/last elem, random, ...),
#             put the smaller elems than pivot toleft,bigger ones to right, 
#             recursively sort low and high, then combine: low + pivots + high
#
#   1. Base case : Return list immediately if list size is 0 or 1 (by definition, sorted).
#   2. Divide    : Partition the array around a pivot into `low`, `pivots`, `high`.
#   3. Conquer   : Recursively sort `low` and `high`, then return `low + pivots + high`.
#
#   Example : [3, 7, 4, 1]
#       pivot = 4 (middle)
#       low = [3,1], pivots=[4], high=[7]
#       quicksort(low) -> [1,3]; quicksort(high) -> [7]
#       result -> [1,3,4,7]


# ====================================
def quicksort(numbers):
    # Simple cases
    if len(numbers) < 2: return numbers               # O(1)
    
    # CHOOSE pivot type
    # pivot = numbers[0]                              # First elem : O(1)
    pivot = numbers[len(numbers) // 2]                # Middle elem : O(1)

    # Divide
    low    = [x for x in numbers if x <  pivot]       # O(n) - each list scans ALL the elems of the original list -> O(n).
    pivots = [x for x in numbers if x == pivot]       # O(n)
    high   = [x for x in numbers if x >  pivot]       # O(n)
        # ---> Total: 3 * O(n) = O(n)
    
    # Conquer
    return quicksort(low) + pivots + quicksort(high)  # O(n) for concatenation


cards = [1, 2, 6, 5, 3, 7, 4]
res = quicksort(cards)
print(res)



# ======== COMPLEXITY =========
#
#   1. Base case : O(1)
#   2. Divide    : O(1)
#
#   3. Conquer :
#       Recursion : 
#           Average : divide list equally = 2 T(n/2)
#           Worst   : unbalanced lists    = T(n-1) + T(0)     _(one list has all elements except pivot, other is empty)
#       Merging :
#           Dividing lists : 3 * O(n) = O(n)
#           Concatenation (low + pivots + high): O(n)
#           Total : O(n)
#
#   Total : 
#       Average : T(n) = 2 T(n/2) + O(n)
#                    => T(n) = O(n log n)   (Master Theorem)
#       Worst   : T(n) = T(n-1) + T(0) + O(n)
#                   => T(n) = O(n^2)       (arithmetic series sum)



# Conclusion :
    # Time Complexity:
    # - Best and average case: O(n log n)
    # - When the pivot divides the list into roughly equal halves at each recursive step, the depth of recursion is about log n.
    # - Each level of recursion processes all n elements to partition into low, pivots, and high lists, resulting in O(n) work per level.
    # - Total: O(n log n).

    # - Worst case: O(n^2)
    # - When the pivot is always the smallest or largest element, leading to highly unbalanced partitions.
    # - The recursion depth becomes n, and each level processes all remaining elements, resulting in O(n^2).




[1, 2, 3, 4, 5, 6, 7]


## 2. Dynamic Programming :

#### LCS - Return Int:

In [5]:
# Longest Common Subsequence (LCS) between two strings A and B


# Methodology :
#      Goal : return the NUMBER of common subsequent characters between A and B
#      Idea : if first letters match -> take them and recurse on A[1:], B[1:]
#             else -> try skipping A[0] or B[0] and take the max

# ================================================================
import functools
@functools.lru_cache(maxsize=None)
def LCS(A, B):
    # Base case:
    if len(A) == 0 or len(B) == 0: return 0         # O(1)

    # check if first letters match
    if A[0] == B[0]: return 1 + LCS(A[1:], B[1:])   # O(slicing) + T(n-1, m-1)

    l1 = LCS(A, B[1:])                              # O(slicing) + T(n, m-1)
    l2 = LCS(A[1:], B)                              # O(slicing) + T(n-1, m) 

    return max(l1, l2)                     # O(1)

A = "ACE"
B = "ABCDE"
print(LCS(A, B))  # Output: 3

A = "HYPERLINKING"
B = "DOLPHINSPEAK"
print(LCS(A, B))  # Output: 4


# ======== COMPLEXITY =========
#
# Idea :
#   At each call LCS(A,B) we either:
#     - match first chars -> one recursive call on (A[1:],B[1:])
#     - or mismatch -> two recursive calls: (A, B[1:]) and (A[1:], B)
# 
# 
#    T(n, m) : length of A is n, length of B is m
# 

# No memoization : 
#
#      Best Case : 
#         All characters match, leading to a single chain of recursive calls.
#         -> T(n, m) = 1 + T(n-1, m-1)
#
#         Each recursive call reduces both strings by 1.
#         Linear chain of calls, not a tree.
#         We call at most the len(A) or len(B), whichever the smallest.
#         
#         Slicing is at worst O( min(n, m) ).
#         
#         Total Cost : O( min(n,m)^2 ) 
#                    
#         if n ~ m :
#                -> O(n^2) - Quadratic time.
#         
#      Worst Case : 
#           No characters match. Each recursion branches into two further calls, leading to an exponential number of calls.
#           -> T(n, m) = 1 + T(n-1, m  ) + T(n  , m-1) 
#
#           Each “mismatch” node branches into 2 subproblems.
#           Each branch decreases either n or m by 1.
#           Height of tree is at most min(n,m) for reaching the base case where either string is empty.
#           
#           Binary tree nodes is 2^{height} = 2^{min(n,m)}
#           
#           Slicing is at worst O(min(n,m)).
#           
#           Total Cost : O( min(n,m) * 2^{min(n,m)} ) 
#           
#           if n ~ m :
#                    -> O(n * 2^n) - Exponential time.


# With memoization :
#       
#      Example: 
#           LCS("ABC","AC") branches into :
#                   -> LCS("BC","AC") and 
#                   -> LCS("ABC","C") 
#           eventually both compute LCS("C","C") twice.
# 
#       Memoization: 
#           Store results of each subproblem (i,j) in a table (dictionary or 2D array).
#           If we ever reach (i,j) again, return the stored value instead of recomputing
#           This removes redundant computations entirely.
#           
#        How many subproblems are there in total? 
#           Since we compare "ABCDE" and "ACE", i and j will range over the lengths of A and B.
#
#           A = "ABCDE" (length n=5)
#           B = "ACE"   (length m=3)
#
#           (0, 0) -> comparing "ACE" with "ABCDE" (full strings)
#           (0, 1) -> comparing "CE"  with "ABCDE"
#           (1, 0) -> comparing "ACE" with "BCDE"
#           ...
#
#           i can be 0..n (length of A) -> n+1 possibilities 
#           j can be 0..m (length of B) -> m+1 possibilities
#
#           Total unique subproblems = (n+1) * (m+1) = O(n*m)
#
#           Each subproblem (i,j) takes O(1) time to compute (just a few comparisons and additions).
#           But slicing add O(min(n,m)) time.
#           
#           T(n, m) = (Number of unique subproblems) * (cost per subproblem)
#                   = O(n * m) * O(min(n,m)) 
#                   = O(n * m  * min(n,m))
#           
#           if n ~ m :
#                    -> O(n^2) - Quadratic time. (NO slicing)
#                    -> O(n^3) - Cubic time.     (slicing)



def LCS_inverted(A, B):
    if len(A) == 0 or len(B) == 0: return 0

    if A[-1] == B[-1]: return 1 + LCS_inverted(A[:-1], B[:-1])

    return max(LCS_inverted(A, B[:-1]), LCS_inverted(A[:-1], B))

print(LCS_inverted("HYPERLINKING", "DOLPHINSPEAK"))


3
4
4


#### LCS - Return String:

In [None]:
# LCS + return the actual subsequence string, not just its length

@functools.lru_cache(maxsize=None)
def LCS(A, B):
    # Base case:
    if len(A) == 0 or len(B) == 0: return ""           # O(1) - Return empty string

    # check if first letters match
    if A[0] == B[0]: return A[0] + LCS(A[1:], B[1:])   # O(concat) + O(slicing)

    else:
        l1 = LCS(A, B[1:])
        l2 = LCS(A[1:], B)
        return max([l1, l2], key=len)


# def LCS(A, B):
#     if len(A) == 0 or len(B) == 0: return ""
#     if A[-1] == B[-1]: return LCS(A[:-1], B[:-1]) + A[-1]
    
#     guesses = [LCS(A, B[:-1]), LCS(A[:-1], B)]
#     return max(guesses, key=len)



A = "ACE"
B = "ABCDE"
# A = "HYPERLINKING"
# B = "DOLPHINSPEAK"
print(LCS(A, B))  # Output: 4




# ======== COMPLEXITY =========
# 
# 
#   Here, instead of returning lengths, we return actual subsequence strings.
# 
#   Idea :
#     At each call LCS(A,B):
#       - match first chars -> return matched char + one recursive call on (A[1:],B[1:])
#       - or mismatch -> two recursive calls: (A, B[1:]) and (A[1:], B)
# 
#   Example:
#       Iteration 1:
#           "ACE" vs. "ABCDE".
#           match 'A' -> return 'A' + LCS("CE", "BCDE")
#
#       Iteration 2:
#           "CE" vs. "BCDE".
#           no match 'A' 
#           -> return max(  LCS("CE", "CDE"),  LCS("E", "BCDE")  )
#               eventually those LCS will return "CE" and "E" respectively.
#           So we return max("CE", "E") = "CE"
# 
# 
#   Important Note:
#       time complexity INCREASES with strings compared to just numbers.
#       previously : return 1    + LCS(...)
#       now        : return char + LCS(...)  (string concatenation)
#       
#       String concatenation takes O(n) time. (technically O(k) where k ≤ min(n−i, m−j)).
#       So the cost per subproblem increases.
#       
#       Slicing takes Θ((n−i) + (m−j)) time.
#       
#       But in the worst case, both slicing and concatenation are Θ(min(n,m)).
#       
#       Cost per subproblem = concat + slicing = O(min(n,m)) + O(min(n,m)) = O(nmin(n,m))
#       
#       Same : Total unique subproblems = O(n * m)
#       
#       Total Cost : O(n * m) * O(min(n,m)) = O(n * m * min(n,m))
#       
#       if n ~ m :
#                -> O(n^3) - Cubic time.
#       





ACE


## 3. Greedy Algorithms

### Huffman Encoding:

## 4. Unweighted Graphs

### Signals

### BFS

### DFS

### Shortest Path

### Topological Sort

## 5. Weighted Graphs

### Dijkstra

### Bellman-Ford

### Floyd-Warshall

### Prim