<h1><b>Bubble Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    T(n) &= \sum_{i=0}^{n-2}\sum_{j=0}^{n-i-2}c\\
    &= \sum_{i=0}^{n-2}(n-i-1)c\\
    &= \sum_{i=0}^{n-2}(n-1)c-\sum_{i=0}^{n-2}ic\\
    &= (n-1)(n-1)c-\dfrac{(n-2)(n-1)}{2}c\\
    &= c(n-1)\left[(n-1)-\dfrac{n-2}{2}\right]\\
    &= c(n-1)\left(\dfrac{n}{2}\right)\\
    &= \dfrac{1}{2}n^2c-\dfrac{1}{2}nc\\
    T(n)&= \boxed{\Theta(n^2)}\\
\end{aligned}
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n^2)\\
    \text{Avg. Case: }&\Theta(n^2)\\
    \text{Worst Case: }&O(n^2)\\
\end{aligned}
$

In [80]:
"""
Stable: Ensures A[j+1] >= A[j] so if [2, 2', 1] --> [2,1,2'] --> [1,2,2']
In-Place: Maintains swaps and checks within list A, no new space created/needed
"""

def bubble_sort(A):
    for i in range(0,len(A)-1): # runs n-1 times
        for j in range(0, len(A)-i-1): # runs n-i-1 times
            if A[j] > A[j+1]: # check & swap takes c time
                temp = A[j]
                A[j] = A[j+1]
                A[j+1] = temp

    return A

Modified Version Time Complexity
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n)\\
    \text{Avg. Case: }&\Theta(n^2)\\
    \text{Worst Case: }&O(n^2)\\
\end{aligned}
$

In [81]:
"""
Stable: Yes. Ensures A[j+1] >= A[j] so if [2, 2', 1] --> [2,1,2'] --> [1,2,2']
"""

def bubble_sort2(A):
    for i in range(0, len(A)-1): # runs 1 or n-1 times (because of break)
        swapped = False # assignment takes c1 time
        for j in range(0, len(A)-i-1): # runs n-i-1 times
            if A[j] > A[j+1]: # check and swap takes c2 time
                temp = A[j]
                A[j] = A[j+1]
                A[j+1] = temp
                swapped = True
        
        if not swapped: # check takes c3 time
            break   
    
    return A

In [82]:
list  = [3,1,4,2]

bubble_sort2(list)

[1, 2, 3, 4]

<ol>
    <li> Why would you use bubble sort? What advantages does it have over something like quicksort or merge sort? 
        <ul>
            <li> Bubble sort guarantees sorted elements at the end of the list after each pass, whereas merge and quick will have to fully iterate and complete all recursive calls and sort list.
            <li> In the best case, where all elements are already sorted, bubble sort will take linear time, wheras merge and quick sorts will each take log-linear regardless as defined by their recurrance realtionships.
        </ul>
    <li> What is true about the list after the frist pass of bubble sort?
        <ul>
            <li> Largest element in list is now at the last index.
        </ul>
</ol>

<h1><b>Insertion Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    T_{\text{best}}(n) &= \sum_{i=1}^{n-1}c_1+c_2+c_4\\
    &= (n-1)(c_1+c_2+c_4)\\
    T_{\text{best}}(n)&= \boxed{\Theta(n)}\\
\end{aligned}
$
$$

$$
$
\begin{aligned}
    T_{\text{avg}}(n) &= \sum_{i=1}^{n-1}c_1+i(c_2+c_3)+c_4\\
    &= (n-1)(c_1+c_4)+\dfrac{(n-1)(n)}{2}(c_2+c_3)\\
    &= (n-1)\left[(c_1+c_4)+\dfrac{n}{2}(c_2+c_3)\right]\\
    &= \dfrac{1}{2}n^2(c_2+c_3)-\dfrac{1}{2}n(c_2+c_3)+n(c_1+c_4)-(c_1+c_4)\\
    T_{\text{avg}}(n)&= \boxed{\Theta(n^2)}\\
\end{aligned}
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n)\\
    \text{Avg. Case: }&\Theta(n^2)\\
    \text{Worst Case: }&O(n^2)\\
\end{aligned}
$

In [83]:
"""
Stable: Yes. Ensures A[i] < A[j] so if [2, 2', 1] --> [2,1,2'] --> [1,2,2']
"""

def insertion_sort(A):
    for i in range(1, len(A)): # runs n-1 times
        key = A[i] # takes c1 time
        j = i-1 # takes c1 time
        while j >= 0 and key < A[j]: # takes c2 time for every check (either 1 check or n-2 checks)
            A[j+1] = A[j] # takes c3 time
            j-=1 # takes c3 time
        A[j+1] = key # takes c4 time

    return A

<ol>
    <li>Do you actually know the difference between this one and selection sort?
    <ul>
        <li> Insertion sort runs inversions and moves elements down, whereas selection sort searches for smallest element and swaps it with element in the beginning.
        <li> Both sorts sort smallest elements in the beginning first.
    </ul>
</ol>

<h1><b>Selection Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    T(n) &= \sum_{i=1}^{n-2}\left[c_1+\left[\sum_{j=i+1}^{n}c_2\right]+c_3\right]\\
    &= \sum_{i=1}^{n-2}c_1+(n-i-1)c_2+c_3\\
    &= \sum_{i=1}^{n-2}[c_1+(n-1)c_2+c_3] - \sum_{i=1}^{n-1} ic_2\\
    &= (n-1)^2c_2+(n-1)(c_1+c_3)-\dfrac{(n-1)n}{2}c_2\\
    &= \dfrac{1}{2}n^2c_2+n(c_1-\dfrac{1}{2}c_2+c_3)-(c_1+c_3)\\
    T(n)&= \boxed{\Theta(n^2)}\\
\end{aligned}
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n^2)\\
    \text{Avg. Case: }&\Theta(n^2)\\
    \text{Worst Case: }&O(n^2)\\
\end{aligned}
$

In [84]:
"""
Stable: No. Does not ensure [2,2',1] --> [1,2',2] stays in order.
"""

def selection_sort(A):
    for i in range(0, len(A)-1): # runs n-1 times
        min_index = i # takes c1 time
        for j in range(i+1, len(A)): # runs n-i-1 times
            if A[j] < A[min_index]: # check takes c2 time
                min_index = j

        # swap takes c3 time
        temp = A[i] 
        A[i] = A[min_index]
        A[min_index] = temp
    
    return A

<ol>
    <li>What is true after the first pass of selection sort?
    <ul>
        <li> The smallest element is sorted and will be at the beginning of the list.
    </ul>
    <li>Does selection sort use more or less operations than bubblesort? Why or why not?
    <ul>
        <li> Less operations becaues bubble sort will keep swapping when it checks, but selection sort only swaps once at the end.
    </ul>
</ol>

<h1><b>Merge Sort</b></h1>

Standard Version Time Complexity
$$

$$
Recurrance Relationship:
$
    T(n) = 2T(n/2) + f(n),\ T(1)=C \text{ \& } f(n)=\Theta(n)
$
$$

$$
Master Theorem (All Cases): 
$
\left\{\begin{aligned}
    T(n) &= 2T(n/2) + \Theta(n)\\
    c=1 &\text{ \& } a=2, b=2, \log_ba = \log_22 = 1 \rightarrow \operatorname{Case}\ 2\\
    T(n) &= \boxed{\Theta(n\lg n)}\\
\end{aligned}\right.
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n\lg n)\\
    \text{Avg. Case: }&\Theta(n\lg n)\\
    \text{Worst Case: }&O(n\lg n)\\
\end{aligned}
$

In [85]:
"""
Stable: Yes. Ensures L[i] < R[j] so maintains left to right order of same valued elements when merging.
"""

def merge(A, L, R):
    i = 0
    j = 0
    k = 0
    while i < len(L) and j < len(R):
        if L[i] < R[j]:
            A[k] = L[i]
            i+=1
        else:
            A[k] = R[j]
            j+=1
        k+=1

    while i < len(L):
        A[k] = L[i]
        i+=1
        k+=1

    while j < len(R):
        A[k] = R[j]
        j+=1
        k+=1

def mergesort(A):
    if len(A) > 1:
        mid = len(A) // 2
        L = A[0:mid]
        R = A[mid:]
        mergesort(L) # runs in T(n/2) time
        mergesort(R) # runs in T(n/2) time
        merge(A, L, R) # runs in f(n) ~ n time

    return A

<p>Be able to not just do mergesort, but also merge. You should know the recurrence for this one off
the top of your head.</p>
<ol>
    <li>Do you actually know the difference between this one and selection sort?
    <ul>
        <li> This a recursive sorting algorithm , where as selection sort is a iterative sorting algorithm.
    </ul>
    <li>What makes mergesort parallelizable?
    <ul>
        <li> This algorithm applies a divide and conquer approach to split the problem into sub-problems with the same algorithm and builds its way back up.
    </ul>
</ol>

<h1><b>Quick Sort</b></h1>

Standard Version Time Complexity
$$

$$
Reccurance Relationship:
$
    T(n) = T(k) + T(n-k-1) + f(n),\ T(1)=C \text{ \& } f(n)=c_1+c_2(n-1)\\
$
$$

$$
Master Theorem (Best Case - Pivot is Middle of List): 
$
\left\{\begin{aligned}
    T_{\text{best}}(n) &= 2T(n/2) + f(n),\ T(1)=C \text{ \& } f(n)=\Theta(n)\\
    c=1 &\text{ \& } a=2, b=2, \log_ba = \log_22 = 1 \rightarrow \operatorname{Case}\ 2\\
    T_{\text{best}}(n) &= \boxed{{\Theta(n\lg n)}}\\
\end{aligned}\right.
$
$$

$$
Pattern Analysis (Worst Case - Pivot is End of List): 
$
\left\{\begin{aligned}
    T_{\text{worst}}(n) &= T(n-1) + c_1 +c_2(n-1),\ T(1)=C\\
    T(1) &= C\\
    T(2) &= T(1)+c_1+c_2(2-1) = c_1 + c_2 + C\\
    T(3) &= T(2)+c_1+c_2(3-1) = 2c_1 + 3c_2 + C\\
    T(4) &= T(3)+c_1+c_2(4-1) = 3c_1 + 6c_2 + C\\
    T(5) &= T(4)+c_1+c_2(5-1) = 4c_1 + 10c_2 + C\\
        &\vdots\\
    T(n) &= nc_1+\dfrac{n(n-1)}{2}c_2+C \\
        &= \dfrac{1}{2}n^2c_2+n\left(\dfrac{1}{2}c_2+c_1\right)+C \\
        T(n) &= \Theta(n^2)\\
    T_{\text{worst}}(n) &= \boxed{\Theta(n^2)}\\
\end{aligned}\right.
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n\lg n)\\
    \text{Avg. Case: }&\Theta(n\lg n)\\
    \text{Worst Case: }&O(n^2)\\
\end{aligned}
$

In [86]:
"""
Stable: No. Because of the A[i] <= pivot_key being the swap flag, when doing final swap of A[t] <-> A[r], we may have [...,4,3',...,3] -> [...,3,3',...,4].
"""
def partition(A, l, r):
    privot_key = A[r]
    t = l
    for i in range(l, r):
        if A[i] <= privot_key:
            temp = A[i]
            A[i] = A[t]
            A[t] = temp
            t+=1
                
    A[r] = A[t]
    A[t] = privot_key

    return t

def quicksort(A, l, r):
    if l < r:
        pivot_index = partition(A, l, r)
        quicksort(A, l, pivot_index-1)
        quicksort(A, pivot_index+1, r)
    
    return A

<p>Know how partition works, and how it interacts with the best and worst case. Make sure you
remember what a pivot is.</p>
<ol>
    <li>What is the recurrence for Quicksort? Is it different in the best, worst, and average case?
    <ul>
        <li>(Recurrance stated above) Yes, the worst is when the pivot is at the end of the list and best case is when the pivot is the middle of the list. 
    </ul>
    <li>Why is having the partition be the median better?
    <ul>
        <li> The median is guaranteed to be in the middle of the lsit after partition, so we will always have the best case recurrance relationship.
    </ul>
</ol>

<h1><b>Heap Sort</b></h1>

Standard Version Time Complexity
$$

$$
Maxheapify (Run on $n$ nodes):
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(1)\\
    \text{Avg. Case: }&\Theta(\lg n)\\
    \text{Worst Case: }&O(\lg n)\\
\end{aligned}\right.
$
$$

$$
Convert To Max Heap (Run on $\frac{n}{2} - 1\leqslant\operatorname{floor}(\frac{n}{2}) \leqslant \frac{n}{2}$ nodes):
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(n)\\
    \text{Avg. Case: }&\Theta(n\lg n)\\
    \text{Worst Case: }&O(n\lg n)\\
\end{aligned}\right.
$
$$

$$
Heapsort:
$
\left\{\begin{aligned}
    T_{\text{best}}(n) &= O(n)+\sum_{i=2}^n[C+\Theta{1}] = O(n)+(C+\Theta(1))(n-1) = O(n)\\
    T_{\text{worst}}(n) &= O(n\lg n)+\sum_{i=2}^n[C+O{\lg(i-1)}] \leqslant O(n\lg n)+\sum_{i=2}^n[C+O{\lg(n)}] = O(n\lg n)+(C+O(\lg n))(n-1) = O(n\lg n)\\
\end{aligned}\right.
$

$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n)\\
    \text{Avg. Case: }&\Theta(n\lg n)\\
    \text{Worst Case: }&O(n\lg n)\\
\end{aligned}
$

In [87]:
import math
"""
Stable: No. Same elements may end up different subtrees in heap, not consitent stability.
"""

def maxheapify(A, i, n):
    left_node = 2*i
    right_node = 2*i+1
    largest_node = i
    
    if left_node <= n and A[left_node] > A[largest_node]:
        largest_node = left_node
    if right_node <= n and A[right_node] > A[largest_node]:
        largest_node = right_node
    if largest_node != i:
        temp = A[i]
        A[i] = A[largest_node]
        A[largest_node] = temp
        maxheapify(A, largest_node, n)
    
def convert_to_maxheap(A, n):
    for i in range(math.floor(n/2), 0, -1):
        maxheapify(A,i,n)

    return A

def heapsort(A, n):
    convert_to_maxheap(A, n)
    
    for i in range(n,1,-1):
        temp = A[1]
        A[1] = A[i]
        A[i] = temp
        maxheapify(A, 1, i-1)
    
    return A

<p>Make sure you know what a heap is, how to build one, and how to use it as a sort. There’s a number of different operations you might have to perform with
this one (removing the min, building the heap, etc).</p>
<ol>
    <li>What is a heap? How is it different from a binary tree?
    <ul>
        <li>Heap is a subset of binary trees, particularly it is a <u>complete binary tree</u> (every parent node has a left child first then a right child).
        <li>Heaps can be classified as max or min heaps. In a <b>max heap</b>, <u>children < parent</u>. In <b>min heap</b>, <u>childen > parent</u>.
        <li>Heaps are represented as 1-indexed arrays/lists and used for priority queues & scheduling rather than sorting.
    </ul>
    <li>Why do we send something to the top of the heap and let it float down when we remove the min?
    <ul>
        <li>Step 1: Bring max element to end of list by swapping last leaf with root, which is now sorted (this algorithm guarantees end of list to be sorted first). 
        <li>Step 2: Reconvert the tree into a heap excluding the sorted element (new root floats down till it finds it new place)
        <li>Step 3: Apply the same process again excluding sorted element.
    </ul>
    <li>Does heapsort have any worst cases or best cases?
    <ul>
        <li>Unique best case when all elements are the same value. Then there is only a scanning (no swaps/no heapify), so just iterates through all elements in the list.
    </ul>
</ol>

<h1><b>Counting Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n+k)\\
    \text{Avg. Case: }&\Theta(n+k)\\
    \text{Worst Case: }&O(n+k)\\
\end{aligned}
$

In [88]:
""" 
Stable: Yes. We are counting left to right , and place them back in order right to left in desceding indecies.
[...,3,...,3',...,3'',...] -> [...,_,_,3'',...] -> [...,_,3',3'',...] -> [...,3,3',3'',...]
"""

def counting_sort(A):
    POS = [0] * (max(A)+1) # size k, largest element in A
    ANEW = [0] * len(A) # size n, length of A

    for i in range(0, len(A)): # runs n times
        POS[A[i]] += 1
    
    for i in range(1, len(POS)): # runs k times
        POS[i] += POS[i-1]

    for i in range(0, len(A)): # runs n times
        ANEW[POS[A[i]]-1] = A[i]
        POS[A[i]] -= 1

    for i in range(0, len(A)): # runs n times
        A[i] = ANEW[i]

    return A

<p>Counting sort had multiple sub-sections. If you encounter it again, the first pass step where numbers
are counted will be called the counting array, and the second step will be called the cumulative
array. You should know the runtime, and why it’s Θ(n + k)</p>
<ol>
    <li>What are the weaknesses of counting sort? When should you not use it?
    <ul>
        <li>Counting sort, by itself/one iteration, doesn’t work on decimal values.
        <li>Counting sort is inefficient if the range of values to be sorted is very large.
        <li>Counting sort uses extra space for sorting the array elements (not in-place).
    </ul>
    <li>Why do decrement each value in the cumulative array by one?
    <ul>
        <li>To make sure the next time the element is placed in A, we put it one index below, in order.
    </ul>
    <li>What does the cumulative array represent?
    <ul>
        <li>Index of cum. array = Value in A, Value - 1 of cum. = Index of A
    </ul>
</ol>

<h1><b>Radix Sort</b</h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    f:\ &T_{\text{underlying}}(n)\\
    \text{Best Case: }&\Omega(df)\\
    \text{Avg. Case: }&\Theta(df)\\
    \text{Worst Case: }&O(df)\\
\end{aligned}
$
$$

$$
Counting Inner Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(d(n+(b-1)))\\
    \text{Avg. Case: }&\Theta(d(n+(b-1)))\\
    \text{Worst Case: }&O(d(n+(b-1)))\\
\end{aligned}\right.
$
$$

$$
Recursive CBA Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(dn\lg n)\\
    \text{Avg. Case: }&\Theta(dn\lg n)\\
    \text{Worst Case: }&O_{\text{merge}}(dn\lg n)\ \& \ O_{\text{quick}}(dn^2)\\
\end{aligned}\right.
$
$$

$$
Iterative CBA Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega_{\text{bubble || insertion}}(dn) \ \& \ \Omega_{\text{selection}}(dn^2)\\
    \text{Avg. Case: }&\Theta(dn^2)\\
    \text{Worst Case: }&O(dn^2)\\
\end{aligned}\right.
$

In [89]:
def radix_sort(A, sort):
    maxval = max (A)
    d = 1
    while d < maxval :
        sort(A, d)
        d = 10*d

<p>Understand how radix sort interacts with the underlying sort, the number of digits, and the base
of the numbers its sorting.</p>
<ol>
    <li>Why does radix sort start with the rightmost digit?
    <ul>
        <li>Least-Significant Digit (LSD) implementations are usaully stable relative to Most-Significant Digit (MSD) implementations.
        <li>Memory Efficient since LSD works right to left, where the index of the left most digit may vary (ex. [AB, ABC, A] -> [2, 3, 1]), so starting rightmost index guarantees efficency when increasing decimal places.
    </ul>
    <li>Does the underlying sort for radix matter, and if so, how?
    <ul>
        <li>Yes, in terms of complexity, stability, size.
        <li>Complexity: All radix sorts are O(df), but f is the time of the underlying sort so varying this varies radix sort time. If the underlying sort is small range of small numbers then counting sort is more efficient than using CBA sorts.
        <li>Stability: If the underlying sort is stable, the radix sort will be stable too.
        <li>Size: If the underlying sort is in-place, less space is used and is more efficient for larger datasets.
    </ul>
    <li>How does the base and length of digits affect the worst, best, and average case?
    <ul>
        <li>More digits (d approaches N) -> More time (O(d) -> O(N)) : More radix loop iterations
        <li>Larger base (b approaches N) -> More time (O(b) -> O(N)) : More underlying sort iterations
    </ul>
    <li>Is it always better to sort binary numbers with radix sort, so the underyling sort has less work to do?
    <ul>
        <li>If list is already sorted, something like Bubble/Insertion sort is better and doesnt have to check each decimal place despite having a base 2.
    </ul>
</ol>

<h1><b>Bucket Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n) \rightarrow\text{ (Uniform Distribution = 1 elem. per bucket)}\\
    \text{Avg. Case: }&\Theta(n) \rightarrow\text{ (At most 4 elem. per bucket = O(1) underlying sort)}\\
    \text{Worst Case: }&O(O_{underlying}(f(n))) \rightarrow\text{ (All elem. in 1 bucket = Worst case of underlying sort with time f(n))}\\
\end{aligned}
$

In [90]:
import math

def bucket_sort(A, sort):
    buckets = [[] for _ in range(0,len(A))]
    ANEW = []

    for i in range(0, len(A)): # runs n times
        buckets[math.floor(len(A) * A[i] / (max(A) + 1))].append(A[i])
    for b in buckets: # runs n * underlying sort (O(1) best & avg case OR O(f) worst case)
        sort(b)
    for b in buckets: # runs n times
        ANEW.extend(b)
    for i in range(0, len(ANEW)): # runs n times
        A[i] = ANEW[i]

    return A

<p>Bucket sort has a number of interesting properties. Know how the buckets are chosen and why. Know how many buckets are typically used, and how the input distribution effects the sort.</p>
<ul>
    <li>Suppose list <b>A</b> of length <b>n</b>. <u># of buckets</u> = <b>n</b> & <u>range per bucket</u> = <b>floor[max(A)/n]</b> 
</ul>
<ol>
    <li>How does the choice of underlying sort effect the runtime of the sort?
    <ul>
        <li>A <u>simpler sort</u> like <i>insertion sort</i> is generally faster for <u>small buckets</u>. It has a lower time complexity (O(n^2) in the worst case, but often performs much better in practice for nearly-sorted lists like those within buckets).
        <li>A more <u>complex sort</u> like <i>merge sort</i> might be more efficient for <u>larger buckets</u> due to its average case time complexity of O(n log n).
    </ul>
    <li>Depending on the distribution of inputs, how might you adjust the buckets?
    <ul>
        <li>If the data has a uniform distribution, then using equal-sized buckets for each digit pass is optimal. This ensures all elements have a similar chance of ending up in different buckets, leading to efficient sorting.
        <li>When the data is skewed, using equal-sized buckets can become inefficient. Many elements might end up in the same buckets, increasing the workload for the underlying sort within each pass (e.g., counting sort might need to handle a large number of elements in a single bucket).
        <li>One approach dynamically adjusts bucket sizes based on the observed distribution of digits in each pass. For example, you could allocate more buckets for digits with a higher frequency and fewer buckets for less frequent digits. This balances the workload within each pass.
    </ul>
    <li>Can you use bucket sort as your secondary sort?
    <ul>
        <li>Yes, whether its radix or bucket sort itself, calling an underlying bucket sort is fine especially if the data set is uniform (best case).
    </ul>
    <li>When is bucket sort unstable?
    <ul>
        <li>If the underlying sort is unstable, then bucket sort is also unstable.
    </ul>
    <li>What change could you make to double the average number of items per bucket, and what effect would it have?
    <ul>
        <li>???
    </ul>
</ol>

<h1><b>Karatsuba's Multiplication</b></h1>

Standard Version Time Complexity
$$

$$
Two-Digit Multiplication: 
$
\left\{\begin{aligned}
    AB &= (10A_1+A_0)(10B_1+B_0)\\
    &= 100A_1B_1+10(A_1B_0 + A_0B_1)+A_0B_0\\
    &= 100A_1B_1+10(A_1B_0 + A_0B_1 + A_0B_0 + A_1B_1 - A_0B_0 - A_1B_1)+A_0B_0\\
    &= 100A_1B_1+10((A_1+A_0)B_0 + (A_1+A_0)B_1 - A_0B_0 - A_1B_1)+A_0B_0\\
    &= 100A_1B_1+10((A_1+A_0)(B_0 + B_1) - A_0B_0 - A_1B_1)+A_0B_0\\
    &= 100k_1+10(k_2-k_1-k_3)+k_3\\
\end{aligned}\right.
$
$$

$$
Generalized Multiplication: 
$
    AB = 10^nk_1+10^{n/2}(k_2-k_1-k_3)+k_3\\
$
$$

$$
Master Theorem: 
$
\left\{\begin{aligned}
    T(n) &= 3T(n/2) + \Theta(n)\\
    c=1 &\text{ \& } a=3, b=2, \log_ba = \log_23 > 1 \rightarrow \operatorname{Case}\ 1\\
    T(n) &= \boxed{\Theta(n^{\lg 3})}\\
\end{aligned}\right.
$
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(1) \\
    \text{Avg. Case: }&\Theta(n^{\lg3}) \\
    \text{Worst Case: }&O(n^{\lg3}) \\
\end{aligned}
$

In [91]:
import math

def num_of_digits(x):
    if x == 0:
        return 1  
    
    count = 0
    while x > 0:
        count += 1
        x //= 10
    
    return count

def karatsuba(A, B):
    if num_of_digits(A) <= 1 or num_of_digits(B) <= 1: # base case: A or B is a single digit
        return A * B

    n = max(num_of_digits(A), num_of_digits(B)) // 2 
    split_point = 10 ** n
    A1, A0 = A // split_point, A % split_point
    B1, B0 = B // split_point, B % split_point

    k1 = karatsuba(A1, B1)
    k2 = karatsuba(A1 + A0, B1 + B0)
    k3 = karatsuba(A0, B0)

    return (10 ** (2 * n) * k1) + (10 ** n * (k2 - k1 - k3)) + (k3)