<h1><b>Counting Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n+k)\\
    \text{Avg. Case: }&\Theta(n+k)\\
    \text{Worst Case: }&O(n+k)\\
\end{aligned}
$

In [6]:
""" 
Stable: Yes. We are counting left to right , and place them back in order right to left in desceding indecies.
[...,3,...,3',...,3'',...] -> [...,_,_,3'',...] -> [...,_,3',3'',...] -> [...,3,3',3'',...]
"""

def counting_sort(A):
    POS = [0] * (max(A)+1) # size k, largest element in A
    ANEW = [0] * len(A) # size n, length of A

    for i in range(0, len(A)): # runs n times
        POS[A[i]] += 1
    
    for i in range(1, len(POS)): # runs k times
        POS[i] += POS[i-1]

    for i in range(0, len(A)): # runs n times
        ANEW[POS[A[i]]-1] = A[i]
        POS[A[i]] -= 1

    for i in range(0, len(A)): # runs n times
        A[i] = ANEW[i]

    return A

<p>Counting sort had multiple sub-sections. If you encounter it again, the first pass step where numbers
are counted will be called the counting array, and the second step will be called the cumulative
array. You should know the runtime, and why it’s Θ(n + k)</p>
<ol>
    <li>What are the weaknesses of counting sort? When should you not use it?
    <ul>
        <li>Counting sort, by itself/one iteration, doesn’t work on decimal values.
        <li>Counting sort is inefficient if the range of values to be sorted is very large.
        <li>Counting sort uses extra space for sorting the array elements (not in-place).
    </ul>
    <li>Why do decrement each value in the cumulative array by one?
    <ul>
        <li>To make sure the next time the element is placed in A, we put it one index below, in order.
    </ul>
    <li>What does the cumulative array represent?
    <ul>
        <li>Index of cum. array = Value in A, Value - 1 of cum. = Index of A
    </ul>
</ol>

<h1><b>Radix Sort</b</h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    f:\ &T_{\text{underlying}}(n)\\
    \text{Best Case: }&\Omega(df)\\
    \text{Avg. Case: }&\Theta(df)\\
    \text{Worst Case: }&O(df)\\
\end{aligned}
$
$$

$$
Counting Inner Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(d(n+(b-1)))\\
    \text{Avg. Case: }&\Theta(d(n+(b-1)))\\
    \text{Worst Case: }&O(d(n+(b-1)))\\
\end{aligned}\right.
$
$$

$$
Recursive CBA Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega(dn\lg n)\\
    \text{Avg. Case: }&\Theta(dn\lg n)\\
    \text{Worst Case: }&O_{\text{merge}}(dn\lg n)\ \& \ O_{\text{quick}}(dn^2)\\
\end{aligned}\right.
$
$$

$$
Iterative CBA Sort: 
$
\left\{\begin{aligned}
    \text{Best Case: }&\Omega_{\text{bubble || insertion}}(dn) \ \& \ \Omega_{\text{selection}}(dn^2)\\
    \text{Avg. Case: }&\Theta(dn^2)\\
    \text{Worst Case: }&O(dn^2)\\
\end{aligned}\right.
$

In [7]:
def radix_sort(A, sort):
    maxval = max (A)
    d = 1
    while d < maxval:
        sort(A, d)
        d = 10*d

<p>Understand how radix sort interacts with the underlying sort, the number of digits, and the base
of the numbers its sorting.</p>
<ol>
    <li>Why does radix sort start with the rightmost digit?
    <ul>
        <li>Least-Significant Digit (LSD) implementations are usaully stable relative to Most-Significant Digit (MSD) implementations.
        <li>Memory Efficient since LSD works right to left, where the index of the left most digit may vary (ex. [AB, ABC, A] -> [2, 3, 1]), so starting rightmost index guarantees efficency when increasing decimal places.
    </ul>
    <li>Does the underlying sort for radix matter, and if so, how?
    <ul>
        <li>Yes, in terms of complexity, stability, size.
        <li>Complexity: All radix sorts are O(df), but f is the time of the underlying sort so varying this varies radix sort time. If the underlying sort is small range of small numbers then counting sort is more efficient than using CBA sorts.
        <li>Stability: If the underlying sort is stable, the radix sort will be stable too.
        <li>Size: If the underlying sort is in-place, less space is used and is more efficient for larger datasets.
    </ul>
    <li>How does the base and length of digits affect the worst, best, and average case?
    <ul>
        <li>More digits (d approaches N) -> More time (O(d) -> O(N)) : More radix loop iterations
        <li>Larger base (b approaches N) -> More time (O(b) -> O(N)) : More underlying sort iterations
    </ul>
    <li>Is it always better to sort binary numbers with radix sort, so the underyling sort has less work to do?
    <ul>
        <li>If list is already sorted, something like Bubble/Insertion sort is better and doesnt have to check each decimal place despite having a base 2.
    </ul>
</ol>

<h1><b>Bucket Sort</b></h1>

Standard Version Time Complexity
$$

$$
$
\begin{aligned}
    \text{Best Case: }&\Omega(n) \rightarrow\text{ (Uniform Distribution = 1 elem. per bucket)}\\
    \text{Avg. Case: }&\Theta(n) \rightarrow\text{ (At most 4 elem. per bucket = O(1) underlying sort)}\\
    \text{Worst Case: }&O(O_{underlying}(f(n))) \rightarrow\text{ (All elem. in 1 bucket = Worst case of underlying sort with time f(n))}\\
\end{aligned}
$

In [8]:
import math

def bucket_sort(A, sort):
    buckets = [[] for _ in range(0,len(A))]
    ANEW = []

    for i in range(0, len(A)): # runs n times
        buckets[math.floor(len(A) * A[i] / (max(A) + 1))].append(A[i])
    for b in buckets: # runs n * underlying sort (O(1) best & avg case OR O(f) worst case)
        sort(b)
    for b in buckets: # runs n times
        ANEW.extend(b)
    for i in range(0, len(ANEW)): # runs n times
        A[i] = ANEW[i]

    return A

<p>Bucket sort has a number of interesting properties. Know how the buckets are chosen and why. Know how many buckets are typically used, and how the input distribution effects the sort.</p>
<ul>
    <li>Suppose list <b>A</b> of length <b>n</b>. <u># of buckets</u> = <b>n</b> & <u>range per bucket</u> = <b>floor[max(A)/n]</b> 
</ul>
<ol>
    <li>How does the choice of underlying sort effect the runtime of the sort?
    <ul>
        <li>A <u>simpler sort</u> like <i>insertion sort</i> is generally faster for <u>small buckets</u>. It has a lower time complexity (O(n^2) in the worst case, but often performs much better in practice for nearly-sorted lists like those within buckets).
        <li>A more <u>complex sort</u> like <i>merge sort</i> might be more efficient for <u>larger buckets</u> due to its average case time complexity of O(n log n).
    </ul>
    <li>Depending on the distribution of inputs, how might you adjust the buckets?
    <ul>
        <li>If the data has a uniform distribution, then using equal-sized buckets for each digit pass is optimal. This ensures all elements have a similar chance of ending up in different buckets, leading to efficient sorting.
        <li>When the data is skewed, using equal-sized buckets can become inefficient. Many elements might end up in the same buckets, increasing the workload for the underlying sort within each pass (e.g., counting sort might need to handle a large number of elements in a single bucket).
        <li>One approach dynamically adjusts bucket sizes based on the observed distribution of digits in each pass. For example, you could allocate more buckets for digits with a higher frequency and fewer buckets for less frequent digits. This balances the workload within each pass.
    </ul>
    <li>Can you use bucket sort as your secondary sort?
    <ul>
        <li>Yes, whether its radix or bucket sort itself, calling an underlying bucket sort is fine especially if the data set is uniform (best case).
    </ul>
    <li>When is bucket sort unstable?
    <ul>
        <li>If the underlying sort is unstable, then bucket sort is also unstable.
    </ul>
    <li>What change could you make to double the average number of items per bucket, and what effect would it have?
    <ul>
        <li>???
    </ul>
</ol>