<a href="https://colab.research.google.com/github/Ryan-M-Smith/CS315/blob/main/HW06/quickselect.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Quickselect Algorithm Analysis

### Implementation

In [1]:
import numpy as np
import random
import time

from typing import Any

def quickselect(arr: list[int], i: int) -> int:
  if len(arr) == 1:
    return arr[0]

  pivot  = random.choice(arr)
  lows   = [x for x in arr if x < pivot]
  highs  = [x for x in arr if x > pivot]
  pivots = [x for x in arr if x == pivot]

  if i <= len(lows):
    return quickselect(lows, i)
  elif i <= len(lows) + len(pivots):
    return pivot
  else:
    return quickselect(highs, i - len(lows) - len(pivots))

### Test Setup

In [2]:
rng = np.random.default_rng()
TEST_RUNS = 10
SIZES = [10, 50, 100, 1000, 2500, 7500, 10000, 12500, 17500, 20000]

### Sorted Data

In [3]:
for i in range(TEST_RUNS):
        n = SIZES[i]
        statistic = n // 2
        arr = list(range(n))

        start = time.perf_counter()
        quickselect(arr, statistic)
        end = time.perf_counter()

        print(f"n = {n:<8,} time = {(end - start) * 1_000:.3f} ms")

n = 10       time = 0.017 ms
n = 50       time = 0.043 ms
n = 100      time = 0.041 ms
n = 1,000    time = 0.245 ms
n = 2,500    time = 0.369 ms
n = 7,500    time = 1.701 ms
n = 10,000   time = 2.744 ms
n = 12,500   time = 3.691 ms
n = 17,500   time = 3.826 ms
n = 20,000   time = 3.159 ms


### Reverse-sorted Data

In [4]:
for i in range(TEST_RUNS):
        n = SIZES[i]
        statistic = n // 2
        arr = list(range(n))[::-1]

        start = time.perf_counter()
        quickselect(arr, statistic)
        end = time.perf_counter()

        print(f"n = {n:<8,} time = {(end - start) * 1_000:.3f} ms")

n = 10       time = 0.012 ms
n = 50       time = 0.026 ms
n = 100      time = 0.046 ms
n = 1,000    time = 0.375 ms
n = 2,500    time = 0.568 ms
n = 7,500    time = 1.414 ms
n = 10,000   time = 4.758 ms
n = 12,500   time = 4.985 ms
n = 17,500   time = 5.377 ms
n = 20,000   time = 7.898 ms


### Random Data

In [5]:
for i in range(TEST_RUNS):
        n = SIZES[i]
        statistic = n // 2
        arr = rng.integers(low=0, high=100, size=n, dtype=int).tolist()

        start = time.perf_counter()
        quickselect(arr, statistic)
        end = time.perf_counter()

        print(f"n = {n:<8,} time = {(end - start) * 1_000:.3f} ms")

n = 10       time = 0.021 ms
n = 50       time = 0.017 ms
n = 100      time = 0.051 ms
n = 1,000    time = 0.274 ms
n = 2,500    time = 0.662 ms
n = 7,500    time = 1.901 ms
n = 10,000   time = 1.846 ms
n = 12,500   time = 3.267 ms
n = 17,500   time = 4.318 ms
n = 20,000   time = 4.925 ms


## Results

We will compare all the resulting experimental runtimes against ratios for the average-case $\Theta(n)$ runtime as well as the worst-case $\Theta(n^2)$ runtime:

<table border="1" cellpadding="4" style="border-collapse: collapse; text-align: center;">
  <tr>
    <th colspan="1"/>
    <th colspan="3">Sorted Data</th>
    <th colspan="3">Reverse-sorted Data</th>
    <th colspan="3">Random Data</th>
  </tr>
  <tr>
    <th>$n$</th>
    <th>$T(n)$ (ms)</th><th>$\dfrac{T(n)}{n}$</th><th>$\dfrac{T(n)}{n^2}$</th>
    <th>$T(n)$ (ms)</th><th>$\dfrac{T(n)}{n}$</th><th>$\dfrac{T(n)}{n^2}$</th>
    <th>$T(n)$ (ms)</th><th>$\dfrac{T(n)}{n}$</th><th>$\dfrac{T(n)}{n^2}$</th>
  </tr>

  <tr>
    <td>10</td>
    <td>0.017</td><td>0.0017</td><td>0.00017</td>
    <td>0.012</td><td>0.0012</td><td>0.00012</td>
    <td>0.021</td><td>0.0021</td><td>0.00021</td>
  </tr>
  <tr>
    <td>50</td>
    <td>0.043</td><td>0.00086</td><td>0.000017</td>
    <td>0.026</td><td>0.00052</td><td>0.0000104</td>
    <td>0.017</td><td>0.00034</td><td>0.0000068</td>
  </tr>
  <tr>
    <td>100</td>
    <td>0.041</td><td>0.00041</td><td>0.0000041</td>
    <td>0.046</td><td>0.00046</td><td>0.0000046</td>
    <td>0.051</td><td>0.00051</td><td>0.0000051</td>
  </tr>
  <tr>
    <td>1,000</td>
    <td>0.245</td><td>0.000245</td><td>0.000000245</td>
    <td>0.375</td><td>0.000375</td><td>0.000000375</td>
    <td>0.274</td><td>0.000274</td><td>0.000000274</td>
  </tr>
  <tr>
    <td>2,500</td>
    <td>0.369</td><td>0.0001476</td><td>0.000000059</td>
    <td>0.568</td><td>0.000227</td><td>0.000000091</td>
    <td>0.662</td><td>0.000265</td><td>0.000000106</td>
  </tr>
  <tr>
    <td>7,500</td>
    <td>1.701</td><td>0.0002268</td><td>0.0000000227</td>
    <td>1.414</td><td>0.000189</td><td>0.0000000252</td>
    <td>1.901</td><td>0.000253</td><td>0.000000034</td>
  </tr>
  <tr>
    <td>10,000</td>
    <td>2.744</td><td>0.0002744</td><td>0.0000000274</td>
    <td>4.758</td><td>0.0004758</td><td>0.0000000476</td>
    <td>1.846</td><td>0.0001846</td><td>0.0000000185</td>
  </tr>
  <tr>
    <td>12,500</td>
    <td>3.691</td><td>0.0002953</td><td>0.0000000235</td>
    <td>4.985</td><td>0.0003988</td><td>0.0000000319</td>
    <td>3.267</td><td>0.000261</td><td>0.0000000209</td>
  </tr>
  <tr>
    <td>17,500</td>
    <td>3.826</td><td>0.0002189</td><td>0.0000000125</td>
    <td>5.377</td><td>0.000307</td><td>0.0000000175</td>
    <td>4.318</td><td>0.000247</td><td>0.0000000142</td>
  </tr>
  <tr>
    <td>20,000</td>
    <td>3.159</td><td>0.0001579</td><td>0.0000000079</td>
    <td>7.898</td><td>0.0003949</td><td>0.0000000197</td>
    <td>4.925</td><td>0.000246</td><td>0.0000000123</td>
  </tr>
</table>



## Analysis

Looking at our ratios, as $n$ gets larger, the results in the $\frac{T(n)}{n}$ column tend to stabilize around a factor of $1.0 \times 10^{-4}$, showing that the runtime tends to stabilize around a ratio of $n$. **This strongly suggests that the algorithm runs in $\Theta\left(n\right)$ time.** Looking at the ratios for<br>$n^2$, we can see that $\frac{T(n)}{n^2}$ decreases exponentially toward zero as $n$ gets larger.
