### Step 0: Setup
##### If you're using Datahub:
* Run the cell below **and restart the kernel if needed**

##### If you're running locally:
* Make sure you've activated the conda environment: `conda activate cs170`
* Launch jupyter: `jupyter lab`
* Run the cell below **and restart the kernel if needed**

# Quickselect
In this notebook, we will implement the quickselect algorithm. The quick select algorithm is an efficient divide and conquer algorithm for finding the $k$-th smallest element of an unsorted array. We will first demonstrate a naive solution for this problem, then implement and compare it with quick select.

The full algorithm is detailed here https://people.eecs.berkeley.edu/~vazirani/algorithms/chap2.pdf#page=10.

In [3]:
# Install dependencies
!pip install -r requirements.txt --quiet

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
datasets 2.2.1 requires pyarrow>=6.0.0, but you have pyarrow 5.0.0 which is incompatible.[0m[31m
[0m

In [4]:
import otter

assert (
    otter.__version__ >= "4.2.1"
), "Please reinstall the requirements and restart your kernel."

grader = otter.Notebook("quickselect.ipynb")
import numpy.random as random

rng_seed = 0

## Naive Solution

The naive solution to the problem is as follows: 
1. sort the input array 
2. return the $k$-th element

In [5]:
def naive_select(array, k):
    sorted_array = sorted(array)
    return sorted_array[k]

We can run this on a few test cases to check that it works.

In [6]:
array1 = [6, 1, 3, 5, 7, 5, 8]
array2 = [10, 4, 7, 2, 8, 9]
array3 = [12, 4, 6, 8, 3, 4, 2]

print("The smallest element of ", array1, " is ", naive_select(array1, 0))
print("The median element of ", array2, " is ", naive_select(array2, len(array2) // 2))
print("The largest element of ", array3, " is ", naive_select(array3, len(array3) - 1))

The smallest element of  [6, 1, 3, 5, 7, 5, 8]  is  1
The median element of  [10, 4, 7, 2, 8, 9]  is  8
The largest element of  [12, 4, 6, 8, 3, 4, 2]  is  12


### Runtime analysis

This algorithm first sorts the array, which would take $O(n \log n)$ assuming quicksort is used and indexing into the array takes $O(1)$. Thus, the algorithm takes $O(n \log n)$ overall.

This is not a very efficient solution; however, since it is unnecessary to sort the entire array to simply find one element. Thus, we will next explore quickselect.

## Write a D&C Solution

Quickselect is a randomized divide and conquer algorithm which is able to solve this problem in expected $O(n)$ time. See https://people.eecs.berkeley.edu/~vazirani/algorithms/chap2.pdf#page=11 for a detailed runtime analysis. The main idea of the algorithm is as follows:

1. Randomly select a pivot element from the array
2. Partion the array into three partitions (the elements less than, equal too, and greater than the pivot)
3. Recurse on the partition which must contain the $k$-th smallest element
With this in mind, please implement the quickselect algorithm by replacing the elipses "..." with your solution.

In [7]:
def quick_select(array, k):
    """
    Returns the k-th smallest element of the array.

    Args:
        array (List[int]): List of integers to select from.
        k (int): The order statistic to select (0 is the smallest, len(array)-1 is the largest).
    """
    # if len(array) == 1:
    #     return array[0]
    # randomly pick a pivot
    v = array[random.randint(0, len(array))]
    
    # print(v)
    partition1 = []
    partition2 = []
    partition3 = []
    # assign each element to the appropriate partition
    for i in range(0, len(array)):
        if array[i] < v:
            partition1.append(array[i])
        else:
            if array[i] == v:
                partition2.append(array[i])
            else:
                partition3.append(array[i])
    # assert v==None
    # recurse on the partition which contains the k-th smallest element
    if len(partition1) >= k+1:
        return quick_select(partition1, k)
    elif len(partition1) + len(partition2) >= k+1:
        assert v!=None
        return v
    else:
        return quick_select(partition3, k-len(partition1)-len(partition2))

We can then test the function on the same set of arrays as before to check for correctness.

In [8]:
array1 = [6, 1, 3, 5, 7, 5, 8]
array2 = [10, 4, 7, 2, 8, 9]
array3 = [12, 4, 6, 8, 3, 4, 2]

print("The smallest element of ", array1, " is ", quick_select(array1, 0))
print("The median element of ", array2, " is ", quick_select(array2, len(array2) // 2))
print("The largest element of ", array3, " is ", quick_select(array3, len(array3) - 1))

The smallest element of  [6, 1, 3, 5, 7, 5, 8]  is  1
The median element of  [10, 4, 7, 2, 8, 9]  is  8
The largest element of  [12, 4, 6, 8, 3, 4, 2]  is  12


### Verification

For a more thorough test, we can check that quick_select returns the same elements as naive_select for a large number of random arrays. Often times, naive algorithms are much simpler to implement and verify than more efficient algorithms. Thus, one way to verify the correctness of our implementation is to compare it to the naive implementation which we know to be correct.

The following block of code generates a 1000 random arrays and 1000 random values for k, and checks that both solutions return the same answer each time. If your implementation is correct, the following code will print "success". 

**This cell is not used for grading, feel free to modify it to help you debug**

In [9]:
for i in range(1000):
    array = random.randint(1000, size=1000)
    k = random.randint(1000)

    assert naive_select(array, k) == quick_select(array, k)

print("success")

success


Now, check your implementation against the autograder's test cases:

_Points:_ 5

In [10]:
grader.check("q1")

Testing correctness...


100%|██████████| 999/999 [00:00<00:00, 1978.08it/s]


Testing speed...


100%|██████████| 100/100 [00:22<00:00,  4.44it/s]


## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

In [None]:
grader.export(pdf=False, force_save=True, run_tests=True)

<IPython.core.display.Javascript object>

Running your submission against local test cases...

