# Selection Algorithms
### Selection by sorting
Items in a list may undergo statistical inquiries such as finding the mean, median, and mode values. Finding the mean and the mode values does not require the list to be ordered. However, to find the median in a list of numbers, the list must first be ordered. Finding the median requires you to find the element in the middle position of the ordered list. In addition, this can be used when we want to find the last-smallest item in the list or the first-smallest item in the list. In such situations, selection algorithms can be useful.

To find the 1<sup>st</sup> smallest number in an unordered list of items, the index of where that item occurs is important to obtain. Since the elements of the list are not sorted, it is difficult to know whether the element at index 0 in a list is really the first-smallest number.

A pragmatic and obvious thing to do when dealing with unordered lists is to first sort the list. After the list is sorted, you can rest assured that the element at the 0 index will hold the first-smallest element in the list. However, it is not a good solution to apply a sorting algorithm on a long list of elements to obtain the minimum or maximum value from the list as sorting is quite an expensive operation.
### Randomized selection
In the previous chapter, we discussed the quicksort algorithm. The quicksort algorithm allows us to sort an unordered list of items, but has a way of preserving the index of elements as the sorting algorithm runs. Generally speaking, the quicksort algorithm does the following:
1. Selects a pivot
2. Partitions the unsorted list around the pivot
3. Recursively sorts the two halves of the partitioned list using steps 1 and 2

One interesting and important fact is that after every partitioning step the index of the pivot will not change, even after the list has become sorted. This means that ater each iteration the selected pivot value will be placed at its correct position in the list. It is this property that enables us to be able to work with a not-so-fully sorted list to obtain the i<sup>th</sup> smallest number. Because randomized selection is based on the quicksort algorithm, it is generally referred to as quick select.
### Quick select
The quickselect algorithm is used to obtain the k<sup>th</sup> smallest element in an unordered list of items and is based on the quicksort algorithm. In quicksort, we recursively sort the elements of both the sublists from the pivot point. In quicksort, in each iteration, we know that the pivot value reaches its correct position in the list with two sublists, having all of their elements set to be unordered.

However, in the case of the quickselect algorithm, we recursively call the function exclusively for the sublist that has the k<sup>th</sup> smallest element. In the quickselect algorithm, we compare the index of the pivot point with the k value to obtain the k<sup>th</sup> smallest element from the given unordered list. There will be three cases in the quickselect algorithm, and they are as follows:
1. If the index of the pivot point is smaller than k, then we are sure that the k<sup>th</sup> smallest value will be present in the right sublist from the pivot point. So, we only recursively call the quickselect function for the right sublist.
2. If the index of the pivot point is greater than k, then it is obvious that the k<sup>th</sup> smallest element will be present in the left side from the pivot point. So, we only recursively look for the i<sup>th</sup> element in the left sublist.
3. If the index of the pivot point is equal to k, then it means that we have found out the k<sup>th</sup> smallest value, and we return it.

To implement the quickselect algorithm, we first need to understand the main function, where we have three possible conditions. We declare the main method of the algorithm as follows:
```python
def quick_select(array_list, left, right, k):
    split = partition(array_list, left, right)
    if split == k:
        return array_list[split]
    elif split < k:
        return quick_select(array_list, split + 1, right, k)
    else:
        return quick_select(array_list, left, split - 1, k)
```
The `quick_select` function takes the index of the first element in the list-as well as the last-as parameters. The i<sup>th</sup> element is specified by the third parameter, k. The value of tk should always be positive; the values greater or equal to zero are only allowed in such a way when k is zero, we know to search the first-smallest item in the list. Others like to treat the k parameter so that it maps directly with the index that the user is searching for, so that the first-smallest number maps to the 0 index of a sorted list.

A method call partition function, `split = partition(array_list, left, right)`, returns the `split` index. This index of the `split` array is the position in the unordered list where all elements between `right` to `split-1` are less than the element contained in the `split` array, while all elements between `split+1` to `left` are greater.

When the `partition` function returns to the `split` value, we compare it with k to find out if the `split` corresponds to the k<sup>th</sup> items. If `split` is less than k, then it means that the k<sup>th</sup> smallest item should exist or be found between `split+1` and `right`.
### Understanding the partition step
The partition step is similar to what we had in the quicksort algorithm. There are a couple of things that are worth noting:
```python
def partition(unsorted_array, first_index, last_index):
    if first_index == last_index:
        return first_index
    
    pivot = unsorted_array[first_index]
    pivot_index = first_index
    index_of_last_element = last_index
    
    less_than_pivot_index = index_of_last_element
    greater_than_pivot_index = first_index + 1
    
    while True:
        while unsorted_array[greater_than_pivot_index] < pivot and greater_than_pivot_index < last_index:
            greater_than_pivot_index += 1
        while unsorted_array[less_than_pivot_index] > pivot and less_than_pivot_index >=  first_index:
            less_than_pivot_index -= 1
        
        if greater_than_pivot_index < less_than_pivot_index:
            temp = unsorted_array[greater_than_pivot_index]
            unsorted_array[greater_than_pivot_index] = unsorted_array[less_than_pivot_index]
            unsorted_array[less_than_pivot_index] = temp
        else:
            break
            
    unsorted_array[pivot_index] = unsorted_array[less_than_pivot_index]
    unsorted_array[less_than_pivot_index] = pivot
    
    return less_than_pivot_index
```
An `if` statement has been inserted at the beginning of the function definition to cater for situations where `first_index` is equal to `last_index`. In such cases, it means that there is only one element in our sublist. Therefore, we simply return any of the function parameters, in this case, `first_index`.

The first element is always chosen as the pivot. This choice to make the first element the pivot is a random decision. It often does not yield a good split-and subsequently-a good partition. However, the i<sup>th</sup> element will eventually be found, even though the pivot is chosen at random.

The `partition` function returns the pivot index pointed to by `less_than_pivot_index`, as we saw in the preceding chapter.
### Deterministic selection
The worst-case performance of a randomized selection algorithm is $\mathcal{O}(n^2)$. It is possible to improve the section of an element of the randomized of an element of the randomized selection algorithm to obtain a worst-case performance of $\mathcal{O}(n)$. We can obtain the performance of $\mathcal{O}(n)$ by using an algorithm, that is, **deterministic selection**.

Median of the median is an algorithm that provides us with the approximate median value, that is, a value close to the actual median for a given unsorted list of elements. This approximate median is often used as a pivot point in the quickselect algorithm for selecting the i<sup>th</sup> smallest element from a list. It is due to the fact that the median of median algorithmfinds out the estimated median in a linear time, and when this estimated median is used as a pivot point in the quickselect algorithm, the worst-case running time's complexity drastically improves from $\mathcal{O}(n^2)$ to linear $\mathcal{O}(n)$. Therefore, the median of the meidan algorithm helps the quckselect algorithm to perform significantly better because of the choice of a good pivot value.

The general approach to the deterministic algorithm to select the i<sup>th</sup> smallest element is listed here:

1. Select a pivot. Split a list of unordered items into groups of five elements each. Sort and find the median of all the groups. Repeat the first two steps recursively to obtain the true median of the list.
2. Use the true median to partition the list of unordered items.
3. Recurse into the part of the partitioned list that may contain the i<sup>th</sup> smallest element.

### Pivot selection
To implement the deterministic algorithm to efficiently determine the i<sup>th</sup> smallest value from the list, we start by implementing the pivot selection method. Previously, in the random selection algorithm, we selected the first element ias the pivot. We shall replace that step with a sequence of steps that enables us to obtain the approximate median. This will improve the partitioning of the list regarding the pivot:
```python
def partition(unsorted_array, first_index, last_index):
    if first_index == last_index:
        return first_index
    else:
        nearest_median = median_of_medians(unsorted_array[first_index:last_index])
    
    index_of_nearest_median = get_index_of_nearest_median(unsorted_array, first_index, last_index, nearest_median)
    
    swap(unsorted_array, first_index, index_of_nearest_median)
    pivot = unsorted_array[first_index]
    pivot_index = first_index
    index_of_last_index = last_index
    
    less_than_pivot_index = index_of_last_element
    greater_than_pivot_index = first_index + 1
```
Let's now  understand the code for the `partition` function. The `nearest_median` variable stores the true or approximate median fo a list:
```python
def partition(unsorted_array, first_index, last_index):
    if first_index = last_index:
        return first_index
    else:
        nearest_media = median_of_medians(unsorted_array[first_index:last_index])
    ...
```
If the `unsorted_array` parameter has only one element, `first_index` and `last_index` will be equal. Therefore, `first_index` is returned anyway. However, if the list size is greater than one, we call the `median_of_medians` function with the section of the array, demarcated by `first_index` and `last_index`. The return value is yet again stored in `nearest_median`.
### Median of medians
The `median_of_medians` function is responsible for finding the approximate median of any given list of items. The function uses recursion to return the true median:
```python
def median_of_medians(elems):
    sublists = [elems[j:j+5] for j in range(0, len(elems), 5)]
    medians = []
    for sublist in sublists:
        medians.append(sorted(sublist)[len(sublist)//2])
    if len(medians) <= 5:
        return sorted(medians)[len(medians)//2]
    else:
        return median_of_medians(medians)
```
The function begins by splitting the list, `elems`, into groups of five elements each. This means that if `elems` contains 100 items, there will be 20 groups that are created by the `sublists = [elems[j:j+5] for j in range(0, len(elems), 5)]` statement, with each containing exactly five elements or fewer. An empty array is created and assigned to `medians`, which stores the medians in each of the five element arrays assigned to `sublists`.

The for loop iterates over the list of lists inside the `sublists`. Each sublist is sorted, the median is found and is stored in the `medians` list. The `medians.append(sorted(sublist)[len(sublist)//2])` statement will sort eh list and obtain the element stored in its middle index. This becomes the median of the five-element list. The use of an existing sorting function will not impact the performance of the algorithm due to the list's small size. Thereafter, if the list now contains five or fewer elements, we shall sort the `medians` list and return the element located in its middle index:
```python
if len(medians) <= 5:
    return sorted(medians)[len(medians)//2]
```
If, on the other hand, the list is greater than five, we recursively call the `median_of_medians`  function again, supplying it with the list of the medians stored in `medians`.

The median of medians can also be used to choose a pivot point in the quicksort algorithm for sorting a list of elements. This significantly improves the worst-case performance of the quicksort algorithm from $\mathcal{O}(n^2)$ to a complexity of $\mathcal{O}(nlog(n))$.
### Partitioning step
Now that we have obtained the approximate median, the `get_index_of_nearest_median` function takes the bounds of the list indicated by the `first` and `last` parameters:
```python
def get_index_of_nearest_median(array_list, first, second, median):
    if first == second:
        return first
    else:
        return first + array_list[first:second].index(median)
```
Once again, we only return the first index if there is only one element in the list. However, `array_list[first:second]` returns an array with an index of 0 up to the size of the `list-1`. When we find the index of the median, we lose the portion in the list where it occurs because of the new range indexing that the `[first:second]` code returns. Therefore, we must add whatever index is returned by `array_list[first:second'` to `first` to obtain the true index where the median was found:
```python
swap(unsorted_array, first_index, index_of_nearest_median)
```
We then swap the first element in `unsorted_array` with `index_of_nearest_median`, using the `swap` function.

The `utility` function to swap two array elements is shown here:
```python
def swap(array_list, first, second):
    temp = array_list[first]
    array_list[first] = array_list[second]
    array_list[second] = temp
```
Our approximate median is now stored at `first_index` of the unsorted list. The partition function continues as it would in the code of the quickselect algorithm.
```python
def deterministic_select(array_list, left, right, k):
    split = partition(array_list, left, right)
    if split == k:
        return array_list[split]
    elif split < k:
        return deterministic_select(array_list, split + 1, right, k)
    else:
        return deterministic_select(array_list, left, split - 1, k)
```
As you will have already observed, the main function of the deterministic selection algorithm looks exactly the same as its random selection counterpart. After the initial `array_list` has been partitioned for the approximate median, a comparison with the k<sup>th</sup> element is made.

If `split` is less than k, then a recursive call to `deterministic_select(array_list, split+1, right, k)` is made. This will look for the k<sup>th</sup> element in that half of the array. Otherwise, the function call to `deterministic_select(array_list, left, split-1, k)` is made.

In [1]:
def partition(unsorted_array, first_index, last_index):

    pivot = unsorted_array[first_index]
    pivot_index = first_index
    index_of_last_element = last_index

    less_than_pivot_index = index_of_last_element
    greater_than_pivot_index = first_index + 1

    while True:

        while unsorted_array[greater_than_pivot_index] < pivot and greater_than_pivot_index < last_index:
            greater_than_pivot_index += 1
        while unsorted_array[less_than_pivot_index] > pivot and less_than_pivot_index >= first_index:
            less_than_pivot_index -= 1

        if greater_than_pivot_index < less_than_pivot_index:
            temp = unsorted_array[greater_than_pivot_index]
            unsorted_array[greater_than_pivot_index] = unsorted_array[less_than_pivot_index]
            unsorted_array[less_than_pivot_index] = temp
        else:
            break

    unsorted_array[pivot_index] = unsorted_array[less_than_pivot_index]
    unsorted_array[less_than_pivot_index] = pivot

    return less_than_pivot_index


def quick_select(array_list, left, right, k):

    split = partition(array_list, left, right)

    if split == k:
        return array_list[split]
    elif split < k:
        return quick_select(array_list, split + 1, right, k)
    else:
        return quick_select(array_list, left, split-1, k)



stored = [5, 3]
print(stored)
print(quick_select(stored, 0, 1, 0))

stored = [3, 5]
print(stored)
print(quick_select(stored, 0, 1, 0))






stored = [3,1,10,4,6,5]
print(stored)
print(quick_select(stored, 0, 5, 0))
stored = [3,1,10,4,6, 5]
print(quick_select(stored, 0, 5, 1))
stored = [3,1,10,4,6, 5]
print(quick_select(stored, 0, 5, 2))
stored = [3,1,10,4,6, 5]
print(quick_select(stored, 0, 5, 3))
stored = [3,1,10,4,6, 5]
print(quick_select(stored, 0, 5, 4))
stored = [3,1,10,4,6, 5]
print(quick_select(stored, 0, 5, 5))

[5, 3]
3
[3, 5]
3
[3, 1, 10, 4, 6, 5]
1
3
4
5
6
10


In [2]:
def partition(unsorted_array, first_index, last_index):
    if first_index == last_index:
        return first_index

    pivot = unsorted_array[first_index]
    pivot_index = first_index
    index_of_last_element = last_index

    less_than_pivot_index = index_of_last_element
    greater_than_pivot_index = first_index + 1

    while True:
        while unsorted_array[greater_than_pivot_index] < pivot and greater_than_pivot_index < last_index:
            greater_than_pivot_index += 1
        while unsorted_array[less_than_pivot_index] > pivot and less_than_pivot_index >=  first_index:
            less_than_pivot_index -= 1

        if greater_than_pivot_index < less_than_pivot_index:
            temp = unsorted_array[greater_than_pivot_index]
            unsorted_array[greater_than_pivot_index] = unsorted_array[less_than_pivot_index]
            unsorted_array[less_than_pivot_index] = temp
        else:
            break

    unsorted_array[pivot_index] = unsorted_array[less_than_pivot_index]
    unsorted_array[less_than_pivot_index] = pivot

    return less_than_pivot_index

def median_of_medians(elems):
    sublists = [elems[j:j+5] for j in range(0, len(elems), 5)]
    medians = []
    for sublist in sublists:
        medians.append(sorted(sublist)[len(sublist)//2])
    if len(medians) <= 5:
        return sorted(medians)[len(medians)//2]
    else:
        return median_of_medians(medians)
    
def get_index_of_nearest_median(array_list, first, second, median):
    if first == second:
        return first
    else:
        return first + array_list[first:second].index(median)
    
def swap(array_list, first, second):
    temp = array_list[first]
    array_list[first] = array_list[second]
    array_list[second] = temp

def deterministic_select(array_list, left, right, k):
    split = partition(array_list, left, right)
    if split == k:
        return array_list[split]
    elif split < k:
        return deterministic_select(array_list, split + 1, right, k)
    else:
        return deterministic_select(array_list, left, split - 1, k)