# Lesson 1 Week 3

In [None]:
unsorted = []
with open("_32387ba40b36359a38625cbb397eee65_QuickSort.txt") as file:
    for line in file:
        unsorted.append(int(line.replace('\n','')))

In [None]:
'''
quicksort - implementation of quicksort algorithm for problem #
'''
from random import randint

# Partition specified subarray in place.
# 
# Arguments:
#
#  A   - input array, unsorted
#  i   - index of pivot element
#  l   - index of leftmost element in subarray
#  r   - index of rightmost element in subarray

def partition(A, l, r, i, count=True):
    
    if not((l <= i) and (i <= r)):
        print("Pivot index out of bound.")
        return
    pivot = A[i]                    # value of pivot element
    A[l], A[i] = A[i], A[l]         # swap pivot element with leftmost element
    comp_index = l + 1              # index of element which is to be compared, after leftmost element
    for j in range(l + 1, r + 1):
        if count: count(1)                  # increment comparison count
        if A[j] < pivot:
            A[j], A[comp_index] = A[comp_index], A[j] # swap
            comp_index += 1                               
    i = comp_index - 1
    A[l], A[i] = A[i], A[l]         # swap pivot w/ LAST > p element
    if not all(elem < pivot for elem in A[l:i]):
        return "<p part not sorted, error, stop"
    if not all(elem >= pivot for elem in A[i+1:r]):
        return ">p part not sorted, error, stop"
    return i


rand = lambda arr, l, r: randint(l, r)

def quicksort(A, l=0, r=None, count=None, pivot=rand):
    if r is None: r = len(arr) - 1
    if r - l <= 0: 
        return
    i = pivot(A, l, r)                    ## leftmost element, rightmost element, or median element of the subarray
    i = partition(A, l, r, i, count)      ## update the index of pivot, 
                                          ## so that quicksort can be applied to the lower or upper section subarray
    quicksort(A, l, i - 1, count, pivot)
    quicksort(A, i + 1, r, count, pivot)


# The following are pivot choice methods - you can use these to 
# in `quicksort` to choose a particular element on which to pivot.

def first(A, l, r): return l

def last(A, l, r): return r

def median(A, l, r):
    '''
    Return index of the median value for the following three 
    elements within specified sub-array, where `l` is the index 
    of the first element, `r` is the index of the last element:
    [first, middle-element-between-first-last, last]
    '''
    m = 0 if (r - l) == 1 else ((r - l) // 2) + l
    M = arr[m]                      # middle element
    L = arr[l]                      # leftmost element
    R = arr[r]                      # rightmost element
    # ordered = sorted([(L, l), (M, m), (R, r)]) # useful
    # return ordered[1][1]            # index of median of the three
    ordered = sorted([L,M,R])
    return ordered[1]


class Counter:
    '''
    A counter class.
    for counting up the number of comparisons made by quicksort.
    
    reference:
    https://stackoverflow.com/questions/9663562/what-is-the-difference-between-init-and-call-in-python
    
    '''
    
    ## initialize the class
    def __init__(self, n=0):
        self.total = n

    ## enable the instance to be called as a function
    ## e.g. count = Counter(); count(1); count.total == 1
    def __call__(self, x=0):
        self.total += x
        
        


if __name__ == '__main__':

    input = unsorted
    for pivot_selection in [first, last, median]:
        arr = input[:]
        count = Counter()
        quicksort(arr, count=count, pivot=pivot_selection)
        # assert arr == sorted(input)
        print(pivot_selection.__name__, count.total)

#### Video 1 ~ Video 4

Overview:
- Quicksort runs $O(nlog(n))$ time on average
- input: unsorted. output: increasing array.
- the algorithm in this video only deals with arrays with distinct numbers
- Key ideas:
    - partition array around a pivot element
    - pick element of array.
        - say, pick the 1st element
        - if pick another element, we swap that element with the first element.
    - rearrange such that
        - left of pivot => less than the pivot
        - right of pivot => greater than the pivot
    - This operation brings the pivot to its rightful position! All we do is swap, we do not sort in the left and in the right when we rearrange.
- Two cool facts on quicksort:
    - runs in $O(n)$ time
    - reduces problem size

High-level descriptions:
- Quicksort(arr, n)
```
if n == 1: return
p = choosePivot(arr, n)
partition array around p
recursively sort 1st part
recursively sort 2nd part
```

#### Implementation

During processing, the array has three parts 
```
p|_____<p_____|_____>p_____|________?________|
    already partitioned    unpartitioned

i denotes outer loop index: boundary for <p and >p
j denotes inner loop index: boundary for >p and unpartitioned

whenever a >p is scanned, it is swapped to the rightmost of >p.
```

Pseudocode for partition

```
Partition(A,l,r) [input := A[l, ..., r]]
p = A[l]
i = l + 1
for j = l + 1 to r:
    if A[j] < p: 
        swap A[j] and A[i] (if the leftmost >p does not exist yet, there will be reduntant swaps)
        i++
swap A[l] and A[i-1]
```

The running time = $O(n)$, where $n = r - l + 1$

Proof of correctness of quicksort:
- suppose $P(1)$ holds, $P(k)$ holds for $k \leq n$
    - must prove $P(n+1)$ holds
- However, we can partition $P(n+1)$ into two parts $k_{1}$ and ${k_{2}}$. Since $k_{1}, {k_{2}} \leq n$, our claim is proved

Running time:
- worst case.
    - if we always use the first element, then running time: $n + (n-1) + ... + 1 == O(n)$
- best case.
    - balanced split? median! then the running time $\theta(nlog(n))$
    - $T(n) \leq 2 * T(\frac{n}{2} + O(n))$. Using master method, we get $O(nlog(n))$ 
- Random pivot
    - Intuition:
        - if we always get a 25-75 split, good enough for $O(nlog(n))$
        - half of elements give a 25-75 split or better
    - Indeed, **Quicksort Theorem**:
        - for every input array of lenght $n$, the average running time of quickSort (with random pivots) is $O(nlog(n))$.
        - "average": fluctuate between the upper bound $O(nlog(n))$ and lower bound $O(n^{2})$
            - The best case is dominating!

### Video 5 ~ Video 9:
0. sample space $\Omega$: all possible outcomes of random choices in QuikSort
1. decompose the random variable $C(\sigma)$, where $C$ is defined as the number of comparions between any two elements in the array.
    - $x_{i,j}$: the individual random variable between two elements from the **unsorted** array.
        - or # of comparions between $z_{i}$ and $z_{j}$
    - $z_{i,j}$: the individual random variable between two elements from the **sorted** array.
2. Consider a subarray $A = z_{i}, z_{i+1}, ..., z_{j-1}, z{j}$, where $z_{i}$ is the $i$th smallest.
    - When $z_{i}$ or $z_{j}$ is picked, these two get compared.
    - When any element in the subarray A other that $z_{i}$ or $z_{j}$ is picked, $z_{i}$ and $z_{j}$ never get compared again.
3. two elements get compared when one is the pivot. Fix two elements of the input array, these two get compared for 0 or 1 time.
4. $C(\sigma) = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}X_{ij}(\sigma)$
    - $E[C] = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}E[X_{ij}]$
    - $E[C] = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n}Pr(z_{i} and z_{j} get compared)$
    - in sum, we indentify $y$, we then decompose $y$.
5. key claim: for all $i, j$, $Pr$($z_{i}$ and $z_{j}$ get compared) = $2/(j-i+1)$
    - elements in subarray A never get compared with each other, unless one of A is selected as pivot
        - if $z_{i}$ or $z_{j}$ gets chosen, compared
        - if elements between $z_{i}$ or $z_{j}$ get chosen, $z_{i}$ and $z_{j}$ never meet again. (they are partitioned into different subarrays)
        - so that $E(X_{i,j}) = 2/(j-i+1)$
        - so that $E[C] = \sum_{i=1}^{n-1}\sum_{j=i+1}^{n} 2/(j-i+1)$
4. E(C) is bounded
    - $E(C) \leq 2 * (n-1) * sum_{k}(1/k)$

Hint for problem set 3.

1. the sample space of the random variable is {1, 2, ..., floor(n/2)}. Assuming n is even, the corresponding probability is 2/n, ...,
    - 2/n is the probability of each sample point
    - n/2 and α * n are respectively the upper and lower bound.

2. see the discussion forum. $\alpha^{k} n  = 1$

3. reference: https://stackoverflow.com/questions/28326380/quicksorts-estimation-of-recursion-depth

4. This is a classical problem: the key is that the probability that a randomly selected pair has the same birthday is 1/365, instead of 1/365^2.
        - https://math.stackexchange.com/questions/2140681/birthday-problem-among-k-people

5. think about a simple dice with sample space {1,2,3}, the result should carry through.
