# **4M26 - Examples Paper 3 - Sorting Algorithms** 

---
<br>
<br>

## 1.

Briefly answer multiple questions below.

&emsp; (i)&emsp; Why is it often said that sorting $n$ values must necessarily cost $n \lg n$ steps? In which cases this assertion is false?

<div style="border-width:2px;border-style:solid;border-color:black;">

For all comparison sorts, the asymptotic best case is $n \log n$, due to the following:

For an unsorted array of length $n$, there are $!n$ possible permutations of the array. For an arbitrary comparison rule a sort must be able to generate all of these $!n$ permutations. Take these permutations as the balanced leaf nodes of a binary tree, where each node descent corresponds to a comparison. The complexity of achieving a sort permutation is equal to the shortest tree path decent. In the best average case for a balanced tree, the height of the tree is $\log_2(n)$, hence a tree height $n$ has $2^h$ leaves.

$n! \leq 2^h$ => $h \geq \log (n!) \geq = n \log n - n \log e = \Omega(n \log n)$

Hence the average tree descent is at best $\Omega(n \log n)$

In the case of a non-comparative sort such as Radix or Counting sort, this is not the case.

</div>

&emsp; (ii)&emsp; In the context of Heapsort, explain how to take initially unordered data and rearrange it so that the heap property applies.

<div style="border-width:2px;border-style:solid;border-color:black;padding:20px">

To build a heap from an unordered array, calling the `max_heapify` process, order $O(h)$, on the first half of the array will generate a valid max-heap in array representation.

1. if the left child is $\geq$ index i, swap it and rerun checks on new left child
2. else if the right child is $\geq$ index i, swap it and rerun checks on new right child
3. else both left and right are less, exit.

| | Max_Heapify(A, size, i) |
| - | - |
| 0 | i_nought = i |
| 1 | left = left_child(i) |
| 2 | right = right_child(i) |
| 3 | if A[left] > A[i] |
| 4 | -- i_nought = left |
| 5 | if A[right] > A[i] |
| 6 | -- i_nought = right |
| 7 | if i != i_nought |
| 8 | -- A[i], A[max_i] = A[max_i], A[i] |
| 9 | -- max_heapify(A, heap_size, max_i)


</div>

&emsp; (iii)&emsp; Explain how to reinstate the heap property if one of the values in the heap is changed. What is the cost of such operation? Does it matter whether the value is increased or decreased?

<div style="border-width:2px;border-style:solid;border-color:black;">

Calling `max_heapify` process, order $O(h)$, will reinstate the heap status. However, it operates by bubbling elements down the order, such that if the value is increased such that its children remain valid, calling it will not solve the property. Hence, the whole heap must be descended. This is order $O(n)$ rather than $O( \log n)$ 

</div>

&emsp; (iv)&emsp;  In bad cases Quicksort can be much more costly than Heapsort. Why, then, is it still very often used?  

<div style="border-width:2px;border-style:solid;border-color:black;">

The worst case of quicksort is very rare, and can be further limited by the addition of careful pivot choice. The average cases are both $O(n \log n), but the contributing constant for quicksort are much lower than for heapsort, giving better general performance.

</div>

&emsp; (v)&emsp; In the context of Quicksort, you are given an array of lenght, $n$, containing arbitrary data in an arbitrary order and a selected value, $p$, taken from that data (*the pivot*). Explain two strategies of how to rearrange the data so that all values less than the pivot end up to its left, and all values greater than it, to its right.

<div style="border-width:2px;border-style:solid;border-color:black;">

1. Lomuto Partition
2. Horare partition

</div>

&emsp; (vi)&emsp; List a set of operations that you would expect to form an implementation of a priority queue. Name a data structure suitable for implementing a priority queue, and explain how much time it should take to perform the aforementioned operations. Why this data structure would be prefered over Red-Black trees?

<div style="border-width:2px;border-style:solid;border-color:black;">

Allow methods `get_max()`, `pop_max()`, `insert()` and `increase_key()` for a max-priority queue - using a max-heap.

`get_max()`, `pop_max()` are both $O(1)$.

`insert()` and `increase_key()` are both $O(n)$.


</div>

&emsp; (vii)&emsp; Both quicksort and mergesort have an average case run time performance of $O(n \log n)$
to sort a list containing $n$ items. However, in the worst case, Quicksort can require $O(n^2)$
time, while Mergesort’s worst case running time is the same as its average case time: $O(n \log n)$.  What accounts for the difference in worst case times? How is it that Quicksort can
require $O(n^2)$ time, while Mergesort always guarantees $O(n \log n)$ run time? Why would one prefer to
use Quicksort over Mergesort despite a worse worst-case performance?  

<div style="border-width:2px;border-style:solid;border-color:black;">

Quicksort takes a 'random' value as its partition, and in the unluckiest case can end up with perfectly non-uniform paritions. e.g. on a pre-sorted array taking the final element as the pivot in every case, $n$ pivots will be generated, each involving an $O(n)$ sort -> $O(n^2)$.

Mergesort on the other-hand has a deterministic runtime, as the merge runtime is independent of the pivot choice.

</div>

&emsp; (viii)&emsp; How would you sort data in each of the following circumstances? Justify or comment on your choices of method. <br>

&emsp; 1. The data is a set of 10 entries in the high-table list for a video-game console, to be sorted by the score. <br>
&emsp; 2. You have 10 million people to sort based on their age in years. If two people have the same age, it does not matter in which order they appear in the output list. <br>
&emsp; 3. Your data consists of a large file of names. The first (about) 90% of it is already believed to be in sorted order from last time (but you are not 100% confident about that). The remaining 10% is new data that has been written onto the end of the original file in a chaotic, unordered state. You want a single file consisting of the old and new data all neatly sorted.

<div style="border-width:2px;border-style:solid;border-color:black;">
Write your answer here.
</div>

&emsp; (ix)&emsp; Describe an algorithm that given $n$ integers in the range $0$ to $k$, preprocesses its input and then answers any query about how many of the $n$ integers fall into a range $[a\cdots b]$ in $O(1)$ time. Your algorithm should use $\Theta(n+k)$ preprocessing time.

<div style="border-width:2px;border-style:solid;border-color:black;">

Use a variation on counting sort to tally a running total, such that a delta can be computed via array indexing:

</div>

In [9]:
class range_check():
    def __init__(self, A, k) -> None:
        if k is None:
            k = max(A) + 1
        n = len(A)
        self.counts = [0 for x in range(k)]

        # for each item, tally it in the counts array
        for key in A:
            self.counts[key] += 1
        # for each position in the tally, update to running total
        for i in range(1, k):
            self.counts[i] += self.counts[i - 1]
        # print(f"Counts: {self.counts}")

    # inclusive range check
    def count_range(self, a, b):
        return self.counts[b] - self.counts[a-1]
    
rc = range_check([1,4,5,12,13,35,40], 50)
print(rc.count_range(3,14))

4


&emsp; (x)&emsp; Explain how to sort $n$ integers in the range $0$ to $n^4 -1$ in $O(n)$ time.

<div style="border-width:2px;border-style:solid;border-color:black;">

?

</div>

&emsp; (xi)&emsp; Explain why the worst-case running time for bucket sort is $\Theta(n^2)$. What simple change to the algorithm preserves its linear average-case running time and makes its worst-case running time of $O(n \log n)$?

<div style="border-width:2px;border-style:solid;border-color:black;">

?

</div>

## 2.

Write the function, **sort**($L$), which takes an unsorted list, $L$, of $n$ real numbers and outputs a sorted list of these numbers. Your solution should have average case runtime of $O(n \log n)$.

### Examples:

**Input:** `[5, 3, 2, 7, 1, 8, 9, 12]` <br>
**Output:** `[1, 2, 3, 5, 7, 8, 9, 12]`

**Input:**  `[2.0, 1.01, 1.2, -3]` <br>
**Output:**  `[-3, 1.01, 1.2, 2.0]`

### Constraints:

- $1\leq n \leq 1000$. 
- $-10^6\leq L[i] \leq 10^6$. 

### Code:

In [49]:
# av case runime - mergesort or quicksort
def sort(L):
    return merge_sort(L)

def merge(A, low, pivot, high):
    # acquire the array to merge
    ## extract the two sorted sublists were trying to merge
    L = [A[low]] if low == pivot else A[low:pivot+1] 
    R = [A[pivot+1]] if pivot+1 == high else A[pivot+1:high+1]
    # print(f"{L} {A[pivot]} {R}")

    L.append(float("inf"))
    R.append(float("inf"))
    i = 0 # L index
    j = 0 # R index
    # k is the index in the top level list we're replacing
    
    # loop over the array section we're replacing
    for k in range(low, high+1):
        # if the left array is less than the right, replace with it
        if L[i] <= R[j]:
            A[k] = L[i]
            i += 1
        # if not, replace with the right array
        else:
            A[k] = R[j]
            j += 1

    

def merge_sort(A, low = None, high = None):
    if low is None:
        low = 0
    if high is None:
        high = len(A) - 1

    pivot = (low + high) // 2

    # single item - no need to sort 
    if low < high:
        merge_sort(A, low, pivot)
        merge_sort(A, pivot+1, high)

        merge(A, low, pivot, high)
    else:
        pass

    return A

L = [5,1,4,2]
sort(L)
print(L)

[1, 2, 4, 5]


### Tests:

##### Run example test case 1:

In [50]:
input_value = [5, 3, 2, 7, 1, 8, 9, 12]
print (sort(input_value))

[1, 2, 3, 5, 7, 8, 9, 12]


##### Run example test case 2:

In [51]:
input_value = [2.0, 1.01, 1.2, -3]
print (sort(input_value))

[-3, 1.01, 1.2, 2.0]


## 3.

Implement the function, **min_priority_queue**($L$), which takes a list, $L$, of $n$ elements, each of which is a dictionary. Each dictionary represents one of the three operations (with corresponding inputs), `"insert"`, `"pop_min"`, `"decrease_key"`, to be performed on a min-priority queue. E.g. `{"op":"insert","key":5}` correspond to an insertion of key, `5`, whereas, `{"op":"decrease_key","new_key":3,"index":2}` corresponds to decreasing the key value to `3` for an element with index, `2`, in the existing heap array. We assume that keys of the heap are stored in a zero-indexed array. *Note, in this question we only concern ourselves with storing keys and not values in the heap data structure.* <br>

The function should output the keys in the same order as they are stored in the array, representing the heap data structure, after all operations provided in the input are executed.

### Examples:

**Input:** `[{"op":"insert","key":10},{"op":"insert","key":5},{"op":"insert","key":1},{"op":"decrease_key","new_key":3,"index":2},{"op":"pop_min"}]` <br>
**Output:** `[3,10]`

**Input:**  `[{"op":"insert","key":5},{"op":"insert","key":4},{"op":"insert","key":3},{"op":"insert","key":2},{"op":"decrease_key","index":2,"new_key":2},{"op":"pop_min"},{"op":"pop_min"},{"op":"insert","key":9},{"op":"insert","key":8},{"op":"insert","key":7},{"op":"insert","key":6}]` <br>
**Output:**  `[3, 5, 6, 8, 7, 9]`

### Constraints:

- $1 \leq n \leq 1000$
- All operations are valid (e.g. one never attempts to decrease a key for element with $index < 0$ or $index \geq heap\_size $ or attempt to pop an element from an empty queue).

### Code:

In [54]:
class Heap:
    def __init__(self):
        self.H = []
        self.size = 0

    def insert(self,key):
        self.size += 1
        self.H.append(key)
        for i in range(self.size//2 - 1, -1, -1):
            self.max_heapify(i)
    
    def pop_min(self):
        self.size -= 1
        x = self.H.pop(0)
        for i in range(self.size//2 - 1, -1, -1):
            self.max_heapify(i)
        return x
        
    def decrease_key(self, index, new_key):
        self.H[index] = new_key
        self.max_heapify(index)

    def max_heapify(self, i):
        def left_child(i):
            return 2*i + 1
        def right_child(i):
            return 2*i + 2
        
        left = left_child(i)
        right = right_child(i)
        max_i = i

        if left < self.size and self.H[left] < self.H[max_i]:
            max_i = left
        if right < self.size and self.H[right] < self.H[max_i]:
            max_i = right
        if max_i != i:
            self.H[i], self.H[max_i] = self.H[max_i], self.H[i]
            self.max_heapify(max_i)

def min_priority_queue(L):

    heap = Heap()
    
    for op in L:
        if op["op"] == "insert":
            heap.insert(op["key"])
            x = op["key"]
            print(f"Insert {x} => {heap.H}")
        elif op["op"] == "pop_min":
            _ = heap.pop_min()
            print(f"Pop {_} => {heap.H}")
        elif op["op"] == "decrease_key":
            heap.decrease_key(op["index"],op["new_key"])
            ind, val = op["index"],op["new_key"]
            print(f"Dec index {ind} val {val} => {heap.H}")
        else:
            assert False, "Invalid operation"
            
    return heap.H[:heap.size]

### Tests:

##### Run example test case 1:

In [55]:
input_value = [{"op":"insert","key":10},{"op":"insert","key":5},{"op":"insert","key":1},{"op":"decrease_key","new_key":3,"index":2},{"op":"pop_min"}]
print (min_priority_queue(input_value))

Insert 10 => [10]
Insert 5 => [5, 10]
Insert 1 => [1, 10, 5]
Dec index 2 val 3 => [1, 10, 3]
Pop 1 => [3, 10]
[3, 10]


##### Run example test case 2:

In [56]:
input_value = [{"op":"insert","key":5},{"op":"insert","key":4},{"op":"insert","key":3},{"op":"insert","key":2},{"op":"decrease_key","index":2,"new_key":2},{"op":"pop_min"},{"op":"pop_min"},{"op":"insert","key":9},{"op":"insert","key":8},{"op":"insert","key":7},{"op":"insert","key":6}]
print (min_priority_queue(input_value))

Insert 5 => [5]
Insert 4 => [4, 5]
Insert 3 => [3, 5, 4]
Insert 2 => [2, 3, 4, 5]
Dec index 2 val 2 => [2, 3, 2, 5]
Pop 2 => [2, 3, 5]
Pop 2 => [3, 5]
Insert 9 => [3, 5, 9]
Insert 8 => [3, 5, 9, 8]
Insert 7 => [3, 5, 9, 8, 7]
Insert 6 => [3, 5, 6, 8, 7, 9]
[3, 5, 6, 8, 7, 9]


## 4. 

Write the function, **pairs**($L$), which, given a list, $L$, of $n$ non-negative integers, finds the number of distinct pairs, $(i, j)$, such that $j \geq i $ and $L[i] = L[j]$. Your solution should have a linear worst-case run time.

### Examples:

**Input:** `[5,2,1,5]` <br>
**Output:** `5`

**Input:**  `[1,2,3,4,1,1,1]` <br>
**Output:**  `13`

### Constraints:

- $1\leq n \leq 10^6$.
- $1\leq A[i] \leq 10^6$.

### Code:

In [57]:
def pairs(L):
    k = max(L)
    n = len(L)
    counts = [0 for x in range(k+1)]

    # for each item, tally it in the counts array
    for key in L:
        counts[key] += 1

    # each value is worth its integer sum to 1
    # i.e. if there are 3 1's, there are
    # 3 single index pairs
    # 2 adjacent
    # 1 adjacent + 1 =  3 + 2 + 1
    #= 0.5 * n * (N + 1)

    def sum_to_n(n):
        return int(0.5 * n * (n+1))
    
    pairs = 0 
    for x in counts:
        pairs += sum_to_n(x)

    return pairs

### Tests:

##### Run example test case 1:

In [58]:
input_value = [5,2,1,5]
print (pairs(input_value))

5


##### Run example test case 2:

In [59]:
input_value = [1,2,3,4,1,1,1]
print (pairs(input_value))

13


## 5. 

Write a function, **close_pairs**($L$), which takes a list, $L$, of three elements, the first of which is a list of non-negative integer numbers, $A$, and the second and third of which are positive integer values, $\delta_{index}$ and $\delta_{value}$, respectively. This function should output a string, `"true"`, if array, $A$, contains a pair of indices, $(i,j)$, such that: 
1. $i \neq j$, 
2. $\lvert i-j\rvert\leq \delta_{index}$ and 
3. $\lvert A[i] - A[j]\rvert \leq \delta_{value}$. <br>

If such pair does not exist, the implemented function should return a string, `"false"`. Your solution should have linear average case run time performance. Note that you can use a python dictionary (i.e. `D = {}`) as an implementation of a hash table with an assumed constant average case run time cost for insertion, deletion and element search operations.

### Examples:

**Input:** `[[1,50,4,8,16,12,2],1,1]` <br>
**Output:** `"false"`

**Input:**  `[[1,16,4,8,64,2,32],2,3]` <br>
**Output:**  `"true"`

### Constraints:

- $1 \leq \text{len }(A) \leq 10^4$.
- $1 \leq \delta_{index} \leq 10^4$.
- $1 \leq \delta_{value} \leq 10^6$.
- $1 \leq A[i] \leq 10^6$.

### Code:

In [91]:
def close_pairs(L):

    A, delta_index, delta_value = L[0], L[1], L[2]

    # store buckets of width delta_value, where each bucket has its index 
    # b_index = A[i] // delta_value
    # buckets contain a bucket ID, and then the key "ID":key
    # on each loop remove the i-delta_index bucket element
    # a bucket will never contain more than 1 key, as if it did we would exit
    D = {}

    for i in range(len(A)):
        b_index = A[i] // delta_value
        b_index_old = A[(i - delta_index -1)] // delta_value
        # print(f"{D}<-{A[i]} Bucket = {b_index}")

        # if its lapsed remove the old bucket
        if i - delta_index - 1 >= 0:
            D.pop(b_index_old)

        # if the bucket already exists
        if b_index in D:
            return "true"
        D[b_index] = A[i]

        if b_index+1 in D:
            if abs(D[b_index+1] - D[b_index]) <= delta_value:
                return "true"
        if b_index-1 in D:
            if abs(D[b_index-1] - D[b_index]) <= delta_value:
                return "true"        
    return "false"
        
    #Write your code here

### Tests:

##### Run example test case 1:

In [92]:
input_value = [[1,50,4,8,16,12,2],1,1]
print (close_pairs(input_value))

false


##### Run example test case 2:

In [93]:
input_value = [[1,16,4,8,64,2,32],2,3]
print (close_pairs(input_value))

true


## 6.

Write the function, **sort_strings**($S$), which takes a list, $S$, of strings and sorts it in the following order.
1. If length of string, $a$, is smaller than length of string, $b$, then string, $a$, should be appearing in the output before string, $b$.
2. If strings, $a$ and $b$, are of the same length, then string, $a$, should come before string, $b$, if $ord(a[i])$<$ord(b[i])$ where $i$ is the index of the first character for which $a[i]\neq b[i]$. Note, $ord$, is a python function which converts a character into its ASCII integer value (e.g. `ord('a')=97`).

Also note that a function, **get_character_dict**($S$), is provided. This function creates a dictionary, $char\_dict$, which for each character used in $S$, computes its position in a ranked list of all characters used in $S$. 

### Examples:

**Input:** `["abcd", "ab", "abc", "abcde", "a","e"]` <br>
**Output:** `["a", "e", "ab", "abc", "abcd", "abcde"]`

**Input:**  `["314", "712", "632", "201", "111","11","1","20","5","50","1a"]` <br>
**Output:**  `["1", "5", "11", "1a", "20", "50", "111", "201", "314", "632", "712"]`

### Constraints:

- $ 1\leq \text{len }(S) \leq 100$.

### Code:

In [None]:
def get_character_dict(S):
    
    char_set = set([])
    for s in S:
        char_set=char_set | set(list(s))
        
    count = 0
    char_dict={}
    for c in sorted(list(char_set)):
        char_dict[c]=count
        count+=1
    return char_dict

def sort_strings(S):
 
    char_dict = get_character_dict(S)

    #Write your code here

### Tests:

##### Run example test case 1:

In [None]:
input_value = ["abcd", "ab", "abc", "abcde", "a","e"]
print (sort_strings(input_value))

##### Run example test case 2:

In [None]:
input_value = ["314", "712", "632", "201", "111","11","1","20","5","50","1a"]
print (sort_strings(input_value))