# **4M26 - Examples Paper 3 - Sorting Algorithms** 

---
<br>
<br>

## 1.

Briefly answer multiple questions below.

&emsp; (i)&emsp; Why is it often said that sorting $n$ values must necessarily cost $n \log n$ steps? In which cases this assertion is false?

<div style="border-width:2px;border-style:solid;border-color:black;">
When the statement above is made, it usually intends to convey that the comparison sort algorithms (e.g. quicksort, heapsort, insertion sort, etc) must have an average-case as well as worst-case lower bound of $\Omega(n \log n)$ comparison operations. <br><br>

Lower bounds can be established by considering that there are $n!$ possible permutations of any set of unique numbers. Running a comparison sort algorithm on a particular input corresponds to traversing a comparison tree described in lecture notes. Since the tree has at least $n!$ leaf nodes, it's longest path has to be at least $\Omega(n\log n)$ since $n!= \left( \frac{n}{e}\right)^n \sqrt{2\pi n}\left(1+\Theta\left(\frac{1}{n}\right)\right)\geq \left(\frac{n}{e}\right)^n$. Also, since the average height of a tree is smallest when the tree is completely balanced it is straight-forwards to show that $\Omega(n \log n)$ comparisons are also needed in average case.<br><br>
    
    
If a non-coparison based sorting method is used (e.g. counting sort or radix sort) or if you can exploit the knowledge about which orderings are probable or if the items sorted have many repeats then analysis above does not hold.
</div>

&emsp; (ii)&emsp; In the context of Heapsort, explain how to take initially unordered data and rearrange it so that the heap property applies.

<div style="border-width:2px;border-style:solid;border-color:black;">
One can build a max-heap from an unsorted array in linear time. This is achieved by applying the $max\_heapify$ procedure for the top half of the array, starting from the element with the largest index (ie. $n/2$) and finishing with the element with the smallest index (i.e. 0). <br><br>
    
The $max\_heapify$ procedure starts at a particular node and checks if that node satisfies the max-heap property (ie. checks if the key value of the particular node is larger than key values of both of its children). If the max-heap property is satisfied then the procedure termintates, otherwise it swaps the current node with its child with largest key value and recursively applies the max-heapify procedure on the subtree of the swapped node. <br><br>

The procedure for building a min-heap is similar with the only difference being that the min-heap property would need to be satisfied and the child with smallest key value would be swapped.
</div>

In [1]:
def max_heapify(A, heap_size, i):
    left = left_child(i)
    right = right_child(i)
    max_i = i
    if left < heap_size and A[left] > A[max_i]:
        max_i = left
    if right < heap_size and A[right] > A[max_i]:
        max_i = right
    if max_i != i:
        A[i], A[max_i] = A[max_i], A[i]
        max_heapify(A, heap_size, max_i)

def build_max_heap(A):
    heap_size = len(A)
    for i in range(heap_size // 2 - 1, -1, -1):
        max_heapify(A, heap_size, i)

&emsp; (iii)&emsp; Explain how to reinstate the heap property if one of the values in the heap is changed. What is the cost of such operation? Does it matter whether the value is increased or decreased?

<div style="border-width:2px;border-style:solid;border-color:black;">
Lets assume that we are dealing with a max-heap. Similar steps (with cases reversed) can be applied for the min-heap.<br><br>
    
1. If the key value of a particular node has been decreased we know that the max-heap property has not been violated from the perspective of the parent of the node of interest. Hence, we simply need to run a max-heapify procedure described in Part (ii) which has run time cost of $O(\log n)$ for a max-heap which stores $n$ values.  <br><br>
    
2. If the key value has increased then we simply need to swap it with its parent value (if the new value is larger than the parent node value) and repeat this procedure for the updated parent node recursively. This procedure also has a runtime cost of $O(\log n)$ as the maximum depth of the heap is $\log n$.
</div>

&emsp; (iv)&emsp;  In bad cases Quicksort can be much more costly than Heapsort. Why, then, is it still very often used?  

<div style="border-width:2px;border-style:solid;border-color:black;">
The bad cases for quicksort are fairly rare, and with input array randomisation especially common cases of data that is almost sorted already would not cause trouble. In average case Quicksort has a smaller constant factor despite having the same assymptotic runtime cost of $O(n \log n)$ as Heapsort. Accross a series of uses it is liable to win even if bad case happens occasionally. Quicksort is also easier to implement and is an in-place sorting algorithm which reduces the requirements for memory.
</div>

&emsp; (v)&emsp; In the context of Quicksort, you are given an array of lenght, $n$, containing arbitrary data in an arbitrary order and a selected value, $p$, taken from that data (*the pivot*). Explain two strategies of how to rearrange the data so that all values less than the pivot end up to its left, and all values greater than it, to its right.

In [158]:
def lomuto( A, low, high ):
    
    p = high - 1
    pv = A[p]
    
    i = low
    for j in range(low, p):
        if A[j] <= pv:
            A[j], A[i] = A[i], A[j]
            i += 1
    A[i], A[p] = A[p], A[i]
    
    return i


def hoare(nums, low, high):
    
    pivot = nums[(high + low) // 2]
    i = low - 1
    j = high + 1

    while True:
        i += 1
        j -= 1

        while nums[i] < pivot:
            i += 1
        while nums[j] > pivot:
            j -= 1

        if i >= j:
            return j#, nums

        nums[i], nums[j] = nums[j], nums[i]    
        

def quicksort( A, low, high ):

    if high - low < 1:
        return A
    
    pivot = hoare( A, low, high )
    quicksort( A, low, pivot )
    quicksort( A, pivot+1, high )
    
    return A
    
    
quicksort([0, 1, 6, 4, 2, 7, 8, 3, 1, 5], 0, 9 )

[0, 1, 1, 2, 3, 4, 5, 6, 7, 8]

<div style="border-width:2px;border-style:solid;border-color:black;">
Lomuto's or Hoare's partition can be used. Assume that the sub-array $A[low:high+1]$ of array $A$ has to be partitioned.<br><br>

In Lomuto partition scheme, element with the largest index (ie. $p =A[high]$) is selected as pivot and the array is then is split into four regions ordered from the start of the subarray to the end: (1) array of elements $\leq p$, (2) array of elements $>p$, (3) undecided elements and (4) final element $A[high]$ which is swaped with the first element of the second type of array after the partitioning loop is completed. Regions (1) and (2) are empty at the start and are grown by scanning the array from the $low$ index to $high$ index. See python code below. <br><br><br>
    
In Hoare partition scheme, the element with the smallest index (ie. $p=A[low]$) is selected as a pivot and array is split into 3 regions ordered from the start of the subarray to the end: (1) array of elements $<p$, (2) undecided region, (3) array of elements $>p$. Regions (1) and (2) are grown repeateadly from the ends of the array inwards with swaps being performed when elements which should not belong to these regions are encountered (see code below).<br><br>
    
Note, it is possible to show Hoare’s scheme is more efficient than Lomuto’s partition scheme on average. Also note, in Hoare's partitioning scheme, the final location of the pivot is not necessarily at the index that is returned.

</div>

In [2]:
def lomuto_partition(A, low, high): 
    pivot = high
    pivot_val = A[pivot]
    i = low
    for j in range(low,high):
        if A[j] <= pivot_val:
            A [i], A[j] = A[j], A[i]
            i = i + 1
    A[i], A[pivot] = A[pivot], A[i]
    return i #new pivot
A = [13,19,9,5,12,8,7,4,11,2,21,6]
p_new = lomuto_partition(A,0,11)
print(A, '\n', p_new, A[p_new])

[5, 4, 2, 6, 12, 8, 7, 19, 11, 9, 21, 13] 
 3 6


In [53]:
def hoare_partition(A, low, high):
    x = A[low]
    i = low - 1
    j = high + 1
    while True:
        j = j - 1
        
        while  A[j]> x:
            j = j - 1
        
        i = i + 1
        while  A[i]<x:
            i = i + 1
        
        if i < j:
            A[i], A[j] = A[j], A[i]
        else:
            return j #new pivot
A = [13,19,9,5,12,8,7,4,11,2,21,6]
p_new = hoare_partition(A,0,11)
print(A,'\n',p_new,A[p_new])

[6, 2, 9, 5, 12, 8, 7, 4, 11, 19, 21, 13] 
 8 11


&emsp; (vi)&emsp; List a set of operations that you would expect to form an implementation of a priority queue. Name a data structure suitable for implementing a priority queue, and explain how much time it should take to perform the aforementioned operations. Why this data structure would be prefered over Red-Black trees?

<div style="border-width:2px;border-style:solid;border-color:black;">
Operations for a max-priority queue: <br><br>

1. <b>get_max</b>() - return value with the largest key. <br>
2. <b>pop_max</b>() - pop and return value with largest key. <br>
3. <b>insert</b>($value$, $key$) - insert $value$ with a key of $key$ into the priority queue. <br>
4. <b>increase_key</b>($value$, $key$) - increase the key associated with $value$ to $key$.  <br> <br>

    
A max-heap can be used to implement a max-priority queue. Run-time complexity for a queue with $n$ items:<br><br>
1. <b>get_max</b> - $\Theta(1)$. Max value is always at the top of the max-heap. <br>
2. <b>pop_max</b> - $O(\log n)$. Moves the final element in the heap array to the first place and calls <b>max_heapify</b> for the root node. <br>
3. <b>insert</b> - $O(\log n)$. Appends an element to the end of the heap array and calls increase_key function for that element. <br>
4. <b>increase_key</b> - $O(\log n)$. Discussed in Part iii.  <br><br>
    
Red-Black trees would have the same cost as Heap for all operations except for <b>get_max</b> which would have $O(\log n)$ runtime as opposed to $\Theta(1)$. Hence one should prefer using heap data strucure for implementing a priority queue.
</div>

&emsp; (vii)&emsp; Both quicksort and mergesort have an average case run time performance of $O(n \log n)$
to sort a list containing $n$ items. However, in the worst case, Quicksort can require $O(n^2)$
time, while Mergesort’s worst case running time is the same as its average case time: $O(n \log n)$.  What accounts for the difference in worst case times? How is it that Quicksort can
require $O(n^2)$ time, while Mergesort always guarantees $O(n \log n)$ run time? Why would one prefer to
use Quicksort over Mergesort despite a worse worst-case run time performance?  

<div style="border-width:2px;border-style:solid;border-color:black;">
In Mergesort, the list of $n$ elements is always divided in half. Two subproblems of size $n/2$ then need to be solved for a maximum recursion depth of $\log n$. This leads to a total running time of $O(n \log n)$. <br><br>
    
Quicksort’s partition function does not always, divide the list in half, when it picks the pivot element. Particularly in the worst case if the largest or the smallest element is chosen, Quicksort performs $n$ recursive calls, with $O(m)$ cost each (here $m$ is the length of the array to be merged), giving the total running time of $O(n^2)$. <br><br>
    
As also mentioned in the answer to Part (iv), Quicksort can sort a list in place. In contrast, Mergesort needs additional memory to merge the sorted sublist. Also, similar arguments as in Part(iv) can be applied that the worst case behaviour in Quicksort is not very likely and it is relatively straight-forward to implement.
</div>

&emsp; (viii)&emsp; How would you sort data in each of the following circumstances? Justify or comment on your choices of method. <br>

&emsp; 1. The data is a set of 10 entries in the high-table list for a video-game console, to be sorted by the score. <br>

&emsp; 2. You have 10 million people to sort based on their age in years. If two people have the same age, it does not matter in which order they appear in the output list. <br>

&emsp; 3. Your data consists of a large file of names. The first (about) 90% of it is already believed to be in sorted order from last time (but you are not 100% confident about that). The remaining 10% is new data that has been written onto the end of the original file in a chaotic, unordered state. You want a single file consisting of the old and new data all neatly sorted.

<div style="border-width:2px;border-style:solid;border-color:black;">
    
1. For just 10 values a simple insertion sort will be good enough. Since this is only a tiny amount of data performance does not matter much but simplicity does. It could be that the order in the list changes only slowly and that too suggests a simple insertion sort. <br><br>

2. The key issue here is that there are only around 100 distinct keys. One pass could be performed to compute the total number of people in each bucket. The second pass could move them in the right place. <br><br>
    
3. This seems best sorted by splitting the data where it stops being well sorted. Then use insertion sort on the almost sorted bit (linear cost since the array is nearly sorted already) and quicksort on the rest followed by a merge to put them together. Simple insertion sort on the whole lot does not seem sensible. 
</div>

In [4]:
import random

nums = [ random.randint(0,99) for _ in range(1000) ]

def preprocess( nums, k ):
    preprocessed = [ 0 for _ in range(k) ]

    for n in nums:
        preprocessed[n] += 1
        
    for i in range(1,k):
        preprocessed[i] += preprocessed[i-1]
        
    return preprocessed
        
    
def cumulation( preprocessed, a, b ):
    
    return preprocessed[b] - (preprocessed[a-1] if a > 0 else 0)

cumulation( preprocess(nums, 100), 29, 53 )


246

&emsp; (ix)&emsp; Describe an algorithm that given $n$ integers in the range $0$ to $k$, preprocesses its input and then answers any query about how many of the $n$ integers fall into a range $[a\cdots b]$ in $O(1)$ time. Your algorithm should use $\Theta(n+k)$ preprocessing time.

<div style="border-width:2px;border-style:solid;border-color:black;">
The algorithm will begin by preprocessing exactly as counting sort does, so that $counts[i]$ contains the number of elements less than or equal to $i$ in the original array. The number of integers that fall into a range $[a\cdots b]$ can be simply answered by computing $counts[b]-counts[a-1]$ in $O(1)$ time.
</div>

&emsp; (x)&emsp; Explain how to sort $n$ integers in the range $0$ to $n^4 -1$ in $O(n)$ time.

<div style="border-width:2px;border-style:solid;border-color:black;">
Convert each integer to base $n$ and then perform radix sort. Each number will have at most $\log_n (n^4) = 4$ digits so there will only need to be 4 passes. Each pass we can use counting sort to sort each digit in $O(n)$ time. This gives a total $O(4n)=O(n)$ run time.
</div>

&emsp; (xi)&emsp; Explain why the worst-case running time for bucket sort is $\Theta(n^2)$. What simple change to the algorithm preserves its linear average-case running time and makes its worst-case running time of $O(n \log n)$?

<div style="border-width:2px;border-style:solid;border-color:black;">
The worst for bucket sort is achieved when all numbers fall into the same bucket (i.e. a total of $n$ values). Since insertion sort has worst case running time $O(n^2)$, so does Bucket sort. We can reduce the worst-case runnign time by using mergesort instead. Such an algorithm would have the worst case running time of $O(n\log n)$.
</div>

## 2.

Write the function, **sort**($L$), which takes an unsorted list, $L$, of $n$ real numbers and outputs a sorted list of these numbers. Your solution should have average case runtime of $O(n \log n)$.

### Examples:

**Input:** `[5, 3, 2, 7, 1, 8, 9, 12]` <br>
**Output:** `[1, 2, 3, 5, 7, 8, 9, 12]`

**Input:**  `[2.0, 1.01, 1.2, -3]` <br>
**Output:**  `[-3, 1.01, 1.2, 2.0]`

### Constraints:

- $1\leq n \leq 1000$. 
- $-10^6\leq L[i] \leq 10^6$. 

### Code:

In [20]:
import random

def hoare(A, low, high):
    
    print(A[low:high])
    
    p = low
    pv = A[low]
    
    i = low
    j = high - 1
    
    while i < j:
        
        if A[i] < pv:
            i += 1
            
        elif A[j] >= pv:
            j -= 1
            
        else:
            A[i], A[j] = A[j], A[i]
            
    if j == low:
        return j + 1
    
    return j
    
def quicksort(A, low=None, high=None):
    
    low = 0 if low is None else low
    high = len(A) if high is None else high
    
    if high - low <= 1:
        return A
    
    pivot = hoare(A, low, high)
    quicksort(A, low, pivot)
    quicksort(A, pivot, high)
    
    return A

quicksort([ random.randint(-100, 100) for _ in range(50)])

[-40, -58, 20, 13, 59, 18, -67, 66, -84, -51, -22, 98, -39, -30, -33, -13, 16, 60, -4, 7, 70, 97, 82, 93, -60, 66, 32, -25, 80, 70, 77, -46, 21, 19, -61, 35, 19, -97, -79, 79, -39, -21, -56, 19, -37, -16, -16, -90, -77, -60]
[-60, -58, -77, -90, -56, -79, -67, -97, -84, -51, -61, -46, -60]
[-61, -84, -77, -90, -97, -79, -67]
[-67, -84, -77, -90, -97, -79]
[-79, -84, -77, -90, -97]
[-97, -84, -90]
[-84, -90]
[-77, -79]
[-56, -58, -51, -60, -46, -60]
[-60, -58, -60]
[-58, -60]
[-51, -46, -56]
[-46, -51]
[-30, -33, -13, 16, 60, -4, 7, 70, 97, 82, 93, -39, 66, 32, -25, 80, 70, 77, 98, 21, 19, -22, 35, 19, 66, 18, 79, -39, -21, 59, 19, -37, -16, -16, 13, 20, -40]
[-40, -33, -37, -39, -39]
[-33, -37, -39, -39]
[-39, -37, -39]
[-37, -39]
[-4, 7, 70, 97, 82, 93, 60, 66, 32, -25, 80, 70, 77, 98, 21, 19, -22, 35, 19, 66, 18, 79, 16, -21, 59, 19, -13, -16, -16, 13, 20, -30]
[-30, -16, -16, -13, -21, -22, -25]
[-16, -16, -13, -21, -22, -25]
[-25, -22, -21]
[-22, -21]
[-13, -16, -16]
[-16, -16]
[66

[-97,
 -90,
 -84,
 -79,
 -77,
 -67,
 -61,
 -60,
 -60,
 -58,
 -56,
 -51,
 -46,
 -40,
 -39,
 -39,
 -37,
 -33,
 -30,
 -25,
 -22,
 -21,
 -16,
 -16,
 -13,
 -4,
 7,
 13,
 16,
 18,
 19,
 19,
 19,
 20,
 21,
 32,
 35,
 59,
 60,
 66,
 66,
 70,
 70,
 77,
 79,
 80,
 82,
 93,
 97,
 98]

In [4]:
#We simply implement quicksort with Lomuto partition.

def lomuto_partition(A, low, high):

    pivot = high
    pivot_val = A[pivot]
    i = low
    for j in range(low, high):
        if A[j] <= pivot_val:
            A[i], A[j] = A[j], A[i]
            i += 1
    A[i], A[pivot] = A[pivot], A[i]
    return i

def quicksort_lomuto(A, low, high):

    if high is None:
        high = len(A) - 1
    
    if low < high:
        pivot = lomuto_partition(A, low, high)
        quicksort_lomuto(A, low, pivot - 1)
        quicksort_lomuto(A, pivot + 1, high)
        
def sort(A):

    #Write your code here
    
    quicksort_lomuto(A,0,len(A)-1)
    return A

### Tests:

##### Run example test case 1:

In [5]:
input_value = [5, 3, 2, 7, 1, 8, 9, 12]
print (sort(input_value))

[1, 2, 3, 5, 7, 8, 9, 12]


##### Run example test case 2:

In [6]:
input_value = [2.0, 1.01, 1.2, -3]
print (sort(input_value))

[-3, 1.01, 1.2, 2.0]


## 3.

Implement the function, **min_priority_queue**($L$), which takes a list, $L$, of $n$ elements, each of which is a dictionary. Each dictionary represents one of the three operations (with corresponding inputs), `"insert"`, `"pop_min"`, `"decrease_key"`, to be performed on a min-priority queue. E.g. `{"op":"insert","key":5}` correspond to an insertion of key, `5`, whereas, `{"op":"decrease_key","new_key":3,"index":2}` corresponds to decreasing the key value to `3` for an element with index, `2`, in the existing heap array. We assume that keys of the heap are stored in a zero-indexed array. *Note, in this question we only concern ourselves with storing keys and not values in the heap data structure.* <br>

The function should output the keys in the same order as they are stored in the array, representing the heap data structure, after all operations provided in the input are executed.

### Examples:

**Input:** `[{"op":"insert","key":10},{"op":"insert","key":5},{"op":"insert","key":1},{"op":"decrease_key","new_key":3,"index":2},{"op":"pop_min"}]` <br>
**Output:** `[3,10]`

**Input:**  `[{"op":"insert","key":5},{"op":"insert","key":4},{"op":"insert","key":3},{"op":"insert","key":2},{"op":"decrease_key","index":2,"new_key":2},{"op":"pop_min"},{"op":"pop_min"},{"op":"insert","key":9},{"op":"insert","key":8},{"op":"insert","key":7},{"op":"insert","key":6}]` <br>
**Output:**  `[3, 5, 6, 8, 7, 9]`

### Constraints:

- $1 \leq n \leq 1000$.
- All operations are valid (e.g. one never attempts to decrease a key for element with $index < 0$ or $index \geq heap\_size $ or attempt to pop an element from an empty queue).

### Code:

In [7]:
#A straight-forward implementation of min-heap.

class Heap:
    def __init__(self):
        self.H = []
        self.size = 0

    def insert(self,key):

        #Write your code here
        
        if self.size==len(self.H):
            self.H = self.H + [0]
        self.size+=1
        self.decrease_key(self.size-1,key)
    
    def pop_min(self):

        #Write your code here
        
        max_key = self.H[0]
        self.H[0] = self.H[self.size-1]
        self.size -=1
        self.min_heapify(0)
        
    def min_heapify(self,index):
        
        left = 2*index +1 
        right = 2 *index +2
        min_ind = index
        if left < self.size and self.H[left]<self.H[min_ind]:
            min_ind = left
        if right < self.size and self.H[right]<self.H[min_ind]:
            min_ind = right
        if min_ind != index:
            self.swap(index, min_ind)
            self.min_heapify(min_ind)
            
    def swap(self,a,b):
        temp = self.H[a]
        self.H[a] = self.H[b]
        self.H[b] = temp
        
    def decrease_key(self, index, new_key):

        #Write your code here

        self.H[index] = new_key
        
        while index > 0 and self.H[index] < self.H[(index-1)//2]:
            self.swap(index,(index-1)//2)
            index = (index-1)//2

def min_priority_queue(C):

    heap = Heap()
    
    for c in C:
        if c["op"] == "insert":
            heap.insert(c["key"])
        elif c["op"] == "pop_min":
            _ = heap.pop_min()
        elif c["op"] == "decrease_key":
            heap.decrease_key(c["index"],c["new_key"])
        else:
            assert False, "Invalid operation"
            
    return heap.H[:heap.size]

### Tests:

##### Run example test case 1:

In [8]:
input_value = [{"op":"insert","key":10},{"op":"insert","key":5},{"op":"insert","key":1},{"op":"decrease_key","new_key":3,"index":2},{"op":"pop_min"}]
print (min_priority_queue(input_value))

[3, 10]


##### Run example test case 2:

In [9]:
input_value = [{"op":"insert","key":5},{"op":"insert","key":4},{"op":"insert","key":3},{"op":"insert","key":2},{"op":"decrease_key","index":2,"new_key":2},{"op":"pop_min"},{"op":"pop_min"},{"op":"insert","key":9},{"op":"insert","key":8},{"op":"insert","key":7},{"op":"insert","key":6}]
print (min_priority_queue(input_value))

[3, 5, 6, 8, 7, 9]


## 4. 

Write the function, **pairs**($L$), which, given a list, $L$, of $n$ non-negative integers, finds the number of distinct pairs, $(i, j)$, such that $j \geq i $ and $L[i] = L[j]$. Your solution should have a linear worst-case run time.

### Examples:

**Input:** `[5,2,1,5]` <br>
**Output:** `5`

**Input:**  `[1,2,3,4,1,1,1]` <br>
**Output:**  `13`

### Constraints:

- $1\leq n \leq 10^6$.
- $1\leq A[i] \leq 10^6$.

### Code:

In [34]:
def pairs(L):
    counts = {}
    for l in L:
        counts[l] = counts.get(l, 0) + 1
    return sum([ n/2 * (n+1) for n in counts.values() ])

In [26]:
# Simply counting the number of times each different value occurs in the list 
# and computing the right number of pairs from such count.

def pairs(L):

    #Write your code here
    
    max_val=L[0]
    for i in range(1,len(L)):
        if L[i]>max_val:
            max_val = L[i]
    c = [0 for i in range(max_val+1)]
    for i in range(len(L)):
        c[L[i]] += 1
    result = 0
    for i in range(len(c)):
        if c[i]>0:
            result += (c[i]*(c[i]+1))//2
    return result

### Tests:

##### Run example test case 1:

In [32]:
input_value = [5,2,1,5]
print (pairs(input_value))

{5: 2, 2: 1, 1: 1}
5.0


##### Run example test case 2:

In [33]:
input_value = [1,2,3,4,1,1,1]
print (pairs(input_value))

{1: 4, 2: 1, 3: 1, 4: 1}
13.0


## 5. 

Write a function, **close_pairs**($L$), which takes a list, $L$, of three elements, the first of which is a list of non-negative integer numbers, $A$, and the second and third of which are positive integer values, $\delta_{index}$ and $\delta_{value}$, respectively. This function should output a string, `"true"`, if array, $A$, contains a pair of indices, $(i,j)$, such that: 
1. $i \neq j$, 
2. $\lvert i-j\rvert\leq \delta_{index}$ and 
3. $\lvert A[i] - A[j]\rvert \leq \delta_{value}$. <br>

If such pair does not exist, the implemented function should return a string, `"false"`. Your solution should have linear worst case run time performance. Note that you can use a python dictionary (i.e. `D = {}`) as an implementation of a hash table with an assumed constant average case run time cost for insertion, deletion and element search operations.

### Examples:

**Input:** `[[1,50,4,8,16,12,2],1,1]` <br>
**Output:** `"false"`

**Input:**  `[[1,16,4,8,64,2,32],2,3]` <br>
**Output:**  `"true"`

### Constraints:

- $1 \leq \text{len }(A) \leq 10^4$.
- $1 \leq \delta_{index} \leq 10^4$.
- $1 \leq \delta_{value} \leq 10^6$.
- $1 \leq A[i] \leq 10^6$.

### Code:

In [50]:
def close_pairs(L):
    
    A, dind, dval = L
    buckets = {}
    
    for i, l in enumerate(A):
        bucket = l // dval
        
        for close in [
            *buckets.get(bucket-1, set()),
            *buckets.get(bucket, set()),
            *buckets.get(bucket+1, set()),
        ]:
            if abs(close - l) < dval:
                return True
            
        buckets[bucket] = buckets.get(bucket, set()) | set([l])
        
        if i > dind:
            buckets[ A[i-dind] // dval ].remove([A[i-dind]])
        
    return False

In [44]:
set([1,2,3]) | set([4])

{1, 2, 3, 4}

In [13]:
# We loop through all the elements in the array and record the count of values in buckets of size 
# d_value. If we encounter a bucket which contains more than 1 item then we know that we found a
# close pair. If there is a neighbouring non-empty bucket then we need to to check if the value 
# in that bucket satisfies constraints (if yes, return true). As we pass through first d_index values
# we need to make sure to remove the "oldest" values not within the right range from the bucket.
# Since in this procedure each item gets added into any bucket at most once and removed at most once 
# and there is a constant number of comparisons per item, we have worst case performance to be linear.

def close_pairs(L):
        
    #Write your code here
    
    A, d_index, d_value = L[0], L[1], L[2]
    
    D = {}
    
    for i in range(len(A)):
        val = A[i]
        bucket = val//d_value
        if bucket in D:
            return "true"
        D[bucket] = val
        if bucket-1 in D:
            if abs(D[bucket-1] -val)<= d_value:
                return "true"
        if bucket+1 in D:
            if abs(D[bucket+1] -val)<= d_value:
                return "true"
        if i-d_index>=0:
            D.pop(A[i-d_index]//d_value)
    return "false"

### Tests:

##### Run example test case 1:

In [51]:
input_value = [[1,50,4,8,16,12,2],1,1]
print (close_pairs(input_value))

TypeError: unhashable type: 'list'

##### Run example test case 2:

In [49]:
input_value = [[1,16,4,8,64,2,32],2,3]
print (close_pairs(input_value))

True


## 6.

Write the function, **sort_strings**($S$), which takes a list, $S$, of strings and sorts it in the following order.
1. If length of string, $a$, is smaller than length of string, $b$, then string, $a$, should be appearing in the output before string, $b$.
2. If strings, $a$ and $b$, are of the same length, then string, $a$, should come before string, $b$, if $ord(a[i])$<$ord(b[i])$ where $i$ is the index of the first character for which $a[i]\neq b[i]$. Note, $ord$, is a python function which converts a character into its ASCII integer value (e.g. `ord('a')=97`).

Also note that a function, **get_character_dict**($S$), is provided. This function creates a dictionary, $char\_dict$, which for each character used in $S$, computes its position in a ranked list of all characters used in $S$. 

### Examples:

**Input:** `["abcd", "ab", "abc", "abcde", "a","e"]` <br>
**Output:** `["a", "e", "ab", "abc", "abcd", "abcde"]`

**Input:**  `["314", "712", "632", "201", "111","11","1","20","5","50","1a"]` <br>
**Output:**  `["1", "5", "11", "1a", "20", "50", "111", "201", "314", "632", "712"]`

### Constraints:

- $ 1\leq \text{len }(S) \leq 100$.

### Code:

In [16]:
# A straight-forward implementation of radix sort with additional hanling of strings of 
# varying lengths.

def counting_sort_on_digit(A, k_dict, d, reverse_digit_pos):

    def subkey(key):

        if reverse_digit_pos> len(key):
            return 0
        else:
            return k_dict[key[-reverse_digit_pos]]+1

    n = len(A)
    counts = [0 for _ in range(len(k_dict)+1)]
    output = [None for _ in range(n)]
    for key in A:
        counts[subkey(key)] += 1
    for i in range(1, len(counts)):
        counts[i] += counts[i - 1]
    for key in reversed(A):
        output[counts[subkey(key)] - 1] = key
        counts[subkey(key)] = counts[subkey(key)] - 1
    return output


def radix_sort_lsd(A, d, k_dict):

    for digit_pos in range(1, d+1):
        A = counting_sort_on_digit(A, k_dict, d, digit_pos)
        
    return A

def get_character_dict(S):
    
    char_set = set([])
    for s in S:
        char_set=char_set | set(list(s))
        
    count = 0
    char_dict={}
    for c in sorted(list(char_set)):
        char_dict[c]=count
        count+=1
    return char_dict

def sort_strings(S):
 
    char_dict = get_character_dict(S)

    #Write your code here
    
    max_length=len(S[0])
    
    for i in range(len(S)):
        if len(S[i])>max_length:
            max_length = len(S[i])

    B = radix_sort_lsd(S, max_length, char_dict) 
    
    return B

### Tests:

##### Run example test case 1:

In [17]:
input_value = ["abcd", "ab", "abc", "abcde", "a","e"]
print (sort_strings(input_value))

['a', 'e', 'ab', 'abc', 'abcd', 'abcde']


##### Run example test case 2:

In [18]:
input_value = ["314", "712", "632", "201", "111","11","1","20","5","50","1a"]
print (sort_strings(input_value))

['1', '5', '11', '1a', '20', '50', '111', '201', '314', '632', '712']
