This group of exercises is themed around sorting. `mergesort` and `quicksort` are picked out as algorithms you really should know. So here are implementations of these.

In [1]:
def mergesort(arr):
    if len(arr) <= 1:
        return arr
    
    pivot = len(arr) // 2
    left = arr[:pivot]
    right = arr[pivot:]

    mergesort(left)
    mergesort(right)

    i = 0
    j = 0
    k = 0
    
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            arr[k] = left[i]
            i = i+1
        else:
            arr[k] = right[j]
            j = j+1
        k = k+1

    while i < len(left):
        arr[k] = left[i]
        i = i+1
        k = k+1

    while j < len(right):
        arr[k] = right[j]
        j = j+1
        k = k+1
        
    return arr

In [2]:
mergesort([1,5,3])

[1, 3, 5]

In [3]:
def partition(arr, pivot_i):
    pivot_v = arr[pivot_i]
    for v_i in range(len(arr)):
        v = arr[v_i]
        if v < pivot_v and v_i > pivot_i:
            if v_i == pivot_i + 1:
                arr[pivot_i], arr[pivot_i + 1] = v, pivot_v
            else:
                arr[pivot_i], arr[pivot_i + 1], arr[v_i] = v, pivot_v, arr[pivot_i + 1]
            pivot_i += 1
        elif v > pivot_v and v_i < pivot_i:
            arr[v_i], arr[pivot_i] = pivot_v, v
            pivot_i = v_i
            
    return [arr[:pivot_i], [arr[pivot_i]], arr[pivot_i + 1:]]

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    
    left, pivot, right = partition(arr, len(arr) // 2)
    
    left = quicksort(left)
    right = quicksort(right)
    
    return left + pivot + right

In [4]:
quicksort([2,5,4,3,7,1])

[1, 2, 3, 4, 5, 7]

Self-assessment: these do seem like something I ought to be able to implement at a whiteboard. The hard part is retaining the ideas.

PS: I already implemented radix sort in JS in another repo.

## Q1

Q: You are given two sorted arrays, A and B, where A has enough space at the end to hold be. Sort B into A.

In [5]:
def sortInto(A, B):
    C = []
    A_i = B_i = 0
    
    while A_i < len(A):
        a, b, c = A[A_i], B[B_i] if B_i < len(B) else None, C[0] if len(C) > 0 else None
        if not a:
            if c:
                A[A_i] = c
                C = C[1:]
            else:
                A[A_i] = b
                B_i += 1
        elif c:
            if c < a:
                C.append(a)
                A[A_i] = c
                C = C[1:]
        elif b:
            if b < a:
                C.append(a)
                A[A_i] = b
                B_i += 1
        A_i += 1
    
    return A

In [6]:
sortInto([1, 2, None, None],[1, 3])

[1, 1, 2, 3]

Self analysis: I missed one of the four cases. Need to sketch out the solution space more vigorously ahead of time.

## Q2

Q: Write a method to sort an array of strings so that all anagrams are next to one another.

### Brute-force solution

In [7]:
from collections import defaultdict

def wordhash(word):
    # Radix sort might be faster, but less readable.
    d = defaultdict(int)
    for c in word:
        d[c] += 1
    return "".join(sorted([c + str(i) for c, i in d.items()]))

def sortAnagrams(arr):
    return sorted([wordhash(word) for word in arr])

In [8]:
sortAnagrams(['bob', 'mark', 'obb', 'amrk'])

['a1k1m1r1', 'a1k1m1r1', 'b2o1', 'b2o1']

`wordhash` is $O(c\log{c})$, where $c$ is the number of unique characters in an average string (the constant term for iterating over the word, which is based on the length of the word, goes away).

`sortAnagrams` is $O(n\log{n} + nc \log{c})$.

Note: I didn't bother preserving the words because I jumped straight to a more optimized solution.

We can make `sortAnagrams` faster by taking advantage of structure in the anagram list, instead of using a naive sorting algorithm.

In [9]:
def sortAnagrams(arr):
    out_map = defaultdict(list)
    
    for i, word in enumerate(arr):
        out_map[wordhash(word)].append(i)
    
    out = []
    for idx_list in out_map.values():
        for idx in idx_list:
            out.append(arr[idx])
    
    return out

In [10]:
sortAnagrams(['bob', 'mark', 'obb', 'amrk'])

['bob', 'obb', 'mark', 'amrk']

This solution reduces the second sort to a constant term $k$, thus reducing the time complexity to $O(nc\log{c})$.

## Q3

Q: Given an array that was rotated in place, search for the index of a specific element in the array.

Find the minimum element, and then perform an adjusted binary search.

In [14]:
def findRotatedMinimum(arr):
    min_val = arr[0]
    idx = 1
    
    while True:
        if arr[idx] < min_val:
            return idx
        else:
            idx += 1


def adjustedBinarySearch(arr, minimum_idx, v):
    address_space_left = list(range(minimum_idx - 1, len(arr)))
    address_space_right = list(range(0, minimum_idx - 1))
    address_space = address_space_left + address_space_right
    
    pos = len(arr) // 2
    while True:
        candidate_v = arr[address_space[pos]]
        if candidate_v == v:
            return address_space[pos]
        elif candidate_v > v:
            pos = pos // 2
        else:
            pos = pos + (len(arr) - pos) // 2
            
    return address_space[pos]


def rotatedBinarySearch(arr, v):
    return adjustedBinarySearch(arr, findRotatedMinimum(arr), v)

In [16]:
rotatedBinarySearch([3, 4, 1, 2], 2)

3

This solution is $O(n + \log{n}) = O(n)$ instead of $O(n\log{n} + \log{n}) = O(n\log{n})$, the latter being what we would get if we did this using naive sorting algorithms.

## Q5

Q: Given a sorted string interspersed with empty strings, write a method to find the location of a given string.

This is kind of like binary search but with the possibility of "missing" the true index of the previous value, forcing us to go find it post-jump. Just the pseudocode for this painful problem:

```
    function searchLocale(arr, pos, fence):
        if arr[pos]:
            return pos, [pos, pos]
            
        left, right = arr[pos - 1], arr[pos + 1]
        
        while True:
            if arr[left]:
                return left, [left, right]
            elif arr[right]:
                return right, [left, right]
            else:
                if left > 0:
                    left -= 1
                else:
                    return -1, [None, None]
                if right < len(arr) - 1:
                    right += 1
                else:
                    return -1, [None, None]


    function sparseFind(arr, target_value):
        pos = len(arr) // 2
        fence = [0, len(arr)]
        
        while True:
            new_v_pos, [left_edge, right_edge] = searchLocale(arr, pos, [0, pos])
            
            if new_v_pos == -1: return -1
            elif arr[new_v_pos] = target_value:
                return pos
            elif arr[new_v_pos] < target_value:
                pos = pos + (len(arr) - pos) // 2
            else:
                pos = pos // 2                
```

The resulting algorithm is $O(\log{n})$ in a good case, and $O(n)$ in a base case. The average performance is $O(c\log{n})$, where $c$ is the average emptiness of the neighborhood around an element of the list.

## Q6

Q: How would you sort a file that doesn't fit in memory?

A: This is an external sort. You split the file into chunks and sort each chunk. Then you take the chunks two at a time, and sort those together.

## Q9

Q: Write an algorithm to efficiently search a matrix which has been sorted in ascending order in both the X and the Y.

### Brute-force solution

In [23]:
def findInSortedMatrix(mat, v):
    def binary_search(v, values):
        pos = len(values) // 2
        
        already_seen_zero = False
        while True:
            if values[pos] == v:
                return pos
            elif values[pos] < v:
                pos = pos + (len(values) - pos) // 2
            else:
                pos = pos // 2
                
            if pos == 0 or pos == len(values) - 1:
                if already_seen_zero == False:
                    already_seen_zero = True
                else:
                    return -1
                
    
    row_idx = 0
    while row_idx < len(mat):
        result = binary_search(v, mat[row_idx])
        print(mat[row_idx], result)
        if result != -1:
            return row_idx, result
        else:
            row_idx += 1
            
    return -1

This is just iterative binary search over either the rows or the columns. The algorithm is $O(n\log{n})$.

In [25]:
findInSortedMatrix([[1,2],[3,4]], 4)

[1, 2] -1
[3, 4] 1


(1, 1)

## Improved algorithm

We can perform a two-dimensional iterative search to find the responsible entry. Basically, we start at a value, then jump either forwards or backwards in either the row or the column (whichever one is both legal and hasn't been tried yet, defaulting to columns arbitrarily).

This is $O(n)$, but also requires $O(n)$ space to store values that have already been seen.

In [35]:
def findInSortedMatrix(mat, v):
    row_idx = col_idx = len(mat) // 2
    seen = set()

    while True:
        curr_v = mat[row_idx][col_idx]
        if curr_v == v:
            return row_idx, col_idx
        elif curr_v > v:
            if f'{row_idx - 1}{col_idx}' in seen or row_idx == 0:
                col_idx -= 1
            else:
                row_idx -= 1
            seen.update({f'{row_idx}{col_idx}'})
        else:
            if f'{row_idx + 1}{col_idx}' in seen or row_idx == 0:
                col_idx += 1
            else:
                row_idx += 1
            seen.update({f'{row_idx}{col_idx}'})

In [36]:
findInSortedMatrix([[1,2],[3,4]], 1)

(0, 0)

### Even more improved algorithm

Instead of doing the search linearly, we can take binary search steps in whichever direction. This lowers our runtime to $O(\log{n})$.

In [37]:
def findInSortedMatrix(mat, v):
    row_idx = col_idx = len(mat) // 2
    seen = set()

    while True:
        curr_v = mat[row_idx][col_idx]
        if curr_v == v:
            return row_idx, col_idx
        elif curr_v > v:
            new_row_idx = row_idx // 2
            if f'{new_row_idx}{col_idx}' in seen or row_idx == 0:
                col_idx = col_idx // 2
            else:
                row_idx = new_row_idx
            seen.update({f'{new_row_idx}{col_idx}'})
        else:
            new_col_idx = col_idx + (len(mat) - col_idx) // 2
            if f'{row_idx}{new_col_idx}' in seen or row_idx == 0:
                col_idx = new_col_idx
            else:
                row_idx = row_idx + (len(mat) - row_idx) // 2
            seen.update({f'{row_idx}{col_idx}'})

In [38]:
findInSortedMatrix([[1,2],[3,4]], 1)

(0, 0)

Notes: this one went very smoothly. Interestingly the book doesn't use this approach at all, instead it performs a quadrangle search. Hmm.

## Q10

Q: *Response ranking* &mdash; Write a routine to look up the rank of a selected value in a stream of values.

A: This is the response ranking problem, which I studied in depth early into my time at RC.

The book recommends using a binary search tree. I could also use a hash map, which is simpler to implement.

## Q11:

Q: Sort an array into alternating peaks and valleys (e.g. single-length monotonic subsequences). For example: `0 5 3 4 1 7`.

In [39]:
def peakValleySort(arr):
    new_arr = arr[:2]
    curr_idx = 1
    curr_offset = 1
    
    if arr[0] > arr[1]:
        curr_desire = "peak"
    else:
        curr_desire = "valley"
    
    while len(new_arr) < len(arr):
        if arr[curr_idx + curr_offset] <= new_arr[curr_idx] and curr_desire == "valley":
            new_arr.append(arr[curr_idx])
            arr = arr[:curr_idx + curr_offset] + arr[curr_offset + 1:]
            curr_idx += 1
            curr_offset = 1
        elif arr[curr_idx + curr_offset] >= new_arr[curr_idx] and curr_desire == "peak":
            new_arr.append(arr[curr_idx])
            curr_idx += 1
            curr_offset = 1
        else:
            curr_offset += 1
    return new_arr

The idea is to "run forward" until we find an element that matches our desired condition, and then splice into the output array, before resetting the search at the next element.

Some version of the code above will work, although it's not run/tested at the moment.

Best-case runtime would be $O(n)$. Worst-case runtime would be $O(n!)$. We can expect $O(mn)$ runtime, where $m$ is a function of the sortedness of the array.