# Divide and Conquer Algorithms

In this notebook, you will be implementing three different divide and conquer algorithms.

__Run this to install the necessary packages.__

In [None]:
#!pip install numpy

In [1]:
from numpy import random
import numpy as np

## Q1: Fixed Points

Given a sorted array of distinct integers $A[0,..., n-1]$, you want to find out whether there is an
index $i$ for which $A[i] = i$. We want to find a divide-and-conquer algorithm that runs in time $O(\log n)$.

#### Naive Solution

We will later use this naive solution to check your implementation.

In [3]:
def fixed_point_naive(array):
    for i in range(len(array)):
        if array[i] == i:
            return True
    return False

This is a naive algorithm that runs in $O(n)$ time. It simply goes through each index of the array and checks whether the element at this index equals the index. Note that this algorithm does not utilize the fact that the array is sorted and that it consists of distinct integers, it just goes through each index, and such algorithm would work for any array, sorted or not sorted. We will use the fact that it is sorted to devise a D&C algorithm which will run much faster.

### 1a. Write a D&C Solution

The fact that the array is sorted and it consists of distinct elements will allow us to use a D&C approach for this problem. Remember how in binary search, we use the fact that the input array is sorted to look at the middle element on each step and based on its value, cut down the size of the input by half.

Let us look at a few examples that might lead us to a D&C solution.

Example 1: $A = [0, 2, 100, 101, 102]$

Does this array have such an index? Yes, we can see that $A[0] = 0$. Looking at the array, is there something you notice about it that will allow you to cut down the size of it by half and recurse on the remaining subarray? Try looking at the middle element, $A[2]$.

Example 2: $A = [-1, -2, 1, 2, 4]$

For this array, the answer is again true, because $A[4] = 4$. Again, is there a part of the array that you would be able to cut down and only focus your attention on the remaining part?

In general, looking at examples is a very useful way to approach a problem. By observing specific examples, you can notice patterns that you can later generalize in your solution.

Now, it's time to write the solution. Since you already got some practice with quick select and with binary search, we offer no skeleton code for this section of the assignment.

In [7]:
def fixed_point(array):
    # Fill in your solution here.
    # Feel free to define any helper functions you might need.
    
    return fixed_point_k(array, 0)

def fixed_point_k(array, k):
    """whether there is an index i for which array[i]=i+k?"""
    if len(array) <= 1:
        return array[0] == k
    mid = len(array) // 2
    if array[mid] > mid + k:
        return fixed_point_k(array[:mid], k)
    else:
        return fixed_point_k(array[mid:], k + mid)

### Verification

Let us check your solution on some generated inputs. You do not have to understand the generating code. If your implementation is correct, a 'success' message will be written.

In [8]:
def generate():
    size = random.randint(100, 1000)
    array = [random.randint(-10000, 10000)]
    for i in range(1, size):
        step = random.randint(1, 100)
        array.append(array[i - 1] + step)
    return array

def generate_true():
    size = random.randint(100, 1000)
    true_index = random.randint(0, size)
    array = [0] * size
    array[true_index] = true_index
    for i in range(true_index - 1, -1, -1):
        step = random.randint(1, 100)
        array[i] = array[i + 1] - step
    for i in range(true_index + 1, size):
        step = random.randint(1, 100)
        array[i] = array[i - 1] + step
    return array

NUM_SAMPLES = 1000

for i in range(NUM_SAMPLES):
    generated = generate()
    assert fixed_point_naive(generated) == fixed_point(generated)
    generated = generate_true()
    assert fixed_point_naive(generated) == fixed_point(generated)
    
print('success')

success


## Q2: Quickselect

In this section, we will implement the quickselect algorithm. The quick select algorithm is an efficient divide and conquer algorithm for finding the $k$-th smallest element of an unsorted array. We will first demonstrate a naive solution for this problem, then implement and compare it with quick select.

The full algorithm is detailed here https://people.eecs.berkeley.edu/~vazirani/algorithms/chap2.pdf#page=10.

#### Naive Solution

The naive solution to the problem is as follows: 1) sort the input array 2) return the k-th element

In [9]:
def naive_select(array, k):
    sorted_array = sorted(array)
    return sorted_array[k]

We can run this on a few test cases to check that it works.

In [10]:
array1 = [6, 1, 3, 5, 7, 5, 8]
array2 = [10, 4, 7, 2, 8, 9]
array3 = [12, 4, 6 ,8 ,3, 4, 2]

print("The smallest element of ", array1, " is ", naive_select(array1, 0))
print("The median element of ", array2, " is ", naive_select(array2, len(array2)//2))
print("The largest element of ", array3, " is ", naive_select(array3, len(array3) - 1))

The smallest element of  [6, 1, 3, 5, 7, 5, 8]  is  1
The median element of  [10, 4, 7, 2, 8, 9]  is  8
The largest element of  [12, 4, 6, 8, 3, 4, 2]  is  12


#### Runtime analysis

This algorithm first sorts the array, which would take $O(n\log n)$ assuming quicksort is used and indexing into the array which takes $O(1)$. Thus, the algorithm takes $O(n\log n)$ overall.

This is not a very efficient solution; however, since it is unnecessary to sort the entire array to simply find one element. Thus, we will next explore quickselect.

### 2a. Write a D&C Solution

Quickselect is a randomized divide and conquer algorithm which is able to solve this problem in expected $O(n)$ time. See https://people.eecs.berkeley.edu/~vazirani/algorithms/chap2.pdf#page=11 for a detailed runtime analysis. The main idea of the algorithm is as follows:

1. Randomly select a pivot element from the array
2. Partion the array into three partitions (the elements less than, equal too, and greater than the pivot)
3. Recurse on the partition which must contain the k-th smallest element

With this in mind, please implement the quickselect algorithm by replacing the elipses "..." with your solution.

In [11]:
import numpy.random as random
def quick_select(array, k):
    
    # randomly pick a pivot
    v = array[0]
    
    
    # create the partitionsWe can then test the function on the same set of arrays as before to check for correctness.
    partition1 = []
    partition2 = []
    partition3 = []
    for elem in array:
        if elem < v:
            partition1.append(elem)
        elif elem == v:
            partition2.append(elem)
        else:
            partition3.append(elem)
    
    
    # recurse on the partition which contains the k-th smallest element
    if len(partition1) > k:
        return quick_select(partition1, k)
    elif len(partition1) + len(partition2) > k:
        return v
    else:
        return quick_select(partition3, k - len(partition1) - len(partition2))
    

We can then test the function on the same set of arrays as before to check for correctness.

In [12]:
array1 = [6, 1, 3, 5, 7, 5, 8]
array2 = [10, 4, 7, 2, 8, 9]
array3 = [12, 4, 6 ,8 ,3, 4, 2]

print("The smallest element of ", array1, " is ", quick_select(array1, 0))
print("The median element of ", array2, " is ", quick_select(array2, len(array2)//2))
print("The largest element of ", array3, " is ", quick_select(array3, len(array3) - 1))

The smallest element of  [6, 1, 3, 5, 7, 5, 8]  is  1
The median element of  [10, 4, 7, 2, 8, 9]  is  8
The largest element of  [12, 4, 6, 8, 3, 4, 2]  is  12


### Verification

For a more thorough test, we can check that quick_select returns the same elements as naive_select for a large number of random arrays. Often times, naive algorithms are much simpler to implement and verify than more efficient algorithms. Thus, one way to verify the correctness of our implementation is to compare it to the naive implementation which we know to be correct. 

The following block of code generates a 1000 random arrays and 1000 random values for k, and checks that both solutions return the same answer each time. If your implementation is correct, the following code will print "success".

In [13]:
for i in range(1000):
    array = random.randint(1000,size = 1000)
    k = random.randint(1000)
    
    assert naive_select(array, k) == quick_select(array,k)
    
print("success")

success


## Q3: Binary Search

Binary search is a well-known search algorithm, first introduced in CS10, CS61A, and various other courses. Nevertheless, it serves as an excellent example of a divide and conquer algorithm.

#### Finding index of a number in distinct sorted array
For the first question, we will implement binary search such that it returns the index of the element if it exists in the array and -1 otherwise. If an element occurs multiple times, return any one of its indicies.

Steps: 
1. Find a pivot
2. Check if the pivot element is what you want, if so return the pivot
3. If not, update the left and right indices based on the pivot

### 3a. Write an iterative binary search
This is how it is typically performed.

In [37]:
def indexOf_iterative(lst, of):
    """
    Implements an iterative version of binary search which returns the index of an element in an array.
    If there are multiple such elements, return the index of any one of them.
    
    args:
    lst: sorted lit of ints
    of: int which the function returns the index of
    """
    
    
    l_index = 0
    r_index = len(lst) - 1
    
    while r_index >= l_index:
        mid = (r_index + l_index) // 2
        if lst[mid] == of:
            return mid
        elif lst[mid] > of:
            r_index = mid - 1
        else:
            l_index = mid + 1
    
    return -1

### 3b. Write a recursive binary search

In [38]:
def indexOf_recursive(lst, of):
    """
    Implements a recursive version of binary search which returns the index of an element in an array.
    If there are multiple such elements, return the index of any one of them.
    
    args:
    lst: sorted lit of ints
    of: int which the function returns the index of
    """
    
    def bin_search(l_index, r_index):
        """
        Helper method with the indices in the arguments
        """
        if l_index > r_index:
            return -1
        mid = (l_index + r_index) // 2
        if lst[mid] == of:
            return mid
        elif lst[mid] < of:
            return bin_search(mid + 1, r_index)
        else:
            return bin_search(l_index, mid - 1)
    
    return bin_search(0, len(lst) - 1)

#### Overflow Error

When working with large arrays, calculating the midpoint, $\frac{l + r}{2}$, can result in an integer overflow error. Typically, this error is avoided when using Python, since Python's int does not overflow and can be arbitrarily large.

However, languages like Java, C, and Rust can encounter this issue, making $\frac{l + r}{2}$ undesirable. `numpy` is written in C, and its primitive types suffer the same problem.

### 3c. Using some algebraic manipulation, find a simple yet elegant expression to compute the same midpoint in a way that avoids overflow errors.

In [39]:
# Create two 8-bit ints using numpy
a: np.int8 = np.int8(116)
b: np.int8 = np.int8(127)

In [40]:
def return_pivot_incorrect(l: np.int8, r: np.int8) -> np.int8:
    return (l + r) // 2

print('The standard implementation overflows, and we get a negative "midpoint":',
      return_pivot_incorrect(a, b))

The standard implementation overflows, and we get a negative "midpoint": -7


  return (l + r) // 2


In [41]:
def return_pivot(l: np.int8, r: np.int8) -> np.int8:
    return l // 2 + r // 2 + (l % 2) * (r % 2)

return_pivot(a, b)
assert return_pivot(a, b) == 121, "Returned wrong value"

#### Verification


##### Common Mistakes

Binary search is also a notoriously buggy algorithm to implement due to the number of edge cases that are often unaccounted for https://stackoverflow.com/questions/504335/what-are-the-pitfalls-in-implementing-binary-search

Here are a few bugs we will check for in your solution:

1. It fails if the array is length 0. This is easy to fix with more careful indexing.
2. The algorithm can fail to return the index of a key which exists in the array. This often happens due to indexing errors where the algorithm ends up on the element thats to the immediate left or right of the key. This can be fixed with careful indexing or an if statement after the main loop if you know the algorithm always round up or down one too many times.
3. The algorithm fails if the key is greater than the largest element or smaller than the smallest element in the array. 

In [42]:
import numpy as np
arr_empty = []
assert indexOf_iterative(arr_empty, 0) == -1
print("success")

success


In [43]:
arr_wrong_index = [np.random.randint(0,100) for i in range(0,100)] + [50 for i in range(10)]
arr_wrong_index = sorted(arr_wrong_index)
assert indexOf_iterative(arr_wrong_index, 50) != -1 and arr_wrong_index[indexOf_iterative(arr_wrong_index, 50)] == 50
print("success")

success


In [44]:
arr_out_of_bounds = [np.random.randint(0,100) for i in range(0,100)]
arr_out_of_bounds = sorted(arr_out_of_bounds)
assert indexOf_iterative(arr_empty, -1) == -1
assert indexOf_iterative(arr_empty, 101) == -1
print("success")

success


In [45]:
arr_empty = []
assert indexOf_recursive(arr_empty, 0) == -1
print("success")

success


In [46]:
arr_wrong_index = [np.random.randint(0,100) for i in range(0,100)] + [50 for i in range(10)]
arr_wrong_index = sorted(arr_wrong_index)
assert indexOf_recursive(arr_wrong_index, 50) != -1 and arr_wrong_index[indexOf_iterative(arr_wrong_index, 50)] == 50
print("success")

success


In [47]:
arr_out_of_bounds = [np.random.randint(0,100) for i in range(0,100)]
arr_out_of_bounds = sorted(arr_out_of_bounds)
assert indexOf_recursive(arr_empty, -1) == -1
assert indexOf_recursive(arr_empty, 101) == -1
print("success")

success


We will now verify the correctness of your algorithms by running the algorithm on 5000 random arrays. If your solution is correct, it should print "success".

In [48]:
for i in range(2500):
    arr = [np.random.randint(0,1000) for i in range(0,1000)]
    arr = sorted(arr)
    key = np.random.randint(0,1000)
    index_it = indexOf_iterative(arr, key)
    index_rec = indexOf_recursive(arr, key)
    if key in arr:
        assert index_it != -1 and arr[index_it] == key
        assert index_rec != -1 and arr[index_rec] == key
    else:
        assert index_it == -1
        assert index_rec == -1

print("success")

success


### 3d. Modify your iterative binary serach so that it returns the index of the first occurance of an element if it doesn't do so already.

Sometimes, not only do you want to find the index of an element, you want to find the lowest index of that element if there are ties.

To do so, rather than immediately returning the pivot if the pivot element is what you're searching for, search on the left half of the array to determine if there is a smaller index for that element. You may have to experiment with indices (i.e. setting the left or right to pivot, pivot + 1, or pivot -1).

__Note: your solution must be $O(\log n)$ you may not call your initial binary search function and simply iterate left__

In [53]:
def lowest_indexOf(lst, of):
    """
    Implements an iterative version of binary search which returns the index of an element in an array.
    If there are multiple such elements, return the lowest index.
    
    args:
    lst: sorted lit of ints
    of: int which the function returns the index of
    """
    l_index = 0
    r_index = len(lst) - 1
    
    while r_index >= l_index:
        if r_index == l_index:
            if lst[l_index] == of:
                return l_index
            else:
                break
        mid = (r_index + l_index) // 2
        if lst[mid] >= of:
            r_index = mid
        else:
            l_index = mid + 1
    
    return -1

We can verify correctness by simply comparing your solution to the `index()` function.

In [54]:
for i in range(2500):
    arr = [np.random.randint(0,1000) for i in range(0,1000)]
    arr = sorted(arr)
    key = np.random.randint(0,1000)
    index = lowest_indexOf(arr, key)
    if key in arr:
        assert index == arr.index(key)
    else:
        assert index == -1

print("success")

success


### Contributors

v1.0 (2022 Fall) Wilson Wu, yxu, Evgenii Sizykh