# CSPB 3104 Assignment 2:

***
# Instructions

This assignment is to be completed as a python3 notebook.  

The questions  provided  below will ask you to either write code or 
write answers in the form of markdown.

 Markdown syntax guide is here: [click here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet)

Using markdown you can typeset formulae using latex.
This way you can write nice readable answers with formulae like thus:

The algorithm runs in time $\Theta\left(n^{2.1\log_2(\log_2( n \log^*(n)))}\right)$, 
where $\log^*(n)$ is the inverse _Ackerman_ function.

__Double click anywhere on this box to find out how your instructor typeset it. Press Shift+Enter to go back.__


***
## Question 1: Setting Up and Solving Recurrences

Consider the python-like pseudocode below

~~~
def div_and_conquer_fun(a):
    # a is an array of size n
    n = length(a)
    if n == 0:
        return 0
    if n == 1: 
        return a[1]
    # 1. Divide into 3 parts
    a1 = a[1 ... n//3]
    a2 = a[n//3+1 ... 2*n//3]
    a3 = a[2*n//3+1 ... n]
    # note // denotes integer division a//b := floor(a/b)
    (b1, b2) = coalesce_arrays_into_two(a1, a2, a3)
    # note b1, b2 are arrays of size n//4 each.
    c1 = div_and_conquer_fun(b1)
    c2 = div_and_conquer_fun(b2)
    return c1 + c2 // Theta (1) time
~~~

1. The algorithm first divides an array of size n into 3 roughly equal parts.
2. Next, it uses the function `coalesce_arrays_into_two(a1, a2, a3)` that runs in $\Theta(n)$ time, returning two arrays `b1` and `b2` of size $\frac{n}{4}$ each.
3. The function is then recursively called on `b1` and `b2`.
4. Finally, the result is summed up and returned.

Write down a recurrence relation for the running time of the divide and conquer function above. Use master method to solve the recurrence: write down which case of the master method and the result.


    def div_and_conquer_fun(a):
        # a is an array of size n
O(1)    n = length(a)
O(1)    if n == 0:
O(1)        return 0
O(1)     if n == 1: 
O(1)        return a[1]
        # 1. Divide into 3 parts
O(1)    a1 = a[1 ... n//3]
O(1)    a2 = a[n//3+1 ... 2*n//3]
O(1)    a3 = a[2*n//3+1 ... n]
        # note // denotes integer division a//b := floor(a/b)
O(n)    (b1, b2) = coalesce_arrays_into_two(a1, a2, a3)
        # note b1, b2 are arrays of size n//4 each.
D1      c1 = div_and_conquer_fun(b1)
D2      c2 = div_and_conquer_fun(b2)
O(1)    return c1 + c2 // Theta (1) time

D1 and D2 are both T(n/4) if T(n) is the recurrance relation of this algorithm.
$$ T(n) = T(\frac{n}{4}) + T(\frac{n}{4}) + O(n) + O(1)*9 $$
$$T(n) = 2*T(\frac{n}{4}) + O(n) $$
$$Log4(2) = .5 $$
$$c  > .5 $$

Thus, since c = 1 and is greater, we use case 3 of the master method.  This means the time complexity is T(n) = O(n)



***
## Question 2(a): Counting Dominances
Suppose you are given two sorted arrays $a$ and $b$ of the sizes $m$ and $n$, respectively. A "dominance" of $a$ over $b$ is a pair of indices $(i,j)$ such that $a[i] > b[j]$.  Note that $i$ is an index of array $a$ and $j$ must be an index of array $b$.


Write a __brute force__ algorithm that counts the number of dominances of $a$ over $b$ that runs in $\Theta(n^2)$ time.

In [227]:
#Answer 2(a):
def count_dominances_brute_force(a, b):
    count = 0

    for i in range(len(a)) :
        for j in range(len(b)) :
            if (a[i] > b[j]) :
                count+=1
    
    return count
    

## Question 2(b): Counting Dominances
However, the brute force algorithm is suboptimal. Design a $\Theta(n)$ algorithm to count the number of dominances. Do this by modifying the merge algorithm we studied as part of merge sort. Instead of merging the two sorted arrays, count the number of dominances.

In [228]:
import math

#Answer 2(b):
def count_dominances(a, b):
    count = 0
    i = 0
    n = len(a)
    j = 0
    m = len(b)

    if m == 0 or n == 0 :
        return 0

    while i < n and j < m :
        if a[i] > b[j] :
            count+=1+j
            j+=1
        else : 
            i+=1

    # j = m but there's still items in a
    while i < n :
        i+=1
        j = m-1
        if i < n :
            if (a[i] > b[j]) :
                count+=1+j
    
    
    return count

## Question 2(c): Counting Dominances

I actually spent a lot of time on this but find it a challenge at times to debug in jupyter notebooks.
My logic was to get all the dominances by one by one incrementing the indices of a and b.  If we found an item in a that is larger than the one we are on for b, then we add to the count, also adding any dominances that could have been missed in the process.  The "pointer" for b gets incremented so you can examine the next item.  If a[i] < b[j] then you increment i so that a larger value in a can be found. After you have terminated the first loop, any remaining dominances for the unfinished array are captured in the remaining loop.  We don't need a second loop for if i = n but j < m because both arrays are in increasing order.  If b[j] is greater than or equal to a[i] then none of the remaining elements of b will be smaller and provide dominances. 
The biggest challenge I had was in adding the missed dominances correctly. All the smaller values in b are also dominances so that is why I chose j+1 as the size of all the dominances from that jth element to the 0th.  I am sure I am overadding at times but after spending a lot of time trying to get the count correct, I am unsure where the logic error is.

***
## Question 3(a): Finding a Fixed Point. 
A fixed point of an array $A$, if it exists, is an index $i$ such that $A[i] = i$.
Given a _sorted_ array $A$ of _distinct_ __integers__, return the index of the fixed point if one exists, or otherwise, return `-1` to signal that no fixed point exists. Your algorithm must be as efficient as possible.

In [229]:
def find_fixed_helper(a, start, stop) :

    mid = start + (stop - start)//2 # floor of the length of passed array portion

    #case 1 - Base case - mid point is fixed
    if (a[mid] == mid) :
        fixed_point = mid
    #case 2 - Base case - mid point isn't fixed AND there's only one or no elements
    elif (stop - start < 2) :
        fixed_point = -1
    #case 3 - mid element is less than, don't call recursion on left half
    #               but recurse on the right half
    elif (a[mid] < mid) :
        fixed_point = find_fixed_helper(a, mid, stop)
    #case 4 - mid element is greater than, don't call recursion on right half
    #               but recurse the left half
    else :
        fixed_point = find_fixed_helper(a, start, mid)
    
    return fixed_point

In [230]:
#Answer 3(a)
def find_fixed_point(a):

    fixed_index = find_fixed_helper(a, 0, len(a) - 1)

    return fixed_index

In [231]:
# #My original version I didn't want to delete completely
# def find_fixed_point2(a):
#     i = len(a) - 1 # our array index

#     while i >= 0 :
#         if a[i] == i :
#             return i
#         elif a[i] < i : 
#             # if there's elements < their index, then 
#             # no matches exist in lower indices
#             break
#         else :
#             i-=1

#     return -1

## Question 3(b): 


I used a divide and conquer technique similar to merge sort to divide the array progressively into smaller and smaller pieces and then performed the operation of examining a single element and returning a count if it is a match, or -1 if it isn't. 

The worst case complexity occurs when the array needs to be divided up log(n) times.  This algorithm is only the divide part of the mergesort algorithm and doesn't involve the sorting part, making it only log(n) and not n*log(n).

The best case occurs when the first element is a fixed point allowing the algorithm to only examine one element, giving it a runtime of omega(1).

## Question 3(c): Finding a Fixed Point Again. 

Given a _sorted_ array $A$ of _distinct_ __natural numbers__, return the index of the fixed point if one exists, or otherwise, return `-1` to signal that no fixed point exists. Your algorithm must be as efficient as possible.

In [232]:
#Answer 3(c)
def find_fixed_point_natural(a):
    # By nature of natural numbers, if the first
    # element is not zero, none of the others
    # will match because they are only sorted whole numbers and are distinct
    if (a[0] == 0) :
        return 0
    else :
        return -1
        

## Question 3(d)

Because only natural numbers are in the domain for this algorithm and all values must be unique, if the first one is not a[0] = 0, then none of the others, which will all be at least one larger than the previous value, can possibly be matched sets since the first one will be larger than its index.  Once I realized this, the algorithm was easy to create minus the debugging I spent returning false instead of -1.

The runtime complexity of this is O(1) because it involves only a simple if statement and not any loops or repeating actions.  This is both the best and worst case complexity.

## Testing your solutions -- Do not edit code beyond this point

In [233]:
# This code runs 5 test cases on your two algorithms
def test_count_dominances(func):
    a1 = [ 5, 7, 10]
    b1 = [ 1, 2,  3] 
    n1 = 9

    a2 = [ 6, 10, 15, 21]
    b2 = [ 4, 19, 25, 32]
    n2 = 5
    
    
    a3 = [ 6, 10, 15, 21]
    b3 = []
    n3 = 0
    
    a4 = [ 1, 3, 5, 7, 9, 11, 13]
    b4 = [ 2,  4, 6, 8, 10]
    n4 = 20
    
    a5 = [1, 3, 5, 6, 7, 9, 11, 13]
    b5 = [2, 4, 6, 6, 6, 8, 10]
    n5 = 30
    
    problems = [(a1, b1, n1), (a2, b2, n2), (a3, b3, n3), (a4, b4, n4), (a5, b5, n5)]
    num_passed = 0
    for (a, b, n) in problems:
        res = func(a, b)
        if res == n:
            num_passed = num_passed + 1
        else: 
            print('FAILED: a = ', a, 'b = ', b, ' expected = ', n, 'your code = ', res)
    print('--- Done ---')
    print ('Num tests = ', len(problems))
    print ('Num passed = ', num_passed)

In [234]:
print('Testing brute force:')
test_count_dominances(count_dominances_brute_force)

Testing brute force:
--- Done ---
Num tests =  5
Num passed =  5


In [235]:
print('Testing modified merge algorithm:')
test_count_dominances(count_dominances)

Testing modified merge algorithm:
FAILED: a =  [5, 7, 10] b =  [1, 2, 3]  expected =  9 your code =  12
FAILED: a =  [6, 10, 15, 21] b =  [4, 19, 25, 32]  expected =  5 your code =  3
FAILED: a =  [1, 3, 5, 6, 7, 9, 11, 13] b =  [2, 4, 6, 6, 6, 8, 10]  expected =  30 your code =  35
--- Done ---
Num tests =  5
Num passed =  2


In [236]:
from random import sample
def compare_brute_force_vs_fast():
    a = sorted( sample (range(60), 20) )
    b = sorted( sample (range(60), 20) )
    n1 = count_dominances_brute_force(a, b)
    n2 = count_dominances(a, b)
    if n1 != n2:
        print('Disparity observed between two algorithms:', a, b, n1, n2)
        return False
    return True
    
print('Comparing the two implementations.')
num_passed = 0
total = 100
for i in range(total):
    if compare_brute_force_vs_fast():
        num_passed = num_passed + 1
print(' -- all tests done -- ')
print(' passed = ', num_passed, ' out of ', total)

Comparing the two implementations.
Disparity observed between two algorithms: [2, 3, 4, 5, 8, 9, 11, 12, 14, 19, 22, 26, 30, 32, 38, 42, 43, 53, 57, 58] [0, 1, 4, 5, 8, 10, 12, 13, 14, 15, 21, 22, 23, 24, 26, 27, 36, 43, 50, 51] 216 250
Disparity observed between two algorithms: [1, 3, 9, 12, 13, 15, 18, 19, 21, 25, 28, 29, 34, 36, 40, 43, 44, 50, 56, 57] [4, 5, 7, 10, 11, 13, 15, 16, 19, 22, 24, 26, 27, 33, 36, 38, 42, 45, 47, 53] 218 230
Disparity observed between two algorithms: [6, 7, 9, 12, 13, 14, 15, 19, 22, 25, 29, 34, 38, 40, 46, 48, 51, 52, 55, 58] [2, 7, 8, 9, 11, 14, 16, 21, 25, 26, 30, 31, 32, 34, 39, 46, 50, 51, 55, 56] 205 210
Disparity observed between two algorithms: [2, 12, 13, 15, 17, 21, 22, 25, 27, 28, 31, 33, 40, 42, 43, 44, 49, 52, 58, 59] [1, 5, 6, 11, 15, 20, 23, 24, 26, 29, 32, 36, 42, 49, 51, 53, 54, 56, 58, 59] 192 190
Disparity observed between two algorithms: [0, 2, 6, 8, 9, 10, 11, 12, 13, 14, 18, 24, 27, 38, 43, 48, 51, 52, 55, 59] [4, 7, 16, 17, 19, 20,

In [237]:
print(find_fixed_point([-10, -5, -2, 2, 3, 5, 7, 10, 15, 25, 35, 78, 129]))

5


In [238]:
def find_fixed_point_very_naive(a):
    n = len(a)
    for i in range(0, n):
        if a[i] == i:
            return i
    return -1

In [239]:
def test_find_fixed_point_code(n_tests, test_size):
    n_passed = 0
    for i in range(0, n_tests):
        a = sorted( sample( range(-10 * n_tests,  10 * n_tests ), test_size))
        j = find_fixed_point(a)
        if j >= 0 and a[j] != j:
            print(' Code failed for input: ', a, 'returned : ', j, 'expected:', find_fixed_point_very_naive(a))
        elif j < 0: 
            assert j == -1, 'Your code returns an illegal negative number: have you implemented it yet?'
            k = find_fixed_point_very_naive(a)
            if k >= 0:
                print('Code failed for input', a)
                print('Your code failed to find a fixed point')
                print('However, for j = ', k, 'a[j] =', a[k])
            else: 
                n_passed = n_passed + 1
        else: 
            n_passed = n_passed + 1
    return n_passed

n_tests = 10000
n_passed = test_find_fixed_point_code(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000


In [240]:
print('Test: expected answer = 5, your answer = ', find_fixed_point([-10, -5, -2, 2, 3, 5, 7, 10, 15, 25, 35, 78, 129])) 

Test: expected answer = 5, your answer =  5


In [241]:
def test_find_fixed_point_natural_code(n_tests, test_size):
    n_passed = 0
    for i in range(0, n_tests):
        a = sorted( sample( range(0,  10 * n_tests ), test_size))
        j = find_fixed_point_natural(a)
        if j >= 0 and a[j] != j:
            print(' Code failed for input: ', a, 'returned : ', j, 'expected:', find_fixed_point_very_naive(a))
        elif j < 0: 
            assert j == -1, 'Your code returns an illegal negative number: have you implemented it yet?'
            k = find_fixed_point_very_naive(a)
            if k >= 0:
                print('Code failed for input', a)
                print('Your code failed to find a fixed point')
                print('However, for j = ', k, 'a[j] =', a[k])
            else: 
                n_passed = n_passed + 1
        else: 
            n_passed = n_passed + 1
    return n_passed

n_tests = 10000
n_passed = test_find_fixed_point_natural_code(10000, 10)
print(' num tests  = ', n_tests)
print(' num passed = ', n_passed)

 num tests  =  10000
 num passed =  10000


In [242]:
print('Test: expected answer = 0, your answer = ', find_fixed_point_natural([0,1, 2, 3, 5, 7, 10, 15, 25, 35, 78, 129])) 

Test: expected answer = 0, your answer =  0
