# Exercise 1.

Although merge sort has a better Big-O than selection sort, selection sort can be faster for smaller inputs.

Rewrite `merge_sort(A, min_size)` such that sub-arrays smaller than an input parameter `min_size` are sorted with our `selection_sort` from the lecture `algorithms intro`.

Time the difference between pure merge sort and this new algorithm. Is it faster? Why or why not?

In [None]:
'''
Exercise 1

INTRO:
To make it easier for me,
this exercise has 4 parts.

PART I.
Contains the functions
merge() and merge_sort()

PART II.
Contains the functions
linear_search() and selection_sort()

PART III.
Contains the new function
merge_sort_min_size(A, min_size)

PART IV.
Contains a brief analysis
of speed

'''

In [1]:
'''
PART I.

The merge() function
merges sorted left and right arrays.

The arrays must be ordered first with
merge_sort() function.

These 2 functions will be used below
for sub-arrays in A equal or larger than the min_size parameter
of the new function 
merge_sort_min_size(A, min_size)

I timed the use of merge_sort with
A = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]
The output was:
17.7 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

'''

def merge(left, right):
    # make sure we don't modify input data
    # good code practice to copy
    left = left.copy()
    right = right.copy()
    res = []
    
    #zip thru R and L parts
    while len(left) > 0 and len(right) > 0:
        if left[0] < right[0]:
            res.append(left[0])
            left.pop(0)
        else:
            res.append(right[0])
            right.pop(0)
    
    #cleanup step: add remaining elements (if any)
    for e in left:
        res.append(e)
    for e in right:
        res.append(e)
    return res
            
        
#print(merge([1, 2, 4, 9], [3, 4, 5, 6, 88])) #output should be [1, 2, 3, 4, 5, 6, 9, 88]


'''
This function sorts an array
by dividing n into 2 (left, right)
then merging them with the function merge(),
see above
'''

def merge_sort(A): 
    n = len(A)
    if n > 1:
        m = n // 2
        left = merge_sort(A[:m])
        right = merge_sort(A[m:])
        return merge(left, right) #the merge function is above
    else:
        return A

A = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]
%timeit merge_sort(A)

17.7 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [14]:
'''
PART II.

The linear_search() function
finds the index of the minimum element.

The selection_sort() function
sorts an array.

These 2 functions will be used below
for sub-arrays in A smaller than the min_size parameter
of the new function 
merge_sort_min_size(A, min_size)

I timed the use of selection_sort with
arr = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]
The output was:
12.3 µs ± 830 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
'''

def linear_search(arr):
    # initialize current best to +infinity
    # So any element beats it
    current_min = float('inf')
    current_min_idx = 0
    
    for i in range(len(arr)):
        if arr[i] < current_min:
            current_min = arr[i]
            current_min_idx = i
    return current_min_idx


def selection_sort(arr):
    n_sorted = 0
    
    while n_sorted < len(arr):
    # Get the index of the min of remaining elements
    # Since argsort returns based on array, we correct result
    # with `+ n_sorted`
        min_idx = linear_search(arr[n_sorted:]) + n_sorted
    
    # Swap minimum element with leftmost remaining element
        to_swap = arr[n_sorted]
        arr[n_sorted] = arr[min_idx]
        arr[min_idx] = to_swap
    
    # Increment and restart
        n_sorted += 1

arr = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]
selection_sort(arr)
arr
%timeit selection_sort([33, 1, 55, 2343, -232, 344, 2, 53, -4, 923])

12.3 µs ± 830 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [3]:
'''
PART III.
The new function
merge_sort_min_size(A, min_size)


I timed the use of merge_sort_min_size(A, min_size) with
arr = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]
min_size = 4

The output was:
16.6 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

'''

def merge_sort_min_size(A, min_size):
    n = len(A)
    
    if n >= min_size:
        m = n // 2
        left = merge_sort(A[:m])
        right = merge_sort(A[m:])
        return merge(left, right) #the merge function is above
    
    elif n > 1 and n < min_size:
        return selection_sort(A)
    
    else:
        return A
    
#merge_sort_min_size([33, 1, 55, 2343, -232, 344, 2, 53, -4, 923], 4)
%timeit merge_sort_min_size([33, 1, 55, 2343, -232, 344, 2, 53, -4, 923], 4)

16.6 µs ± 929 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [4]:
'''
PART IV.

For the input:
A = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]

The outputs were:
12.3 µs ± 830 ns per loop for the selection_sort() function

17.7 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 

16.6 µs ± 929 ns per loop for the hybrid merge_sort_min_size() function

It is clear that for an array of len 10,
the selection_sort function was much faster
than the merge_sort option. This supports the idea 
than linear algorithms are faster than divide and conquer
alogorithm for arrays with small lengths.

The hybrid merge_sort_min_size() was
more efficient than merge_sort().
Again, given the small length of the array used,
this makes sense.

Food for thought:
with a better computer, it would be
interesting to run these algorithms
with much larger arrays to test at what point
the hybrid merge_sort_min_size()
is surpassed in efficiency by merge_sort()

'''

'\nPART IV.\n\nFor the input:\nA = [33, 1, 55, 2343, -232, 344, 2, 53, -4, 923]\n\nThe outputs were:\n12.3 µs ± 830 ns per loop for the selection_sort() function\n\n17.7 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) \n\n16.6 µs ± 929 ns per loop for the hybrid merge_sort_min_size() function\n\nIt is clear that for an array of len 10,\nthe selection_sort function was much faster\nthan the merge_sort option. This supports the idea \nthan linear algorithms are faster than divide and conquer\nalogorithm for arrays with small lengths.\n\nThe hybrid merge_sort_min_size() was\nmore efficient than merge_sort().\nAgain, given the small length of the array used,\nthis makes sense.\n\nFood for thought:\nwith a better computer, it would be\ninteresting to run these algorithms\nwith much larger arrays to test at what point\nthe hybrid merge_sort_min_size()\nis surpassed in efficiency by merge_sort()\n\n'

# Exercise 2. 

Let $A[1...n]$ be an array of $n$ distinct numbers. If $i < j$ and $A[i] > A[j]$, then the pair $(i, j)$ is called an **inversion** of $A$. 

In other words an inversion is a pair of unsorted elements in an array.

**1)** List the five inversions of $[2, 3, 8, 6, 1]$ 

**2)** Give an algorithm that determines the number of inversions in any permutation on $n$ elements in $O(nlog_2(n))$ worst-case time. (Hint: Modify merge sort.)

In [None]:
'''Exercise 2.1
the five inversions of [2,3,8,6,1] are:
(8,6)
(8,1)
(6,1)
(3,1)
(2,1)

'''

In [None]:
'''Exercise 2.2
An algorithm that determines
the number of inversions in any permutation
on  𝑛  elements in  𝑂(𝑛𝑙𝑜𝑔2(𝑛)) worst-case time.
Hint: Modify merge_sort.
'''



In [27]:
'''
    See further below for my answer, this part is for my info only
    
    This function does the job,
    but it is Big O(n**2)
    
    It works by incrementing the inv_count
    every time the value at index on the left, aka arr[i]
    is bigger that the value at index on the right, aka arr[j]
    
'''
 
def merge_sort_count1(arr):
    
    n = len(arr)
    inv_count = 0
    for i in range(n):
        for j in range(i + 1, n):
            if (arr[i] > arr[j]):
                inv_count += 1
 
    return inv_count
 
merge_sort_count1([2,3,8,6,1])

5

In [None]:
'''
Resources: 

https://www.khanacademy.org/computing/computer-science/algorithms/merge-sort/a/overview-of-merge-sort
explained using pseudocode + java


https://www.youtube.com/watch?v=owZhw-A0yWE
video explaining interaction of merge_sort and merge

https://medium.com/@ssbothwell/counting-inversions-with-merge-sort-4d9910dc95f0
python code: the use of ai, bi, to be able to use in inversions count formula

'''

In [26]:
def merge_sort_count(arr):
    n = len(arr)
    
    # base case
    if n <= 1:
        return arr, 0
    
    else:
        m = n // 2        
        left = arr[:m]
        right = arr[m:]
        
        #ai keeps track of left inversions
        left, ai = merge_sort_count(left)
        #bi keeps track of right inversions
        right, bi = merge_sort_count(right)
        sorted_array = []
           
        #setting the index pointers at zero
        i = 0
        j = 0
        
        # inversions count formula
        inversions = 0 + ai + bi
        
    while i < len(left) and j < len(right):
        if left[i] <= right[j]:
            sorted_array.append(left[i])
            i += 1
        else:
            sorted_array.append(right[j])
            j += 1
            inversions += (len(left)-i)
    
    sorted_array += left[i:]
    sorted_array += right[j:]
    
    # returns the sorted array, final inversions count
    return sorted_array, inversions

merge_sort_count([2,3,8,6,1])

([1, 2, 3, 6, 8], 5)

# 3. Recursive sum

Write a function that uses recursion to compute the sum of an array or list of numbers

```
recursive_sum([2, 4, 5, 6, 7])

output: 24
```

In [5]:
'''
A recursive funtion 
to sum an array or list of numbers

'''
def recursive_sum(list_num):
    #base case
    if len(list_num) == 1:
        return list_num[0]
    
    else:
        return (list_num[0] + recursive_sum(list_num[1:]))
        

recursive_sum([2, 4, 5, 6, 7]) # output should be 24

24

# 4. Recursive denominators

Write a Python program that uses recursion to find the greatest common divisor (gcd) of two integers.

```
recursive_gcd(12,14)

output : 2
```

In [4]:
'''
A recursive function to find the gcd
of two integers

'''

def recursive_gcd(x, y):
    # base case
    if y == 0:
        return x
    else:
        return recursive_gcd(y, x%y)
    
recursive_gcd(12, 14) # output should be 2

2

# 5. Recursive power function

Write a recursive function to calculate the value of 'a' to the power 'b'. 

```
recursive_pow(3, 4)

output: 81
```

In [2]:
'''
A recursive function to calculate
"a" to the power of positive "b"
'''

def recursive_pow(a, b):
    # base case
    if b == 0:
        return 1
    # base case
    if b == 1:
        return a
    
    else:
        return a * recursive_pow(a, b-1)

recursive_pow(3, 4) # output should be 81

81

# 6. (Stretch) K-Nearest Neighbours

Consider a matrix with the following format:

```
[[0.3, 0.8],
 [-0.2, 0.5],
 [1, -1],
 [0.9, 0.5]
]
```

Each row denotes a point, and the numbers in each row are the coordinates. The coordinates in this example are in 2d, but the matrix could be in 3d (3 numbers per row) or even higher dimensions.

Your task is to write a function `knn(m, p)` or `k_nearest_neighbors(m, p, k)` which takes in a matrix of points `m`, an integer `p` denoting the index of a point in that matrix, and an intger `k` denoting the number of nearest neighbors to return.

The function returns the index of the `k` nearest neighbors of the point `p` in the matrix `m`.

```
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]

knn(dataset, 0, 2)

output : [4, 1]
```

You can use `from sklearn.neighbors import NearestNeighbors` to test your function