# Exercise 1.

Although merge sort has a better Big-O than selection sort, selection sort can be faster for smaller inputs.

Rewrite `merge_sort(A, min_size)` such that sub-arrays smaller than an input parameter `min_size` are sorted with our `selection_sort` from the lecture `algorithms intro`.

Time the difference between pure merge sort and this new algorithm. Is it faster? Why or why not?

In [1]:
def merge_sort(A, min_size): 
    if len(A) < min_size:
        return selection_sort(A)
    else:
        size = len(A)
        if size > 1:
            m = size // 2
            left = merge_sort(A[m:], min_size) 
            right = merge_sort(A[:m], min_size)
            return merge(left, right)
        else:
            return A

def pure_merge_sort(A): 
    size = len(A)
    if size > 1:
        m = size // 2
        left = pure_merge_sort(A[m:]) 
        right = pure_merge_sort(A[:m])
        return merge(left, right)
    else:
        return A

def merge(left, right):
    res = []
    # Zip in together left and right parts
    while len(left)>0 and len(right)>0: 
        if left[0]<right[0]: 
            res.append(left[0]) 
            left.pop(0)
        else: 
            res.append(right[0]) 
            right.pop(0)
    # Copy in remaining elements of left and right
    # (if there are any)
    for i in left: 
        res.append(i) 
    for i in right: 
        res.append(i)
    return res

# selection sort with linear search
def linear_search(a):
    min_ = a[0]
    min_index = 0
    for i in range(len(a)):
        if a[i] < min_:
            min_ = a[i]
            min_index = i
    return min_index

def selection_sort(arr):
    a = arr.copy()
    for i in range(len(a)):
        j = linear_search(a[i:])+i
        ai = a[i]
        a[i] = a[j]
        a[j] = ai
    return a

In [15]:
import random

A = random.sample(range(-1000, 1000), 1000) 

%timeit merge_sort(A,30)
%timeit pure_merge_sort(A)

# depending on the value of min_size we can get different results
# but for sorting less than ~100 values selection sort is quicker because
# the worst case (n^2) is unlikely and will be smaller than n*log(n)

7.46 ms ± 209 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.83 ms ± 71.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Exercise 2. 

Let $A[1...n]$ be an array of $n$ distinct numbers. If $i < j$ and $A[i] > A[j]$, then the pair $(i, j)$ is called an **inversion** of $A$. 

In other words an inversion is a pair of unsorted elements in an array.

**1)** List the five inversions of $[2, 3, 8, 6, 1]$ 

**2)** Give an algorithm that determines the number of inversions in any permutation on $n$ elements in $O(nlog_2(n))$ worst-case time. (Hint: Modify merge sort.)

In [3]:
# inversions:
# 1. (0,4) because 2 > 1
# 2. (1,4) because 3 > 1
# 3. (2,4) because 6 > 1
# 4. (3,4) because 8 > 1
# 5. (2,3) because 8 > 6

In [4]:
def merge_sort_inv(A, inv): 
    size = len(A)
    if size > 1:
        m = size // 2
        left = merge_sort_inv(A[:m], inv)[0]
        right = merge_sort_inv(A[m:], inv)[0]
        return merge_inv(left, right)
    else:
        return A, inv

def merge_inv(left, right):
    res = []
    inv=[0]
    # Zip in together left and right parts
    while len(left)>0 and len(right)>0: 
        if left[0]<right[0]:
            res.append(left[0]) 
            left.pop(0)
            inv += [1]
        else:    
            res.append(right[0]) 
            right.pop(0)
            inv += [0]
    # Copy in remaining elements of left and right
    # (if there are any)
    for i in left: 
        res.append(i)
        inv += [1]
    for i in right: 
        res.append(i)
        inv += [0]
    return res, sum(inv)

In [5]:
merge_inv([2, 3, 8, 6, 1], [0])[1]
#merge_inv([2, 3, 8, 6, 1, 9], [0])


5

# 3. Recursive sum

Write a function that uses recursion to compute the sum of an array or list of numbers

```
recursive_sum([2, 4, 5, 6, 7])

output: 24
```

In [6]:
def recursive_sum(n):
    if len(n) == 1:
        return n[0]
    else:
        return n[0] + recursive_sum(n[1:])

In [7]:
recursive_sum([2, 4, 5, 6, 7])

24

# 4. Recursive denominators

Write a Python program that uses recursion to find the greatest common divisor (gcd) of two integers.

```
recursive_gcd(12,14)

output : 2
```

In [8]:
def recursive_gcd(n1,n2):
    if n2 == 0:
        return n1
    else:
        return recursive_gcd(n2, n1 % n2)

In [9]:
recursive_gcd(12,14)

2

# 5. Recursive power function

Write a recursive function to calculate the value of 'a' to the power 'b'. 

```
recursive_pow(3, 4)

output: 81
```

In [10]:
def recursive_pow(n,p):
    if p == 0:
        return 1
    elif p < 0:
        return 1 / (n * recursive_pow(n, -p-1))
    else:
        return n * recursive_pow(n, p-1)

In [11]:
print(recursive_pow(3,4))  # 81
print(recursive_pow(3,-4)) # 0.01234

81
0.012345679012345678


# 6. (Stretch) K-Nearest Neighbours

Consider a matrix with the following format:

```
[[0.3, 0.8],
 [-0.2, 0.5],
 [1, -1],
 [0.9, 0.5]
]
```

Each row denotes a point, and the numbers in each row are the coordinates. The coordinates in this example are in 2d, but the matrix could be in 3d (3 numbers per row) or even higher dimensions.

Your task is to write a function `knn(m, p)` or `k_nearest_neighbors(m, p, k)` which takes in a matrix of points `m`, an integer `p` denoting the index of a point in that matrix, and an intger `k` denoting the number of nearest neighbors to return.

The function returns the index of the `k` nearest neighbors of the point `p` in the matrix `m`.

```
dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]

knn(dataset, 0, 2)

output : [4, 1]
```

You can use `from sklearn.neighbors import NearestNeighbors` to test your function

In [12]:
import numpy as np

dataset = [[2.7810836,2.550537003,0],
	[1.465489372,2.362125076,0],
	[3.396561688,4.400293529,0],
	[1.38807019,1.850220317,0],
	[3.06407232,3.005305973,0],
	[7.627531214,2.759262235,1],
	[5.332441248,2.088626775,1],
	[6.922596716,1.77106367,1],
	[8.675418651,-0.242068655,1],
	[7.673756466,3.508563011,1]]

In [13]:
def knn(m, p, k):
    m = np.array(m)
    p = m[p]
    ed = np.linalg.norm(m - p, axis=1)
    # or 
    #ed = np.sqrt( ((m - p) ** 2).sum(axis=1) )
    #print(ed)
    return np.argsort(ed)[1:k+1]

In [14]:
knn(dataset, 0, 2)

array([4, 1], dtype=int64)