# Divide and Conquer algorithms
In this notebook we will code and analyse the following three divide and conquer algorithms.

1. Counting the number of inversions
2. Matrix multiplication using Strassen's algorithm
3. Computing the closest points together.

## Counting Inversions
Counting the number of inversions is a measure of how similar (or not similar) are two arrays of numbers. This algorithm is used in recommender systems where the system recommends something (say a movie) to the consumer based on how similar their rating is to other people who rated similarly.

Collaborative Filtering is a technique where we try to find similarities between likes of people. Suppose we want to find similarities of the movie rankings of two people, we will sort movies by ranking of one person's preference and then put ranks given by the second person for those movies.

For example if your ranked 3 movies A, B and C as your favorite movies and your friend ranked their favorite movies as B A C then between you and the your friend the ranking list becomes 2 1 3 because A is your favorite movie but is second favorite for B. Similarly B is your second favorite movie but your friend's favorite. Both you and your friend's choice match for the third best movie.

From the above example we see that if two people have similar tastes, both the arrays would be identical and there would be no inversions. Higher the number of inversions more are the differences in the preferences.

We will use divide and conquer algorithm like Merge sort to find the number of inversions in an array. The number of inversions in an array A are the number of pairs of indices $(i, j)$ where $i < j$ and $A[i] < A[j]$

An array that is sorted has no inversions, the converse is also true, that is, an array with no inversions is sorted and is not sorted if it has atleast one inversion.

For example, consider the following array

1 3 5 2 4 6

The number of inversions in this array are the following pair of numbers (3, 2), (5, 2) and (5, 4)

For an array of n numbers, we can have a maxumum of (n - 1) + (n - 2) + .. + 1 number of inversions = $n(n - 1) / 2$ number of inversions.

Let us implement counting inversion in two different ways. First one is the Bruteforce approach and the second one is using divide and conquer approach that sorts the numbers using Merge Sort along with counting the number of inversions.



### Brute force approach for Counting Inversions

In [19]:
def count_inversions_brute_force(arr):
    count_inv = 0
    for i in range(len(arr)-1):
        for j in range(i+1, len(arr)):
            if arr[i] > arr[j]:
                count_inv +=1
    print(count_inv)

count_inversions_brute_force([1,2,3,4,5,6,7,8,9])
count_inversions_brute_force([1,2,3,6,5,4,7,8,9])
count_inversions_brute_force([6,5,4,3,2,1])
count_inversions_brute_force([1, 3, 5, 2, 4, 6])
count_inversions_brute_force([ 8, 7, 6, 5, 4, 3 ,2, 1 ])

0
3
15
3
28


we will now load the two test case files provided <a href=http://algorithmsilluminated.org/ > here</a> and test our implementation.

In [47]:
import urllib3
http = urllib3.PoolManager()

# Test case
r1 = http.request('GET', "http://algorithmsilluminated.org/datasets/problem3.5test.txt")
problem35teststr = r1.data.split('\r\n')
del problem35teststr[-1]
problem35test = [int(i) for i in problem35teststr]
count_inversions_brute_force(problem35test)

# Challenge data set
###  Brute force approach not work  efficiently on larger problems for counting the inversions.
#r2 = http.request('GET', "http://algorithmsilluminated.org/datasets/problem3.5.txt")
#problem35str = r2.data.split('\r\n')
#del problem35str[-1]
#problem35 = [int(i) for i in problem35str]
#count_inversions_brute_force(problem35)

28


### Divide and Conquer approach for Counting Inversions

The brute force approach seems to be working fine for small inputs, it will however not work for efficiently on the larger 100000 numbers for counting the inversions. We will now implement the inversion counting piggy backed on merge sort.

In [73]:
def sort_and_count_inversions(arr, count_inversions):
    n = len(arr)
    if n == 1:
        return arr, count_inversions

    if n >= 1:
        left_tree, count_inversions = sort_and_count_inversions(arr[ : n/2], count_inversions)   
        right_tree, count_inversions = sort_and_count_inversions(arr[n/2 : ], count_inversions)
        sorted_arr = [None] * n
        i = 0
        j = 0
        for k in range(n):
            if(i < len(left_tree) and j < len(right_tree)):
                if left_tree[i] <= right_tree[j]:
                    sorted_arr[k] = left_tree[i]
                    i+=1 
                elif left_tree[i] > right_tree[j]:
                    sorted_arr[k] = right_tree[j]
                    j+=1 
                    count_inversions += len(left_tree) - i
                
            elif i >= len(left_tree) and j < len(right_tree):
                sorted_arr[k] = right_tree[j]
                j += 1
            elif j >= len(right_tree) and i < len(left_tree):
                sorted_arr[k] = left_tree[i]
                i += 1
    
    return  sorted_arr, count_inversions
    

print(sort_and_count_inversions([4, 3, 2] , count_inversions = 0)[1])    
print(sort_and_count_inversions([4, 3, 2, 10, 12, 1, 5, 6, 24, 33, 23,54, 12, 6 ] , count_inversions = 0)[1])    
print(sort_and_count_inversions([1,2,3,4,5,6,7,8,9] , count_inversions = 0)[1])
print(sort_and_count_inversions([1,2,3,6,5,4,7,8,9] , count_inversions = 0)[1])
print(sort_and_count_inversions([6,5,4,3,2,1] , count_inversions = 0)[1])
print(sort_and_count_inversions([1, 3, 5, 2, 4, 6] , count_inversions = 0)[1])
print(sort_and_count_inversions([ 8, 7, 6, 5, 4, 3 ,2, 1 ] , count_inversions = 0)[1])

3
25
0
3
15
3
28


In [80]:
import urllib3
http = urllib3.PoolManager()

# Test case
r1 = http.request('GET', "http://algorithmsilluminated.org/datasets/problem3.5test.txt")
problem35teststr = r1.data.split('\r\n')
del problem35teststr[-1]
problem35test = [int(i) for i in problem35teststr]
print("Number of splits in test array are  {}".format(sort_and_count_inversions(problem35test , count_inversions = 0)[1]))

# Challenge data set
###**** Divide and Conquer approach works efficiently on larger problems for counting the inversions. ********************
r2 = http.request('GET', "http://algorithmsilluminated.org/datasets/problem3.5.txt")
problem35str = r2.data.split('\r\n')
del problem35str[-1]
problem35 = [int(i) for i in problem35str]
print("Number of splits in the challenge data set are  {}".format(sort_and_count_inversions(problem35 , count_inversions = 0)[1]))

Number of splits in test array are  28
Number of splits in the challenge data set are  2407905288


The divide and conquer method sort_and_count_inversions is trivial. All we do is to count the number of inversions on the left and right and get the corresponding halves sorted. We then find the split inversions and also merge the two sorted arrays. The total inversions are the number of inversions on the left plus the ones on the right plus the number of split inversions.

The ingenuity lies in the count_inversions_and_sort function. This function piggy backs on the merge sort function and counts the number of inversions along with sorting the array. The function defined above is pretty straight forward and has comments inline giving explanation.

If we have no inversions in a array A and we receive two halves of an array, then the elements in first half are strictly less than the elements in second half.

The sort_and_count_inversions splits the input array in two and recursively sorts and counts inversions on the left and right half. The number of tasks at level n doubles than that of level n - 1 and input size given to each of task at level n is half of the input given to a unit on n - 1 level. This is similar to merge sort and given that the routine to sort and count inversions execute in linear time, the count sort_and_count_inversions also runs in $O(nlogn)$ time

***