# Week 2 - counting inversions

### Instructions

This file contains all of the 100,000 integers between 1 and 100,000 (inclusive) in some order, with no integer repeated.

 Your task is to compute the number of inversions in the file given, where the i-th row of the file indicates the i-th entry of an array.

Because of the large size of this array, you should implement the fast divide-and-conquer algorithm covered in the video lectures.

The numeric answer for the given input file should be typed in the space below.

So if your answer is 1198233847, then just type 1198233847 in the space provided without any space / commas / any other punctuation marks. You can make up to 5 attempts, and we'll use the best one for grading.

(We do not require you to submit your code, so feel free to use any programming language you want --- just type the final numeric answer in the following space.)

[TIP: before submitting, first test the correctness of your program on some small test files or your own devising.  Then post your best test cases to the discussion forums to help your fellow students!]

### Analysis

Remember that the 'brute force' approach has a $\Theta(n^2)$ time-complexity but, we can use a sort-and-count approach to achieve a $\Theta(nlogn)$ time complexity.

The algorithm will have the following structure:

 &emsp;1) Divide the complete list into 2 halves: left and right\
 &emsp;2) Order the  left half and count every time numbers are re-ordered (as this is an inversion)\
 &emsp;3) Order the  right half and count every time numbers are re-ordered (as this is too an inversion)\
 &emsp;4) Count the inversions between numbers in the left and right (by now sorted) sections of the list. We do this by creating a final list that takes numbers in order from the left or right section. Every time we take an element from the right section, we know there's an inversion, since the left section was the 1st in the whole list.

Recall that ordering a list has a time-complexity $\Theta(nlogn)$, where 'n' is the length of the list. Taking elements from either of the lists after ordering them, on the other hand, has a time-complexity of $\Theta(n)$, so the overall complaxity of the algorithm will be $\Theta(nlogn)$

Another concept to remember is that an inversion is a pair of indices (i,j) in a list A, where i<j and A[i]>A[j]. So, for example, list A=[1,3,5,2,4,6] has 3 inversions [3,2] (since i=1, j=3, so i<j, but A[1]=3 > A[3]=2), [5,2], and [5,4].


First, let's create a merge-sort recursive function that takes in a list with 'n' elements and returns them in an ordered list (time-complexity $\Theta(nlogn)$) plus the number of inversions in the list:

In [147]:
def order_list(total_inv_cntr: list, X: list)-> (int, list):
    
    # initialising values
    X_ord = []
    l = len(X)
    left_inv_cntr, right_inv_cntr, merge_inv_count = 0, 0 , 0
    
    # recursive calls
    if l > 1:
        if l == 2 and X[0] > X[1]:
            # base case with 2 elements
            X_ord = [X[1], X[0]]
            total_inv_cntr += 1
        else:
            left_inv_cntr, left_list = order_list(total_inv_cntr, X[: l // 2])
            right_inv_cntr, right_list = order_list(total_inv_cntr, X[l // 2:])
            merge_inv_count, X_ord = merge_ordered_lists(left_list, right_list)    
    else:
        # base case, with only 1 element
        X_ord = X
                
    total_inv_cntr += left_inv_cntr + right_inv_cntr + merge_inv_count

    return total_inv_cntr, X_ord

Notice that in the function above we assumed we had a function that could receive 2 ordered lists and return a unified marged list plus the number of inversions between the 2 lists. 

Here we create that function to merge the 2 lists and return the number if inversions found (time-complexity $\Theta(n)$):

In [142]:
def merge_ordered_lists(X: list, Y: list) -> (int, list):

    """ Constant section """
    #initialising values
    Output_list = []
    n = len(X)  + len(Y)
    i, j, inv_cntr = 0, 0, 0

    """ Loop section """
    # this section populates the output list from lists X and Y maintaining an increasing order
    # Notice that this merging algorithm does away with repeated values
    # To count inversions, we increase the counter every time we take a value from the Y list (before list X is finished)
    for k in range(n):
        if i < len(X) and j < len(Y):

            if X[i] <= Y[j]:
                Output_list.append(X[i])
                if X[i] == Y[j]:
                    j += 1
                    i += 1
                else:
                    i += 1
            elif X[i] > Y[j]:
                Output_list.append(Y[j])
                j += 1
                inv_cntr += len(X) - i # notice that, by taking the difference we only increment the counter when there are values remaining
                #print(Output_list)
        elif j < len(Y):
            Output_list.append(Y[j])
            j += 1
        elif i < len(X):
            Output_list.append(X[i])
            i += 1
            inv_cntr += len(Y) - j # Notice that taking any value from X once Y is over, is an inversion
            #print(Output_list)

    return inv_cntr, Output_list

Now, we need a set of lists to test the functions:

In [157]:
A = [0,1,2,3,4,5] # notice that this lists should return 0 inversions
nbr_inversions, ord_list = order_list(0, A)

print(f"Number of inversions:{nbr_inversions}\nOrdered list: {ord_list}\n")


A = [5, 4, 3, 2, 1] # this list should return 10 inversions
nbr_inversions, ord_list = order_list(0, A)

print(f"Number of inversions:{nbr_inversions}\nOrdered list: {ord_list}\n")

A=[1, 5, 3, 2, 4] # this list should return 4 inversions
nbr_inversions, ord_list = order_list(0, A)

print(f"Number of inversions:{nbr_inversions}\nOrdered list: {ord_list}\n")

A=[2, 4, 1, 6, 3, 8, 5, 10, 7, 9, 12, 11, 14, 15, 13, 17, 16, 19, 18, 20] # this list should return 14 inversions
nbr_inversions, ord_list = order_list(0, A)

print(f"Number of inversions:{nbr_inversions}\nOrdered list: {ord_list}\n")

A = [5, 3, 2, 4, 1, 8, 7, 10, 6, 9, 12, 11, 14, 15, 13, 17, 16, 19, 18, 20] # there are 18 inversions in this list
nbr_inversions, ord_list = order_list(0, A)

print(f"Number of inversions:{nbr_inversions}\nOrdered list: {ord_list}\n")



Number of inversions:0
Ordered list: [0, 1, 2, 3, 4, 5]

Number of inversions:10
Ordered list: [1, 2, 3, 4, 5]

Number of inversions:4
Ordered list: [1, 2, 3, 4, 5]

Number of inversions:14
Ordered list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

Number of inversions:18
Ordered list: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]



Let's now create a function to read from the text file into a list: 

In [159]:
def read_integers_from_file(file_path):
    try:
        with open(file_path, 'r') as file:
            integers_list = [int(line.strip()) for line in file]
        return integers_list
    except FileNotFoundError:
        print(f"File not found: {file_path}")
        return None
    except Exception as e:
        print(f"Error reading file: {e}")
        return None


In [162]:
# Example usage:
file_path = 'Week_2_IntegerArray.txt'  # Replace with the path to your file
list_exercise = read_integers_from_file(file_path)

#if list_exercise is not None:
#    print(f"Integers from file: {list_exercise}")

Finally, let's test the code!!

In [164]:
nbr_inv, ord_list = order_list(0, list_exercise)
print(f"Number of inversions:{nbr_inv}")

Number of inversions:2407905288
