<a href="https://colab.research.google.com/github/bubuloMallone/Algorithms_1/blob/main/1_count_inversion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
# The Dataset is an array of all the integers between 1 and 100,000 in random order. The Testset is just a shorter array for debug.

!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/1_count_inversion/dataset.txt
!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/1_count_inversion/testset.txt


dataset = open('dataset.txt', 'r')
dataset = dataset.read().split('\n')
dataset = [int(x) for x in dataset if x] # Convert to list of integers, handling empty strings

testset = open('testset.txt', 'r')
testset = testset.read().split('\n')
testset = [int(x) for x in testset if x] # Convert to list of integers, handling empty strings

testset[:3]

--2025-09-25 14:05:56--  https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/1_count_inversion/dataset.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 588895 (575K) [text/plain]
Saving to: ‘dataset.txt’


2025-09-25 14:05:56 (22.1 MB/s) - ‘dataset.txt’ saved [588895/588895]

--2025-09-25 14:05:56--  https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/1_count_inversion/testset.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58 [text/plain]
Saving to: ‘tests

[54044, 14108, 79294]

What is an inversion?

Given an array A, an inversion is a pair $(i,j)$ such that $i<j$ and $A[i]>A[j]$.

For example, in list $[2, 4, 1, 3, 5]$, the inversions are $(2,1), (4,1), (4,3),$ for a total of $3$.

Note that in an ordered array the number of inversions is zero, while in the maximum number of inversion can be obtained considering a reverse sorted array. In this case the number of inversions is equal to all the combinations of pairs you can form with the array elements. For a n-elements array is $\binom{n}{2}$.


In [12]:
# GOAL: implement the counting-inversions algorithm by means of a divide-conquer type algorithm. It is based on the merge-sort algorithm.
# Implement the merge-sort algorithm with the slight modification of counting the the inversion at every recursive merge operation.

# define the merge operation of two sorted arrays to be used later as a sub-routine

def merge_counting(left, right):

  i = j = 0
  inv_count = 0
  merged = []

  while i < len(left) and j < len(right):
    if left[i] <= right[j]:
      merged.append(left[i])
      i += 1
    else:
      merged.append(right[j])
      j += 1
      inv_count += len(left) - i

  merged.extend(left[i:])  # slicing is clamped to the list's indexes validity range: in this scenario an out of range slicing would return empty list
  merged.extend(right[j:])
  return merged, inv_count

  # define the recursive divide-conquer routine and use in it the merge sub-routine

def inversions_merge_sort(array):

  if len(array) <= 1:
    return array, 0

  mid = len(array) // 2
  left, inv_left = inversions_merge_sort(array[:mid])
  right, inv_right = inversions_merge_sort(array[mid:])

  merged, inv_split = merge_counting(left, right)

  return merged, inv_left + inv_right + inv_split




In [None]:
# example

# A = [1,2,3,5,4,6]
# b1 = [1,2,3] b2 = [5,4,6]                               -->  b1 = A[:mid]
# c1 = [1] c2 = [2,3] c3 = [5] c4 = [4,6]                 -->  c1 = b1[:mid] = A[:mid][:mid]
# d1 = [1] d2 = [2] d3 = [3] d4 = [5] d5 = [4] d6 = [6]   -->  d2 = A[:mid][mid:][:mid]

In [13]:
# test the algorithm in the special cases

n = 10
ordered_test = [i+1 for i in range(n)]
reversed_test = [i for i in range(n,0, -1)]

_, inv_min = inversions_merge_sort(ordered_test)
_, inv_max = inversions_merge_sort(reversed_test)

min_true = 0
max_true = int(n * (n-1) / 2)

print('ordered: ', min_true, '-->', inv_min)
print('reversed: ', max_true, '-->', inv_max)

ordered:  0 --> 0
reversed:  45 --> 45


In [14]:
# compare the algorithm with the brute force O(n^2) iterative algorithm

def inversions_bf(array):
  n = len(array)
  inv_count = 0

  for i in range(n):
    for j in range(i+1, n):
      if array[j] < array[i]:
        inv_count += 1

  return inv_count


_, inv_count = inversions_merge_sort(testset)
inv_count_bf = inversions_bf(testset)

print(inv_count, inv_count_bf)

28 28


In [20]:
import time

start_time = time.time()
_, inv_count = inversions_merge_sort(dataset[:10000])
end_time = time.time()
execution_time_ms = end_time - start_time

start_time = time.time()
inv_count_bf = inversions_bf(dataset[:10000])
end_time = time.time()
execution_time_bf = end_time - start_time


print(f"The function divide-conquer took {execution_time_ms:.3f} seconds to execute.")
print(f"The function brute-force took {execution_time_bf:.3f} seconds to execute.")
print('inversions: ', '(ms)', inv_count, '(bf)', inv_count_bf)

The function divide-conquer took 0.033 seconds to execute.
The function brute-force took 3.541 seconds to execute.
inversions:  (ms) 24936914 (bf) 24936914


In [21]:
# find the total number of inversions on the dataset array

_, inv_count = inversions_merge_sort(dataset)
print(inv_count)

2407905288
