<a href="https://colab.research.google.com/github/bubuloMallone/Algorithms_1/blob/main/4_selection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Randomized selection algorithm

**Purpose of the Algorithm**

The **selection problem** asks: Given an unsorted array of $n$ elements and an integer $i$ (with $1 \leq i \leq n$ ), find the element that would occupy the $i$-th position if the array were sorted. In other words, the algorithm finds the $i$-th order statistic:

- $i=1 \rightarrow$ minimum element
- $i=n \rightarrow$ maximum element
- $i=\frac{n+1}{2} \rightarrow$ median (if $n$ is odd)
- $i=\frac{n}{2} \rightarrow$ median (if $n$ is even by convention)

A naive approach would be to sort the array, in $O(n \log n)$ time and return the $i$-th element. However, this is unnecessary since we do not need the full sorted order, but only one position.

The **Randomized Selection Algorithm** solves this problem in **expected** linear time, $O(n)$, using a partition-based approach similar to QuickSort, i.e. it uses partitioning around a pivot to recursively reduce the problem size.

**Algorithmic Idea**

1. Choose a pivot element uniformly at random from the array.
2. Partition the array into:
- Elements "less than" the pivot,
- The pivot itself,
- Elements "greater than" the pivot.
3. Determine which side to recurse on:
- Let $k$ be the number of elements in the "less than" partition.
- If $i=k+1$, the pivot is the $i$-th order statistic - return it.
- If $i \leq k$, recursively search the $i$-th smallest element in the "less than" subarray.
- If $i>k+1$, recursively search the ( $i-k-1$ )-th smallest element in the "greater than" subarray.

This process eliminates about half of the array **on average** with each recursive step, leading to an **expected running time** of $O(n)$.


**Complexity Analysis**

Let $T(n)$ denote the expected running time on an array of size $n$.

- The partition step takes $O(n)$ time, since every element is compared to the pivot once.

- The recursive step is performed only on one side of the partition (unlike QuickSort, which recurses on both sides).

The **expected size** of the recursive subproblem is **on average** $n / 2$, since the pivot is random.

Hence, the expected recurrence is

$$
T(n)=T(n / 2)+O(n),
$$


solving this recurrence, according to the Master method, yields

$$
T(n)=O(n).
$$


**Worst-Case Complexity**

In the worst case (e.g., if the pivot is always the smallest or largest element), one side of the partition has size $n-1$ and the other has size zero :

$$
T(n)=T(n-1)+O(n)=O\left(n^2\right)
$$


However, because the pivot is chosen randomly, this happens with extremely low probability, and the expected running time remains linear.

**Space complexity**

The algorithm works in-place, i.e. space complexity $O(1)$.


In [34]:
# Randomized In-Place QuickSelect
import random

# Partition procedure:  partiton the sub-array into <=pivot portion and >pivot portion, then place pivot in correct position.

def partition(arr, left, right):
  # suppose pivot is the last element of arr
  pivot = arr[right]
  s = left - 1   # pointer tracking the boundary of partition <= pivot
  for i in range(left, right):
    if arr[i] <= pivot:
      s += 1
      arr[s], arr[i] = arr[i], arr[s]  # perform swap operation after incrementing boundary

  # after partitioning all array need to put pivot in correct position in the middle of the two partitions
  arr[s+1], arr[right] = arr[right], arr[s+1]  # swap pivot

  # need to return the position of the pivot in order to split array into partitions
  return s+1

# Pivot selection: choose a pivot index uniformly at random, swap it with the last element, then call partition routine.

def random_partition(arr, left, right):
  # choose randomly a pivot in the vector
  pivot_index = random.randint(left, right)
  # put pivot element at the end of the array for consistency
  arr[right], arr[pivot_index] = arr[pivot_index], arr[right]  # swap pivot with last element

  # call array partitioning routine
  return partition(arr, left, right)


# RandomSelect recursive routine: call QuickSort recursively on the partitions of the array, handle base case.

def quickselect(array, i, left = 0, right = None):
  # first call settings
  if right == None:
     right = len(array) - 1

  # print("[left, right]: ", left, right)

  # base case: sub-array of size 1
  if left == right:
    return array[left]

  # recursive call
  pivot_index = random_partition(array, left, right)  # put pivot in correct place: find pivot index define two partitions
  k = pivot_index - left + 1   # rank of pivot in array[left:right]
  # print("pivot: ", pivot_index, "   pivot rank: ", k)
  # apply selection on the correct partition recursively
  if i == k:
        return array[pivot_index]
  elif i < k:
        return quickselect(array, i, left, pivot_index - 1)
  else:
        return quickselect(array, i - k, pivot_index + 1, right)


In [59]:
# Let us first create some arrays to be sorted, taken from the dataset.

!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/pi_digits_10e5.txt
!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/testset1.txt
!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/testset2.txt


dataset = open('pi_digits_10e5.txt', 'r')
dataset = dataset.read().split('.')[1]
pi_digits = []
for i in range(0, len(dataset), 10):
  pi_digits.append(int(dataset[i:i+10]))

testset1 = open('testset1.txt', 'r')
testset1 = testset1.read().split('\n')
testset1 = [int(x) for x in testset1 if x] # Convert to list of integers, handling empty strings

testset2 = open('testset2.txt', 'r')
testset2 = testset2.read().split('\n')
testset2 = [int(x) for x in testset2 if x] # Convert to list of integers, handling empty strings

pi_digits[:10]

--2025-10-11 18:05:24--  https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/pi_digits_10e5.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 100002 (98K) [text/plain]
Saving to: ‘pi_digits_10e5.txt.5’


2025-10-11 18:05:24 (45.6 MB/s) - ‘pi_digits_10e5.txt.5’ saved [100002/100002]

--2025-10-11 18:05:24--  https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/testset1.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 48 [text/plain]
Savi

[1415926535,
 8979323846,
 2643383279,
 5028841971,
 6939937510,
 5820974944,
 5923078164,
 628620899,
 8628034825,
 3421170679]

In [60]:
# Search for the median of an array
# The median can be defined as follows
n = len(testset1)
median_order =  int(n / 2 if n % 2 == 0 else (n+1)/2)

median = quickselect(testset1, median_order)
print("The median value is  ", median)

The median value is   5469


In [61]:
# let us check the result by using the quicksort algorithm and then retrieving the median on the sorted array

# let us define again the quick sort algorithm

def quicksort(array, left = 0, right = None):
  # first call settings
  if right == None:
     right = len(array) - 1
     print("Sorting input array ...")


  # base case: sub-array of size 1 --> do nothing

  # recursive call
  if left < right:
    pivot_index = random_partition(array, left, right)  # put pivot in correct place: find pivot index define two partitions
    # apply quicksort on the two partitions recursively
    quicksort(array, left, pivot_index - 1)
    quicksort(array, pivot_index + 1, right)

In [62]:
array = testset1.copy()
quicksort(array)
median = array[median_order - 1]
print("The median value (quicksort) is  ", median)

Sorting input array ...
The median value (quicksort) is   5469


In [65]:
# Now let us do the same with another bigger array

n = len(testset2)
median_order =  int(n / 2 if n % 2 == 0 else (n+1)/2)
median = quickselect(testset2, median_order)
print("The median value (quickselect) is  ", median)

array = testset2.copy()
quicksort(array)
median = array[median_order - 1]
print("The median value (quicksort) is  ", median)

The median value (quickselect) is   4715
Sorting input array ...
The median value (quicksort) is   4715


In [66]:
# Now let us now consider the numbers formed by taking 10-digits chunks of the first 10e5 digits of pi and look for their median.

n = len(pi_digits)
median_order =  int(n / 2 if n % 2 == 0 else (n+1)/2)

median = quickselect(pi_digits, median_order)
print("The median value (quickselect) is  ", median)

array = pi_digits.copy()
quicksort(array)
median = array[median_order - 1]
print("The median value (quicksort) is  ", median)

The median value (quickselect) is   5021715913
Sorting input array ...
The median value (quicksort) is   5021715913
