<a href="https://colab.research.google.com/github/bubuloMallone/Algorithms_1/blob/main/4_selection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Randomized selection algorithm

**Purpose of the Algorithm**

The **selection problem** asks: Given an unsorted array of $n$ elements and an integer $i$ (with $1 \leq i \leq n$ ), find the element that would occupy the $i$-th position if the array were sorted. In other words, the algorithm finds the $i$-th order statistic:

- $i=1 \rightarrow$ minimum element
- $i=n \rightarrow$ maximum element
- $i=\frac{n+1}{2} \rightarrow$ median (if $n$ is odd)
- $i=\frac{n}{2} \rightarrow$ median (if $n$ is even by convention)

A naive approach would be to sort the array, in $O(n \log n)$ time and return the $i$-th element. However, this is unnecessary since we do not need the full sorted order, but only one position.

The **Randomized Selection Algorithm** solves this problem in **expected** linear time, $O(n)$, using a partition-based approach similar to QuickSort, i.e. it uses partitioning around a pivot to recursively reduce the problem size.

**Algorithmic Idea**

1. Choose a pivot element uniformly at random from the array.
2. Partition the array into:
- Elements "less than" the pivot,
- The pivot itself,
- Elements "greater than" the pivot.
3. Determine which side to recurse on:
- Let $k$ be the number of elements in the "less than" partition.
- If $i=k+1$, the pivot is the $i$-th order statistic - return it.
- If $i \leq k$, recursively search the $i$-th smallest element in the "less than" subarray.
- If $i>k+1$, recursively search the ( $i-k-1$ )-th smallest element in the "greater than" subarray.

This process eliminates about half of the array **on average** with each recursive step, leading to an **expected running time** of $O(n)$.


**Complexity Analysis**

Let $T(n)$ denote the expected running time on an array of size $n$.

- The partition step takes $O(n)$ time, since every element is compared to the pivot once.

- The recursive step is performed only on one side of the partition (unlike QuickSort, which recurses on both sides).

The **expected size** of the recursive subproblem is **on average** $n / 2$, since the pivot is random.

Hence, the expected recurrence is

$$
T(n)=T(n / 2)+O(n),
$$


solving this recurrence, according to the Master method, yields

$$
T(n)=O(n).
$$


**Worst-Case Complexity**

In the worst case (e.g., if the pivot is always the smallest or largest element), one side of the partition has size $n-1$ and the other has size zero :

$$
T(n)=T(n-1)+O(n)=O\left(n^2\right)
$$


However, because the pivot is chosen randomly, this happens with extremely low probability, and the expected running time remains linear.

**Space complexity**

The algorithm works in-place, i.e. space complexity $O(1)$.


In [None]:
# Let us first create some arrays to be sorted, taken from the dataset.

# !wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/2_quick_sort/dataset.txt
!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/testset1.txt
!wget https://raw.githubusercontent.com/bubuloMallone/Algorithms_1/refs/heads/main/datasets/3_selection/testset2.txt


# dataset = open('dataset.txt', 'r')
# dataset = dataset.read().split('\n')
# dataset = [int(x) for x in dataset if x] # Convert to list of integers, handling empty strings

testset1 = open('testset1.txt', 'r')
testset1 = testset1.read().split('\n')
testset1 = [int(x) for x in testset1 if x] # Convert to list of integers, handling empty strings

testset2 = open('testset2.txt', 'r')
testset2 = testset2.read().split('\n')
testset2 = [int(x) for x in testset2 if x] # Convert to list of integers, handling empty strings

testset1