Quicksort is an efficient and commonly used sorting algorithm, serving as a method that puts one number of array in correct place for each recursion.
(Image form Wikipedia)
Quicksort is a comparison sort, which indicates that for any kind of dataset that the relation of less or greater than is defined it will work.
From efficiency aspect, quicksort is not a stable algorithm, meaning that the time complexity of quicksort is not same for average and worst case.
Consider quicksort is a kind of in-place algorithm, meaning that during sorting process there are only swap operations and no copy or cover operations, which indicates that only few addition memory will be needed in the process.
Suppose, we have an array of integers A = [1,6,3,2,5,8,7,4]
as input and the supposed output after quicksort is A=[1,2,3,4,5,6,7,8]
.
The main idea of quicksort is recursion. Our mission is to seperate the array into two subarray, less than and great than, then use same algorithm on subarray again.
-
We choose the last number of array as a
key
, in this case,key = 4
, andA[len(A)-1] = key
.Index 0 1 2 3 4 5 6 7 Pointer key Value 1 6 3 2 5 8 7 4 -
We initialize an index
bar = -1
.Index -1 0 1 2 3 4 5 6 7 Pointer bar key Value 1 6 3 2 5 8 7 4 -
For
i
in the range of indices, meaningi++
for each step, do the loop:if the value of
A[i]
is less thankey
,bar++
and then swapA[bar]
andA[i]
.In this instance, for
i=0
,bar=-1
,A[i]=1
is less thankey
, sobar++
,bar=0
, then swapA[bar]
andA[i]
. Becausei==bar
in this case, nothing will change. We have:Index 0 1 2 3 4 5 6 7 Pointer bar key Value 1 6 3 2 5 8 7 4 For
i=1
,bar=0
,A[i]=6
is great thankey
, so do nothing this step. We have:Index 0 1 2 3 4 5 6 7 Pointer bar key Value 1 6 3 2 5 8 7 4 For
i=2
,bar=0
,A[i]=3
is less thankey
, sobar++
,bar=1
, then swapA[bar]
andA[i]
. We have:Index 0 1 2 3 4 5 6 7 Pointer bar key Value 1 3 6 2 5 8 7 4 For
i=3
,bar=1
,A[i]=2
is less thankey
, sobar++
,bar=2
, then swapA[bar]
andA[i]
. We have:Index 0 1 2 3 4 5 6 7 Pointer bar key Value 1 3 2 6 5 8 7 4 For
i=4
,bar=2
,A[i]=5
is great thankey
, so do nothing in this step. Similarly, we do nothing fori=5
andi=6
.Index 0 1 2 3 4 5 6 7 Pointer bar key Value 1 3 2 6 5 8 7 4 -
When the loop reach
key
, which means it reach the last index of array, swapA[bar+1]
andA[key]
.For
i=7
,bar=2
, the loop reach the end of the array, where we set thekey
. So, in this step, we swapA[bar+1]
andA[len(A)-1]
, we have:Index 0 1 2 3 4 5 6 7 Pointer bar bar+1 Value 1 3 2 4 5 8 7 6 By the end of loop, the array have reached a state that the index less than
bar+1
holds smaller number thanA[bar+1]
and index great thanbar+1
holds larger number thanA[bar+1]
. And now we find number 4, theA[bar+1]
and the previouskey
, on the correct position of the sorted array. -
Recursion on subarrays
A[0:bar+1]
andA[bar+2:len(A)]
.In this case,
A[0:bar+1]
is[A[0],A[1],A[2]]
,A[bar+2:len(A)]
is[A[4],A[5],A[6],A[7]]
. Because number 4 is already find the correct place to stay in, so it won't join the recursion.
A = [1,6,3,2,5,8,7,4]
def part(A,s,r):
# s: start point
# r: rear point
key=A[r] # Choose last number as key
bar=s-1 # Set bar as start -1
for i in range(s,r):
if A[i]<=key:
bar+=1
A[bar], A[i] = A[i], A[bar] # Swap
A[bar+1], A[r] = A[r], A[bar+1] # Swar key with bar+1
return bar+1
def qsort(A,s,r):
if s<r:
m=part(A,s,r)
qsort(A,s,m-1)
qsort(A,m+1,r)
qsort(A,0,len(A)-1)
print A
#include "iostream"
using namespace std;
template <class T>
void change(T &a,T &b){
T temp;
temp = a;
a = b;
b = temp;
}
int part(int A[],int s,int r){
// s: start point
// r: rear point
int key=A[r]; // Choose last number as key
int bar=s-1; // Set bar as start -1
for (int i=s;i<r;i++){
if (A[i]<=key){
bar++;
change(A[bar], A[i]); // Swap
}
}
change(A[bar+1], A[r]); // Swar key with bar+1
return bar+1;
}
void qsort(int A[],int s,int r){
if(s<r){
int m=part(A,s,r);
qsort(A,s,m-1);
qsort(A,m+1,r);
}
}
int main(){
int A[]={1,6,3,2,5,8,7,4};
int lenth = sizeof(A)/sizeof(A[0]);
qsort(A,0,lenth-1);
for (int i=0;i<lenth;i++){
cout << A[i];
}
cout << endl;
return 0;
}
Worst case
The worst case is the most unbalanced partition when we divide the list into two sublists of sizes
Best case
The best case is the most balanced case, each recursion we divide the list into two nearly equal subarrays. This means each recursive call processes a list of half the size. It's easy to prove that we only have
\[ f(n) = O(n)\cdot O(\log n)=O(n\log n)\]
Average case
We could use recursion function to solve the average time complexity. See the prove on Wikipedia: Quicksort
By the definition of worst case, we could easily find that the sorted array (positive sequence or inverse) may cause the worst time complexity. To solve this problem and to optimize the algorithm, we could use following approaches:
- Choose key randomly.
- Choose middle index.
- Choose the median of the first, middle and last element
If the number of elements in a subarray is quite small, under a threshold, the recursive method may not efficient anymore. We could switch to some non-recursive algorithm like insertion sort.
Insertion sort is a simple sorting algorithm, could be done in 3 lines of codes, which place one element on its correct order each time. The main idea of insertion sort is to build up the sorted array in-place, and insert the unsorted element into sorted array by inverse search and comparison.
(Image from Wikipedia)
The average and worst time complexity of insertion sort is
Insertion sort, as we mentioned in quicksort, performs well in small dataset so we could use insertion sort to optimize quicksort when the lenth of subarray is less than a threshold.
Suppose, we have an array of integers A = [1,6,3,2,5,8,7,4]
as input and the supposed output after quicksort is A=[1,2,3,4,5,6,7,8]
.
The main idea of insertion sort is to update the sorted array with one new element in unsorted part each iteration.
(Image from Wikipedia)
-
We choose the first number of array as sorted, the main iteration
i
will begin with the second element, which indicates the first element of unsorted array. Sub-iteration variablej
is used for comparison, we will talk it soon.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 6 3 2 5 8 7 4 -
Compare
A[i]
with sorted element one by one by sub-iterationj
until find the correct position, indicating that is larger than left side elememnt and smaller than right side element. Then update the iteration variable, meaning that the elements before iteration variable are sorted.In this case,
A[j+1]=6>A[j]=1
, so 6 should put after 1, the sorted array is[1,6]
now, iteration variablesi=i+1
,j=i-1
.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 6 3 2 5 8 7 4 In this case,
A[j+1]=3<A[j]=6
, so 3 should put before 6. We swapA[j]
withA[j+1]
thenj=j-1
. Update sub-iteration variablej
is to find somewhere to stop, meaning that find a element is less than the element we are going to sort.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 3 6 2 5 8 7 4 In this case,
A[j+1]=3>A[j]=1
, so 3 should put after 1. We find the correct position of 3. Then update the iteration variablesi=i+1
,j=i-1
.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 3 6 2 5 8 7 4 Same as previous step, we find
A[j+1]=2<A[j]=6
, so swap 2 and 6, thenj=j-1
.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 3 2 6 5 8 7 4 Still,
A[j+1]=2<A[j]=3
, swap and updatej
.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 2 3 6 5 8 7 4 We find that
A[j+1]=2<A[j]=1
, so element 2 is in its correct position. Update iteration variablesi=i+1
,j=i-1
.Index 0 1 2 3 4 5 6 7 Pointer j i Value 1 2 3 6 5 8 7 4 Samely, we could replace the rest of array, unsorted array, in their right position.
Index 0 1 2 3 4 5 6 7 Pointer Value 1 2 3 4 5 6 7 8
A = [1,6,3,2,5,8,7,4]
def insert_sort(A):
for i in range(1,len(A)):
for j in range(i,0,-1):
if A[j-1]>A[j]: A[j-1], A[j] = A[j], A[j-1]
return A
insert_sort(A)
print A
#include "iostream"
using namespace std;
void insert_sort(int A[],int lenth){
for (int i=1;i<lenth;i++){
for (int j=i;j>0;j--){
if (A[j-1]>A[j]) swap(A[j-1],A[j]);
}
}
}
int main(){
int A[]={1,6,3,2,5,8,7,4};
int lenth = sizeof(A)/sizeof(A[0]);
insert_sort(A, lenth);
for (int i=0;i<lenth;i++){
cout << A[i];
}
cout << endl;
return 0;
}
Best case
The best case of insertion sort is sorted array. In this case, the whole algorithm process won't swap any elements and for each iteration only one comparison will happened, because sorted array must have A[j+1]>A[j]
. So the time complexity is
Worst case
Inversive sorted array is one of worst case. For all iterations, it needs i
times compar3 and swap operations, and it easily to have the time complexity is
Average case
In average case, we could imagine that the insertion sort need to swap half of elements in the array. So, we could say that the average case is also quadratic, which makes insertion sort impractical for sorting large arrays.