In [16]:
from IPython.display import Latex
%autosave 0

Autosave disabled


## Sorting

* **Sorting** is the process of placing elements from a collection in some kind of order. 
* Examples of sorting:
    * Sorting list of words alphabetically or by length
    * Sorting list of cities by population, by area, or zip codes
* Like searching, the efficiency of a sorting algorithm is related to the number of items being processed. 
* Operations that can be used to **Analyze** a sorting process:
    * First, it will be necessary to compare two values to see which is smaller or larger. In order to sort a collection, some systematic way is needed to compare values to see if they are out of order. **Total number of comparisons** will be the most common way to **measure** a sort procedure.
    * Second, when values are not in correct position with respect to one another, it may be necessary to **exchange them**. This exchange is a **costly operation** and the total number of exchanges will also be important for evaluating the overall efficiency of the algorithm. 

## The Bubble Sort

* The bubble sort makes multiple passes through a list. 
* It **compares adjacent items** and **exchanges** those that are out of order.
* **Each pass** through the list places the **next largest value** in its proper place.
* In essence, each item **"bubbles up"** to the location where it belongs.
* If there are **n** items in the list, then there are **n-1 pairs** of items that need to be compared on the first pass. 
* At the start of the second pass, the largest value is now in place. There are **n-1** items left to sort, meaning that there will be **n-2 pairs** to compare. 
* Since each pass places the next largest value in place, the total number of passes necessary will be **n-1**. 
* After completing the **n-1** plasses, the **smallest item** must be in the correct position with no further processing required.


In [17]:
# SWAP routine in most programming languages, with temporary storage
# temp = alist[i]
# alist[i] = alist[j]
# alist[j] = temp

# SWAP routine in python, using simultaneous assingments
# alist[i],alist[j] = alist[j],alist[i]

In [18]:
# The Bubble Sort
def bubbleSort(alist):
    for passnum in range(len(alist)-1,0,-1):
        for i in range(passnum):
            if alist[i] > alist[i+1]:
                alist[i],alist[i+1] = alist[i+1],alist[i]
                
alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
bubbleSort(alist)
print(alist)
    

[17, 20, 26, 31, 44, 54, 55, 77, 93]


### Analyze Bubble Sort

* Regardless of how the items are arranged in the initial list, **n-1 passes** will be made to sort a list of size **n**.
* The total number of comparisons is the sum of the first **n-1 integers**.
* Sum of the first **n** integers = $$n(n+1)/2 = n^2/2 + n/2$$
* Sum of the first **n-1** integers = $$(n-1)(n)/2 = n^2/2 - n/2$$. Which is $$O(n^2)$$ comparisons.


In [19]:
# The Bubble sort algorithm
def shortBubbleSort(alist):
    exchanges = True
    passnum = len(alist)-1
    while passnum > 0 and exchanges:
        exchanges = False
        for i in range(passnum):
            if alist[i] > alist[i+1]:
                exchanges = True
                alist[i],alist[i+1] = alist[i+1],alist[i]
        passnum = passnum-1

alist = [20,30,40,90,50,60,70,80,100,110]
shortBubbleSort(alist)
print(alist)
    

[20, 30, 40, 50, 60, 70, 80, 90, 100, 110]


## The Selection Sort

* The **selection sort** improves on the bubble sort by making **only one pass** through the list.
* In order to do this, a selection sort looks for the largest value as it makes a pass and, after completing the pass, places it in the proper location.
* Even in selection sort (like bubble sort), after the first pass, the largest item is in the correct place. After the second pass, next largest is in its place. This process continues and requires **(n-1)** passes to sort **n** items. 

In [20]:
# Selection sort algorithm - my version
def selectionSort(alist):
    listLen = len(alist)
    while listLen > 1:
        location = 0
        for i in range(listLen):
            if  alist[i] > alist[location]:
                location = i
        
        if location != listLen-1:
            alist[location],alist[listLen-1] = alist[listLen-1],alist[location]
        
        listLen = listLen-1

alist = [54,26,93,17,77,31,44,55,20]
selectionSort(alist)
print(alist)
        

[17, 20, 26, 31, 44, 54, 55, 77, 93]


In [21]:
# Selection sort algorithm - other version
def altSelectionSort(alist):
    for fillslot in range(len(alist)-1,0,-1):
        location = 0
        for i in range(1,fillslot+1):
            if alist[i] > alist[location]:
                location = i
        
        alist[location],alist[fillslot] = alist[fillslot],alist[location]

alist = [54,26,93,17,77,31,44,55,20]
selectionSort(alist)
print(alist)

[17, 20, 26, 31, 44, 54, 55, 77, 93]


### Analyze Selection Sort
* Selection sort makes the same number of comparisons as the bubble sort and is therefore also $$O(n^2)$$
* However, due to reduction in the number of exchanges, the selection sort typically executes faster in benchmark studies.

## The Insertion Sort

* Insertion sort is of order $$O(n^2)$$, but works slightly better than bubble and selection sort.
* Insertion sort, always maintains a sorted sublist in the lower positions of the list.
* Each new item is then **"inserted"** back into the previous sublist such that the sorted sublist is one item larger.
* Insert sort algorithm:
    * We begin by assuming that a list with one item(position 0) is already sorted. 
    * On each pass, one for each item 1 through n-1, the current item is checked against those in the already sorted sublist. 
    * As we look back into the already sorted sublist, we shift those items that are greater to the right. When we reach a smaller item or the end of the sublist, the current item can be inserted.
* The maximum number of comparisons for an insertion sort is the sum of the first n-1 integers. Again this is $$O(n^2)$$.
* Note: In general, a **shift operation** requires approximately a third of the processing work of an **exchange operation**, since only one assignment is performed. 
* In benchmark studies, insertion sort will show very good performance.

In [22]:
# Insertion Sort algrorithm.
def insertionSort(alist):
    for i in range(1,len(alist)):
        currentvalue = alist[i]
        j = i
        
        while j > 0 and alist[j-1] > currentvalue:
            alist[j] = alist[j-1]
            j = j-1
        
        alist[j] = currentvalue
        
alist = [54,26,93,17,77,31,44,55,20,1]
insertionSort(alist)
print(alist)  

[1, 17, 20, 26, 31, 44, 54, 55, 77, 93]


## The Shell Sort

* The **Shell sort** sometimes called the **diminishing increment sort**, improves on the insertion sort by breaking the original list into a number of smaller subsets, each of which is sorted using an **insertion sort**. 
* The unique way that these sublists are choosen is the key to shell sort.
* The shell sort uses an increment **i**, sometimes called the **gap**, to create a sublist by choosing all items that are **i** items apart.

### Analyze Shell Sort

* The final set includes **incsertion sort** with gap=0. This final insertion sort does not need to do very many comparisons(or shifts), since the list has been pre-sorted by earlier incremental insertion sorts. 
    * In other words, each pass produces a list that is **more sorted** than the previous one. This makes the final pass very efficient. 
* The running time of shell sort falls somewhere between $$O(n)$$ and $$O(n^2)$$.


In [23]:
# Shell Sort Algorithm
def shellSort(alist):
    sublistcount = len(alist)//2
    while sublistcount > 0:
        
        for startposition in range(sublistcount):
            gapInsertionSort(alist,startposition,sublistcount)
        
        print("After increments of size", sublistcount, "The list is", alist)
        
        sublistcount = sublistcount//2
        
def gapInsertionSort(alist,start,gap):
    for i in range(start+gap,len(alist),gap):
        currenvalue = alist[i]
        position = i
        
        while position > 0 and alist[position-gap] > currenvalue:
            alist[position]=alist[position-gap]
            position=position-gap
        
        alist[position]=currenvalue

alist = [54,26,93,17,77,31,44,55,20]
shellSort(alist)
print(alist)

After increments of size 4 The list is [20, 26, 55, 17, 54, 31, 93, 44, 77]
After increments of size 2 The list is [20, 93, 54, 26, 55, 31, 77, 44, 17]
After increments of size 1 The list is [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]


## The Merge Sort

* Merge sort uses **divide and conquer** strategy for sorting items.
* Merge sort is a **recursive algorithm** that continually splits a list in half.
* If the list is empty or has one item, it is sorted (**base case**)
* If the list is more than one item, we **split the list** and **recursively invoke** a merge sort on both halves. 
* Once the two halves are sorted, the fundamental operation, called a **merge** is performed. 
* **Merging** is the process of taking two smaller sorted lists and combining them together into a single sorted list. 

In [24]:
# Merge Sort
def mergeSort(alist):
    print("Splitting ",alist)

    if len(alist) > 1:              # Base Case
        mid = len(alist)//2
        lefthalf = alist[:mid]
        righthalf = alist[mid:]
        
        mergeSort(lefthalf)
        mergeSort(righthalf)
        
        i=0
        j=0
        k=0
        while i < len(lefthalf) and j < len(righthalf):
            if lefthalf[i] < righthalf[j]:
                alist[k] = lefthalf[i]
                i=i+1
            else:
                alist[k] = righthalf[j]
                j=j+1
            k=k+1
        
        while i < len(lefthalf):
            alist[k] = lefthalf[i]
            i=i+1
            k=k+1
            
        while j < len(righthalf):
            alist[k] = righthalf[j]
            j=j+1
            k=k+1
        
    print("Merging ", alist)

alist = [54,26,93,17,77,31,44,55,20]
mergeSort(alist)
print(alist)

Splitting  [54, 26, 93, 17, 77, 31, 44, 55, 20]
Splitting  [54, 26, 93, 17]
Splitting  [54, 26]
Splitting  [54]
Merging  [54]
Splitting  [26]
Merging  [26]
Merging  [26, 54]
Splitting  [93, 17]
Splitting  [93]
Merging  [93]
Splitting  [17]
Merging  [17]
Merging  [17, 93]
Merging  [17, 26, 54, 93]
Splitting  [77, 31, 44, 55, 20]
Splitting  [77, 31]
Splitting  [77]
Merging  [77]
Splitting  [31]
Merging  [31]
Merging  [31, 77]
Splitting  [44, 55, 20]
Splitting  [44]
Merging  [44]
Splitting  [55, 20]
Splitting  [55]
Merging  [55]
Splitting  [20]
Merging  [20]
Merging  [20, 55]
Merging  [20, 44, 55]
Merging  [20, 31, 44, 55, 77]
Merging  [17, 20, 26, 31, 44, 54, 55, 77, 93]
[17, 20, 26, 31, 44, 54, 55, 77, 93]


### Analyze Merge Sort

* Two distinct functions make up the implementation of **merge sort**:
    * First, the list is **split** into halves. We can divide the list in half **log(n)** times, where **n** is the length of the list.
    * Second, process is to **merge** items in the lists. Each item in the list will eventually be processed and placed on the **sorted list**. So the merge operation which results in a list of size **n** requires **n operations**. 
* Thus the result of this analysis is that **log(n) splits**, each of which costs **n** for a total of **nlog(n)**.
* A **merge sort** is an **O(nlog(n)) algorithm**
* <font color='red'>Note: mergeSort function requires extra space to hold the two halves as they are extracted with the slicing operations. This additional space can be a critical factor if the list is large and can make this sort problematic when working on large data sets.</font>

## The Quick Sort

* The **quick sort** uses **divide and conquer** to gain the same advantage as the merge sort, while **not using additional storage**.
    * <font color='red'>Disadvantage: If the list is not divided in half, we will see that the performance is diminished</font>
* A quick sort first selects a value, which is called the **pivot value**.  Although there are many ways to choose the pivot value, we will simply use the **first item in the list**. 
* Role of the pivot value is to assist with spliting the list.
* The actual position where the pivot value belongs in the final sorted list, commonly called the **split point**, will be used to divide the list for subsequent calls to the quick sort. 
* After choosing the pivot value, **partition** process will happen next. Pivot value will help in finding the split point and at the same time move other items to the appropriate side of the list, either less than or greater than the pivot value.
* Partitioning begins by locating **two position markers**. The **leftmark** and **rightmark** and the begining and end of the remaining items (excluding pivot value) in the list.
* **Goal of the partition process** is to move items that are on the wrong side with respect to the pivot value while also, **converging on the split point**. 
    * We begin by incrementing **leftmark** until we locate a value that is greater than the pivot value. 
    * We then decrement **rightmark** until we find a value that is less than the pivot value. 
    * At this point we have discovered two items that are out of place with respect to the eventual split point. 
    * Now we can **exchange** these two items and then repeat the process again.
    * At the point where **rightmark** becomes less than **leftmark**, we stop. The position of the **rightmark is the split point**.
    * The **pivot value** can be **exchanged** with the contents of the **split point** and the pivot value is now in place. 
    * The list can now be divided at the split point and the quick sort can be invoked **recursively** on the two hlaves. 

In [25]:
# The Quick Sort algorithm
def quickSort(alist):
    quickSortHelper(alist,0,len(alist)-1)

def quickSortHelper(alist,first,last):
    if first < last:
        splitpoint = partition(alist,first,last)
        
        quickSortHelper(alist,first,splitpoint-1)
        quickSortHelper(alist,splitpoint+1,last)

def partition(alist, first, last):
    pivotvalue = alist[first]
    leftmark = first+1
    rightmark = last
    
    done = False
    while not done:
        while alist[leftmark] <= pivotvalue and leftmark <= rightmark:
            leftmark = leftmark+1

        while alist[rightmark] >= pivotvalue and rightmark >= leftmark:
            rightmark = rightmark-1

        if rightmark < leftmark:
            done = True
        else:
            alist[leftmark],alist[rightmark]=alist[rightmark],alist[leftmark]
        
    alist[first],alist[rightmark]=alist[rightmark],alist[first]
    
    return rightmark

alist = [54,26,93,17,77,31,44,55,20]
quickSort(alist)
print(alist)
    
    
    

[17, 20, 26, 31, 44, 54, 55, 77, 93]


### Analyze Quick Sort

* For a list of length **n**, if the partition always occurs in the **middle** of the list, there will be **log(n)** divisions. 
* In order to find the **split point**, each of the **n** items needs to be checked against the pivot value. The result is **nlog(n)**.
* **Advantage over merge sort**: no additional memory needed.
* Unfortunately, in the **worst case**, the split point may not be in the middle and can be very skewed to the left or the right, leaving a very uneven division. In this case the result is an **$$O(n^2)$$** sort with all the overhead that recursion requires.
* To alleviate the potential of **uneven division**, we can use a technique call **median of three**. To choose the **pivot value**, we can consider the **first, middle and last** elements in the list. 

## Summary

* A bubble sort, a selection sort, and an insertion sort are $$O(n^2)$$ algorithms.
* A shell sort improves on the insertion sort by sorting incremental sublists. It falls between $$O(n)$$ and $$O(n^2)$$.
* A merge sort is $$O(nlogn)$$, but requires additional space for the merging process.
* A quick sort is $$O(nlogn)$$, but may degrade to $$O(n^2)$$ if the split points are not near the middle of the list. It does not require additional space.