# Bubble Sort, Merge Sort, Quicksort and Divide-n-Conquer Algorithms in Python

Let's check out our problem from lesson 3 of the Jovian course:

## Problem 


In this notebook, we'll focus on solving the following problem:

> **QUESTION 1**: You're working on a new feature on Jovian called "Top Notebooks of the Week". Write a function to sort a list of notebooks in decreasing order of likes. Keep in mind that up to millions of notebooks  can be created every week, so your function needs to be as efficient as possible.


The problem of sorting a list of objects comes up over and over in computer science and software development, and it's important to understand common approaches for sorting, and the trade-offs they offer. Before we solve the above problem, we'll solve a simplified version of the problem:

> **QUESTION 2**: Write a program to sort a list of numbers.


"Sorting" usually refers to "sorting in ascending order", unless specified otherwise.

Remember our method of attack:

1. State the problem clearly, in your own words. Identify the input and output.
2. Come up with some examples for test cases using the inputs and outputs. Cover your edge cases.
3. Find a simple solution to the problem in English (or whatever language you prefer).
4. Impliment your solution and test it with the example inputs. Squash your bugs.
5. Analyze your algorithms' time and space complexity. 
6. Optimize your code for any inefficiencies.

Sorting solutions are essential to solving common problems in computer science.

Generally, we'll have inputs and outputs like these:

Input
nums: A list of numbers e.g. [4, 2, 6, 3, 4, 6, 2, 1]
Output
sorted_nums: The sorted version of nums e.g. [1, 2, 2, 3, 4, 4, 6, 6]

Here are some cases that we need to test:

1. Lists of nummbers in random orders.
2. Lists that have been sorted.
3. Lists sorted in descending order.
4. Lists with repeating elements.
5. Empty lists.
6. Lists with one element.
7. Lists with one element that is repeated many times.
8. Long lists.

In [64]:
def sort(nums):
    pass

In [65]:
#Random list with no repeated elements
test0 = {
    'input': {
        'nums': [4, 6, 3, 8, 5, 7, 2, 1]
    },
    'output': [1, 2, 3, 4, 5, 6, 7, 8]
}

In [66]:
# Random list with negative elements
test1 = {
    'input': {
        'nums': [5, 2, 6, 1, 23, 7, -12, 12, -243, 0]
    },
    'output': [-243, -12, 0, 1, 2, 5, 6, 7, 12, 23]
}

In [67]:
# Sorted list
test2 = {
    'input': {
        'nums': [3, 5, 6, 8, 9, 10, 99]
    },
    'output': [3, 5, 6, 8, 9, 10, 99]
}

In [68]:
# Descending order lists
test3 = {
    'input': {
        'nums': [99, 10, 9, 8, 6, 5, 3]
    },
    'output': [3, 5, 6, 8, 9, 10, 99]
}

In [69]:
# Random list with repeating elements
test4 = {
    'input': {
        'nums': [5, -12, 2, 6, 1, 23, 7, 7, -12, 6, 12, 1, -243, 1, 0]
    },
    'output': [-243, -12, -12, 0, 1, 1, 1, 2, 5, 6, 6, 7, 7, 12, 23]
}

In [70]:
#Empty list
test5 = {
    'input': {
        'nums': []
    },
    'output' : []
}

In [71]:
#List with one element
test6 = {
    'input': {
        'nums': [23]
    },
    'output': [23]
}

In [72]:
#List with one element and many repeats
test7 = {
    'input': {
        'nums' : [23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23]
    },
    'output': [23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23]
}


In [73]:
#Really long lists
#Use helper methods from random like here, so you don't have to do this manually:
import random

in_list = list(range(10000))
out_list = list(range(10000))

random.shuffle(in_list)

test8 = {
    'input':{
        'nums': in_list
    },
    'output': out_list

}

In [74]:
tests = [test0, test1, test2, test3, test4, test5, test6, test7, test8]

First, we should come up with a simple solution in English.

1. Iterate over a given list.
2. Compare one number with the next.
3. Swap the first number with the second if the value is greater.
4. Repeat steps 1-3 until the list is sorted.

Time complexity: we will repeat the first three steps up to n-1 times, because largest number in the list will become the final element at worst when iterating through every other element.

Bubble sort: this method is called bubble sort, because the larger elements bubble to the top while the others sink to the bottom.

See the following for a visual representation:

![](https://upload.wikimedia.org/wikipedia/commons/c/c8/Bubble-sort-example-300px.gif)

Next, we will impliment a solution.

In [75]:
def bubble_sort(nums):
    
    #copy list
    nums = list(nums)

    #iterate n-1 times
    for _ in range(len(nums) - 1):

        #for each element in the array except the last
        for i in range(len(nums) - 1):

            #compare one number to the next number:
            if nums[i] > nums[i + 1]:

                #Swap the numbers. (We can do both at once, because Python is awesome.)
                nums[i], nums[i + 1] = nums[i + 1], nums[i]

    #return the sorted list
    return nums

Testing, attention please:

In [76]:
nums0, output0 = test0['input']['nums'], test0['output']

print('Input:', nums0)
print('Expected output:', output0)
result0 = bubble_sort(nums0)
print('Actual output:', result0)
print('Match:', result0 == output0)

Input: [4, 6, 3, 8, 5, 7, 2, 1]
Expected output: [1, 2, 3, 4, 5, 6, 7, 8]
Actual output: [1, 2, 3, 4, 5, 6, 7, 8]
Match: True


Let's think about time complexity.

We have two loops in bubble sort. 

Each of them iterate through our list up to n-1 times. 

Therefore, the worst case scenario would result in (n-1)*(n-1) iterations through our list.

(n-1)*(n-1) = n^2 - 2n + 1

Remember, we drop our constants and are left with an exponential space complexity where the highest power is n^2, or O(N^2). This is also called quadratic complexity.

What about space complexity?

Although our list only requires a constant space, we do need to initially consider the numbers in our list. Therefore, the space complexity depends on the amount of nums and results in O(N) space complexity.

Where is the inefficiency?

Large lists require an exponential amount of time compared to more efficient algorithms, because only two elements are being compared and shifted one position at a time. 

Let's look at another algorithm of the same complexity, and improve our efficiency after that.



In [77]:
def insertion_sort(nums):
    nums = list(nums)
    for i in range(len(nums)):
        cur = nums.pop(i)
        j = i-1
        while j >=0 and nums[j] > cur:
            j -= 1
        nums.insert(j+1, cur)
    return nums            

In [78]:
nums0, output0 = test0['input']['nums'], test0 ['output']

print('input', nums0)
print('expected output', output0)
result0 = insertion_sort(nums0)
print('Actual output', result0)
print('Do we have a match?:', result0 == output0 )

input [4, 6, 3, 8, 5, 7, 2, 1]
expected output [1, 2, 3, 4, 5, 6, 7, 8]
Actual output [1, 2, 3, 4, 5, 6, 7, 8]
Do we have a match?: True


Let's make these algorithms more efficient by dividing and conquering.

1. Divide the inputs into two roughly equal parts.
2. Recursively solve the two problems individually for each of the parts.
3. Combine the results to solve the problem for the original inputs.
4. Include terminating conditions for small or indivisible inputs.

Here's a visual representation of the strategy:

![](https://www.educative.io/api/edpresso/shot/5327356208087040/image/6475288173084672)

This strategy is known as

### Merge Sort

Here's an example:


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Merge_sort_algorithm_diagram.svg/2560px-Merge_sort_algorithm_diagram.svg.png" width="480">



Roughly, the idea follows these steps:

1. If the input is empty or has one element, it is sorted. Return it.
2. If not sorted, divide the list of numbers into two equal parts.
3. Sort each part recursively using the Merge sort algorithm.
4. Merge the two sorted list into one bigger list.

Here's a visual representation: https://youtu.be/GW0USDwhBgo?t=28


In [79]:
#Here's how to run the video in our notebook:

from IPython.display import Audio,Image, YouTubeVideo

YouTubeVideo('GW0USDwhBgo&t=28s',width=600,height=300)

In [80]:
#Impliment Merge Sort like this

def merge_sort(nums):
    if len(nums)<= 1:
        return nums

    #Get the midpoint
    mid = len(nums)//2

    #Split the list in half
    left = nums[:mid]
    right = nums[mid:]

    #Recursively solve each half
    left_sorted, right_sorted = merge_sort(left), merge_sort(right)

    #Combine each half
    sorted_nums = merge(left_sorted, right_sorted)

    return sorted_nums


Our code will not work until we define the merge function in merge_sort. Let's do that after we visualize the merge operation:

<img src="https://i.imgur.com/XeEpa0U.png" width="480">



In [81]:
def merge(nums1, nums2):
    #list to sort the two halves
    merged= []

    #create two indices for iteration
    i, j = 0, 0

    #Loop
    while i < len(nums1) and j < len(nums2):

        #include the smaller element in the list and move to the next element

        if nums1[i] <= nums2[j]:
            merged.append(nums1[i])
            i+=1
        else:
            merged.append(nums2[j])
            j+=1
    nums1_tail = nums1[i:]
    nums2_tail = nums2[j:]

    #return the merged array
    return merged + nums1_tail + nums2_tail

"Testing, attention please" - Eminem, probably

In [82]:
merge([1, 4, 7, 9, 11], [-1, 0, 2, 3, 8, 12])

[-1, 0, 1, 2, 3, 4, 7, 8, 9, 11, 12]

Yay, it works. Now let's test the merge_sort function

In [83]:
nums0, output0 = test0['input']['nums'], test0['output']

print('Input:', nums0)
print('Expected output:', output0)
result0 = merge_sort(nums0)
print('Actual output:', result0)
print('Match:', result0 == output0)

Input: [4, 6, 3, 8, 5, 7, 2, 1]
Expected output: [1, 2, 3, 4, 5, 6, 7, 8]
Actual output: [1, 2, 3, 4, 5, 6, 7, 8]
Match: True


Tricky analysis of recursive Merge sort.

Track the chain of depth in a recursive call, and add print statements to see the calls to merge sort. 

In [84]:
def merge(nums1, nums2, depth=0):
    print('  '*depth, 'merge:', nums1, nums2)
    i, j, merged = 0, 0, []
    while i < len(nums1) and j < len(nums2):
        if nums1[i] <= nums2[j]:
            merged.append(nums1[i])
            i += 1
        else:
            merged.append(nums2[j])
            j += 1
    return merged + nums1[i:] + nums2[j:]
        
def merge_sort(nums, depth=0):
    print('  '*depth, 'merge_sort:', nums)
    if len(nums) < 2: 
        return nums
    mid = len(nums) // 2
    return merge(merge_sort(nums[:mid], depth+1), 
                 merge_sort(nums[mid:], depth+1), 
                 depth+1)

In [85]:
merge_sort([5, -12, 2, 6, 1, 23, 7, 7, -12])

 merge_sort: [5, -12, 2, 6, 1, 23, 7, 7, -12]
   merge_sort: [5, -12, 2, 6]
     merge_sort: [5, -12]
       merge_sort: [5]
       merge_sort: [-12]
       merge: [5] [-12]
     merge_sort: [2, 6]
       merge_sort: [2]
       merge_sort: [6]
       merge: [2] [6]
     merge: [-12, 5] [2, 6]
   merge_sort: [1, 23, 7, 7, -12]
     merge_sort: [1, 23]
       merge_sort: [1]
       merge_sort: [23]
       merge: [1] [23]
     merge_sort: [7, 7, -12]
       merge_sort: [7]
       merge_sort: [7, -12]
         merge_sort: [7]
         merge_sort: [-12]
         merge: [7] [-12]
       merge: [7] [-12, 7]
     merge: [1, 23] [-12, 7, 7]
   merge: [-12, 2, 5, 6] [-12, 1, 7, 7, 23]


[-12, -12, 1, 2, 5, 6, 7, 7, 23]

Now we can see that Merge sort invokes itself, recursively, with each side of the array at half of the size compared to our earlier search functions. Merge Sort divides each array a series of times until the arrays of various sizes have reached sizes of 1 or 0. Merging happens after a comparison, and the numbers are appended to a new array.

Execution is much faster than Bubble Sort, sorting lists of hundreds of thousands of indices effectively. However, memory allocations make this inefficient compared to the next algorithm that we'll impliment called Quick Sort.

### Quick Sort

1. If the list is empty or has one element, return it, because it is already sorted.
2. Pick a random element from the list. We call this element a pivot.
3. Partition the list based on comparing numbers in the array to the pivot, less than or equal to the pivot sorting to the left of the pivot, and the rest to the right.
4. Call Quick Sort recursively on each of the two lists.

Check out this visual representation:

![](https://images.deepai.org/glossary-terms/a5228ea07c794b468efd1b7f758b9ead/Quicksort.png)



In [86]:
def quicksort(nums, start = 0, end= None):
    #print('quicksort', nums, start, end)
    if end is None:
        nums = list(nums)
        end = len(nums) - 1

    if start < end:
        pivot = partition(nums, start, end)
        quicksort(nums, start, pivot - 1)
        quicksort(nums, pivot+1, end)

    return nums

Here's how the partition operation works([source](https://medium.com/basecs/pivoting-to-understand-quicksort-part-1-75178dfb9313)):

<img src="https://i.imgur.com/Igk7Kr4.png" width="420">


We need to write an implementation of partition before we can run our Quicksort function:

In [99]:
def partition(nums, start=0, end=None):
    if end is None:
        end = len(nums) - 1
    
    #initialize right and left pointers
    l, r = start, end-1

    #iterate until they come together
    while r > l:
        #print('  ', nums, l, r)
        #incriment left if number is less or equal to the pivot
        if nums[l] <= nums[end]:
            l += 1

         #decrement right if the number is greater than the pivot
        elif nums[r] > nums[end]:
            r -= 1

        #swap out-of-place elements
        else:
            nums[l], nums[r] = nums[r], nums[l]
    # print('  ', nums, l, r)
    # Place the pivot between the two parts
    if nums[l] > nums[end]:
        nums[l], nums[end] = nums[end], nums[l]
        return l
    else:
        return end


And away we go:

In [97]:
l1 = [1, 5, 6, 2, 0, 11, 3]
pivot = partition(l1)
print(l1, pivot)

[1, 0, 2, 3, 5, 11, 6] 1


Let's see Quicksort in action:

In [100]:
nums0, output0 = test0['input']['nums'], test0['output']

print('Input:', nums0)
print('Expected output:', output0)
result0 = quicksort(nums0)
print('Actual output:', result0)
print('Match:', result0 == output0)

Input: [4, 6, 3, 8, 5, 7, 2, 1]
Expected output: [1, 2, 3, 4, 5, 6, 7, 8]
Actual output: [1, 2, 3, 4, 5, 6, 7, 8]
Match: True


There we have it! You can test for all cases using Jovian's evaluate test case function, although you will have to import it with jovian.pythondsa. This works better on the jovian.ai website 

### Quicksort Time Complexity


Best case partitioning:

<img src="https://i.imgur.com/DgvYvnG.png" width="480">


If we partition the list into two nearly equal parts, then the complexity analysis is similar to that of Mergesort. Quicksort has the complexity $O(n \log n)$. This is called the average-case complexity.


Worst case partitioning visualization:


<img src="https://cdn.kastatic.org/ka-perseus-images/7da2ac32779bef669a6f05decb62f219a9132158.png" width="480">

In this case, the partition is called `n` times with lists of sizes `n`, `n-1`... so that total comparisions are $n + (n-1) + (n-2) + ... + 2 + 1 = n * (n-1) / 2$. So the worst-case complexity of quicksort is $O(n^2)$.

Even with the worst case time complexity equal to $O(n^2)$, Quicksort is preferred in many situations, because its running time is closer to $O(n \log n)$ in practice if you pick a pivot well. 

Here are a couple of ways to pivot:

- [Picking a random pivot](https://cs.stackexchange.com/questions/7582/what-is-the-advantage-of-randomized-quicksort)
- [Picking median of medians](https://en.wikipedia.org/wiki/Median_of_medians)



## Custom Comparison Functions

Let's return to our original problem statement now.

> **QUESTION 1**: You're working on a new feature on Jovian called "Top Notebooks of the Week". Write a function to sort a list of notebooks in decreasing order of likes. Keep in mind that up to millions of notebooks  can be created every week, so your function needs to be as efficient as possible.

We should sort objects, instead of numbers, in the descending order of likes. We can create a custom comparison function for the notebooks using our previous algorithm.

Let's do that now:

In [101]:
class Notebook:
    def __init__(self, title, username, likes):
        self.title, self.username, self.likes = title, username, likes
    def __repr__(self):
        return 'Notebook <"{}/{}", {} likes>'.format(self.username, self.title, self.likes)

In [102]:
nb0 = Notebook('pytorch-basics', 'aakashns', 373)
nb1 = Notebook('linear-regression', 'siddhant', 532)
nb2 = Notebook('logistic-regression', 'vikas', 31)
nb3 = Notebook('feedforward-nn', 'sonaksh', 94)
nb4 = Notebook('cifar10-cnn', 'biraj', 2)
nb5 = Notebook('cifar10-resnet', 'tanya', 29)
nb6 = Notebook('anime-gans', 'hemanth', 80)
nb7 = Notebook('python-fundamentals', 'vishal', 136)
nb8 = Notebook('python-functions', 'aakashns', 74)
nb9 = Notebook('python-numpy', 'siddhant', 92)

In [103]:
notebooks = [nb0, nb1, nb2, nb3, nb4, nb5,nb6, nb7, nb8, nb9]

In [104]:
notebooks

[Notebook <"aakashns/pytorch-basics", 373 likes>,
 Notebook <"siddhant/linear-regression", 532 likes>,
 Notebook <"vikas/logistic-regression", 31 likes>,
 Notebook <"sonaksh/feedforward-nn", 94 likes>,
 Notebook <"biraj/cifar10-cnn", 2 likes>,
 Notebook <"tanya/cifar10-resnet", 29 likes>,
 Notebook <"hemanth/anime-gans", 80 likes>,
 Notebook <"vishal/python-fundamentals", 136 likes>,
 Notebook <"aakashns/python-functions", 74 likes>,
 Notebook <"siddhant/python-numpy", 92 likes>]

Here's a custom comparator function for the notebooks

In [105]:
def compare_likes(nb1, nb2):
    if nb1.likes > nb2.likes:
        return 'lesser'
    elif nb1.likes == nb2.likes:
        return 'equal'
    elif nb1.likes < nb2.likes:
        return 'greater'

Let's impliment that with Merge sort:

In [106]:
def default_compare(x, y):
    if x < y:
        return 'less'
    elif x == y:
        return 'equal'
    else:
        return 'greater'

def merge_sort(objs, compare=default_compare):
    if len(objs) < 2:
        return objs
    mid = len(objs) // 2
    return merge(merge_sort(objs[:mid], compare), 
                 merge_sort(objs[mid:], compare), 
                 compare)

def merge(left, right, compare):
    i, j, merged = 0, 0, []
    while i < len(left) and j < len(right):
        result = compare(left[i], right[j])
        if result == 'lesser' or result == 'equal':
            merged.append(left[i])
            i += 1
        else:
            merged.append(right[j])
            j += 1
    return merged + left[i:] + right[j:]

In [107]:
sorted_notebooks = merge_sort(notebooks, compare_likes)

In [108]:
sorted_notebooks

[Notebook <"siddhant/linear-regression", 532 likes>,
 Notebook <"aakashns/pytorch-basics", 373 likes>,
 Notebook <"vishal/python-fundamentals", 136 likes>,
 Notebook <"sonaksh/feedforward-nn", 94 likes>,
 Notebook <"siddhant/python-numpy", 92 likes>,
 Notebook <"hemanth/anime-gans", 80 likes>,
 Notebook <"aakashns/python-functions", 74 likes>,
 Notebook <"vikas/logistic-regression", 31 likes>,
 Notebook <"tanya/cifar10-resnet", 29 likes>,
 Notebook <"biraj/cifar10-cnn", 2 likes>]

We can use our merge_sort function with any comparison operation, because it is generic:

In [109]:
def compare_titles(nb1, nb2):
    if nb1.title < nb2.title:
        return 'lesser'
    elif nb1.title == nb2.title:
        return 'equal'
    elif nb1.title > nb2.title:
        return 'greater'

In [110]:
merge_sort(notebooks, compare_titles)

[Notebook <"hemanth/anime-gans", 80 likes>,
 Notebook <"biraj/cifar10-cnn", 2 likes>,
 Notebook <"tanya/cifar10-resnet", 29 likes>,
 Notebook <"sonaksh/feedforward-nn", 94 likes>,
 Notebook <"siddhant/linear-regression", 532 likes>,
 Notebook <"vikas/logistic-regression", 31 likes>,
 Notebook <"aakashns/python-functions", 74 likes>,
 Notebook <"vishal/python-fundamentals", 136 likes>,
 Notebook <"siddhant/python-numpy", 92 likes>,
 Notebook <"aakashns/pytorch-basics", 373 likes>]

## Summary and Exercises

Thus concludes our exercises with the following algorithms:

1. Bubble sort
2. Insertion sort
3. Merge sort
4. Quick sort

In the future, we will dissect more sorting algorithms and see when we should use them. Counting sort, Radix sort, Bucket sort, Comb sort, Shell sort, Pancake sort and Tim sort are all useful under various circumstances. 

Until then, try out some problems on sorting here:

* https://leetcode.com/tag/sort/
* https://www.techiedelight.com/sorting-interview-questions/
* [HackerRank](https://www.hackerrank.com/domains/algorithms?filters%5Bsubdomains%5D%5B%5D=arrays-and-sorting)
* https://leetcode.com/tag/divide-and-conquer/
* https://www.geeksforgeeks.org/divide-and-conquer/

Thanks to Jovian, and congratulations on following until the end! You really sorted this one out.