# Bubble Sort

---

From week 1 we learned that we can sort a list of items using a helpful list function called sort(). How exactly does this function work? 

That is not such an staightforward answer since Python's sort() uses a few different strategies in order to get the fastest it can to the answer. In order to understand sorting, we must begin by learning different ways to sort items. 

All sorting methods are different, but are derived from the same simple task: organize a list of items in order

The **Bubble sort method** is one of the easiest ways to begin. how does it work?

### Idea
Iterate over the list and at each position compare the item with the item that comes after it. 
If they are in order, leave them. 
If they are out of order, swap them. 
Once we have reached the end, we have ensured that the item at the end of the list is the largest item. 
Continue to iterate again and again until all items are in sorted order.  

<img src="https://www.w3resource.com/w3r_images/bubble-short.png"></img>

Notice how in each pass of our iteration, we do not go all the way to the end. It is unncessary! We know that after the firdt pass, the last element in the list is the biggest of the entire list, and after the second pass the second to last item is the second biggest. Watch carefully how we only go as far as we have not yet sorted. 


<img src="https://upload.wikimedia.org/wikipedia/commons/0/06/Bubble-sort.gif" />

### Outside loop

Let's begin by deciding how many time we must iterate over our entire list to make sure every item is sorted. 

If we have n items, the first iteration brings the largest item to the top. If we iterate n times, we will cover all n items. 

Is it necessary to iterate n times? On the last iteration we have sorted n - 1 items, which means by default the first item is already in its correct position, thus making it unnecessary to sort. 

In conclusion: Iterate n - 1 times on the outside!

### Inside loop

We will go from the beginning, all the way to the last item that was already in the correct position. At the beginning none of the elements are in sorted order, and so we must go to the very last index (n - 1). we can put that together to move the end point by n - 1 - i. 

### Check

if our current item is bigger than the one to the right, swap them!

For swapping variables we can use the fancy Python syntax, or simple use a temp variable. 

```
item, item to right = item to right, item
```

Put this all together and we get: 

In [2]:
lst = [3, 7, 4, 2, 9, 1]
n = len(lst)

for i in range(n - 1):
    for j in range(n - 1 - i):
        if lst[j] > lst[j + 1]:
            lst[j], lst[j + 1] = lst[j + 1], lst[j]

print(lst)

[1, 2, 3, 4, 7, 9]


### Optimization

Let's imagine a situation where our list is sorted, but we are unaware that it is.  

Is it completely necessary for us to go through all this trouble of comparing and swapping just to get the correct order? Escpecially in this case where no swapping at all would ever happen?

Idea: As we complete our bubble sort, keep track if we have found adjacent items that are out of order. If nothing is out of order, we can simply stop, since each item satisfies item < item to right of it. 

Let's add this condition to our code to save us all the unncessary work. 

In [5]:
lst2 = [1, 2, 3, 4, 5, 6]
n2 = len(lst2)

for i in range(n2 - 1):
    swapped = False
    for j in range(n2 - 1 - i):
        if lst2[j] > lst2[j + 1]:
            lst2[j], lst2[j + 1] = lst2[j + 1], lst2[j]
            swapped = True
    if swapped == False:
        break

print(lst2)

[1, 2, 3, 4, 5, 6]


### Difference in performance

Although this seems like a very small change, imagine if we were working with really large lists. Small adjustments in our code can make a really big difference when we are working with large data quantites. Let's take a look at the performance difference between these two options when handling large data!

Idea: to see this lets  generate a list of 10,000 numbers (already in sorted order) and pass them through our two functions. We will time the performance of each to see how they differ. 

In [18]:
import datetime 

# generate our data for testing
test_items_regular = []
test_items_optimized = []
test_range = 10000
max_test_val = 10000

for i in range(test_range):
    test_items_regular.append(i)
    test_items_optimized.append(i)

# write our bubble sort functions
def bubbleNormal(test_lst):
    for i in range(test_range - 1):
        for j in range(test_range - 1 - i):
            if test_lst[j] > test_lst[j + 1]:
                test_lst[j], test_lst[j + 1] = test_lst[j + 1], test_lst[j]


def bubbleOptimized(test_lst):
    for i in range(test_range - 1):
        swapped = False
        for j in range(test_range - 1 - i):
            if test_lst[j] > test_lst[j + 1]:
                test_lst[j], test_lst[j + 1] = test_lst[j + 1], test_lst[j]
                swapped = True
        if swapped == False:
            break

# test
start = datetime.datetime.now()
bubbleNormal(test_items_regular)
total_time = datetime.datetime.now() - start
print('the total time for regular bubble sort is:', total_time)

start = datetime.datetime.now()
bubbleOptimized(test_items_optimized)
total_time = datetime.datetime.now() - start
print('the total time for optimized bubble sort is:', total_time)


the total time for regular bubble sort is: 0:00:04.340574
the total time for optimized bubble sort is: 0:00:00.000892


<div class="alert alert-block alert-info">
Notice how the Regular version took several seconds to run, while the optimized only took some milli seconds. 10,000 is still a relatively small number. Imageine if you were working even larger numbers. We could be waiting for days or weeks for our code to finish running!
</div>

### Conclusion

As programmers, we always want to look for new ways to solve problems, often problems that already have solutions. As we solve our problems we want to look for as many optimizations as we can in order to speed up the time it takes our code to run. Be very very careful when making optimizations in your code! If your optimization does not work for **all** test cases, you will run into some wrong answers along the way. 