# Sorting

<style>
section.present > section.present { 
    max-height: 90%; 
    overflow-y: scroll;
}
</style>

<small><a href="https://colab.research.google.com/github/brandeis-jdelfino/cosi-10a/blob/main/lectures/notebooks/14_sorting.ipynb">Link to interactive slides on Google Colab</a></small>

## 7 volunteers needed

You'll need to hold a piece of paper with a number on it, and stand at the front of the class for a few demonstrations.

## Please sort yourselves by the order of your numbers

## What happened? How did you sort yourselves?

Sort yourselves in the order of your numbers - lowest on stage right, highest on stage left.

How did you do it?

However you did - that's a sorting algorithm. 

You may think you surveyed multiple numbers at once, but you probably did them one at a time, quickly.

Computers generally can't compare multiple things at once. 

Complex comparisons are made up of many smaller, individual comparisons.

Sorting algorithms range from simple to complex, but if you dig down far enough, they are all made up of a series of comparisons of 2 numbers.

More complex sorting algorithms attempt to minimize the number of comparisons needed to sort a whole list.

## Sorting algorithms

Today we'll look at 2 of the simpler sorting algorithms:
* Bubble sort
* Insertion sort

## Bubble sort

1. Compare the first 2 elements in the list. 
   * If the first is larger than the second, swap them.
2. Repeat for each pair of elements in the list - (2 and 3), (3 and 4), (4 and 5), and so on.
3. Repeat steps 1 and 2 until no swaps were made for the entire list.

It's called "bubble sort" because the next largest item is "bubbled up" to the end of list on each pass.

## Insertion sort

At a high level, the process is: 
1. Create a new list
2. Insert the elements into it one at a time, putting them into their correct positions in the new list.

## Insertion sort

More specifically:

1. Create a new list
2. Insert each element from the original list into the new list, using the following steps:  
   2a. Compare the new element to the first element in the new list.
      * If the new element is smaller, insert the smaller element just before the larger element.
      * Otherwise, repeat step 2a with the next element from the new list.
            
   2b. If the new element is larger than all elements in the new list, insert it at the end.


## Thank you volunteers!

## Coding bubble sort

Let's write code for bubble sort!

1. Compare the first 2 elements in the list. 
   * If the right is larger than the left, swap them.
2. Repeat for each pair of elements in the list - (2 and 3), (3 and 4), (4 and 5), and so on.
3. Repeat steps 1 and 2 until no swaps were made for the entire list.

First, the core part: walk through the list, compare each consecutive pair of numbers, and swap them if the first is larger.

In [None]:
data = [36, 83, 9, 54, 94, 20, 18, 84, 12, 56]
# incorrect
for i in range(len(data)-1):
    if data[i] > data[i+1]:
        # swap?
        data[i] = data[i+1]
        data[i+1] = data[i]
        
print(data)

We need to use an intermediate, temporary variable to swap.

In [None]:
data = [36, 83, 9, 54, 94, 20, 18, 84, 12, 56]
# correct
for i in range(len(data)-1):
    if data[i] > data[i+1]:
        temp = data[i+1]
        data[i+1] = data[i]
        data[i] = temp
        
print(data)

Now we need to do those passes until no swaps happen:

In [None]:
data = [36, 83, 9, 54, 94, 20, 18, 84, 12, 56]

while True:
    swapped = False
    for i in range(len(data)-1):
        if data[i] > data[i+1]:
            temp = data[i+1]
            data[i+1] = data[i]
            data[i] = temp
            swapped = True
    if not swapped:
        break
print(data)

## Insertion sort

1. Create a new list
2. Insert each element from the original list into the new list, using the following steps:  
   2a. Compare the new element to the first element in the new list.
      * If the new element is smaller, insert it into the right of the first element.
      * Otherwise, repeat step 2a with the next element in the new list.  
            
   2b. If the new element is larger than all elements in the new list, insert it at the end.

In [None]:
data = [36, 83, 9, 54, 94, 20, 18, 84, 12, 56]
sorted_data = []

for num in data:
    insertion_point = 0
    while insertion_point < len(sorted_data) and num > sorted_data[insertion_point]:
        insertion_point += 1
    sorted_data.insert(insertion_point, num)

print(sorted_data)

## Sorting efficiency

Both of these sorts are relatively **inefficient**: they require a lot of comparisons in order to sort the list. 

The sorting algorithm used by `.sort()` and `sorted()` is called "[Timsort](https://en.wikipedia.org/wiki/Timsort)" (named for Tim Peters, who wrote it for Python), and is significantly more efficient.

The algorithm is relatively complex, and the theory behind why it is so efficient is even more complex.

However, just like every other sorting algorithm, it's ultimately doing a series of comparisons of 2 values at a time.

Luckily, we can use it without learning all those details.

## Let's look at some more powerful ways to sort sequences in Python
## Exercise

Sort the words in a string alphabetically

In [None]:
value = "This note is about Bill Murray, a famous American actor."
words = value.split()
words.sort()
print(' '.join(words))

Hmm... case matters. We could lowercase the whole string before splitting...

In [None]:
value = "This note is about Bill Murray, a famous American actor."
value = value.lower()
words = value.split()
words.sort()
print(' '.join(words))

But what if we want to preserve the original case in our resulting string?

## Telling `sort` how to sort

By default, `sort()` orders the values by... the values.

We can tell it to order them by something else, using the `key` parameter.

The `key` parameter should be a **function** that returns the sort key for a given value.

In [None]:
def make_lower(value):
    return value.lower()

value = "This note is about Bill Murray, a famous American actor."
words = value.split()

words.sort(key=make_lower)
print(' '.join(words))

Wait, what? Did we just pass a function as a parameter?

Yes, yes we did.

# Functions are Objects

Python has **first class functions**. 

This means that functions are objects, just like other types: `int`, `str`, `list`, `dict`, `set`, etc.

They can be passed as arguments to other functions, assigned to variables, returned by other functions, and even stored in lists, dictionaries, and sets.

We'll only use this concept for this specific sorting technique, but it is quite powerful and widely used in advanced programming.

## Back to sorting...

In this example, our final answer uses the original, capitalized values.

But the comparisons used to determine the overall order were done with the lowercase values.

In [None]:
def make_lower(value):
    return value.lower()

value = "This note is about Bill Murray, a famous American actor."
words = value.split()

words.sort(key=make_lower)
print(' '.join(words))

For another example, we'll sort the words by length:

In [None]:
def get_len(value):
    return len(value)

value = "This note is about Bill Murray, a famous American actor."
words = value.split()

words.sort(key=get_len)
print(' '.join(words))

A little trick: because `len` is already a function, we actually don't need to define `get_len`, we can just use `len` directly:

In [None]:
value = "This note is about Bill Murray, a famous American actor."
words = value.split()

words.sort(key=len)
print(' '.join(words))

We could even sort by the 2nd letter of each word:

In [None]:
def second_letter(value):
    if len(value) > 1:
        return value[1]
    return value

value = "This note is about Bill Murray, a famous American actor."
words = value.split()

words.sort(key=second_letter)
print(' '.join(words))

# Sorting class instances

We can use this new idea to easily sort class instances.

In [None]:
class Book:
    def __init__(self, title, word_count, author):
        self.title = title
        self.word_count = word_count
        self.author = author

    def format(self):
        return f"{self.title} ({self.word_count} words), by {self.author}"

In [None]:
books = [
    Book("War and Peace", 561304, "Leo Tolstoy"),
    Book("Harry Potter 1", 77325, "J. K. Rowling"),
    Book("The Golden Compass", 112815, "Philip Pullman"),
    Book("The Hobbit", 95356, "J. R. R. Tolkein"),
    Book("The Old Man and the Sea", 26601, "Ernest Hemingway")
]

Previously, when we've needed to sort like this, we've created a list of 2-tuples, containing the sort value first, and the actual value second, then sorted that list.

Example of sorting books by their length, "the hard way":

In [None]:
sortable = []
for book in books:
    sortable.append((book.word_count, book))
sortable.sort(reverse=True)

sorted_list = []
for count, book in sortable:
    print(book.format())


Instead, we can define a function that returns the word count for a book, and use that as a sort key.

In [None]:
def get_wc(book):
    return book.word_count

books.sort(reverse=True, key=get_wc)
for book in books:
    print(book.format())

Similarly, we can define a sort key to sort by author's name.

In [None]:
def get_author(book):
    return book.author

books.sort(key=get_author)
for book in books:
    print(book.format())

We could even sort by author's last name:

In [None]:
def get_author_last_name(book):
    return book.author.split()[-1]

books.sort(key=get_author_last_name)
for book in books:
    print(book.format())

# Max/min

The built-in methods `max` and `min` also take `key` parameters.

In [None]:
def get_wc(book):
    return book.word_count

highest = max(books, key=get_wc)
lowest = min(books, key=get_wc)
print(f"Book with highest wordcount: {highest.format()}")
print(f"Book with lowest wordcount: {lowest.format()}")

# Lambdas

Some of you have seen sorting examples written with the `lambda` keyword, like this:

In [None]:
books.sort(reverse=True, key=lambda book: book.word_count)
for book in books:
    print(book.format())

In [None]:
books.sort(reverse=True, key=lambda book: book.word_count)
for book in books:
    print(book.format())

Is equivalent to:

In [None]:
def get_wc(book):
    return book.word_count

books.sort(reverse=True, key=get_wc)
for book in books:
    print(book.format())

This example uses a **lambda** to create a function inline, rather than needing to define it separately.

A **lambda** is an **anonymous, inline function** that can contain a single expression. The return value is whatever the expression evaluates to.

The syntax is:
```
lambda <arguments>: <expression>
```

Lambdas are similar to list comprehensions: they are a relatively advanced programming feature, and allow you to express complex ideas more concisely. 

You never have to use a lambda. You can always define a normal function. 

We're only covering them because they are very commonly (and conveniently) used to specify sort keys.