# Searching and sorting

Searching and sorting problems are probably the most commonly studied problems for computing science students.

## Searching

The textbook glosses over how to perform a search of an unsorted list. To search for a specified value in an
unsorted list, we iterate over the elements of the list and test if each element is equal to one we are looking
for:

In [None]:
def contains(t, val):
    found = False
    for elem in t:
        if elem == val:
            found = True
    return found

t = ['a', 'b', 'c']
print('t contains "a"?', contains(t, 'a'))
print('t contains "z"?', contains(t, 'z'))

The implementation of `contains` is efficient if `val` is not in the list `t`: We must examine every element in
the list to conclude that `val` is not in the list.

The implementation of `contains` is inefficient if `val` is in the list `t` because we know the answer as soon
as we find the first element equal to `val`. We can stop the loop from running when we find an element equal to
`val` using a `break` statement:

In [None]:
def contains(t, val):
    found = False
    for elem in t:
        if elem == val:
            found = True
            break
    return found

t = ['a', 'b', 'c']
print('t contains "a"?', contains(t, 'a'))
print('t contains "z"?', contains(t, 'z'))

Alternatively, we can simply return `True` inside the `if` statement, and `False` at the end of the function:

In [None]:
def contains(t, val):
    for elem in t:
        if elem == val:
            return True
    return False

t = ['a', 'b', 'c']
print('t contains "a"?', contains(t, 'a'))
print('t contains "z"?', contains(t, 'z'))

## Textbook section 4.1 *Binary search*

The textbook provides a recursive version of binary search in the program `questions.py` which is repeated in the
cell below (slightly modified so that it works in the notebook) for convenience; enter the strings `True` or
`False` when prompted by the program:

In [None]:
import sys

def search(lo, hi):
    if (hi - lo) == 1:
        return lo
    mid = (hi + lo) // 2
    s = input('Greater than or equal to ' + str(mid) + '? ')
    if s == 'True':
        return search(mid, hi)
    else:
        return search(lo, mid)

s = input('Enter a positive integer value of k: ')
k = int(s)
n = 2 ** k
print('Think of a number ')
print('between 0 and ' + str(n - 1))
guess = search(0, n)
print('Your number is ' + str(guess))

An iterative version of `search` is shown in the following cell:

In [None]:
import sys

def search(lo, hi):
    while hi - lo > 1:
        mid = (hi + lo) // 2
        s = input('Greater than or equal to ' + str(mid) + '? ')
        if s == 'True':
            lo = mid
        else:
            hi = mid
    return lo

s = input('Enter a positive integer value of k: ')
k = int(s)
n = 2 ** k
print('Think of a number ')
print('between 0 and ' + str(n - 1))
guess = search(0, n)
print('Your number is ' + str(guess))

**Exercise 1** It seems as though we should be able to assign `hi = mid - 1` if the target number is less than
`mid`. Does this work? Can you explain why or why not?

<div class="alert alert-info">
    No. hi must be 1 greater than the maximum number in the range being searched. To see that this is true,
    pick k=3 and a guess of either 3 or 5, and trace what happens in the function.
</div>

## Selection sort

The textbook introduces sorting with the *insertion sort* algorithm, but there is a different algorithm
called *selection sort* that is somewhat easier to understand.

Suppose you have a list sorted in ascending order; for example, consider the list:

```python
[-10, -3, 2, 7, 12, 29]
```

Now make the following observation regarding the elements of the sorted list:

- `-10` is the smallest element in the entire list (the sublist starting at index `i=0`)
- `-3` is the smallest element in the sublist starting at index `i = 1`
- `2` is the smallest element in the sublist starting at index `i = 2`
- `7` is the smallest element in the sublist starting at index `i = 3`
- `12` is the smallest element in the sublist starting at index `i = 4`
- `29` is the smallest element in the sublist starting at index `i = 5`

Now consider an unsorted list `t` having $n > 0$ elements.
Suppose that we have a function `min_to_front(t, start)` that moves the smallest element in a sublist of `t` 
starting at index `start` to the front of the sublist. It is not difficult to implement such a function:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp
    
def min_to_front(t, start):
    # assume the minimum value is located at t[start]
    min_val = t[start]
    min_i = start
    # search t[start + 1:] for the minimum value
    for i in range(start + 1, len(t)):
        elem_i = t[i]
        if elem_i < min_val:
            min_val = elem_i
            min_i = i
    # exchange t[start] and t[min_i]
    exchange(t, start, min_i)
    
t = [5, 4, 3, 2, 1]
print('t before:', t)
min_to_front(t, 0)
print('t after :', t)

Given `min_to_front` it is very easy to implement selection sort; simply call `min_to_front(t, i)` for all values
of `i` equal to 0, 1, 2, ..., (`len(t)` - 2). We don't need to use `i = len(t) - 1` because the other $n-1$
elements are already in their correct sorted positions.

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp
    
def min_to_front(t, start):
    # assume the minimum value is located at t[start]
    min_val = t[start]
    min_i = start
    # search t[start + 1:] for the minimum value
    for i in range(start + 1, len(t)):
        elem_i = t[i]
        if elem_i < min_val:
            min_val = elem_i
            min_i = i
    # exchange t[start] and t[min_i]
    exchange(t, start, min_i)
    
def selection_sort(t):
    n = len(t)
    for i in range(0, n - 1):
        min_to_front(t, i)
        
        
t = [5, 4, 3, 2, 1]
print('t before:', t)
selection_sort(t)
print('t after :', t)

**Exercise 2** What is the big-O complexity of selection sort?

<div class="alert alert-info">
    <tt>min_to_front(t, i)</tt> has complexity O(i) where i = 0, 1, 2, ..., n-1. The sum 0 + 1 + 2 + ... + n-1
    is equal to n(n-1)/2 which is in O(n<sup>2</sup>)
</div>

**Exercise 3** Re-write `selection_sort` so that it uses recursion instead of iteration. You may use
the `min_to_front(t, start)` and `exchange(t, i, j)` functions in your solution.

For extra practice with recursion, re-write `min_to_front` so that it uses recursion instead of iteration.

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp
    
def min_to_front(t, start):
    if start == len(t) - 1:
        return
    else:
        min_to_front(t, start + 1)
        if t[start] > t[start + 1]:
            exchange(t, start, start + 1)
    
def selection_sort(t):
    if len(t) > 1:
        selection_sort_impl(t, 0)
    
def selection_sort_impl(t, start):
    if start == len(t) - 1:
        return
    else:
        min_to_front(t, start)
        selection_sort_impl(t, start + 1)
        
        
t = [5, 4, 3, 2, 1]
print('t before:', t)
selection_sort(t)
print('t after :', t)

## Textbook section 4.2 *Insertion sort*

The textbook does not really explain what the insertion sort algorithm does to sort a list. This notebook
section is intended to illustrate how insertion sort works.

Consider an unsorted list `t` having the elements:

```python
[3, 2, 5, 0, 1, 4]
```

Insertion sort starts by consider the sublist made up of the first element of the list:

```python
[3, ?, ?, ?, ?, ?]
```

The sublist `[3]` is a sorted list. The size of the sorted sublist is called `i` and is given the value `i = 1`.

Now, consider the second element of `t` which has an index `j = 1`. Add the second element of `t` to the sorted
sublist to make a new sublist:

```python
[3, 2, ?, ?, ?, ?]
```

The sublist `[3, 2]` is not sorted, but it is easy to sort; simply move the element at index `j` towards the
front of the list until it is in the correct sorted position:

```python
[2, 3, ?, ?, ?, ?]
```

The sublist `[2, 3]` is a sorted list and has a size `i = 2`.

Now, consider the third element of `t` which has an index `j = 2`. Add the third element of `t` to the sorted
sublist to make a new sublist:

```python
[2, 3, 5, ?, ?, ?]
```

The sublist `[2, 3, 5]` is a sorted list and has a size `i = 3`.

Now, consider the fourth element of `t` which has an index `j = 3`. Add the fourth element of `t` to the sorted
sublist to make a new sublist:

```python
[2, 3, 5, 0, ?, ?]
```

The sublist `[2, 3, 5, 0]` is not sorted, but it is easy to sort; simply move the element at index `j` towards the
front of the list until it is in the correct sorted position. To do this, we repeatedly exchange the `0`
with the element immediately in front of it until the `0` reaches its correct sorted position:

```python
[2, 3, 5, 0, ?, ?]   # j = 3, exchange t[j] and t[j-1] then decrement j
[2, 3, 0, 5, ?, ?]   # j = 2, exchange t[j] and t[j-1] then decrement j
[2, 0, 3, 5, ?, ?]   # j = 1, exchange t[j] and t[j-1] then decrement j
[0, 2, 3, 5, ?, ?]   # j = 0, done because there is nothing in front of t[0]
```

The sublist `[0, 2, 3, 5]` is a sorted list and has a size `i = 4`.

Now, consider the fifth element of `t` which has an index `j = 4`. Add the fifth element of `t` to the sorted
sublist to make a new sublist:

```python
[0, 2, 3, 5, 1, ?]
```

The sublist `[0, 2, 3, 5, 1]` is not sorted, but it is easy to sort; simply move the element at index 
`j` towards the
front of the list until it is in the correct sorted position. To do this, we repeatedly exchange the `1`
with the element immediately in front of it until the `1` reaches its correct sorted position:

```python
[0, 2, 3, 5, 1, ?]   # j = 4, exchange t[j] and t[j-1] then decrement j
[0, 2, 3, 1, 5, ?]   # j = 3, exchange t[j] and t[j-1] then decrement j
[0, 2, 1, 3, 5, ?]   # j = 2, exchange t[j] and t[j-1] then decrement j
[0, 1, 2, 3, 5, ?]   # j = 1, done because t[j] >= t[j - 1]
```

The sublist `[0, 1, 2, 3, 5]` is a sorted list and has a size `i = 5`.

Now, consider the sixth element of `t` which has an index `j = 5`. Add the fifth element of `t` to the sorted
sublist to make a new sublist:

```python
[0, 1, 2, 3, 5, 4]
```

The sublist `[0, 1, 2, 3, 5, 4]` is not sorted, but it is easy to sort; simply move the element at index 
`j` towards the
front of the list until it is in the correct sorted position. To do this, we repeatedly exchange the `4`
with the element immediately in front of it until the `4` reaches its correct sorted position:

```python
[0, 1, 2, 3, 5, 4]   # j = 5, exchange t[j] and t[j-1] then decrement j
[0, 1, 2, 3, 4, 5]   # j = 4, done because t[j] >= t[j - 1]
```

The sublist `[0, 1, 2, 3, 4, 5]` is a sorted list and has a size `i = 6`.  We are done sorting the
list because `i == len(t)`.

The preceding example illustrates how insertion sort works. It maintains a sorted sublist at the front
of the original list. For each iteration of the `i` loop, it adds one element $x = t[j]$ to the sorted sublist:

![](week09/insertion_sort_before.png)

and moves that element towards the front of the list one position at a time until it reaches its sorted place in
the sublist:

![](week09/insertion_sort_after.png)

## An insertion sort implementation

The main operation of the insertion sort algorithm is shifting an element towards the front of the list until
it is in its correct sorted position. It is not difficult to implement a function that performs this operation:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp

def shift_to_front(t, j):
    # shifts the element at t[j] to the front of the list until
    # it reaches a position where the element in front of it is
    # smaller
    #
    # j is used to keep track of the position of the element that
    # we are moving towards the front of the list
    if j == 0:
        # nothing to shift because t[j] is already at the front of the list
        return
    while t[j] < t[j - 1]:
        # shift t[j] to front one position by exchanging t[j] and t[j-1]
        exchange(t, j, j - 1)
        
        # element has moved to the front of the list by one position
        j = j - 1
        
        # is the element now at the front of the list?
        if j == 0:
            return
        
t = [1, 3, 4, 5, 2]
print('t before:', t)
shift_to_front(t, 4)
print('t after :', t)

Analyzing `shift_to_front` we can observe that the `while` loop runs when two conditions are true:

1. `t[j] < t[j - 1]`
2. `j > 0`

This means that we can combine these two conditions in the loop condition and remove the last `if` statement:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp

def shift_to_front(t, j):
    # shifts the element at t[j] to the front of the list until
    # it reaches a position where the element in front of it is
    # smaller
    #
    # j is used to keep track of the position of the element that
    # we are moving towards the front of the list
    if j == 0:
        # nothing to shift because t[j] is already at the front of the list
        return
    while j > 0 and t[j] < t[j - 1]:
        # shift t[j] to front one position by exchanging t[j] and t[j-1]
        exchange(t, j, j - 1)
        
        # element has moved to the front of the list by one position
        j = j - 1

        
t = [1, 3, 4, 5, 2]
print('t before:', t)
shift_to_front(t, 4)
print('t after :', t)

Given the function `shift_to_front`, it is easy to implement `insertion_sort`:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp

def shift_to_front(t, j):
    # shifts the element at t[j] to the front of the list until
    # it reaches a position where the element in front of it is
    # smaller
    #
    # j is used to keep track of the position of the element that
    # we are moving towards the front of the list
    if j == 0:
        # nothing to shift because t[j] is already at the front of the list
        return
    while j > 0 and t[j] < t[j - 1]:
        # shift t[j] to front one position by exchanging t[j] and t[j-1]
        exchange(t, j, j - 1)
        
        # element has moved to the front of the list by one position
        j = j - 1

        
def insertion_sort(t):
    n = len(t)
    # i is the length of the sorted sublist
    for i in range(1, n): 
        # element to add to the sorted sublist is t[i]
        shift_to_front(t, i)
        
t = [5, 1, 3, 2, 4]
print('t before:', t)
insertion_sort(t)
print('t after :', t)

This is almost the algorithm shown in the textbook. If we take the contents of `shift_to_front`, paste it into
the body of `insertion_sort`, and assign `j = i` we get:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp

def insertion_sort(t):
    n = len(t)
    # i is the length of the sorted sublist
    for i in range(1, n): 
        # element to add to the sorted sublist is t[i]
        j = i
        if j == 0:
            return
        while j > 0 and t[j] < t[j - 1]:
            exchange(t, j, j - 1)
            j = j - 1
        
t = [5, 1, 3, 2, 4]
print('t before:', t)
insertion_sort(t)
print('t after :', t)

Notice that `j` is initialized to have the value `i` which is never equal to 0; thus, we can remove
`if j == 0: return` to get:

In [None]:
def exchange(t, i, j):
    tmp = t[i]
    t[i] = t[j]
    t[j] = tmp

def insertion_sort(t):
    n = len(t)
    # i is the length of the sorted sublist
    for i in range(1, n): 
        # element to add to the sorted sublist is t[i]
        j = i
        while j > 0 and t[j] < t[j - 1]:
            exchange(t, j, j - 1)
            j = j - 1
        
t = [5, 1, 3, 2, 4]
print('t before:', t)
insertion_sort(t)
print('t after :', t)

which is more or less identical to the textbook version.

**Exercise 4** Insertion sort is said to be a *stable* sorting algorithm. A stable sorting algorithm is one that
does not change the relative order of equal elements. For example, if we sort the list:

```python
[2, 1a, 0, 1b]
```

where `1a` and `1b` are both equal to `1`, then a stable sorting algorithm will produce the list:

```python
[0, 1a, 1b, 2]
```

because `1a` comes before `1b` in the unsorted list.

Explain why insertion sort is a stable sorting algorithm.

<div class="alert alert-info">
    It is stable because the while loop will not move an element in front of an element that it is equal to
    (it only moves an element towards the front of the list if <tt>t[j] < t[j - 1]</tt>).
</div>

**Exercise 5** How many assignments are required each time the `while` loop runs in insertion sort?

<div class="alert alert-info">
    4 (3 for exchange, 1 for j = j - 1)
</div>

**Exercise 6** There is a way to implement insertion sort where two assignments are required each time the
`while` loop runs (see [Wikipedia](https://en.wikipedia.org/wiki/Insertion_sort). Try implementing such a
version of insertion sort.

<div class="alert alert-info">
    It is left as an exercise for the reader to translate the Wikipedia code into Python.
</div>