# xSoc Python Course - Week 6

The end is in sight! This week's content is going to focus on efficiency, compactness and making multiple python files work together.

### Recursion
**Recursion** means defining a function in terms of itself; you'll call the function you're writing inside itself. Every recursive function has to have 3 features:
- **Base case**: the non-recursive case, which returns an actual value (or none!)
- **General case**: the recursive case, which makes a call to the function again. Each call should move you closer to reaching the base case
- You ***must*** reach a base case after a finite number of function calls. Bad things happen if you don't

Have a look at this code snippet:

In [None]:
current = 1
previous = 1

for i in range(5):
    print(previous)
    temp = current
    current += previous
    previous = temp


Hopefully, you'll recognise the first 5 terms of the Fibonacci sequence. If you look closely however, this loop fufills the criteria for a recursive function:
- Base case: have we printed out 5 terms?
- General case: printing the next term, updating the values of previous and current
- At each general case, we increment a counter variable by 1

Here's the Fibonacci code rewritten to be a recursive function:

In [None]:
def recursive_fibonacci(counter, previous, current):
    if counter > 4:
        return
    else:
        print(previous)
        return recursive_fibonacci(counter+1, current, current+previous)

recursive_fibonacci(0, 1, 1)

Note that this function looks nicer than the previous for loop. In general, recursion is preferred because it's often more succinct and easier to follow.
> Task: Modify the `recursive_fibonacci()` function so that the user can change how many terms are printed out
>
> Task: Which of the following snippets can be re-written to be recursive functions? Where possible, implement the recursive function.

1)
<pre><code>from random import randint

for i in range(0, 100, 2): #the loop will start at 0, increment i by 2 each time, and stop when i > 100
    total += i * randint(1,11) * randint(1,11)  #randint(1,11) generates a random integer between 1 and 10
    if total > 75:
        break
</code></pre>

2)
<pre><code>from random import randint

i = 0
while i < 100:
    total += i * randint(1,11) * randint(1,11)
</code></pre>

3)
<pre><code>def isPalindrome(str):
    i = 0
    j = len(str) - 1
    while (i != j and str[i] == str[j]):
        i += 1
        j -= 1
    return i == j
</code></pre>

### A Few Useful Data Structures
A ***set*** is a collection that is unordered, unindexed and does not allow duplicate values. Just like with lists, sets can contain elements of different types.

The syntax for declaring a set is:
<pre><code>`my_set = {"item1", "item2", "item3", ...}`</code></pre>

An alternative way to declare a set is by using the set constructor:
<pre><code>my_constructed_set = set(("first_item", "second_time", ...))`</code></pre>
Note the double brackets here - they're important as it tells Python that all the values we passed in are the values we want to add to the set.

Once a set has been declared, you can add (`.add()`) and remove (`.remove(el)`) elements, but you cannot directly edit an individual element. Since the elements in a set are unindexed, there's no way to actually *access* individual elements. Instead, you have to loop through each element one-by-one using a for-loop.

Python also supports standard set operations with the methods `.difference()`, `.intersection()`, `.symmetric_difference()` and `.union()`. Using these methods will return a new set, if you just want to update an existing set, use the methods `.difference_update()`, `.intersection_update()`, `.symmetric_difference_update()` and `.update()`. Here's a quick example to show the difference:

In [None]:
set1 = {"a", "b" , "c"}
set2 = {1, 2, 3}

print(set1)
print(set2)

set3 = set1.union(set2)
print(set3)
set1.update(set2)
print(set1)

> Task:

A ***dictionary*** is a collection that stores items as **key:value pairs**. It is changable, ordered (as of Python 3.7), and does not allow duplicate keys. Values, however, can be repeated.

### Sorting Things Out
Now that we've introduced some data structures, it's worth introducing some algorithms you might find handy. The first class of algorithms we'll cover are the sorting algorithms. We'll start with the ***Bubble Sort***:

In [None]:
def bubble_sort(arr):
    for i in range(len(arr)):
        for j in range(len(arr)):
            if arr[i] < arr[j]:
                temp = arr[i]
                arr[i] = arr[j]
                arr[j] = temp
    return arr

unsorted_array = [2, 7, 1, 5, 3, 8, 0]
print(unsorted_array)
sorted_array = bubble_sort(unsorted_array)
print(sorted_array)

Seems fairly simple right? Iterate through the list and check that each element is smaller than all the rest. Our algorithm works, but it's not as efficient as it could be. 

Luckily, we can exploit one of the features of a bubble sort: after each swap, the largest element (if you're sorting in ascending order) will move to the end of the unsorted list - that means we don't need to check it anymore!

But what happens if we pass an already-sorted array into the function? The for-loops would execute anyway, but they wouldn't do anything useful! In the end, the sort would take the same amount of time regardless of how sorted your array already is. We can do better still; what if we check how sorted our array is after each pass by counting the number of swaps we make. If we make no swaps in a pass, we know the array has been sorted. 

Let's add these features to the code:

In [None]:
def better_bubble_sort(arr):
    swaps = True
    n = len(arr) - 1
    while (swaps == True):
        swaps = False
        for i in range(n):
            if arr[i] > arr[i+1]:
                temp = arr[i]
                arr[i] = arr[i+1]
                arr[i+1] = temp
                swaps = True
        n -= 1
    return arr

unsorted_array = [2, 7, 1, 5, 3, 8, 0]
print(unsorted_array)
sorted_array = better_bubble_sort(unsorted_array)
print(sorted_array)

> Task: Another common type of sort is the ***Insertion Sort***. We "split" the array into sorted and unsorted parts, then values in the unsorted part are picked and placed in the correct position in the sorted array. Try to implement your own Insertion Sort (there are plenty of solutions online, but try to solve it yourself as much as possible).
>
> Task (optional): What are the advantages of an Insertion Sort over the Bubble Sort? When might you use one over the other?

If implementing your own sort is too much effort, don't worry - Python has your back. The inbuilt `sort()` method can be called on any list using: `array.sort()`. The method will sort the list in ascending order by default; if you want the list sorted in ascending order you need to specify the `reverse` parameter: `array.sort(reverse=true)`.

> Task: Use `.sort()` to sort `unsorted_array` in *ascending* order. Now sort it in *descending* order.

### A Quick Detour: Efficiency and Big-Oh
If the first bubble sort we wrote worked, then why did we improve it? It has to do with *code efficiency*. Often, there are lots of different implementations that solve the same problem. The best solutions will run the fastest, and use the least amount additional storage space. Many of the constructs you've learned over the past weeks will help with writing efficient code, such as:
- Using loops for repeated actions
- Using data structures instead of separate variables
- Using functions if you're going to be repeating the same blocks of actions throughout your code
- Use of in-built features / external code libraries
- Use of recursion

A fairly common notation you'll see when talking about the efficiency of algorithms is **Big-Oh notation O()**. Big-Oh gives us an upper bound to the growth rate of an algorithm as the input size *n* increases. 
Big-Oh has a couple of basic rules:
- If your run time is a polynomial of degree *d*, then the run time is O(nᵈ). You drop any lower-order and constant terms (as n increases, the nᵈ term grows the fastest)
- Use the *smallest possible* class of function

For example, consider going one by one through a list of *n* elements. As we add more elements to the list, it's going to take longer to visit all of the elements. The time varies linearly (hence the term *linear search*) with input size, so we say it runs in O(n) time.

By contrast, the bubble and insertion sorts we wrote in the previous section run in O(n²) time. That means that the run time of our algorithms grow proportional to the **square** of the size of the input. For smaller inputs, it's not a huge problem, but if were sorting arrays with thousands of elements, things would quickly get out of hand.

Not to say that there aren't *worse* sorting algorithms out there. The ***Bogosort*** has no upper bound on its runtime (aka O(∞)) and an **average** runtime of O((n+1)!). We're not going to bother with a Python implementation (you can attempt it if you really want, check Python's `random` library to help), but the pseudocode for the randomised Bogosort is:
<pre><code>while not inOrder(list):
    shuffle(list)
</code></pre>
Unsuprisingly, nobody actually uses Bogosorts.

### Divide and Conquer
The Insertion and Bubble sorts are *iterative*. A **Divide and Conquer** algorithm solves a large problem by:
<ol>
    <li>Breaking the problem into smaller sub-problems</li>
    <li>Recursively solving the sub-problems</li>
    <li>Combining them to get the desired output</li>
</ol>

We'll look at 2 Divide and Conquer sorts - the ***QuickSort*** and ***MergeSort***.

A ***QuickSort*** partitions the list to be sorted by choosing a *pivot*, the remaining items are grouped into 3 sub-lists:
- Items less than the pivot
- Items equal to the pivot
- Items greater than the pivot

The less than/greater than sublists are then sorted using the QuickSort, and the final sorted lists are combined. Although it doesn't matter how you choose the pivot (including choosing randomly), for now we'll stick to choosing the element at the middle index as the pivot.

> Task: I want to sort a list of numbers in *ascending* order. I've written the code for the partition, can you write the recursive method?
<pre><code> def swap(arr, i, j):
    temp = arr[j]
    arr[j] = arr[i]
    arr[i] = temp

def partition(arr, low, high):
    pivot_index = (low+high) // 2
    i = low - 1
    for j in range(low, high):
        if arr[j] < arr[pivot_index]:
            i += 1
            swap(arr, i, j)
    swap(arr, i+1, high)
    return i+1            
</code></pre>

The ***MergeSort*** is similar to the the QuickSort, except it divides the list into two sub-lists, then calls the MergeSort on both the sub-lists and merges the sorted lists. The implementation of a MergeSort is slightly harder:

In [None]:
def merge(arr, low, mid, high):
    left_size = mid - low + 1
    right_size = high - mid
    left_array = []
    right_array = []

    for i5 in range(left_size):         # copy elements into the left sub-list
        left_array.append(arr[low+i5])
    
    for j5 in range(right_size):        # copy elements into the right sub-list
        right_array.append(arr[mid+1+j5])
    
    i5 = 0      # Start of left sub-list
    j5 = 0      # Start of right sub-list
    k = low     # Start of merged list

    while i5 < left_size and j5 < right_size:
        if left_array[i5] <= right_array[j5]:
            arr[k] = left_array[i5]
            i5 += 1
        else:
            arr[k] = right_array[j5]
            j5 += 1
        k += 1
    
    # Copy any remaining elements of the left sub-list into the merged list
    while i5 < left_size:
        arr[k] = left_array[i5]
        i5 += 1
        k += 1
    
    # Copy any remaining elements of the right sub-list into the merged list
    while j5 < right_size:
        arr[k] = right_array[j5]
        j5 += 1
        k += 1

def merge_sort(arr, low, high):
    if low < high:
        mid = (low+high) // 2
        merge_sort(arr, low, mid)
        merge_sort(arr, mid+1, high)
        merge(arr, low, mid, high)

unsorted_array = [2, 7, 1, 5, 3, 8, 0]
print(unsorted_array)
merge_sort(unsorted_array, 0, len(unsorted_array)-1)
print(unsorted_array)   # not so unsorted anymore

### More Algorithms: Searching

### Compactness

### ***This week of content was written by the Computing Society***

We hope you enjoyed the course!