# Workshop 3: Functions

## The Collatz Problem Revisited

Recall the Collatz Operation from Tutorial 1. Given an integer $n$, the next number in the Collatz sequence is:

 - if $n$ is even, divide it by two
 - if $n$ is odd, triple it and add one

Repeatedly applying the Collatz Operation results in a Collatz Sequence, which terminates once it reaches the number $1$.

The number of terms in the Collatz Sequence starting with $n$ is the Collatz Number of $n$. For example, for $n = 5$ the Collatz Sequence is $5, 16, 8, 4, 2, 1$ and the Collatz Number is $6$.

:::{exercise}
:label: exercise_3_6
Define a function `collatz_op(n)` that returns the next number in the Collatz Sequence. Check that your function returns the correct result for `n = 5` and `n = 6`.

```
def collatz_op(n):
    # replace with your code
```
:::

:::{exercise}
:label: exercise_3_7
Define a function `collatz_number` that returns the Collatz Number of `n`. Your function should use the function `collatz_op`.

```
def collatz_number(n):
    while n > 1:
        # Replace with your code to
        # calculate the Collatz number

    return num
```
:::

The [Wikipedia article](https://en.wikipedia.org/wiki/Collatz_conjecture) for the Collatz Conjecture contains [a graph](https://en.wikipedia.org/wiki/Collatz_conjecture#/media/File:Collatz5.svg) of the first $100$ Collatz Numbers.

:::{exercise}
:label: exercise_3_8
Create an array containing the first $100$ Collatz numbers. Then plot a line graph like the one in the Wikipedia article.


```
collatz_numbers = np.zeros(100)

# For i from 0 to 99
#   set the value of collatz_numbers[i]

# plot a line graph of collatz_numbers
```

:::



## Investigating the SIR Model

In the previous tutorial, we wrote code that simulated the spread of an epdemic using the SIR model. We were able to investigate how the spread of the epidemic was influenced by parameters of the model - recovery rate $a$ and infection rate $b$.

:::{math}
:label: SIR_equations_2
\begin{align}S_{i+1} &= S_i - bS_iI_i\\
I_{i+1} &= I_i + bS_iI_i - aI_i.\end{align}
:::

Through policy interventions, the Government can influence the value of the parameter $b$. The government would like to understand how the value of $b$ affects the peak number of infections.

:::{exercise}
:label: exercise_3_9

Complete the code below so that it simulates the epidemic for 100 days using the parameter values $a = 0.1$ and $b = 0.00005$, and initial populations $S_0 = 20000$ and $I_0 = 100$. You can reuse your code from last week's exercise.

Then use the `numpy` function `np.max` to calculate peak infections (the maximum value of the array `I`).

Change the value of the parameter `b` and (by inspecting the graph of `I`) check that your calculated value of peak infections is correct.

```
import numpy as np
import matplotlib.pyplot as plt

n_days = 100
a = 0.1
b = 0.00005

# your code here
```
:::

:::{solution} exercise_3_9
:hidden:
Using our code from last week, we just need to make a little addition to calculate peak infections:

```
import numpy as np
import matplotlib.pyplot as plt

# set up variables and arrays
n_days = 100
a = 0.1
b = 0.00005
S = np.zeros(n_days)
I = np.zeros(n_days)

# initialise the variables
S[0] = 20000
I[0] = 100

# implement equations
for i in range(n_days - 1):
    S[i+1] = S[i] - (b * S[i] * I[i])
    I[i+1] = I[i] + (b * S[i] * I[i]) - (a * I[i])

# plot the figure
plt.figure(figsize=(5,5))
plt.plot(I)
plt.plot(S)
plt.xlabel("Time (days)")
plt.ylabel("Population")


# check the maximum 
peak_infections = np.max(I)
print('For infection rate', b, 'infections peak at', peak_infections, 'cases.')
```

Looking at the graph, infections peak at around $15000$ cases, which is similar to what we find frpom `np.max`. Let's check for another value of $b$.

```
import numpy as np
import matplotlib.pyplot as plt

# set up variables and arrays
n_days = 100
a = 0.1
b = 0.00001
S = np.zeros(n_days)
I = np.zeros(n_days)

# initialise the variables
S[0] = 20000
I[0] = 100

# implement equations
for i in range(n_days - 1):
    S[i+1] = S[i] - (b * S[i] * I[i])
    I[i+1] = I[i] + (b * S[i] * I[i]) - (a * I[i])

# plot the figure
plt.figure(figsize=(5,5))
plt.plot(I)
plt.plot(S)
plt.xlabel("Time (days)")
plt.ylabel("Population")


# check the maximum 
peak_infections = np.max(I)
print('For infection rate', b, 'infections peak at', peak_infections, 'cases.')
```

This time, both from `np.max` and by visually inspecting the graph we see that the peak number of infections is much lower. 

:::

Next, we'd like to produce a graph which shows how peak infections varies with the value of the infection rate parameter `b`. To do this, we will write a function `max_infected(a, b)` which calculates and returns the peak infections. 

:::{exercise}
:label: exercise_3_10
1. Write a function `max_infected(a, b)` which calculates and returns the maximum number of infected people over the course of the epidemic given parameter values `a` and `b`. Check that your function returns the expected values for `a = 0.1` and various values of `b`.
2. Use `np.linspace` to create an array `b_array` which contains a sequence of `10` evenly spaced numbers from  `0` to `0.00005`
3. Use `np.zeros` to create an empty array `peak_infections` of length `10`.
4. Use a loop to set the value of `peak_infections` for each value of `b` in `b_array`. 
5. Finally, create a plot to show how peak infections vary with infection rate.

```
def max_infected(a, b):
    # Run the simulation and calculate
    # the peak number of infections

b_array = # your code here
peak_infections = # your code here
for i in range(10):
    # Calculate the peak number of infections
    # for the given value of b

# Create a plot of peak infections against b

```
:::

:::{solution} exercise_3_10
:hidden:
1. First we need to write a function that will return the maximum number of infected people for different parameter values. We can just modify the code above so that it is inside a function!

```
def max_infected(a, b):
    # Run the simulation and calculate
    # the peak number of infections
    
    # set up variables and arrays
    n_days = 100
    S = np.zeros(n_days)
    I = np.zeros(n_days)

    # initialise the variables
    S[0] = 20000
    I[0] = 100

    # implement equations
    for i in range(n_days - 1):
        S[i+1] = S[i] - (b * S[i] * I[i])
        I[i+1] = I[i] + (b * S[i] * I[i]) - (a * I[i])
        
    # check the maximum 
    peak_infections = np.max(I)
    
    return peak_infections
```

We can check this works by calling the function for `a = 0.1` and the values of `b` we tested above. For example:
```
max_infected(0.1, 0.00005)
```
returns $14873$, as we expect!

2. Recall from last week that to generate `10` evenly spaced numbers from  `0` to `0.00005` we can use:
```
b_array = np.linspace(0, 0.00005, 10)
```

3. Again, recall from last week that we can create an array containing 10 zeros:
```
peak_infections = np.zeros(10)
```

4. We have also seem how to use a loop to set a value in an array - notice that we need to retrieve each value of `b` to test from `b_array`. 
```
for i in range(10):
    # Calculate the peak number of infections
    # for the given value of b
    peak_infections[i] = max_infected(0.1, b_array[i])
```

5. Now we just need to plot the `peak_infections` array against `b_array` to see how peak infections vary with the infection parameter. 
```
# Create a plot of peak infections against b
plt.figure(figsize=(4,4))
plt.plot(b_array, peak_infections)
plt.title('Peak infection size against infection rate')
plt.xlabel('Infection rate')
plt.ylabel('Peak infection')
```

Putting this altogether we have:
```
def max_infected(a, b):
    # Run the simulation and calculate
    # the peak number of infections
    
    # set up variables and arrays
    n_days = 100
    S = np.zeros(n_days)
    I = np.zeros(n_days)

    # initialise the variables
    S[0] = 20000
    I[0] = 100

    # implement equations
    for i in range(n_days - 1):
        S[i+1] = S[i] - (b * S[i] * I[i])
        I[i+1] = I[i] + (b * S[i] * I[i]) - (a * I[i])
        
    # check the maximum 
    peak_infections = np.max(I)
    
    return peak_infections

b_array = np.linspace(0, 0.00005, 10)
peak_infections = np.zeros(10)

for i in range(10):
    # Calculate the peak number of infections
    # for the given value of b
    peak_infections[i] = max_infected(0.1, b_array[i])

# Create a plot of peak infections against b
plt.figure(figsize=(4,4))
plt.plot(b_array, peak_infections)
plt.title('Peak infection size against infection rate')
plt.xlabel('Infection rate')
plt.ylabel('Peak infection')
```

As we expect, as the infection rate grows so does the peak infection!
    
:::

## Taking it Further

As well as the peak number of infected people, the government is interested in the *total* amount of medical care that will be required over the course of the epidemic. Assuming that each day, every infected person has an equal and independent probability of requiring medical care, then the total cost of medical care will be proportional to the sum of the number of infected people over all days. We call this the *infected person-days* and is simply the sum values in the array `I`.

:::{exercise}
:label: exercise_3_11
Write a function `total_infected(a, b)` which calculates and returns the total number of infected person-days for the duration of the epidemic.

Then, plot the total number of infections against $b$ for values between $0$ and $0.00005$.
:::

:::{solution} exercise_3_11
:hidden:

We can use the function `np.sum` to sum the values in an array. To find out the total number of infected person-days we just need to sum the values in `I` - so we just need to make a small tweak to the `max_infected` function we wrote above:

```
def total_infected(a, b):
    # Run the simulation and calculate
    # the total number of infections

    # set up variables and arrays
    n_days = 100
    S = np.zeros(n_days)
    I = np.zeros(n_days)

    # initialise the variables
    S[0] = 20000
    I[0] = 100

    # implement equations
    for i in range(n_days - 1):
        S[i+1] = S[i] - (b * S[i] * I[i])
        I[i+1] = I[i] + (b * S[i] * I[i]) - (a * I[i])

    # check the total - sum instead of max!
    total_infections = np.sum(I)

    return total_infections
```

This code is tricky to test as it is not easy to see the total infected person-days from the graph! However, when $b = 0$ and $a = 1$, the equations reduce to 
:::{math}
:label: SIR_equations_exercise
\begin{align}S_{i+1} &= S_i \\
I_{i+1} &= 0.\end{align}
:::
so everyone recovers after just $1$ day, and no one else is infected. This means the total infected person-days for these parameter values should be $100$ - let's check!

```
total_infected(1, 0)
```

Now we just need to plot the total infected person-days for different values of $b$ - just like we did above:

```
# set up variables and arrays
b_array = np.linspace(0, 0.00005, 10)
total_infections = np.zeros(10)

for i in range(10):
    # Calculate the total number of infections
    # for the given value of b
    total_infections[i] = total_infected(0.1, b_array[i])

# Create a plot of total infections against b
plt.figure(figsize=(4,4))
plt.plot(b_array, total_infections)
plt.title('Total infection size against infection rate')
plt.xlabel('Infection rate')
plt.ylabel('Total infection size')

```

Through public policy interventions such as vaccinations, the infection rate parameter $b$ can be reduced. However such interventions are costly. Goverment analysts estimate that the cost (in thousands of pounds) of interventions are given by the following formula:

$$ \mathrm{intervention~cost} = \frac{1}{5b} - 2000$$

where $b$ is the desired infection parameter.

Likewise, the cost of providing medical care (in thousands of pounds) is:

$$ \mathrm{medical~cost} = I$$

where $I$ is the total number infected person-days.

:::{exercise}
:label: exercise_3_12
Calculate arrays `intervention_cost`, `medical_cost` containing the intervention and medical costs respectively over the range of `b` from $0$ to $0.0005$. Assume $a = 0.1$

Calculate an array `total_cost` which is the sum of the intervention and medical costs.

Plot all three on the same axes. Roughly what value of `b` minimises the total cost?
:::

:::{solution} exercise_3_12
:hidden:

As above, let's check the costs for 10 values of $b$ in the range $0$ to $0.00005$. We need to set up arrays to contain the costs and then calculate the corresponding cost for each value of $b$.

Notice that when $b = 0$ the formula suggests the intervention cost will be infinite - Python cannot handle dividing by zero (try it!) and will throw an error. To get around this, we can use `np.linspace` to set up the array for $b$ as normal, and then just make the first element non-zero (but still small). 

Also notice that rather than initialising a third array and calculating each element of `total_cost` one by one, we have just been able to add the `intervention_cost` and `medical_cost` arrays together - this works because they are both the same size!

```
# set up variables and arrays
b_array = np.linspace(0, 0.00005, 10)
intervention_cost = np.zeros(10)
medical_cost = np.zeros(10)

# avoid b = 0
b_array[0] = 0.000001

for i in range(10):
    intervention_cost[i] = 1/(5 * b_array[i]) - 2000
    medical_cost[i] = total_infected(0.1, b_array[i])

# add up the total cost
total_cost = intervention_cost + medical_cost

# Create a plot of costs against b
plt.figure(figsize=(4,4))
plt.plot(b_array, intervention_cost)
plt.plot(b_array, medical_cost)
plt.plot(b_array, total_cost)
plt.title('Costs against infection rate')
plt.xlabel('Infection rate')
plt.ylabel('Cost (thousands of £s)')
```

Examining the graph, we can see that costs are minimised when $b \approx 5e-6$.
:::