# Week 1 Lecture and Lab

This is a short Jupyter notebook to take you through some python concepts and relate these to Section 1.4 of (Ross, 2017). 

You will need to install Jupyter notebooks (as well as python3) on your machine. A guide to installation and working with these can be found [here](https://jupyter.readthedocs.io/en/latest/) (other online sources are also available). There are two types of cells markdown (for text) and code (for python). This is a markdown cell. Rules of markdown syntax can be found online. [Here is a particularly good guide](https://www.markdownguide.org/basic-syntax/).


## Variables and Types
Here we'll look a little a what variables are and their types. This is based on John Graunt's analysis of deaths in the population. 

In [1]:
# There were 3 deaths per 88 people,
# calculate the fractional death_rate and assign it to a variable death_rate
death_rate = 3/88
# output the value of the variable death_rate
death_rate

0.03409090909090909

Let's create more variables

In [2]:
# to get the people implied per recorded death we take the reciprocal
people_per_death = 1/death_rate
# output the new variable
print(people_per_death)
# overwrite the variable from a direct calculation
people_per_death = 88/3
people_per_death

29.333333333333336


29.333333333333332

Note that there is only one output of the value of <code>people_per_death</code>. Outputs are only generated for
the last statement in the code. To display at any point we need another mechanism. Below are just three ways to print out information. Please look up strings, print and string formatting for more information.

In [3]:
print(
    "The number of people in the population suggested by each death is ",
    people_per_death)
some_output = "The proportion of people in the population who die in a given year is %.4f" % death_rate
print(some_output)
recorded_deaths = 13200
estimated_population = 13200*people_per_death
print(f"Given {recorded_deaths} recorded deaths, John Graunt estimated the population to be {estimated_population}")

The number of people in the population suggested by each death is  29.333333333333332
The proportion of people in the population who die in a given year is 0.0341
Given 13200 recorded deaths, John Graunt estimated the population to be 387200.0


We have now created an <code>int</code> type variable <code>recorded_deaths</code>, a <code>string</code> variable <code>some_output</code> and a number of <code>float</code> type variables, e.g. <code>death_rate</code>. We do so by simply assigning these with the <code>=</code> operator. We do not have to state the type of the variable as in some languages.

## Exercises A

Create some additional variables in the code cell below. Look at the questions at the end of (Ross, 2017, Chp. 1) for inspiration on what to create. Try creating an <code>int</code>, <code>float</code> and <code>string</code> type variable. What other [Python types](https://www.w3schools.com/python/python_datatypes.asp) do you know? Create some additional variables with other basic types too (don't worry about lists, tuples and dictionaries for now).

Output your variables with the various approaches shown above. Can you format your <code>float</code> variables so that they only display 1 decimal place? Look at the python documentation [here](https://docs.python.org/3.6/library/string.html) or the tutorial [here](https://www.w3schools.com/python/ref_string_format.asp) to see if you can solve this problem yourself.

In [4]:
# to complete

## Functions
If we wanted to run this population estimate multiple times, it is good practice to construct a function to do it. For instance...

In [5]:
def estimate_population(recorded_deaths, death_rate):
    return recorded_deaths/death_rate

We can now calculate the estimate for a number of scenarios...

In [6]:
print("estimate 1: ", estimate_population(23456, 0.03))
print("estimate 2: ", estimate_population(18000, 0.05))
print("estimate 3: ", estimate_population(50000, 0.07))


estimate 1:  781866.6666666667
estimate 2:  360000.0
estimate 3:  714285.7142857142


To define a function we use the <code>def</code> keyword followed by the function name then the argument names in brackets followed by a colon. The following lines must be indented and this indicates that they belong to the function. Any indentation is allowed but <b>you are strongly advised to use 4 spaces (do not use tabs)</b>.

## Lists, Tuples and Numpy Arrays

### Lists

Lists can be used to store collections of items of any type.

In [7]:
# This is an acceptable list
x = [1, 3, 5,7,9,11]
print(f"{x} is an acceptable list")
# This is an acceptable list
y = ["fish", "and", "chips"]
print(f"{y} is an acceptable list")
# This is an acceptable list
z = ["One", 2, True, None, 3.4, ["A", "list", "within", "a", "list"]]
print(f"{z} is an acceptable list")


[1, 3, 5, 7, 9, 11] is an acceptable list
['fish', 'and', 'chips'] is an acceptable list
['One', 2, True, None, 3.4, ['A', 'list', 'within', 'a', 'list']] is an acceptable list


In [8]:
print(f"x looks like this {x}")
index = 1
element = x[index]
print(f"The element at index {index} of x is {element}")
# we can update x with
x[index] = 247
print(f"Now x looks like this {x}")
element = x[index]
print(f"And now the element at index {index} of x is {element}")

# we can access multiple elements of a list at the same time with a slice
print(f"x[0:2] looks like this {x[0:2]}")
print(f"x[2:] looks like this {x[2:]}")
print(f"x[:4] looks like this {x[:4]}")
# even going backwards
print(f"x[4:1:-1] looks like this {x[4:1:-1]}")
print(f"x[::-1] looks like this {x[::-1]}")


x looks like this [1, 3, 5, 7, 9, 11]
The element at index 1 of x is 3
Now x looks like this [1, 247, 5, 7, 9, 11]
And now the element at index 1 of x is 247
x[0:2] looks like this [1, 247]
x[2:] looks like this [5, 7, 9, 11]
x[:4] looks like this [1, 247, 5, 7]
x[4:1:-1] looks like this [9, 7, 5]
x[::-1] looks like this [11, 9, 7, 5, 247, 1]


We  can access members of a list via their index to both read an write elements. We can also append elements to a list, extend a list by another and concatenate two lists.

In [9]:
# append an element
z.append("next")
print(f"Now z looks like this {z}")
# extend a list
x.extend(z)
print(f"Now x looks like this {x}")

# concatenate lists
new_list = y + ["are", "delicious"]
print(f"new_list looks like this {new_list}")

Now z looks like this ['One', 2, True, None, 3.4, ['A', 'list', 'within', 'a', 'list'], 'next']
Now x looks like this [1, 247, 5, 7, 9, 11, 'One', 2, True, None, 3.4, ['A', 'list', 'within', 'a', 'list'], 'next']
new_list looks like this ['fish', 'and', 'chips', 'are', 'delicious']


### Tuples
Tuples are like lists but cannot be modified.


In [10]:
w = (0,2,4,"end")
print(f"{w} is an acceptable tuple")
# we can access elements of w
print(f"w[1] is {w[1]}")
print(f"w[3] is {w[3]}")
# but we cannot modify elements, append to or extend w.

(0, 2, 4, 'end') is an acceptable tuple
w[1] is 2
w[3] is end


If you want to modify a tuple you must convert to a list. You can convert it back later if need-be. 

In [11]:
print(f"w looks like this {w}")
w_list = list(w)
print(f"w_list looks like this {w_list}")
w_list[0] = "start"
print(f"Now w_list looks like this {w_list}")
w = tuple(w_list)
print(f"Now w looks like this {w}")



w looks like this (0, 2, 4, 'end')
w_list looks like this [0, 2, 4, 'end']
Now w_list looks like this ['start', 2, 4, 'end']
Now w looks like this ('start', 2, 4, 'end')


### Numpy arrays

Numpy is a library you will need to install before you can use it. We will be making much use of it in the module so I suggest you do so if you haven't already ([see here]{https://numpy.org/install/}).

Numpy arrays come with a suite of extra functionality but are slightly more restrictive in their use.  

In [12]:
# we must somewhere in the file import numpy, e.g.
#import numpy
# Many prefer to do this with an alias 
import numpy as np

a = np.array([-1,6,8,9,31, 47, 53, 72])
print(f"a looks like this {a}")
print(f"a[2:] looks like this {a[2:]}")
# we can use a list of indices for a
print(f"a[[1,3,6]] looks like this {a[[1,3,6]]}")

# arrays can be multi-dimensional
b = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
print(f"b looks like this {b}")
# and can be transposed like this:
print(f"b.transpose() looks like this {b.transpose()}")
# or like this
print(f"b.T looks like this {b.T}")
# we can access sub-arrays
print(f"b[1:3,1:] looks like this {b[1:3,:]}")


a looks like this [-1  6  8  9 31 47 53 72]
a[2:] looks like this [ 8  9 31 47 53 72]
a[[1,3,6]] looks like this [ 6  9 53]
b looks like this [[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
b.transpose() looks like this [[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]]
b.T looks like this [[ 1  4  7 10]
 [ 2  5  8 11]
 [ 3  6  9 12]]
b[1:3,1:] looks like this [[4 5 6]
 [7 8 9]]


Don't worry if numpy arrays look a little complicated right now. We will be working with them a great deal, and you will get plenty of practice.

## Exercises B

Look at the code block with the roulette function. Create your own function in the same block called <code>switch_first_and_last</code> which takes a list as input and swaps the elements in the first and last position. You can refer to the last position in two ways. Look at [this tutorial page](https://www.w3schools.com/python/python_lists.asp) on lists if you do not know how. Be careful, you must store one of the results in a temporary variable. before copying the other across.

Now look at question 3 in (Ross, 2017, p. 12). Then complete the following:

1. Encode each row of the table as it's own list.
1. Calculate the change in percentage for males between 1999 and 2000.
1. Can you calculate all changes in percentage from one year to the next for both males and females? Try converting your lists to Numpy arrays first.
1. How would you use python to help answer Questions 3 (a) and (b)?

In [13]:
# to complete


## Control flow

Control flow, if, if-else, while etc are available in python and you use these just as in other proceedural programming languages. The local scope of if-blocks etc must be indented. Nested blocks must be indented further to the right than the parent block. There is no need for brackets with if/while conditions, and we use elif for else-if. For instance:


In [14]:
if True:
    print(f"{True} is always true")
    
if 1 == 0:
    print("This statement will not print")
else:
    print("When an if-test fails the else block will be executed.")
    
def roulette_colour(number):
    if (number % 2) == 1:
        return "red"
    elif number != 0:
        return "black"
    else:
        return "green"

print(f"{0} is coloured {roulette_colour(0)}")
print(f"{1} is coloured {roulette_colour(1)}")
print(f"{2} is coloured {roulette_colour(2)}")

True is always true
When an if-test fails the else block will be executed.
0 is coloured green
1 is coloured red
2 is coloured black


For loops have a very compact syntax. One very common form is:

<code>for {variable} in {collection}:</code>

Another powerful form is:

<code>for {variable} in {generator}:</code>

A generator can be thought of a lazily evaluated collection (i.e. elements are only calculated/evaluated as needed). One very common generator is <code>range</code> see [here](https://wiki.python.org/moin/Generators) for more on generators and [here](https://docs.python.org/3/library/functions.html) for more on built in functions.
    

In [15]:
for n in range(7):
    print(f"{n} is coloured {roulette_colour(n)}")

0 is coloured green
1 is coloured red
2 is coloured black
3 is coloured red
4 is coloured black
5 is coloured red
6 is coloured black


This gives us enough to start talking about the statistics. There are many more features to python and we will being to explore some of these through the module. However, I suggest you look at the Moodle page [guidance on python](https://moodle.ucl.ac.uk/mod/page/view.php?id=2011137) to familiarise yourself with these a little in advance.

# Final Exercises

What follows contains some incomplete parts for you to fill in yourself.

## John Graunt's "Total Deaths in England" (Ross, 2017, Tbl. 1.2)
Let's encode Table 1.2 in python and explore the values. Two lists <code>years</code> and <code>burials</code> have been created for you corresponding to columns 0 and 1 of the table (counting from 0). Create a third variable <code>plague_deaths</code> based on the final column. Then iterate over the three lists printing out the year, burials and plague deaths for each row of the table.

In [None]:
years = [1592, 1593, 1603, 1625, 1636]
burials = [25886,17844,37294,51758,23359]

# to complete

### Calculating the average
We would like to calculate the average (mean) of burials and plague deaths in one year for  the years recorded. Again there are a number of ways to do this, but you should use the Numpy <code>mean</code> function. In this module, you will be asked to investigate the documentation for unfamiliar functions so that you can learn how to use them. You can find documentation on <code>numpy.mean</code> [here](https://numpy.org/doc/stable/reference/generated/numpy.mean.html). You should always look at the standard documentation first, and in this case there are examples of usage. However, if you find the documentation difficult to read at first you can always perform a web-search on examples of usage.

Calculate the mean number of burials and plague deaths recorded and print these out. Can you calculate the standard deviation too using another Numpy function?

In [None]:
# we've already imported numpy with short name np, so there is no need to 
# import this again here.

# to complete

### Significant figures

Calculations like the above can lead to long strings of digits, many of which have very little impact on the result. For instance, the fractional values here are possibly less important than those left of the decimal point. More importantly, they can give us an unwarranted impression of the accuracy of some calculation (more later). It is common therefore, to limit the number of significant figures or set a number of decimal places for your output. Look at the documentation for string formatting numbers and then recreate the last exercise using 0 decimal places.

[advanced] Do the same but producing 3 significant figures (use the g flag). Is this better?

In [None]:
# to complete

## A very quick introduction to plotting
We can plot data in a number of ways using the <code>matplotlib</code> library. You can find the documentation [here](https://matplotlib.org/). Make sure you have the library installed before you proceed.

To use this library we must first import it. We are interested in the <code>pyplot</code> module. Then we can use the module's <code>plot</code> function to plot the data we have. See below for an example.

In [None]:
import matplotlib.pyplot as plt

plt.plot(years,burials)
plt.plot(years, plague_deaths)


Is this the best way of showing the data? We'll discuss this in more detail through the module, but for now your task is to label the axes and provide a legend. Investigate the documentation and other online sources to see how.

In [None]:
# to complete
# no need to import module again.

Look again at Exercise 3 from (Ross,2017)[Chp. 1] and the variables you created above. Can you create plots for this data too? Does this help answer the question?

In [None]:
# to complete

## Graunt's Mortality Table (Ross, 2017, Tbl. 1.3)
Here we are going to investigate Graunt's mortality table (Table 1.3) from the textbook. Let's encode the data before we start to explore it.
We'll start by encoding the age boundaries between rows as a single numpy array (variable <code>age_boundaries</code>) and the deaths per row as another (variable <code>deaths</code>). Why is it better to encode the age boundaries first rather than anything else?

Now create arrays <code>lower_boundaries</code> and <code>upper_boundaries</code> using the existing variables. What is the problem?

In [None]:
# no need to import numpy module again (here called np)
age_boundaries = np.array([
    0,6,16,26,36,46,56,66,76])
deaths = np.array([36, 24, 15, 9, 6, 4, 3, 2, 1])
print(f"age_boundaries = {age_boundaries}")
print(f"deaths = {deaths}")

# to complete

What barriers are there to us performing an average age at death calculation?

Instead, you are going to calculate values for which there aren't similar barriers. Use python and your knowledge of the data to calculate the following.

  1. Given Graunt's analysis, how many people in each 100 die before their 16th birthday?
  1. Given this analysis and 387200 births (Graunt's estimate of the population of London), how many of those people would die before their 16th birthday?
  1. Based on these values, what proportion of each people die sometime between the age of 36 (inclusive) and 56 (exclusive) in Graunt's time?
  1. Given all those still alive on their 26th birthday, what proportion go on to live to 56 or over?

Print out these values with appropriate statements.

In [None]:
# to complete