# The Basics of Programming with Python

These first nine notebooks will take you from zero to hero: teaching you the basics of python syntax and the key features of programming in Python.

The remaining three notebooks will introduce key modules that you're likely to use in the future. We will talk through these during the course but you will do the exercises after the course.

We will use the notebooks to work through a series of descriptive text, examples and exercises.

Tasks (in light blue) will be completed during the course with Exercises (in light orange) for completing outside the course and building up your skills.

## Why Python?

First, why are we using Python? You may have heard of other programming languages like R, Java or C++ and each programming language has its benefits. Python has several benefits that make it an ideal starting language for most scientists:

* Simple syntax that's human readable
* Very widely used in the scientific community
* large community support and good documentation
* Packages provide almost infinite extensiblility - and there are many scientifically-focussed packages
* Obvious 'pythonic' ways of doing things

## Python Files

Generally, Python files are plain text files that end with the `.py` extension. This let's the operating system know that the file is a Python program and to interpret the text as Python commands. Such files can be edited with a text editor and run from the shell.

## Jupyter Lab

However, we will focus on using Jupyter Lab notebooks (`.ipynb` extension). Notebooks are a more user-friendly way or working with Python and enable you to combine text notes (like this 'cell'), your Python codes and the results of codes - all in one place.

In general, Jupyter Lab will probably satisfy most of your programming needs in the near future.

## Cells

In Jupyter each block is called a cell. The currently selected cell is highlighted by a blue bar to the left and, if the cell is being edited, a blue frame around the cell.

You can move between cells using the arrow keys (up and down); however, you must exit edit mode before doing so. This can be done by hitting the <kbd>Esc</kbd> key.

### Markdown Cells

Markdown cells, like this cell, are a form of formatted text. If you double click this cell (or press <kbd>Enter</kbd> when the cell is highlighted) it will change from pretty, formatted text to Markdown text.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 1.1:</strong> Double click this cell to see the following.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/OGFurQ15a5s '>here</a> for a walkthrough.</div>

1. How numbered lists work
2. How bulleted lists work:
  * How **bold** text can be created
  * How to use *italics*
  * And how to show `code`
  
Throughout this course we recommend you add your own Markdown cells to make additional notes that you may find useful in the future.

To add a new cell, highlight an existing cell, and hit the <kbd>A</kbd> key to create a cell above or <kbd>B</kbd> to create one below. This cell is, by default, a code cell. Hit <kbd>M</kbd> to change the cell to Markdown. (You can always hit <kbd>Y</kbd> to convert it back to code.)

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 1.2:</strong> Let's create a new Markdown cell below this one where we can describe how to do this with the mouse and user interface.
<br/>
When you've done this, or if you get stuck, see the video <a href=' https://youtu.be/Ce3meEVIyfI'>here</a> for a walkthrough.</div>

### Python Cells

Code cells, like the one below, can be filled with a series of Python commands. When the cell is run (<kbd>Ctrl+Enter</kbd> or <kbd>Shift+Enter</kbd>) the output is printed below.

Note that a code cell has a pair of square brackets to the left of the cell. If the square brackets are empty (`[ ]`) then the cell has not been run; if the square brackets contain an asterisk (`[*]`) then the cell is running; and if the square brackets contain a number (e.g. `[5]`) then the cell has been run and was the, for example, fifth cell to be run. The running order of cells is important as Jupyter can only know about data and functions in cells that have already been run.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 1.4:</strong> Let's run our first Python script - Hello World! (This is a conventional first code for any programming language.)
<br/>
When you've done this, or if you get stuck, see the video <a href=' https://youtu.be/4tnTID73n2g'>here</a> for a walkthrough.</div>

In [None]:
print("Hello World!")

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 1.5:</strong> The cell below contains some Python code; however, it has accidentally been converted to a Markdown cell (see above for how to do this). Can you convert the cell back to a code cell using the Jupyter Lab interface? How might you do this with a keyboard shortcut?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/ep04qB40-lM'>here</a> for a walkthrough.</div>

print(1 + 2)

### Closing Notebooks

Now we've completed this notebook we need to close it a shutdown the 'kernel', which runs and stores the Python data. We do this to prevent an old notebook for keeping resources such as your computer's memory.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 1.6:</strong> Close this notebook and shutdown the kernel.
<br/>
When you've done this, or if you get stuck, see the video <a href=' https://youtu.be/Z4GGt3LyZ08 '>here</a> for a walkthrough.</div>

### Help with Jupyter

Sometimes we all need help. Thankfully Jupyter Lab has a help menu that links to helpful documentations for:

* Keyboard Shortcuts in Jupyter
* Jupyter
* Markdown
* Python
* NumPy, SciPy, Matplotlib and pandas (all of which we'll get to later)

**Now let's do the blue Tasks 1.1 to 1.5 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

# Basic Python Syntax and Variables

## Comments

First, let's address comments. As well as using Markdown cells in Jupyter, it's good practice to use Python comments throughout your code, giving the reader (usually you, six months later) useful hints about what variables contain or operations do.

In Python, the comments are specified by the `#` symbol. Anything after the `#` symbol will not be executed when you run the cell.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.1:</strong> Run the following cells note how everything after the <code>#</code> is ignored. Note how Jupyter Lab automatically prints the final line of each cell.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/9iB1ov8N2Q4'>here</a> for a walkthrough.
</div>

In [None]:
1 + 1 + 2 + 3

In [None]:
# Everything after the # is ignored
1  # + 1 + 2 +3

## Arithmetic Calculations

Python has several standard operators, which can be used without any special knowledge. In the next two cells we use the `+` and `*` (multiplication) operators.

In [None]:
total = 1000
total

In [None]:
print(total)

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.2:</strong> Create two new Python cells (below this one) and take a guess at how to calculate:
<ol>
    <li>1024 minus 512</li>
    <li>42 divided by 6</li>
</ol>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/ZmvUGkbXtvc'>here</a> for a walkthrough.</div>

## Variables

**Variables** are ways of accessing data that we want to use multiple times or that we want to perform operations on. Variables are created with the `=` symbol - we call this 'assignment'.

Variable names can only contain letters, numbers and underscores and they cannot start with a number (nor should they start with an underscore). There are many conventions for naming variables but the key thing is: **make it understandable**.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.4:</strong> In the next cell, complete the Python with your own details and run the cell.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/2Eg07VFnOZw'>here</a> for a walkthrough.</div>

In [None]:
age =
first_name =
last_name =
shoe_size =

**N.B.** Strings (letters, words and sentences) must always be delimited by `'` or `"`. We will encounter some other rules about strings as the course proceeds.

### Operations on Variables

Operations can be done on variables just as they can on raw numbers.

It's also very important to understand that variables **persist** between cells - they continue to exist after a cell has been run.

**Now let's do the blue Tasks 2.1 to 2.5 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.5:</strong> Given that knowledge, create a new Python cell (below this one) and calculate your age divide by your shoe size.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/gPu4Jr4sBec'>here</a> for a walkthrough.</div>

### Indexing

Certain variables, such as those holding strings, can be 'indexed' - we can access individual elements (like letters in word).

In Python we do this with square brackets, `[]`.

**Important:** But it's really important to remember that Python (like many other programming languages) starts indexing at `0`.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.7:</strong> In the next cell index your <code>first_name</code> variable to get just the first vowel in your name.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/7-LHjwRsJCo'>here</a> for a walkthrough.</div>

In [None]:
first_name[]

### Length of Variables

Variables that can be indexed, such as strings (above), also have a length (i.e. how many letters in a word).

Python has a built-in function, `len()`, which will give you this length.

**Now let's do the blue Tasks 2.7 and 2.10 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.10:</strong> In the cell below, add your variable to find the length of your last name.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/N2Fq5JXNlXo'>here</a> for a walkthrough.</div>

In [None]:
len()

## Printing Variables

You've already seen the `print()` function in the first notebook - remember 'Hello World!'?

The `print()` function (note the brackets) is one of many built-in Python functions and we'll use it a lot throughout this course.

Knowing how to print variables is one of the most important tools in a Python programmer's arsenal. If your code isn't working then printing important variables can be a useful tool for debugging (fixing your code).

Sometimes you just want to print a variable, such as `first_name` in which case you would just pass that variable name to the print function:
```python
print(first_name)
```
Often it's more useful to print out several variables and some context.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.11:</strong> Run the next cell to print out your full name, age in the context of a complete and useful statement. There's a lot of new syntax in this next cell but we'll discuss it all afterwards.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Jv3-nGJkV7I'>here</a> for a walkthrough.</div>

In [None]:
print("{0} {1} is {2} years old.".format(first_name, last_name, age))

### The `.format()` Method

**Disclaimer**: This topic is a little tricky to start with. Don't worry if you don't get it straight away - as with a lot of Python it becomes clearer the more you use it and we will use it throughout the course.

Let's take some of this new syntax one step at a time.

`.format()` (technically `str.format()`), just like `print()`, is a built-in function.

However, the `print()` function only operates on what you pass into the brackets, such as `first_name` in the example above.

The `str.format()` function operates on the data before the dot too - we call this type of function a method. Think of it as a function which has a "special argument" which is written before the dot, and the normal arguments passed in a bracket as usual.

In the case of `.format()`, the data before the dot must be a string (or `str`).

In the example above we have a string `'{0} {1} is {2} years old and has a shoe size of {3}.'` and we call the `.format()` method on that string - so Python will do a formatting operation on that string.

So how does Python know what to format and how?

In this example we haven't told Python 'how' to format anything - Python is very good at taking guesses with that - and we won't cover this until much later in the course.

But we have told Python 'what' to format - we passed four variables to `.format()` inside the brackets: `first_name`, `last_name`, `age` and `shoe_size`.

So Python knows the four things it needs to format, and guesses how it should format them, but 'where' should Python put this formatted data?

That's where `{0}` comes in - `{0}` means 'put the zeroth variable here', `{1}` means 'put the first variable here', and so on. (Remember that Python indexes from zero.)

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.12:</strong> In the cell below, change the code to swap your first name and your last name and report your shoe size instead of your age.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/7egvS4tsea0'>here</a> for a walkthrough.</div>

In [None]:
print("{0} {1} is {2} years old".format(first_name, last_name, age))

## Watching Variables Change

It's often very useful to watch how a variable changes after each operation - usually when you're confirming your codes work or trying to find out why they don't!

**Now let's do the blue Tasks 2.12 and 2.13 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 2.13:</strong> Read the following Python code cell - what value do you think <code>myVariable</code> will be at each call of the <code>print()</code> function? Don't run the cell before you make a prediction and add your prediction as a comment on each line.
<br/>
Run the cell to confirm your expectations.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/egujpXm8gkc'>here</a> for a walkthrough.</div>

In [None]:
myVariable = age
print(myVariable)

myVariable = myVariable / shoe_size
print(myVariable)

myVariable = myVariable + 10
print(myVariable)

myVariable = first_name
print(myVariable)

# Data Types in Python

## Data Types

So far we've said that variables store information or data. But Python needs to know what 'type' of data a variable is in order to understand that data.

For example, Python (and hopefully you) thinks that it's perfectly okay to multiply two numbers together but not two words!

Python has a large number of types but for now we will consider only three:

* `int` or integers (whole numbers)
* `float` or floats (numbers with decimal information)
* `str` or character strings, i.e. text
  * Note that strings are always delimited by `'` or `"`

Python also has a built-in function that will tell you what type some data is.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.1:</strong> Read the next cell and predict what the five outputs will be - then run the cell.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/ajgHILv0p-4'>here</a> for a walkthrough.</div>

In [None]:
#### print(type(52))

print(type("52"))

print(type(52.0))

print(type(52.0))

myVariable = 52

print(type(myVariable))

## Operations and Types

All operators, methods and functions will only work on specific data types. Or, perhaps more confusingly, will act differently on different data types.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.3:</strong> Create a new code cell below this one for each of the following:
    <ol>
        <li>add (using <code>+</code>) an <code>int</code> with a different <code>int</code></li>
        <li>add an <code>int</code> with a <code>float</code></li>
        <li>add a <code>str</code> with a different <code>str</code></li>
        <li>add a <code>str</code> with an <code>int</code></li>
    </ol>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/_s6seW4vpuE'>here</a> for a walkthrough.</div>

### Mixing `int` and `float`

In some scenarios it makes sense to mix `float` and `int` data in our calculations.

For example, if dinner costs £141.24 (a `float`) and we want to split the bill amongst 6 people (an `int`) Python can do that. In the background it sees that we're mixing `float` and `int` variables and converts everything to `float`.

**Now let's do the blue Tasks 3.1 to 3.4 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.4:</strong> Complete the next cell to work out the cost of dinner per person. What type is the output of your calculation?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/QqSlrCmP3eg'>here</a> for a walkthrough.</div>

In [None]:
totalCost = 141.24

numberOfPeople = 6

costPerPerson =

print(costPerPerson)

## Methods/Functions and Types

And what about methods, like `.format()`, or functions, like `len()`?

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.6:</strong> Create a new code cell below this one for each of the following tasks:
    <ol>
        <li>Run <code>.format()</code> on a variable that holds an integer (don't forget to use <code>print()</code> as well)</li>
        <li>Print the length (<code>len()</code>) of a variable that holds a float</li>
    </ol>
What do the <code>AttributeError</code> and <code>TypeError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/mrtTnlJ3Prs'>here</a> for a walkthrough.</div>

## `.format()` and Types

We've shown in the above exercises that the `.format()` methods works on `str`s and not on `int`s (nor `float`s).

So, how do we do pretty printing of our number data?

As a reminder, in the previous notebook we defined four variables: `age`, `first_name`, `last_name` and `shoe_size` and we printed them with some formatting.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.7:</strong> Run the following code cell. The variables are not shared between different notebooks so you need to initialise them again.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/HrL71GHOEa4'>here</a> for a walkthrough.</div>

In [None]:
age =
first_name =
last_name =
shoe_size =
print('{0} {1} is {2} years old.'.format(first_name, last_name, age))

In this scenario, Python has guessed that `first_name` and `last_name` are `str` and that `age` is an `int`.

But what if age was a float?

And what if we wanted to print the integer number of years from that float?

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.8:</strong> In the following cell, assign a new float value to <code>age</code> that is your exact age (or there abouts - use at least 5 decimal places). Then run the cell, what is the output when this new age is printed?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/RZtaCoCkIRQ'>here</a> for a walkthrough.</div>

In [None]:
age =

print('{0} {1} is {2} years old.'.format(first_name, last_name, age))

As you can see, Python now interprets `age` as a `float`, which it is.

In order to print just the integer number of years we can take two approaches:

1. We can modify our data and round age to the nearest whole number and there are a few ways to do that in Python *but* this changes our data and we just want to print it in a formatted way.
2. We can modify the string that `.format()` acts upon. And this is what we'll do.

When we first learnt `.format()` we learnt that `{0}` means 'put the first variable here' and `{2}` means 'put the third variable here' and that the variables are given inside the parentheses.

Inside the braces (`{` and `}`) we can put extra information that tells Python what data type we're passing and how we want Python to treat it.

The syntax for providing additional formatting information is to follow the variable number with a colon (`:`) and the formatting specifiers. To tell Python that this variable is a `float` we use `f` and to tell it that we want zero decimal places we use `.0`. Note that type (here `f`) always comes at the end.

Altogether `{2}` becomes `{2:.0f}`.

**Now let's do the blue Tasks 3.9 and 3.10 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.9:</strong> Run the next cell. Change the number after the decimal to print your age to two decimal places - don't forget to rerun the cell.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Aib31-o8o-0'>here</a> for a walkthrough.</div>

In [None]:
print("{0} {1} is {2:.0f} years old.".format(first_name, last_name, age))

Now, you might like to add more decimal places to the displayed age. To do that write number of decimal places after dot: `2.1f` for one decimal, `2.2f` for two. 
You can also display `int` with decimal points if you choose to, by formatting them as floats!

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 3.10:</strong> Modify the code below to display height with 1 mm accuracy.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Z4M_9Z3-rxw'>here</a> for a walkthrough.</div>

In [None]:
height = 181  # cm
print("My height is precisely {0} cm".format(height))

Usually, you will be formatting floats to a given number of decimals places so we won't cover any other formatting options in this course.

# Built-in Functions, Help and Documentation

## Functions

A function is a short command that allows you to access an underlying algorithm. Often a function takes some input data and returns some output data that has been transformed in some way.

We've already been introduced to, and used, three built-in Python functions: `print()`, `len()` and `type()`. 

* `len()` takes some input data (usually a string) and returns the length of that string
* `type()` takes some input data and returns that data's type
* `print()` takes some input data (a string) and prints it to your notebook (we will discuss what `print()` returns below)

Python has many, many built-in functions and, through packages, other functions.

All Python functions have several features in common:
* They take zero or more parameters or arguments (inputs)
* They always return something (an output or result) 
* They sometimes *do* something (for instance, print things) 
* They (should) return useful error messages when something goes wrong
* They (should) be well documented

### Built-in Functions

Python built-in functions are all functions that don't need a Python package to be imported (this will make more sense later).

One common function that you might use is `round()`.

This function takes a float as an input parameter and returns a new float (not an int) rounded to zero decimal places.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.1:</strong> In the code cell below this cell, create a new variable with pi rounded to zero decimal places. Print this variable and it's type.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/uCWjUQTt1qI'>here</a> for a walkthrough.
</div>

In [None]:
piToSix = 3.141593

### Optional Parameters

Actually, `round()` can take more than one parameter. This function, and many others, has an optional parameter.

An optional parameter is a parameter that has a default value.

The optional parameter for `round()` is `ndigits` and the default value is `None`. Note that `None` is a special Python object that we'll cover in more detail below. When `ndigits=None`, the `round()` function rounds the number to zero decimal places.

But what if `ndigits=3`?

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.2:</strong> Create a new code cell below this cell. In it create a new variable with pi rounded to three decimal places. Print this variable.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/d2qlShqrpfk'>here</a> for a walkthrough.
</div>

## Returns

As mentioned above, all functions return something. That means, everytime a function is called (run) a new piece of information is provided.

In our examples above (rounding pi), `round()` returned a float, which we assigned to a variable.

**N.B.** Remember that variables persist throughout the life of a notebook. This means `piToSix` defined above can be 'seen' by the cell below.

**Now let's do the blue Tasks 4.1 to 4.3 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.3:</strong> What does a call to <code>print()</code> return? Run the next cell to find out.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/8nUYnJaPjqo'>here</a> for a walkthrough.</div>

In [None]:
print(piToSix)  # This function does something, but we don't store its result
returnFromPrint = print(piToSix)  # Now, we also store the result of printing.
print(returnFromPrint)  # The result is not very interesting.... Or is it?

### `None`

We've now encountered this magical `None` a couple of times.

`None` is a special Python value.

Sometimes we have a piece of information, returned from a function for example, that has no value, i.e. it's empty. Amongst other issues, leaving this piece of information blank would cause problems for a human trying to see what's going on in their code. So Python puts this special `None` value there.

## Function Documentation

When any programmer encounters new functions, uncommon functions or functions they've just forgotten how to use they need help. This is the purpose of documentation.

### `help()`

There are many ways of accessing the documentation for Python functions.

One way is to use Python's built-in `help()` function.

Calling `help()` with a function name as a parameter will return the documentation for that function.

### Jupyter's Tricks

Jupyter Lab has two other quick ways to access documentation for a function:

* Place the cursor between the parenthesis of a function and press `Shift+Tab`
* Type a function name followed by a question mark and run the cell

## Online Documentation

Finally, most Python package functions and all built-in functions have online documentation.

**Now let's do the blue Tasks 4.5 to 4.8 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

## Errors Are Helpful

One of the most off-putting things for new programmers are errors.

But, in reality, a good error message can be a really helpful tool. An error message can help you identify a mistake you've made, or a mistake that you might not notice until it's too late.

One things for sure, an error message is more helpful than when the code runs (without errors) but returns an unexpected value.

**Now let's do the blue Tasks 4.10 to 4.13 - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

### `TypeError`

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.10:</strong> Run the cell below. What does this <code>TypeError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here. What useful information does the traceback tell you?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/8d6Rg_4_ThA'>here</a> for a walkthrough.</div>

In [None]:
myVariable = "This is an example string."

round(myVariable)

### `SyntaxError`

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.12:</strong> Run the cell below. What does this <code>SyntaxError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here. What does the arrow in the traceback point to? Fix this error and run the cell again.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/HH8UQyzpmbk'>here</a> for a walkthrough.</div>

In [None]:
myVariable = = 'This is an example string.

# print(myVariable

### `NameError`

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 4.13:</strong> Run the cell below. What does this <code>NameError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/tqIoriOKuh8'>here</a> for a walkthrough.</div>

In [None]:
myVariable = "This is an example string."

print(myString)

# What We've Covered So Far

* Python and Jupyter Lab
* Variables and types
* Functions and methods
* Printing and `.format()`
* Documentation and `help()`
* Errors are useful!

## Some Reminders About Errors

* `TypeError` indicates a function parameter has an incompatible data type, e.g. calling `round()` on a string
* `AttributeError` often indicates an object does not have the method you've tried to call, e.g. calling `.format()` on a number
* `SyntaxError` indicates a writing error in your code, usually an open string or missing parenthesis - look for the arrowhead
* `NameError` indicates that the function or variable you've called doesn't exist (did you forget to run a previous cell?)

Remember: you can post question's in the chat and Miks will answer them in there or will ask me to explain if it's likely to be a question everybody has.

Also, feel free to email me: chas.nelson@glasgow.ac.uk.

# Python Lists (Notebook 05)

## The `list` data type

As scientists, we usually work with many samples or measurements. Creating a new variable for each measurement would make our codes complex and cumbersome. With enough different variables we might as well do it all by hand.

In Python there are several ways to storing lots of values in a single variable. We will cover three of these in the course: 
- Python lists, 
- NumPy arrays, 
- Pandas dataframes.

The Python `list` is a data type (remember our other data types). Unlike, for example, `int`, the `list` data type is able to store many different values in one place.

A `list` is 'contained' within square brackets (`[...]`) and each element within the list is separated by a comma (`,`).

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.1:</strong> Look at the cell below to see an example of creating a list of numbers. Add a line to print the entire list and run the cell. How might we find out how many elements are in our list?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Im-ybtmQ8w0'>here</a> for a walkthrough.
</div>

In [None]:
myVariable = [100, 2.5, 17.0, 9.5, 12.0]

## Indexing (continued)

We learnt earlier about indexing character strings. Strings and lists are similar in some ways and indexing is one of them.

```python
my_string = "Hello world!"
print(my_string[4])  # will print "o"

my_list = ["H", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d", "!"]
print(my_list[4])  # will also print "o"
```

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.2:</strong> Create a new code cell below this one. Remembering that variables persist between cells, print only the first element in <code>myVariable</code>. Edit your code to print the last variable in the list and run the cell again.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/qAbNbnwyrNg'>here</a> for a walkthrough.
</div>

Once you know the index of an element, you can also modify that element within the list.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.3:</strong> Create a new code cell below this one. Remembering the <code>type()</code> built-in Python function, print the data type of the first and last elements.
<br/>
If you get stuck, see the video <a href='https://youtu.be/uCbWE3z4EfM'>here</a> for a walkthrough that includes the answer for the next task too.
</div>

Lists can contain elements of different datatypes. In fact, lists can even contain other lists!

Beware, this is not always true of other data structures.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.4:</strong> Correct this mistake by replacing the first element of <code>myVariable</code> with the same number but as a float.
<br/>
When you've done this and the cell above, or if you get stuck, see the video <a href='https://youtu.be/uCbWE3z4EfM'>here</a> for a walkthrough.
</div>

## Slicing

As well as accessing individual elements of a list, we can access a 'slice' of elements.

List slicing requires a `start` index, a `stop`, index and a `step` size (which defaults to 1):
```python
myVariable = [100.0,2.5,17.0,9.5,12.0]
slicedElements = myVariable[start:stop:step]
```

**Important** Remember that Python starts indexing at `0` and that the last element can also be accessed as `-1`. So accessing every other element of the whole list would be:
```python
myVariable = [100.0,2.5,17.0,9.5,12.0]
slicedElements = myVariable[0:-1:2]
```

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.5:</strong> Create a new code cell below. In it, slice <code>myVariable</code> to access the third, fourth and fifth elements and store them as a new variable. Print that variable.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/v92RAPgBISY'>here</a> for a walkthrough.
</div>

## `list` Methods

There are many methods for the list data type. You can find a list of `list` methods with the help function or using online documentation.

Remember that methods (like `str.format()`) act on an object, which is before the `.`, and (like functions) accept input parameters, which go between the parantheses.

Note that many of these methods work 'in-place', i.e. they update the variable they act on (and thus return `None`).

We are going to focus on two methods that are commonly used:

* `list.append()`, which appends a new element to the end of a list in-place and returns `None`
* `list.pop()`, which removes an element from a list in-place and returns that element

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.8:</strong> Run the cells below to see examples of both of these methods.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/fSbbaUNLc0k'>here</a> for a walkthrough.
</div>

In [None]:
myVariable = [100.0, 2.5, 17.0, 9.5, 12.0]
print("My original list:")
print(myVariable)

In [None]:
myVariable.append(51.6)  # add some new data
print("My list with an extra element:")
print(myVariable)

In [None]:
myElement = 0  # the element to 'pop'
myValue = myVariable.pop(myElement)  # remove an anomolous piece of data
print("My list with a removed element:")
print(myVariable)
print("I removed element {0} with value {1}".format(myElement, myValue))

# Errors are helpful (continued)

## `IndexError`

Now we're comfortable with indexing and slicing we can introduce a new error: `IndexError`.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 5.9:</strong> Run the cell below. What does this <code>IndexError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/uKomV_6PPo4'>here</a> for a walkthrough.
</div>

In [None]:
print(myVariable[100])

## Key Points

* `list`s are a Python data type
* `list`s can store many values (of different data types) as a single variable
* `list`s can be indexed and sliced to access the elements within the list
* Python has methods for modifying `list` variable such as `list.append()` and `list.pop()`
* Errors are helpful! (again)
* `IndexError` indicates an attempt to access an index beyond the end of a list

**Now let's do the blue Tasks - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

In [None]:
myVariable = [100, 2.5, 17.0, 9.5, 12.0]

print(len(myVariable))  # How long is my list (Task 5.1)

print(myVariable[0])  # The first element of my list (Task 5.2)
print(myVariable[-1])  # The last element of my list (Task 5.2)

print(type(myVariable[0]))  # The type of the first element of my list (Task 5.3)
print(type(myVariable[-1]))  # The type of the last element of my list (Task 5.3)

In [None]:
myVariable[0] = float(myVariable[0])  # or `myVariable[0] = 100.0` (Task 5.4)
print(myVariable)

newVariable = myVariable[
    2:5
]  # The third, fourth and fifth elements as a new variable (Task 5.5)
print(newVariable)

newVariable.append(10.0)  # append a new number
print(newVariable)

# `for` Loops in Python (Notebook 06)

## Loops

As with lists, we usually want to process many samples or measurements. Writing a line of code to process each measurement would make our codes complex and cumbersome. Like with lists, with enough variables to process, we might as well do it all by hand.

There are two key ways to do repetitive tasks in Python - the `for` loop and the `while` loop. They work in slightly different ways but we'll only cover `for` loops today.

## `for` loops

In Python, a `for` loop takes a group of values and does an operation on each value in turn.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.1:</strong> Run the next two cells and compare their outputs. Imagine you had hundreds of values to process, which would you rather do?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/ToKt2Wi9BMc'>here</a> for a walkthrough.</div>

In [None]:
print(1)
print(1 * 1)
print(2)
print(2 * 2)
print(5)
print(5 * 5)
print(9)
print(9 * 9)

In [None]:
for number in [1,2,5,9]:

Let's break this `for` loop down:

* The `for` statement tells Python we want to run a loop.
* `number` is a variable that exists only within the loop, we call it the *iterator*. An iterator always represents "the thing we want to process now".
* `in [1,2,5,9]` tells the `for` loop that `number` should take (in turn) the values *in the list* `[1,2,5,9]`.
* Finally, `print(number)`, or whatever code we choose to do inside your loop, is the code we want to repeat.

But there are two other extremely important features of the cell above:

1. The `for` statement must end in a colon, this signals the start of the code you want to repeat.
2. The body of the statement must be indented, this is usually four spaces but can be anything consistent, e.g. one space, one tab, etc.. Thankfully, Jupyter should autoindent lines following a colon for you.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.2:</strong> Read the next cell carefully. What do you think the value of <code>myVariable</code> will be during each step of the loop? Add a call to <code>print()</code> in the cell and check that you're right.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/fBlAf0dR9L8'>here</a> for a walkthrough.</div>

In [None]:
myVariable = 0

for number in [1, 2, 5, 9]:
    myVariable = myVariable + number

## Errors are helpful (continued)

### IndentationError #1

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.3:</strong> Run the cell below. What does this <code>IndentationError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here. Modify the cell to fix the error and re-run it.
<br/>
If you get stuck, see the video <a href='https://youtu.be/G-FUA8TcDrE'>here</a> for a walkthrough, which also includes the next two tasks.</div>

In [None]:
for number in [1, 2, 3]:
print(number)

### IndentationError #2

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.4:</strong> Run the cell below. What does this <code>IndentationError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here. Modify the cell to fix the error and re-run it.
<br/>
If you get stuck, see the video <a href='https://youtu.be/G-FUA8TcDrE'>here</a> for a walkthrough, which also includes the previous and next tasks.</div>

In [None]:
firstName = 'Chas'
    lastName = 'Nelson'

### IndentationError #3

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.5:</strong> Run the cell below. What does this <code>IndentationError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here. Modify the cell to fix the error and re-run it.
<br/>
When you've done this and the above two cells, or if you get stuck, see the video <a href='https://youtu.be/G-FUA8TcDrE'>here</a> for a walkthrough, which also includes the previous two tasks.</div>

In [None]:
for number in [1, 2, 3]:
    print(number)
#   print(number*number)

## Key Points

* Loops enable your Python codes to repeat tasks
* A `for` loop runs code on an iterator, which takes each value in a list (one-by-one)
* `for` loops must have a colon and everything after the colon should be indented
* There can be many lines of code inside a `for` loop.
* Errors are helpful! (yet again)

**Now let's do the blue Tasks - around 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

In [None]:
myVariable = 0

for number in [1, 2, 5, 9]:
    myVariable = myVariable + number

print(myVariable)

## `for` loops (continued)

### `range()`

Now we've seen how a `for` loop is constructed using a list of numbers, what if we want to run some code over a range of regularly spaced values?

Python has a built-in function to help with this: `range()`.

```python
for number in range(10):
    print(number)
```

**Try Tasks 6.6 and 6.7 over the weekend if you haven't had chance to do them.**

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.6:</strong> Run the cell below. to see the documentation for the <code>range()</code> function.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Y3EVembmdvs'>here</a> for a walkthrough.</div>

In [None]:
?range

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 6.7:</strong> The following cell loops through the numbers 0 to 9 (inclusive) and prints them one-by-one. Modify the code to print the numbers 90 to 100 (inclusive).
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/iE0ZFyKPh-8'>here</a> for a walkthrough.</div>

In [None]:
for number in range(10):
    print(number)

# `if` Statements in Python (Notebook 7)

## Conditionals and `if` statements

In programming a 'conditional' gives code the ability to make a choice - usually a yes/no choice.

The most common conditional is the `if` statement, which takes the following form:

```python
if <BooleanExpression>:
    print('Do something in this special situation.')
```

Like a `for` loop, there are two special features of `if` statements:

1. `if` statements must end in a colon, this signals the start of the code you want to run in that situation.
2. The body of an `if` statement (called a branch) must be indented,. Thankfully, Jupyter should autoindent lines following a colon for you.

## Boolean expressions

But what about our `<BooleanExpression>` (above).

A Boolean expression is any expression that can be evaluated as true or false, e.g.

* $1=1$ is true
* $1=2$ is false
* $1<2$ is true
* $4\ge5$ is false

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 7.1:</strong> Run the following cells to see different examples of boolean expressions.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/Au420nePsJ0'>here</a> for a walkthrough.</div>

In [None]:
45 == 45

In [None]:
7 >= 6 or 3 > 1

In [None]:
True and False

In [None]:
True and "same string" == "same string"

In [None]:
"same string" == "different string"

In [None]:
123 != "123"

Note that in Python, if we want to see if two numbers are equal we use a double equals: `1==1`.

Like `None` (see Built-in Functions, Help and Documentation, above), `True` and `False` are special Python values.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 7.2:</strong> Run the following cell to see a simple example of a Boolean expression. In the same cell, write an <code>if</code> statement that only prints a message if <code>myVariable</code> equals one.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/loaDJy1E1Lo'>here</a> for a walkthrough.</div>

In [None]:
myVariable = 1
# here we assign a value of 1 to our variable

print("An example of a False Boolean expression:")
print(myVariable == 10)  # our variable does not equal 10, i.e. this statement is False

print("An example of a True Boolean expression:")
print(myVariable == 1)  # our variable does equal 1, i.e. this statement is True

## `if`, `elif` and `else`

So an `if` statement allows us to run some code in only a specific situation (a branch), but what if we have more than one situation (multiple branches)? Or want to have some default code that's run in all situations where the `if` branch is not taken.

Python has two additional statements for these cases:

* `elif`, short for 'else if', allows us to use multiple conditions and create several branches of code.
* `else`, allows us to set an alternative code that it run only if no branches are taken.

Note that all conditionals are run in order.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 7.4:</strong> Read the following cell. What do you expect the output to be? Run the cell to confirm this. Now change the value of <code>myVariable</code> so that a different branch is taken. Re-run the cell and repeat for the third branch.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/7ncn_MV7yyQ'>here</a> for a walkthrough.</div>

In [None]:
myVariable = 1

if myVariable == 1:
    print("myVariable is equal to one")
elif myVariable == 2:
    print("myVariable is equal to two")
else:
    print("myVariable equals anything else")

## Combining conditionals and loops

More often than not an `if` conditional is most useful inside a `for` loop - the `for` loop can run through a series of values and the `if` conditional can choose what to do with that value.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 7.6:</strong> Read the cell below. This for loop aims to go through a list of values and, if that number is greater than 10, adds that number to a total, else prints a warning to the user that the value has been ignored. Replace all the gaps (<code>____</code>) in the cell so that it runs without errors and produces the right output values.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/hx1AGxik1Y0'>here</a> for a walkthrough.</div>

In [None]:
myData = [12,9,13,19,2,16,7,10,4,1,6,18,11]
myTotal = ____

for number in ____:
    if number ____ 10:
        ____ = myTotal + number
    else:
        print('Warning: Program did not use value {0} as it was too low.'.format(number))

print('The total value is {0}'.format(____))

## Key Points

* Conditionals enable your Python codes to choose between options
* An `if` statement uses a Boolean conditional to decide whether or not to execute some code (a branch)
* `if` statements must have a colon and everything after the colon should be indented
* There can be many lines of code inside an `if` statement (a branch).
* `elif` statements can be used to include additional branches of code
* `else` statements can be used to run a branch of code when no `if` or `elif` statements are run
* Conditionals are tested in order.

**Now let's do the blue Tasks - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

In [None]:
myData = [12, 9, 13, 19, 2, 16, 7, 10, 4, 1, 6, 18, 11]
myTotal = 0

for number in myData:
    if number > 10:
        myTotal = myTotal + number
    else:
        print(
            "Warning: Program did not use value {0} as it was too low.".format(number)
        )

print("The total value is {0}".format(myTotal))

# Defining Custom Python Functions

## Functions

Programs can easily become large and unweilding. Functions enable us to break programs down into smaller chunks, making our code easier to understand *and* easier to debug.

A further benefit of functions is that they can be reused - it's the 'write once, use often' philosophy of coding.

We've already called several built-in functions, which might seem a little bit like black boxes but below we will define our own functions and control their internal workings ourselves.

## Defining Functions with `def`

Defining our own functions in Python uses a `def` statement and takes the following form:

```python
def my_function():
    print('My custom function has been called.')
    return
```

There are five important features of `def` statements:

1. Our definition starts with `def`.
2. This is followed by a name for our custom function. Function names follow the same rules as variable names.
3. After the variable name we need a pair of parentheses (we will come back to these below).
4. `def` statements must end in a colon, this signals the start of the code you want to run when your custom function is called.
5. The body of a `def` statement must be indented. Thankfully, Jupyter should autoindent lines following a colon for you.
6. The final line of a function must `return` something (refer back to Returns in Built-in Functions, Help and Documentation above). If nothing is provided (as here) then the function returns `None`.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 8.1:</strong> Run the following cell to define <code>my_function</code>, as defined above. Why does the cell not print anything?
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/s7Mqud_YJDc'>here</a> for a walkthrough.</div>

In [None]:
def my_function():
    print("My custom function has been called.")
    return

## Functions and Parameters

Many of the built-in functions we've used so far have input parameters.

These are defined by putting the name of a parameter between the parentheses of our `def` statement. In fact, we can include several parameters, like so:

```python
def print_date(year, month, day):
    print('The ISO format date is {0}-{1:02}-{2:02}'.format(year,month,day))
    return
```

N.B. The formatting `{0:02}` means 'format (`:`) the first parameter (`0`) with leading zeros (`0`) to a width of two digits (`2`), e.g. `'{0:02}'.format(1)` will print `01`.

### Optional Parameters

We can even make optional parameters. Or, more precisely, we can give parameters default values. These default values can be used if the parameter is not used when the function is called.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 8.2:</strong> Run the following cell to define and call <code>print_date</code>. How is this definition different to the one above? Add a second call to the function but do not use the <code>day</code> parameter.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/7ciBw5-cc90'>here</a> for a walkthrough.</div>

In [None]:
def print_date(year, month, day=1):
    # Our function defaults to the first day of the month if the day is not given
    print("The ISO format date is {0}-{1:02}-{2:02}".format(year, month, day))
    return


print_date(2019, 4, 5)

## Functions and Returns

So far we've left the return blank, which by default returns `None`.

Much of the time our functions will need to return a value.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 8.5:</strong> Read the cell below. It defines a function <code>my_classifier</code> that takes a size (a number) and returns the name of the appropriate size category (a string). Replace all the gaps (<code>____</code>) in the cell so that it runs over all the sizes and without errors.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/0HokFLxlxog'>here</a> for a walkthrough.</div>

In [None]:
def my_classifier(____):
    if size >= 75:
        category = "Large"
    elif size____25:
        ____ = "Medium"
    else:
        category = "Small"
    return ____


mySizes = [81, 61, 18, 69, 78, 66, 41, 65, 84, 18, 64, 16, 35, 65, 84, 89, 45, 13, 64]

for ____ in mySizes:
    thisCategory = ____(thisSize)
    print("This item is {0} with a size of {1}".____(____, ____))

# Python Modules

## Python Modules: Doing More

At some point, very soon, you will realise that not everything you can think of can be done by using built-in functions. You have learnt how to write your own functions but common problems already have common solutions written by other, more experience programmers. 

Python itself provides an extensive library of [modules](https://docs.python.org/3/library/index.html) which are collections of functions (and other objects, like datatypes) allowing us to do some more specific tasks. 

Let's follow a simple example using the `math` module. `math` is a module focussed on simple mathematical functions and data types in Python.

First, let's calculate the factorial of 6 manually:
<br>
$6! = 1 \times 2 \times 3 \times 4 \times 5 \times 6$

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 9.1:</strong> Edit and run following cell to calculate factorial of 6 manually. Then run the next cell, which tries to use a built-in function to calculate the factorial. What does this <code>NameError</code> mean? Create a new Markdown cell and describe, in words you understand, what has gone wrong here.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/eeqdF6EJNQ8'>here</a> for a walkthrough.</div>

In [None]:
factorial_of_6 = 6 * 5 * 4 * 3 * 2 * 1
print("Factorial of 6 is {0}".format(factorial_of_6))

In [None]:
# Let's assume there is a built-in factorial function. There should be one, right?
factorial_using_builtin = factorial(6)

So Python has no built-in `factorial` function. However, there is a `factorial` function in `math`, which also covers many other simple mathematical operations. We will now _load_ the `math` module into our notebook environment.

To load a module we use command `import <module name>`, e.g. `import math`. Once the module is imported new functions and variables will be available just like built-in functions and variables. Note that you only need to import a module once per notebook.

To call a function from the imported module we use the syntax `<module name>.<function name>`, e.g. `math.factorial()`

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 9.2:</strong> Run the following cell to import the <code>math</code> module and calculate factorial of 6. Fill in the blanks (`____`) and execute the code in the subsequent two cells to see more examples of the <code>math</code> module.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/80tzs58Hc4s'>here</a> for a walkthrough.</div>

In [None]:
import math

factorial_using_builtin = math.factorial(6)
print("Factorial of 6 is {0}".format(factorial_using_builtin))

In [None]:
pi_val = ____.pi
print("Value of Pi with 6 digits accuracy is {0:.6f}".format(pi_val))

In [None]:
sin_of_pi = ____.____(____.____)
print("Sinus of Pi (180 degrees) with 2 digits accuracy is {____:____f}".format(____))

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 9.3:</strong> Run the following cell to see high level information about the <code>math</code> module and a list of available functions.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/eiIMqMb26lc'>here</a> for a walkthrough.</div>

In [None]:
help(math)

## Different Ways to Import

### Importing Functions Directly

In some cases you might need to use one specific function multiple times. But importing the whole module (like above) requires you to use the `<module name>.<function name>` syntax to access functions within that module. To make it simpler we can import one or more specific functions directly into our 'namespace'.

To do that you can use the `from <module name> import <function name>` syntax, e.g.:

```python
from math import log
```

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 9.5:</strong> Run the following cell. Is the logarithm function avaliable in pure python? Add a line at the top of the cell to load the logarithm function from the <code>math</code> module and re-run the cell.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/MrxXjuE0X6s'>here</a> for a walkthrough.</div>

In [None]:
# Using logarithm function math.log()
log1010 = math.log(10, 10)
print("Logarithm of 10 with base 10 is {0} ".format(log1010))

# Using logarithm function log()
log8_2 = log(8, 2)
print("Logarithm of 8 with base 2 is {0} ".format(log8_2))
print("Well done!")

### Importing with Aliases

When we import libraries it is possible to shorten the library name to make coding easier. For instance, the module `datetime` ([details here](https://docs.python.org/3/library/datetime.html)) provides us with essential functionality for dealing with dates and time but its name is unnecessarily long.

You can use the syntax `import <module name> as <module alias>` to make your own short name for an imported module.

<div style="background-color:#abd9e9; border-radius: 5px; padding: 10pt"><strong>Task 9.6:</strong> Run the following cell to see an example of importing the <code>datetime</code> module using an alias. You will notice that module <code>datetime</code> contains a submodule of the same name - an unfortunate complication.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/PToi3HohJlk'>here</a> for a walkthrough.</div>

In [None]:
import datetime as dt

current_time = dt.datetime.now()
print("Current time is {0}".format(current_time))

## Key Points

* Modules hold functions for many operations you may want to do
* It's easier to import a module than write your own function from scratch
* You can import specific functions or a whole module at a time
* You can use an alias when importing to make your code easier to understand
* Details for modules can be found in their documentation

**Now let's do the blue Tasks - 15 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

# Python Modules - NumPy and Scipy

## Numpy

NumPy (`numpy`) is a large and extremely well developed module focussed on simple and complex mathematical functions and datatypes in Python. NumPy is a large module and we will only introduce you to a couple of functions today.

One of the key features of `numpy` is the introduction of a new datatype: the `numpy.ndarray`.

In [None]:
import numpy as np

In [None]:
# Create a 3x3 array with the number 1 to 9
myArray = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(myArray)
print(myArray.shape)
print(myArray.dtype)

In [None]:
# Calculate the mean and standard deviation of myArray
print("The mean of myArray is {0:.2f} ...".format(myArray.mean()))
print("...and the standard deviation is {0:.2f}.".format(myArray.std()))

In [None]:
# Calculate the mean of each column and of each row
print("The mean of each column of myArray is:")
print(myArray.mean(axis=0))
print("The mean of each row of myArray is:")
print(myArray.mean(axis=1))

The `numpy.ndarray` is particularly important if you plan to analyse images or 2D+ data, e.g. geological recordings.

However, in the next notebook we will introduce the pandas DataFrame. This is another new data type and is built upon the `numpy.ndarray`. Many `numpy.ndarray` methods are also defined for pandas DataFrames.

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.2:</strong> Find the NumPy Documentation on-line. Can you easily navigate the documentation to find useful functions?
<br/>
When you've done this and the previous task, or if you get stuck, see the video <a href='https://youtu.be/KXEYPE4ryAU'>here</a> for a walkthrough. which also covers the previous task.</div>

## Key Points

* NumPy increases the functionality of Python significantly
* NumPy provides mathematical and statistical functions
* Whilst NumPy documentation can look overwhelming, it can easily be interpreted

**Now let's do the yellow Exercises up to 10.3 - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

### Slicing (...continued)

**Note: the following topic is not in your notebook, but I have updated the ones on GitHub for you to download later.**

Just like with a `list`, it is often useful to access particular elements of a `numpy.ndarray`, e.g. access specific pixels in an image.

Also just like a list, this is done with square brackets: 

**Now let's do the yellow Exercises up to 10.5 - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

Similarly, we can navigate a 2D `numpy.ndarray` by using row and column numbers to extract a single element.

Remember: Python starts counting at zero so our axes are the 0th and 1st axes.
  
![Acessing pixels using axes.](../assets/arrays.png)

*Adapted from https://github.com/elegant-scipy/elegant-scipy*

## Key Points

* NumPy arrays can be easily access with indexing and slicing, just like list
* Errors are helpful! (again)

**Now let's do the yellow Exercises up to 10.9 - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.5:</strong> The code cell beneath this one has a 1-dimensional `numpy.ndarray`. How might you access the fifth element of this array?</div>

In [None]:
my_array = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print(my_array[4])

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.6:</strong> How might you access the second, third, fourth and fifth elements of this array?</div>

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.7:</strong> The code cell beneath this one has a 2-dimensional `numpy.ndarray`. How might you access the second element of the second row of this array?</div>

In [None]:
my_2d_array = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]])

print(my_2d_array)

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.8:</strong> The code cell beneath this one has a 3-dimensional `numpy.ndarray`. How might you access the second element of the second row of the second slice of this array?</div>

In [None]:
my_3d_array = np.array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]])

print(my_3d_array)

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.9:</strong> Now run the following cell. Do you recognise the error? Create a new Markdown cell and describe this error in a way that's clear to you.</div>

In [None]:
print(my_3d_array[1, 1, 2])

In [None]:
print(my_array[4])

print(my_array[1:5])

print(my_2d_array[1, 1])

print(my_3d_array[1, 1, 1])

print(my_2d_array[1, :])

In [None]:
print(my_3d_array[1, 1, 2])

## SciPy

SciPy (`scipy`) is another large and extremely well developed module but is focussed on mathematical, scientific and engineering functions and datatypes for Python. SciPy is also to large to cover in detail so we will only introduce you to one key function right now.

### Importing SciPy

Importing `scipy` is a little bit unusual. `scipy` has several large submodules and if you want to access functions in these submodules, they must be loaded as individual modules. For example, say you wanted to do some linear regression (which is in the `scipy.stats` submodule) and some image processing (using functions from `scipy.ndimage`) you need to import both sumodule. E.g.:

```python
import scipy.stats
import scipy.ndimage
```

In [None]:
import scipy.stats
import scipy.ndimage

# Load data
x = np.arange(0, 9, 1)  # create an array of the numbers 0 to 9
y = np.arange(0, 18, 2)  # create an array of the numbers 0 to 18 in steps of 2
im = np.zeros([10, 10])  # create a 10 by 10 array of zeros, i.e. an empty image

# Linear Regression of x and y
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)

# Apply Gaussianm Filter to im
imBlurred = scipy.ndimage.gaussian_filter(im, sigma=5)

### Linear Regression

The SciPy module contains a lot of useful stats functions including t-tests and linear regressions. Due to time constraints we will only explain the linear regression and function (which we've already used above).

In [None]:
time_seconds = [0, 1, 2, 3, 5, 7, 14, 15, 16, 17, 18, 19]
distance_metres = [0, 9, 22, 30, 48, 74, 130, 148, 160, 170, 181, 189]

new_time_seconds = np.arange(0, 20, 1)

## Key Points

* Like NumPy, SciPy increase the functionality of Python significantly
* SciPy provides scientific and engineering functions
* Whilst SciPy documentation can look overwhelming, it can easily be interpreted

Now let's do yellow Exercise up to 10.12 - 10 minutes in breakout rooms (with 5 minute wrap-up afterwards).**

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.11:</strong> Load the documentation for <code>scipy.stats.linregress</code>. Create a Markdown cell beneath this one and write, in simple English, what each of the two parameters and five outputs mean.
<br/>
When you've done this, or if you get stuck, see the video <a href='https://youtu.be/yw6TEfzuujM'>here</a> for a walkthrough.</div>

<div style="background-color:#fdae61; border-radius: 5px; padding: 10pt"><strong>Exercise 10.12:</strong> The data below represents some simple experimental data. You have two arrays: <code>time_seconds</code>, which records the time in seconds that the data was taken and <code>distance_metres</code>`, which records the distance travelled at that time (in metres). As you might notice, the times at which the data was taken are unevenly distributed (let's pretend that your colleague came in with a box of doughnuts and distracted you!) - and so you want to interpolate your data to given you measurements at evenly distributed points.

Now, you could do this with a for loop and lots of maths... but that isn't the Python way (if you can help it).
    
Working as a team, find an appropriate function in SciPy, go through the documentation together and try and create interpolated data for `new_time_seconds`.
    
We will plot this data next week.

Question: What does `np.arange()` do? How is it different to the `range()` function we've seen before?</div>

In [None]:
import scipy

# Load data
x = np.arange(0, 9, 1)  # create an array of the numbers 0 to 9
y = np.arange(0, 18, 2)  # create an array of the numbers 0 to 18 in steps of 2
im = np.zeros([10, 10])  # create a 10 by 10 array of zeros, i.e. an empty image

# Linear Regression of x and y
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(x, y)

# Apply Gaussianm Filter to im
imBlurred = scipy.ndimage.gaussian_filter(im, sigma=5)

In [None]:
from scipy import interpolate

interpolation = interpolate.interp1d(time_seconds, distance_metres)

new_distance_metres = interpolation(new_time_seconds)

## For Friday: Please email me with all of your "Where can I go now?" questions and I'll try to answer them all.

# Python Modules - pandas

## pandas

pandas (`pandas`) is a large and well developed module focused on data analytics functions and datatypes in Python. As pandas is a large module and we will only introduce you to a couple of functions today.

### DataFrames

One of the key features of pandas is the introduction of another new datatype: the `pandas.DataFrame`.

We will use the `pandas.DataFrame` for the mini-project this afternoon (on your own data if your brought it).

A `DataFrame` is a collection of `Series`. You can consider a `DataFrame` to be a table of data and a `Series` to be a column of data.

pandas is built on top of NumPy and many NumPy array methods can be applied to `DataFrames` and `Series`.

If you're used to the `R` programming language then `DataFrames` and `Series` may already be familiar to you; although Python has it's own special ways to deal with these.
There are many benefits to using a `DataFrame` instead on a NumPy array and these include the ability for pandas to deal with missing values and the ability to use relational database operations between DataFrames.

## Loading CSV Data

The easiest way to load data as a DataFrame with pandas is to read a 'comma separated value' file. These can easily be exported from Excel or similar software if you already have data in a different format.

**N.B.** `display()` can be used instead of `print()`, this is a Jupyter feature for pretty printing of complex objects like dataframes.

In [None]:
import pandas as pd

iris = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
)
print("Our data is of type {0}".format(type(iris)))  # print the datatype
display(iris.head())  # print only the first five rows

### Accessing Elements of a DataFrame - Selecting by Indexing

DataFrames are accessible by indexing (as are the `numpy.ndarray`, `list` and `string` datatypes).

However, unlike other datatypes we don't just use square brackets to select values by indexing. If we want to access elements of a `DataFrame` by index we must use the `.iloc` (Integer-LOCation) attribute.

In [None]:
# Access the first row, first column
display(iris.iloc[0, 0])

### Accessing Elements of a DataFrame - Selecting by Labels

DataFrames are also accessible by labels, i.e. column headers (this is unlike the `numpy.ndarray`, `list` and `string` datatypes).

If we want to access elements of a `DataFrame` by labels we must use the `.loc` (label-LOCation) attribute.

In [None]:
# Access the first row, 'sepal_length' column,
display(iris.loc[0, "sepal_length"])

In [None]:
# Access the last row, last three columns
display(iris.loc[-1, "petal_width"])

### Accessing Elements of a DataFrame - Selecting Whole Columns or Rows

To access a whole column (or row), we can use just a colon to indicate 'everything'.

In [None]:
# Access the whole 'species' column
myVariable = iris.loc[:, "species"]
display(myVariable)

### Describing a DataFrame

Often, we just want a quick summary of numerical data, e.g. the mean and standard deviation. `pandas.DataFrame` objects have a method to give you a quick overview: `.describe()`.

In [None]:
display(iris.describe())

### Accessing Elements of a DataFrame - Selecting By Comparison

But what if we only want to see summary statistics for one species in our dataset? We could manually look through the `DataFrame` and pick the indices for each 'setosa' iris. In this case that will be indiced `0-49`, but in many cases the ordering of our data may be random.

Luckily, DataFrames can be accessed not just be indices and labels but also by comparisons. Essentially, we create a Boolean 'mask' - a `True`/`False` value at every element, which tells the DataFrame what data to use.

In [None]:
# Create a mask from the 'species' column
mask = iris.loc[:, "species"] == "setosa"

# Print the mask
display(mask.head())

# Print the masked DataFrame
display(iris.loc[mask])

# Print summary statistics
display(iris.loc[mask].describe())

# Python Modules - Matplotlib and Seaborn

## Matplotlib

Matplotlib (`matplotlib`) is the most widely used scientific plotting module in Python. Many other modules are built upon Matplotlib and we will explore one of these in particular: Seaborn.

Matplotlib is a huge module and we will only introduce you to a few plotting tools today.

In order to make Jupyter show plots just saved with command `plt.savefig()` we need to use a 'magic' command: `%matplotlib inline`

Most of the functions we will need are in the `matplotlib.pyplot` submodule - so we will only import that today.

In [None]:
%matplotlib inline

# Add you imports here
import matplotlib.pyplot as plt

iris = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
)
display(iris.head())

### Scatter Plotting with Matplotlib

Plotting with Matplotlib is powerful but can be complicated (especially when you first start).

In [None]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats

# Create figure thats 5 by 5 inches
plt.figure(figsize=[5, 5])

# Create a mask for each species
maskSetosa = iris.loc[:, "species"] == "setosa"
maskVersicolor = iris.loc[:, "species"] == "versicolor"
maskVirginica = iris.loc[:, "species"] == "virginica"

# Plot a scatter for each species in a unqiue colour showing sepal_length against sepal_width
plt.scatter(
    iris.loc[maskSetosa, "sepal_length"],
    iris.loc[maskSetosa, "sepal_width"],
    color="#FF0000",
    label="Setosa",
)
plt.scatter(
    iris.loc[maskVersicolor, "sepal_length"],
    iris.loc[maskVersicolor, "sepal_width"],
    color="#00FF00",
    label="Versicolor",
)
plt.scatter(
    iris.loc[maskVirginica, "sepal_length"],
    iris.loc[maskVirginica, "sepal_width"],
    color="#0000FF",
    label="Virginica",
)

# Calculate a linear regression model for each species
(
    slopeSetosa,
    interceptSetosa,
    r_valueSetosa,
    p_valueSetosa,
    std_errSetosa,
) = scipy.stats.linregress(
    iris.loc[maskSetosa, "sepal_length"], iris.loc[maskSetosa, "sepal_width"]
)
(
    slopeVersicolor,
    interceptVersicolor,
    r_valueVersicolor,
    p_valueVersicolor,
    std_errVersicolor,
) = scipy.stats.linregress(
    iris.loc[maskVersicolor, "sepal_length"], iris.loc[maskVersicolor, "sepal_width"]
)
(
    slopeVirginica,
    interceptVirginica,
    r_valueVirginica,
    p_valueVirginica,
    std_errVirginica,
) = scipy.stats.linregress(
    iris.loc[maskVirginica, "sepal_length"], iris.loc[maskVirginica, "sepal_width"]
)

# Plot a line for each model over the range of sepal widths using the colours from the appropriate scatter
xSetosa = np.linspace(
    iris.loc[maskSetosa, "sepal_length"].min(),
    iris.loc[maskSetosa, "sepal_length"].max(),
    100,
)
ySetosa = slopeSetosa * xSetosa + interceptSetosa
plt.plot(xSetosa, ySetosa, color="#FF0000")
xVersicolor = np.linspace(
    iris.loc[maskVersicolor, "sepal_length"].min(),
    iris.loc[maskVersicolor, "sepal_length"].max(),
    100,
)
yVersicolor = slopeVersicolor * xVersicolor + interceptVersicolor
plt.plot(xVersicolor, yVersicolor, color="#00FF00")
xVirginica = np.linspace(
    iris.loc[maskVirginica, "sepal_length"].min(),
    iris.loc[maskVirginica, "sepal_length"].max(),
    100,
)
yVirginica = slopeVirginica * xVirginica + interceptVirginica
plt.plot(xVirginica, yVirginica, color="#0000FF")

# Add a legend
plt.legend()

# Add a title and axis labels
plt.title("Sepal Length Against Sepal Width")
plt.ylabel("sepal_width")
plt.xlabel("sepal_length")

plt.savefig("myMatplotlibFigure.png")

## Seaborn

I'm sure we all agree that that's quite a lot of code - and quite daunting if you've never seen it before. But don't worry! Seaborn is here to make you life easier.

Matplotlib is an extremely powerful module. However, it can be complex, so some packages, like Seaborn, build upon Matplotlib to make plotting a little quicker and easier.

### Scatter Plotting with `seaborn`

Let's start by recreating the plot above.

In [None]:
# Imports
import seaborn as sns

# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
#
# 'height' controls the figure height in inches
# 'truncate' prevents the regression extending beyond the data
sns.lmplot(
    x="sepal_length", y="sepal_width", data=iris, hue="species", height=5, truncate=True
)

# Save figure
plt.savefig("mySeabornFigure.png")

### Faceted plotting with Seaborn

Isn't that a lot simpler?!

Seaborn is doing all the hard work for you - it creates the figure, the scatter plots, the legend and it does the regression and plots the model with error bounds too.

But what if we want to split the data across three plots? Again, Seaborn comes to the rescue.

In [None]:
# Imports
import seaborn as sns

# Create a plot of sepal_length vs sepal_width where colour (hue) is controlled by the species
# height controls the figure height in inches
# truncate prevents the regression extending beyond the data
sns.lmplot(
    x="sepal_length",
    y="sepal_width",
    data=iris,
    hue="species",
    col="species",
    height=5,
    truncate=True,
)

# Save figure
plt.savefig("myFacetedFigure.png")

## Boxplots

Scatter and line plots are all part of Seaborn's relational plot tools. But sometimes we have categorical data (such as species) and might want to use box plots to explore this data.

In [None]:
# 'Melt' the data
irisMelted = iris.melt(
    id_vars="species",
    value_vars=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    var_name="measure",
    value_name="measurement",
)

# Plot the melted data
sns.catplot(
    x="species",
    y="measurement",
    col="measure",
    data=irisMelted,
    kind="box",
    height=5,
    aspect=0.5,
)

# Save the plot
plt.savefig("myBoxplot.png")