# Programming example

# Big Data Assessment of Measurement Accuracy in Suncreams

Suncream is an essential cosmetic product that protects skin from the sun's harmful ultraviolet (UV) light. 
The active ingredient in many suncream products is titanium dioxide, which absorbs the sun's harmful light, limiting its impact on our sun. 
However, companies that produce suncream frequently fail to report the amount of titanium dioxide in their products. 

Analytical scientists can use experimental measurements to estimate the amount of titanium dioxide in a suncream. 
This approach involves the calibration of instrumentation with samples of a known concentration. 
The calibrated instrumentation can then be used to estimate the titanium dioxide concentration in the unknown sample. 
Instrument calibration is an exercise in big data, and we must interpret our results using statistics and data modelling.

## Getting Started with Jupyter

Previously, you may have interacted with a Jupyter Notebook. 
But before starting the data analysis, we will quickly refresh some important aspects. 

### Interface Elements

There are a few parts of the Notebook interface that to draw attention to ({numref}`interface`):

1. The **Notebook/file tabs**. Similar to modern web browsers, JupyterLab allows many files to be open simultaneously within a tabbed interface. 
2. The **toolbar** contains buttons for common actions relating to working with Notebooks, hovering over the button with the cursor will pop up relevant information.
3. The **cell**, which depending on the type, Python code or Markdown can be written in this box. 
4. Indictates if a cell has been run or not, when the cell has not been run it will read `In [ ]:` and run cells will have `In [x]:`, where `x` is a number that indicates the order that the cells were run. 

![](./images/interface.png)
Some important interface elements in the Jupyter Notebook.

### Cells

Cells make up the body of a Notebook. 
When a new Notebook is opened, it will contain a single empty cell. 
Other cells can be added below the currently selected one by running the cell, pressing the "+" button in the toolbar or by using the keyboard shortcut of pressing "B" (the shortcut "A" can be used to add a cell above the currently selected one).
Cells can be of different types, there are two particularly important ones to be aware of. 

#### Code Cells

A code cell contains Python code that can be executed. 
When the cell is run, the notebook will display any output from the final line of the cell in the corresponding cell. 

![](./images/code-cell.png)
An example of a code cell that has been run, the Python code in the cell performs the addition of 4 and 3 to give 7 as a return.

A cell is run by either clicking on the &#9658; icon in the toolbar or using "Control + Enter" (Windows) or "Command + Enter" (macOS) on the keyboard.
When the cell is run, the phrase should be printed below the code cell. 

#### Markdown Cells

The type of cell can be changed using the drop-down menu in the toolbar. 
After "Code", the most important type of cell is "Markdown". 
A markdown cell contains text that is formatted using [Markdown](https://www.markdownguide.org), which is a lightweight markup language for writing {term}`HTML` documents. 
When a markdown cell is "run", the markdown is formatted to {term}`HTML`, and the formatted text is shown in place of the cell ({numref}`markdown-rendered`).

![](./images/markdown.png)
A markdown cell that has not been run yet, showing the raw markdown.

![](./images/markdown-rendered.png)
The rendered markdown, with the nicely formatted equations.

#### Active Cells

The currently active cell is indicated by being highlighted.
The presence of the cursor, the blinking `|` symbol, indicates that the cell is currently in either the command or edit mode. 

##### Command Mode

When in command mode, the cell content cannot be edited but keyboard shortcuts can be used to cut, paste, and move whole cells. 
All of the keyboard shortcuts can be found [online](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330).

![](./images/command.png)
A Notebook cell in command mode.

##### Edit Mode

From command mode, pressing Enter or clicking in the input text area of a cell will switch the cell to edit mode. 
When in edit mode, code or markdown can be written. 

![](./images/edit-cursor.gif)
A Notebook cell in edit mode.

## Sequences

### Displaying

In this example we're going to display simple sequences using a turtle friend. We will tell him how many steps to walk before turning and he will show us what that looks like:

In [None]:
from draw import draw_sequence_with_turtle

sequence = [100, 100, 100, 100]

draw_sequence_with_turtle(sequence)

In this case he walked 4 equal paths of 100 steps each and so walked in a square.

> **Task**
>
> Can you make the turtle walk a different path? Can you make him form a rectangle?
>

In [None]:
## CAN USE CODE CELL (if needed) ##

In [None]:
## CAN USE CODE CELL (if needed) ##

In [None]:
## CAN USE CODE CELL (if needed) ##

## Creating sequences

### Multiples of 13

Let's consider a simple sequence containing multiples of 13:

    13, 26, 39, 52, 65, 78, ...

Rule:

    Each value is a multiple of 13 (1 x 13, 2 x 13, 3 x 13, ...)

In a code cell, we can store this sequence like this:

In [None]:
sequence = [13, 26, 39, 52, 65, 78]

To see the value stored in `sequence` (called a variable) we can write `print` followed by brackets `(` and `)`. Anything between these brackets will be shown on the screen.

In [None]:
print(sequence)

How could we carry on the sequence? We know the rules for this sequence:

- 1 * 13
- 2 * 13
- 3 * 13
- 4 * 13

...

We can use this as a way to find the next values in steps:
 - Define the number we want to multiply - in this case 13
 - Define number of values in our sequence - at the moment that is 6 values
 - Multiply by an increasing number, repeating for each value in our sequence

Within Python code we can use the `*` operator to multiply numbers:

In [None]:
factor = 13

value1 = 1 * factor
value2 = 2 * factor
value3 = 3 * factor
value4 = 4 * factor
value5 = 5 * factor
value6 = 6 * factor

sequence = [value1, value2, value3, value4, value5, value6]
print(sequence)

This includes lots of *repeating steps* and doesn't make it easier to make the sequence longer.

A structure we could use to repeat some of these steps for us is called a *loop*. This can be written as:

In [None]:
for i in range(1, 7):
    print(i)

In this example this can be used to produce the numbers 1, 2, 3, 4, 5, 6 for us.

> **Task**
> 
> Can you update this range to print from 1 to 10? Try updating this and re-running the cell.
>

We can use this to create the numbers within our sequence:

In [None]:
factor = 13

for i in range(1, 7):
    print(i * factor)  # Calculate the new value

And we can create an empty sequence to save this to and build this up like this:

In [None]:
factor = 13

sequence = []  # Create empty sequence
for i in range(1, 7):
    sequence.append(i * factor)  # Add the new value to the end of this sequence

print(sequence)

We can generalise this by saving our number of sequence values and using this in our loop:

In [None]:
factor = 13
number_in_sequence = 10

sequence = []
for i in range(1, number_in_sequence + 1):
    sequence.append(i * factor)

print(sequence)

Let's try drawing this sequence:

In [None]:
draw_sequence_with_turtle(sequence)

We can see that each length the turtle has walked is now longer than the previous in an even way, so he has walked in a spiral pattern.

> **Task**
>
> How would we build a longer sequence to continue the spiral? Try it and re-run to see what the turtle does.
>

> **Task**
> 
> Try creating a sequence with factors of 7 rather than 13 - what difference does this make to the plot?
>

## Bonus: more sequences

### Multiplying by 2

Using similiar steps can we consider how to create a sequence which includes multiplies each value by 2:

    1, 2, 4, 8, 16, 32, 64, ...

Rule:

    Multiply previous value by 2

Consider the steps as we did before:
 - Define start value
 - Set first value in sequence as start value
 - Define next value by multiplying previous value by 2
     - ... Keep repeating last step ...

In this case, rather than generating a new number for each repeat we need to use the previous value to define the next one.

In [None]:
start = 1
factor = 2

value1 = start
value2 = value1 * factor
value3 = value2 * factor
value4 = value3 * factor
value5 = value4 * factor
value6 = value5 * factor
value7 = value6 * factor

sequence = [value1, value2, value3, value4, value5, value6, value7]
print(sequence)

We can use the same variable name, assigning a new value if we like. We can also save the variable to our sequence after each value is calcuated so this could be written as:

In [None]:
start = 1
factor = 2

sequence = []

value = start
sequence.append(value)

value = value * factor
sequence.append(value)

value = value * factor
sequence.append(value)

value = value * factor
sequence.append(value)

value = value * factor
sequence.append(value)

value = value * factor
sequence.append(value)

# ...

print(sequence)

We can write group the repeated lines into a loop:

In [None]:
start = 1
factor = 2
sequence = []

value = start
sequence.append(value)

for i in range(1, 6):
    value = value * factor
    sequence.append(value)

print(sequence)

In [None]:
draw_sequence_with_turtle(sequence)

> **Task**
>
> Can you extend this sequence?

### Fibonacci

    1, 1, 2, 3, 5, 8, 13, ...

Rules

    First and second numbers are 1
    Numbers are the sum of the previous 2 numbers

> **Task**
> 
> Can you break this down into steps to create the sequence?

> **Task**
> 
> Can you plot this sequence using the turtle?

In [None]:
## CAN USE CODE CELL (if needed) ##

In [None]:
## CAN USE CODE CELL (if needed) ##