<a href="https://colab.research.google.com/github/ds4geo/ds4geo/blob/master/WS%202020%20Course%20Notes/Session%203.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# To Do


* write numpy indexing/nd array workshop (check timings/make optional?)
* Update assignment submission details
* References


* Create separate solutions notebook
* Enable output cells
* Final go-through and checks

# **Data Science for Geoscientists - Winter Semester 2020**
# **Session 3 - Numpy etc.**
In the previous sessions we learnt about python built-in objects useful for handling data. We also used the library Pandas for data handling, although didn't go into its details. The numerical basis of Pandas is a library called Numpy. This week we will introduce and learn the basics of Numpy including how to create, manipulate and index Numpy arrays.

We will then apply that understanding in a geoscience related excercise: data reduction/calibration of raw Laser Ablation ICP Mass Spectrometer data.

# Part 3.1 - Data Icebreaker - *Discussion*
**Hypothesising about data**

I will provide a sequence of 4 numbers which follow a rule.

Participants have to figure out the rule.

Participants can provide additional sequences, and I will answer yes or no as to whether they follow the rule.

When participants think they know the rule, they write it down and give it to me.

The first participant to correctly guess the rule wins.

# Part 3.2 - Review of plotting assignment - *Discussion*
We will review and discuss the assignment submissions from last week.
Students will review and discuss improvements to each other's plots/visualisations.

# Part 3.3 - Numpy Introduction - *Mini-lecture*
Last week we used the popular python library Pandas, but didn't introduce it formally.
This week we will also be using a popular libary called Numpy.
Pandas is built upon Numpy, and they work well together.
Pandas is good at data handling, manipulation and analysis, while Numpy is the basis of numerical operations and processing.
See more here:
* https://pandas.pydata.org/
* https://numpy.org/

We will use both Pandas and Numpy throughout the course. Together (along with matplotlib), they are the basis of Data Science in python.

Numpy is based around multi-dimensional arrays (of data), and allows efficient indexing, operations and aggregation of said arrays.
For those not familiar with multi-dimensional arrays (also called nd-arrays), imagine an excel spreadsheet as a 2 dimensional table/array with rows and columns, but that you can have as many dimensions as you like.

As an example, in satellite remote sensing, it is typical to have a time-series of many multi-band (e.g. red, green, blue, infra-red) images. Therefore, you might have an array of 4 dimensions: [pixel rows, pixel columns, time, band]. So for each x-y pixel, at each point in time, you have a value for each band.

In the following section, we will create arrays, learn how to do simple operations on them and perform basic aggregations. In the following section, we will explore Numpy's powerful indexing system.

The website Datacamp.com provides an excellent Numpy "cheat-sheet". It is highly recommended to keep it handy when working with Numpy, and going through it in your own time.
https://www.datacamp.com/community/blog/python-numpy-cheat-sheet


# Part 3.4 - Numpy part 1 - *Walkthrough*



## 3.4.1 - Creating Arrays

In [None]:
# Import numpy
import numpy as np

In [None]:
# Here we cover simple ways to create numpy arrays.
# We will cover loading and importing data, e.g. from pandas later.

# The simplest way to create an array is from a list
array = np.array([1,2,3])
print(array)

# Or with nested lists for multiple dimensions
array_2d = np.array([[1,2,3],[4,5,6]])
print(array_2d)

In [None]:
# numpy provides some functions to create arrays by shape:
# make a 1d array of 5 zeros
array_zeros = np.zeros(5) 
print(array_zeros)

# Make a 2d array of 1s
array_ones = np.ones((2,5))
print(array_ones)

# numpy arrays have an attribute shape:
print("array_zeros size:", array_zeros.shape)
print("array_ones size:", array_ones.shape)

In [None]:
# Create an array of consecutive integers in a range using np.arange
arange_1 = np.arange(15,25)
print(arange_1)

# Use arange to create larger steps
arange_2 = np.arange(15,25,2)
print(arange_2)

# If one needs a standard python list in this style:
print(range(5))

In [None]:
# Create array across range by number of intermediate steps, rather than the step itself
linspace_1 = np.linspace(0,4,17)
print(linspace_1)

In [None]:
# Arrays of random numbers can be produced with np.random.random_sample np.random.standard_normal
uni_random = np.random.random_sample(10)
print(uni_random)

np.random.standard_normal()
norm_random = np.random.standard_normal(10)
print(norm_random)

## 3.4.2 - Operations

In [None]:
# Python lets us do operations on integers and floats
print(1+2)
print(2*3)
print(2.5*5)
print(2**6)
print(64/4)

In [None]:
# But on lists, these operators do other things:
print([1,2,3] + [4]) # List concatenation
print([1,2,3] * 3) # List duplication
# Operators like / and - do not work

In [None]:
# Operators can be applied to numpy arrays in an intuitive way:
# Operators between a numpy array and a single int or float apply the operation to all elements in the array:
a = np.ones(5)
b = np.arange(5)

print("a:",a)
print("a + 1:",a + 1)
print("a - 1:",a - 1)
print("a * 2:",a * 2)
print("a / 2:",a / 2)

print("b * 2:", b * 2)

In [None]:
# Operations between arrays of the same shape result are element-wise:
print("b:",b)
print("b * b:", b * b)

## 3.4.3 - Aggregations
We will look more into describing data using statistics in a future session. Here, we simply introduce how to calculate them on Numpy arrays.

In [None]:
# Calculate the mean
# Create a large uniformly distributed random sample between 0 and 1
large_uni_random = np.random.random_sample(10000)
# Calculate and print various statistics
print("mean:",large_uni_random.mean())
print("min:",large_uni_random.min())
print("max:",large_uni_random.max())
print("standard deviation:",large_uni_random.std())

# Note: Pandas uses a very similar object-orientated system: pd.DataFrame.mean?

In [None]:
# Above is the object orientated approach. An alternative is using numpy functions:
print("mean:",np.mean(large_uni_random))
print("min:",np.min(large_uni_random))
print("max:",np.max(large_uni_random))
print("standard deviation:",np.std(large_uni_random))

In [None]:
"""
If one's data contains NaNS (not a number - null values),
one can use np.nanmean, np.nanmax, etc. to ignore the NaNs when calculating stats.
"""
# Create array with a NaN
data_with_nans = np.array([1, 2, np.nan, 3])
# Calculate mean with NaNs ignored
print(np.nanmean(data_with_nans))


# Part 3.5 - Numpy Excercise 1 - *Workshop*

In [None]:
# Create the following sequences as numpy arrays:
# 1. [3, 6, 9, 12, ...., 99]
# 2. [15, 15, 15, 15, 15, 15]
# 3. [0, 0.5, 1, 1.5, ...., 100]


In [None]:
# Create the following arrays:
# 1. 1d array of size 100 with random decimal numbers between 1 and 100
# 2. 1d array of size 50 with random integers between 25 and 75
# 3. 1d array of size 100 with normal (gaussian) distributed numbers with a mean of 5 and a standard deviation of 2


In [None]:
# Challenges
# 1. [0,1,0,2,0,4,0,8,0,16,0,32,0,64,.....65536]
#   Note, this can be done in many different ways, including with other numpy functions
#   and using numpy indexing, but it is possible to do with only the functions described above.
# 2. An array representing the sum of rolling a pair of 6 sided dice 1000 times (if your game of monopoly overruns more than usual)

In [None]:
## ANSWER (2 ** np.arange(-0.5,50.5,0.5)) * np.array([0,1]*51)

# Part 3.6 - Numpy part 2 - *Walkthrough*

##3.6.1 - Multi-dimensional arrays and Broadcasting
So far we've looked at 1 dimensional arrays (ca. data with many rows but 1 column). Now we will look at multi-dimensional arrays and how we can operate on them using broadcasting (ca. data with rows and columns, and additional dimensions). We will look here at 2d arrays, but numpy allows arrays of any dimensionality.

More info:
* nd arrays: https://numpy.org/doc/stable/reference/arrays.ndarray.html
* Broadcasting: https://numpy.org/doc/stable/user/basics.broadcasting.html


In [None]:
# Create a multi-dimensional array using Numpy commands
zeros_2d = np.zeros((5,9)) # Provide a list of dimension sizes to functions like np.zeros
print(zeros_2d.shape)

In [None]:
# Create a multi-dimensional array using stacking
c1 = (np.random.random_sample(5)*2) +2 # 1d array of random numbers with mean of 2 and stdev of 2
c2 = np.zeros(5) + 19 # 1d array of 19s
c3 = np.full(5,np.nan) # 1d array of NaNs
vertical_stack = np.vstack([c1, c2, c3]) # "vertical" stack
horizontal_stack = np.hstack([c1, c2, c3]) # "horizontal" stack - in this case is concatenation

print("vertical stack:")
print(vertical_stack)
print("shape:", vertical_stack.shape)

print("horizontal stack:")
print(horizontal_stack)
print("shape:", horizontal_stack.shape)


In [None]:
'''
The vertical stack array produced what we wanted, but we intended c1, c2 and c3
to be columns rather than rows (1st value in shape = nrows, 2nd = ncolumns).
We can transpose the array to flip the rows and columns:
'''
# transpose vertical_stack
vertical_transposed = vertical_stack.T
print("vertical transposed:")
print(vertical_transposed)
print("shape:", vertical_transposed.shape)

In [None]:
'''
Side note: aggregations and other numpy functions on multi dimensional arrays:
there is an "axis" argument for many numpy functions which defines which dimension
the operation is performed on.
'''
# nanmean to ignore NaN values
print("without axis argument:", np.nanmean(vertical_transposed)) # mean of all values in array
print("axis=0 (columns):", np.nanmean(vertical_transposed, axis=0)) # mean of columns
print("axis=1 (rows):", np.nanmean(vertical_transposed, axis=1)) # mean of rows
# The last line gives a warning because it tries to calculate the mean of only NaNs

In [None]:
'''
Broadcasting allows us to apply mathematical operations between arrays of different shapes
where it is obvious what is intended.
'''
# Create a 2d array with differently distributed random normal distriubtions
Sr = np.random.standard_normal(100) * 5 + 15
Zn = np.random.standard_normal(100) * 0.2 + 0.5
Fe = np.random.standard_normal(100) * 10 + 1000
elements = np.vstack([Sr, Zn, Fe]).T # stack and transpose
print(elements.shape) # 100 rows and 3 columns

In [None]:
# Calculate the mean of each element
ele_mean = np.mean(elements, axis=0)
print(ele_mean)
print(ele_mean.shape)

In [None]:
'''
The mean of each column is length 3 (the number of columns/elements)
If we try to subtract the ele_mean of length 3 from tge element array (100, 3)
it is obvious we mean to subtract the mean of each column from all values in each column:
'''
# Centre the data on 0 by subtracting the mean
centered_elements = elements-ele_mean

In [None]:
# Standardize the data by dividing by the column standard deviation
standardized_elements = centered_elements / np.std(centered_elements)

# Now the mean of each element should be zero, and the standard deviation 1:
print("mean:", np.mean(standardized_elements))
print("std:", np.std(standardized_elements))

## 3.6.2 - Logical operations and booleans
We can compare Numpy arrays with comparison operators, resulting in boolean arrays with True or False.
This can be done element wise with two arrays (corresponding elements are compared), or applied to a whole array if compared to e.g. an int or float value.
Boolean arrays can be combined with numpy logical operator functions, e.g. np.logical_and, which operates elementwise on two boolean arrays.

In [None]:
# Create some arrays of ints between 0 and 9
r1 = np.random.randint(0,10,10)
r2 = np.random.randint(0,10,10)
r3 = np.random.randint(0,10,10)

print(r1)
print(r2)
print(r3)

In [None]:
# Apply some comparison operations (>, <, =>, <=, ==)
b1 = r1 > r2
print(b1) # Return True where r1 is greater than r2 element-wise

b2 = r3 > 5
print (b2) # Return True when r3 is greater than 5

b3 = np.logical_and(b1, b2) # element-wise logical and
print(b3) # Return True when b1 and b2 are true
# see also np.logical_or, _xor, _not, and np.all, np.any

## 3.6.3 - Numpy Indexing
Last week we learnt how to index lists and dictionaries. Indexing in Numpy operates on a similar basis but with additional power and flexibility.

More info:
* https://numpy.org/doc/stable/user/basics.indexing.html#basics-indexing
* https://numpy.org/doc/stable/reference/arrays.indexing.html#arrays-indexing


In [None]:
# Simple 1d indexing works like for lists
arr = np.random.randint(0,100,15) # create an array of random ints
print(arr)
print("position 5:", arr[5])
print("positions 2 to 8:", arr[2:8])
print("positions every 2nd from 2 to 12:", arr[2:12:2])

In [None]:
# If there are more dimensions, indexing works separately for each dimension separated by commas:
arr2d = np.random.randint(0,100,(15,25)) #  create a 15x25 array of random ints
print("row 5, column 9:", arr2d[5,9])
print("rows 2 to 4 and column 9:", arr2d[2:4,9])
print("rows 2 to 6 and columns 9 to 14:", arr2d[2:6,9:14])

In [None]:
# We can also index using lists of indices:
print(arr)
# For 1 dimension
print("positions 1, 4, and 7:", arr[[1,4,7]])

# For 2 (or more) dimensions
print("positions (5,4), (2,8), (9,8):", arr2d[(5,2,9), (4,8,8)])



In [None]:
# We can also index using boolean arrays
# get all values in arr greater than 75
print(arr)
print(arr>75)
print(arr[arr>75])

In [None]:
# Also works for multi dimensional arrays
print(arr2d[arr2d > 75])

# Part 3.7 - Numpy Excercise 2 - *Workshop*

# Part 3.8 - LA-ICPMS data reduction excercise - *Workshop*
*Write this, then check all requirements are fulfilled above*

In the geosciences, we often have raw measurement data from an analytical machine, and need to convert or "reduce" that data to make it useful for further analysis and interpretation. A very common example is conversion of mass spectrometer (or similar) raw count data to composition data, such as weight percentage or ppm of the analysis material. In many cases there are specific software packages to perform this data reduction without needing to do any coding, but frequently the underlying methodology is not complex and could be easily done in python. As an example, in this excercise we will convert raw Laser Ablation - ICP Mass Spectrometer (LA-ICPMS) data to mass fraction of the sample material.

The following paper explains LA-ICPMS, typical data reduction proceedures and software packages:
https://www.sciencedirect.com/science/article/abs/pii/S0009254118305461?via%3Dihub

Using python, we will perform steps 2 to 5 of the "Basic Processing" in section 2.1 of that paper.

**Note: Each field (within and beyond the geosciences) has its own literature about data reduction and processing. You should consult authoritative sources when doing this work to avoid methodological errors. This excercise is intended to demonstrate that the mathematical and programming required is easily achievable with only basic python knowledge.***

**Data Reduction Steps**:
* 1. Load the data
* 2. Identification of background, samples and standards in the raw data
* 3. Apply background correction
* 4. Standardise data
* 5. Calibrate data to standards
* 6. Calculate the mass fraction


## 3.8.1 - Load data

The example data we will use is from the testing datasets for a python tool for LA-ICPMS data reduction:
(https://github.com/oscarbranson/latools).

In [None]:
# Load the data from here as a pandas data frame:
# https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/laicpms_sample.csv


In [None]:
# Create:
# 1. a 1d numpy array of the "Time" column
# 2. a 2d numpy array of the element count columns

In [None]:
# ANSWERS
data_pd = pd.read_csv("https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/laicpms_sample.csv", header=1)
data = data_pd.to_numpy()
time = data[:,0]
raw_te = data[:,1:]

In [None]:
# Make a/some plots of the data to get an overview

## 3.8.2 - Identify background, samples and standards

When you plot the data, you will see several periods where the counts (for any element) are well above 0. The first 3 of these sections are standards, the last 4 are samples. The intermediate parts are background.

We need to identify the time intervals corresponding to samples, sections and background for the following analyses. We will do this by identifying the start and end times/positions (here, position is measured in time) for each relevant section. We then create a boolean index array for each.

It is recommended to work together with your classmates to complete this task by sharing the start and end positions.

In [None]:
# Find out and record the start and end positions of at least 2 background sections
# Record them in a list containing dictionaries with the following format:
# [{"start": <position in seconds>, "end": <position in seconds>}
#  {"start": <>, "end": <>},
#   .....]

In [None]:
# Do the same for all the sample sections, and separately for all the standards sections

In [None]:
# Create a boolean index array for each of background, samples and standards.
# Each should be same shape as the time array (use the .shape method to check).
# hint: use np.logical_and to create a boolean index for each section,
#       then combine the results with np.any. 

In [None]:
# ANSWERS
bg_loc = [{"start": 0, "end": 25},
          {"start": 491, "end": 498}]
stand_loc = [{"start": 27, "end": 82},
            {"start": 105, "end": 160},
            {"start": 184, "end": 220},
            ]
samp_loc = [{"start": 269, "end": 340 },
            {"start": 363, "end": 389},
            {"start": 409, "end": 433},
            {"start": 453, "end": 487}]

bg_idx = np.any([np.logical_and(time > bg_loc[0]["start"], time < bg_loc[0]["end"]),
                 np.logical_and(time > bg_loc[1]["start"], time < bg_loc[1]["end"])],
                axis=0)

stand_idx = np.any([np.logical_and(time > stand_loc[0]["start"], time < stand_loc[0]["end"]),
                 np.logical_and(time > stand_loc[1]["start"], time < stand_loc[1]["end"]),
                 np.logical_and(time > stand_loc[2]["start"], time < stand_loc[2]["end"])],
                axis=0)

samp_idx = np.any([np.logical_and(time > samp_loc[0]["start"], time < samp_loc[0]["end"]),
                 np.logical_and(time > samp_loc[1]["start"], time < samp_loc[1]["end"]),
                 np.logical_and(time > samp_loc[2]["start"], time < samp_loc[2]["end"])],
                axis=0)


## 3.8.3 - Apply background correction
The background counts should be removed from the rest of the signal for each element.

We therefore take the average counts during the background periods for each element and subtract these values from the element arrays.

In [None]:
# Create an array of the average (mean) counts for each element during the background sections.


In [None]:
# Subtract the per-element backgrounds from the entire dataset

In [None]:
# ANSWER
bg_vals = np.mean(raw_te[bg_idx], axis=0)
bg_corr = raw_te - bg_vals

##3.8.4 - Standardize data
In LA-ICPMS analysis, the amount of analyte which is measured depends on how much is ablated by the laser, which in turn depends on the material properties of the sample (known as matrix effects). To remove matrix effects and other spurious features, we can standardize the data to an element which we expect to be present at a constant concentration. For carbonates, an isotope of Ca is often used and is known as the internal standard.

Standardization in this case means that we convert all other element data to count ratios of that element vs Ca44.

In [None]:
# Convert all data into ratios to Ca44 - i.e. divide all element counts by Ca44 counts

In [None]:
#ANSWER
castd_te = bg_corr / bg_corr[:,4:5]

##3.8.5 - Calibrate data
Next we calibrate the count ratios to composition ratios. We do this using the measured standards of known composition.

Preparation of the standard composition in terms of count ratio is beyond the scope of this excercise, so the required data is provided ready-to-use. In this case, we will only use one of the standards, but more complex methods exist to simultaneously calibrate using multiple standards and to thereby improve estimation of the measurement uncertainty.


In [None]:
# Load the standard composition from here:
# https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/laicpms_standard.csv
# Convert the dataframe to a numpy array and check its shape

In [None]:
# The calibration data refers only to the first measured standard.
# Create an boolean index array for this standard.
# Then calculate the mean values per element for the standard

In [None]:
# Calculate the conversion ratio by dividing calibration data by the measured (standardized) standard data
# Apply the conversion ratio by dividing the standardized element data by the calibration



In [None]:
# Remove all data except the samples
# Assign everything else to np.nan

In [None]:
# ANSWER
std_dat = pd.read_csv(r"https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/laicpms_standard.csv",sep=",").to_numpy()
#cali_te = castd_te / std_dat
stand_1_idx = np.logical_and(time > stand_loc[0]["start"], time < stand_loc[0]["end"])
stand_comp = np.mean(castd_te[stand_1_idx,:], axis=0)
calibr = std_dat / stand_comp
calibrated = castd_te * calibr
calibrated[~samp_idx] = np.nan

##3.8.6 - Calibrate mass fraction
LA-ICPMS results are often reported simply as molar mass ratios to the internal standard (e.g. Ca), but one can also take an additional step to convert the molar mass ratios to the sample mass fraction.
We standardized our data to an internal standard of constant concentration (i.e. a ratio of Ca44). However, if we want to calculate the mass fraction, we need to convert our data back from that ratio to the mass fraction. To do this, we need to know the concentration of the internal standard in the material. This needs to be separately measured or assumed.

Here we will make an assumption and the calculations are provided.

In [None]:
# Estimate mass fraction of Ca44 in sample (a foram)
# Assume the foram is entirely composed of CaCO3
# 1. Calculate the weight fraction of Ca in CaCO3
Ca_aw = 40.078 # Calcium standard atomic weight
O_aw = 15.999
C_aw = 12.011
CaCO3_aw = Ca_aw + C_aw + (3 * O_aw) # calculate atomic weight of CaCO3
Ca_rw = Ca_aw / CaCO3_aw # Calculate Ca weight fraction of CaCO3

# Assume the typical isotope fraction of Ca44 to all Ca
# 2. Find the Ca44 weight fraction of CaCO3
Ca44_ra = 0.02086 # relative abundance of Ca44 - i.e. 2%
Ca44_rw = Ca_rw * Ca44_ra

# 3. Convert to ppm
Ca44_ppm = Ca44_rw * 1000000

In [None]:
# Apply the ratio to mass conversion by multiplying the calibrated ratio data by the Ca44 mass

In [None]:
# Make some plots to visualise the output.
# Optionally you can convert the calibrated data back to a pandas DataFrame

In [None]:
# ANSWER
mass_fraction = calibrated * Ca44_ppm
for c,l in enumerate(data_pd.columns[1:]):
  plt.plot(time,mass_fraction[:,c], label=l)
plt.legend()
plt.ylim([0,200])

# Part 3.9 - Week 2 Assignment
Data contains hidden stories. Part of data science (arguably quantitative science in general!) is to find and tell those stories. Jupyter notebooks provide a perfect tool for data story-telling in allowing text, images, code and code results (e.g. plots) to be combined.

**Task**

Create a Jupyter notebook where you tell the story of what it is like to live in 2 different cities in terms of the weather. Pick 2 from Innsbruck, London, Sydney, Tehran and Singapore.
* Daily weather data for these cities for 2015 to 2019 is provided (see below).
* You should not use any source of information (including personal experience) other than the data provided and your analysis of it.
* You are free to tell the story as you wish in terms of content and style. Your audience is the other course participants.
* Do not just summarise the data, tell a story. You do not need to use all of the available data.
* Length depends on how one tells the story, but including 10 lines of code is too short, and 100 is too long.
* Use Pandas to handle and analyse the data. Use the built in help, online Pandas help docs, and google to figure out how to perform your analysis. Note down anything you wish to do but cannot figure out how to do.
 * Hint: .groupby(), .mean(), .sum(), .count(), etc. will be useful!

**Data**

Pandas readable csv files are located here:
* Innsbruck:
 * Readable: https://github.com/ds4geo/ds4geo/blob/master/data/timeseries/meteo/Innsbruck_weather_2015-19.csv
  * Raw: https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/meteo/Innsbruck_weather_2015-19.csv
* London:
 * Readable: https://github.com/ds4geo/ds4geo/blob/master/data/timeseries/meteo/London_weather_2015-19.csv
  * Raw: https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/meteo/London_weather_2015-19.csv
* Sydney:
 * Readable: https://github.com/ds4geo/ds4geo/blob/master/data/timeseries/meteo/Sydney_weather_2015-19.csv
  * Raw: https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/meteo/Sydney_weather_2015-19.csv
* Tehran:
 * Readable: https://github.com/ds4geo/ds4geo/blob/master/data/timeseries/meteo/Tehran_weather_2015-19.csv
  * Raw: https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/meteo/Tehran_weather_2015-19.csv
* Singapore:
 * Readable: https://github.com/ds4geo/ds4geo/blob/master/data/timeseries/meteo/Singapore_weather_2015-19.csv
  * Raw: https://raw.githubusercontent.com/ds4geo/ds4geo/master/data/timeseries/meteo/Singapore_weather_2015-19.csv

The data was downloaded from https://rp5.ru/ and pre-processed from hourly to daily data.


**Submission**
* SUBMIT NOTEBOOKS HOW? Github or OLAT?
* The **deadline** is 23:59 on 27th October 2020.
* This assignment comprises 5% of the assessment for the course. Marks are awarded for clear, effective and interesting data driven story telling.

Submitted notebooks will be made available to the whole class, and will be discussed in session 4. The assignment to session 5 will build upon this assignment.



# References