In [None]:
import requests
from IPython.core.display import HTML
HTML(f"""
<style>
@import "https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css";
</style>
""")

# Data analysis, manipulation and plotting
_Note to self: Explain that the tutorial contains small exercises but that they are optional (do this for all tutorials!)_
## Introduction
The tutorial contains:
1. Introduction to arrays and vectors in numpy.

2. Loading/Saving data. 

3. Essential methods for data analysis/manipulation. 

4. Elementary plotting using matplotlib.




In [None]:
#Import necessary libraries 
import numpy as np
from skimage.io import imread
import matplotlib.pyplot as plt
import numpy as np
import os


### Creating data in numpy
#### Numpy arrays
Numpy has several convenient functions for creation of arrays. The following are especially useful for this course (read more about array creation [here](https://numpy.org/doc/stable/user/basics.creation.html#array-creation)
.
- [`np.ones(size)`
](https://numpy.org/doc/stable/reference/generated/numpy.ones.html#numpy.ones)
, [`np.zeros(size)`
](https://numpy.org/doc/stable/reference/generated/numpy.zeros.html)
: Create an array of size `size`
 with either all ones or zeros.
- [`np.linspace(start, stop, num)`
](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html#numpy.linspace)
, [`np.arange(start, stop, step)`
](https://numpy.org/doc/stable/reference/generated/numpy.arange.html#numpy-arange)
: Create 1d arrays of ranges from `start`
 to `stop`
 (inclusive) using either interpolation to create `num`
 elements in the case of `linspace`
 or using a certain `step`
-size in `arange`
.
- [`np.random.uniform(size)`
](https://numpy.org/doc/stable/reference/random/generated/numpy.random.uniform.html#numpy.random.uniform)
, [`np.random.normal(loc, scale, size)`
](https://numpy.org/doc/stable/reference/random/generated/numpy.random.normal.html#numpy.random.normal)
: Create arrays with random elements drawn either from a uniform or normal/Gaussian distribution. For the gaussian, `loc`
=$\mu$ (mean) and `scale`
=$\sigma$ (standard deviation).

Don’t worry to much about remembering all of them for now. Examples are including in the next cell for easy experimentation:


In [None]:
a_ones = np.ones((2, 3)) # 2 by 3 array of ones. 
a_zeros = np.zeros((3, 2)) # 2 by 3 array of ones.
a_linspace = np.linspace(0, 10, 5) ## creates an array of 5 numbers evenly spaced from 0 to 9 (10-1 # zero indexed).
a_arange = np.arange(0, 10, 2) # creates arrays from 0 to 9 (max) with a stride of 2. since (10>9) the max value will be 8.
a_uniform = np.random.uniform(size= (2, 2)) # creates a 2 by 2 array of "random" numbers drawn from a uniform distribution. 
a_normal = np.random.normal(size=(2, 2))  # creates a 2 by 2 array of "random" numbers drawn from a normal/gaussian distribution. 

print('ones:\n', a_ones)
print('zeros:\n', a_zeros)
print('linspace:\n', a_linspace)
print('arange:\n', a_arange)
print('uniform:\n', a_uniform)
print('normal:\n', a_normal)


**Note:** There is no need for iteration (i.e. loops) when creating arrays in numpy!
## Saving arrays with numpy
The following example shows how data (examples from above) is saved with numpy. We can store data in two different formats.
- [`np.save(save_path)`
](https://numpy.org/doc/stable/reference/generated/numpy.save.html#numpy.save)
 stores the data as a compressed npy file.
- [`np.savetxt(save_path)`
](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html)
 stores the data as a (uncompressed txt-file).



In [None]:
a_normal_50 = np.random.normal(size=(50,2))
np.save('./Data/RandomData.npy',a_normal_50) ## Saving the array as a compressed npy file (numpy data format)

a_arange_50 = np.arange(0,100,2)
np.save('./Data/StructuredData.npy',a_arange_50)

#numpy can additionally save to as a txt-file (uncompressed) formats like.
a_linspace_50 = np.linspace((1,2),(10,20),10)
np.savetxt('./Data/Txt_file.txt',a_linspace_50) ### saving data as a regular txt file, also possible to save as a csv file


## Loading data with numpy
### loading numpy data
The data can correspondingly be loaded with the numpy functions `np.load(path)`
 and `np.loadtxt(path)`
.


In [None]:
A = np.load('./Data/RandomData.npy') ## Loading data stored as a compressed npy file (numpy data format)

B = np.load('./Data/StructuredData.npy')

#load data stored as a txt/ (csv) file (uncompressed) formats like.
C = np.loadtxt('./Data/Txt_file.txt')


The loaded `numpy`
 arrays are printed in the cell below.


In [None]:
# Note A[:N] is only a slice i.e. the first N elements of A
print('A:\n',A[:5])
print('B:\n',B[:10])
print('C:\n',C[:5])


## Operate along dimensions
You will often create operations on arrays that contain multiple instances grouped together. For example, in machine 
learning, you often concatenate input vectors into matrices. In these instances you may want to perform operations along 
only one or some of the array axes.
As an example, let’s try to calculate the average of $N$ random vectors. We define an $N\times K$ matrix of random 
values:


In [None]:
N, K = 20, 10
r = np.random.uniform(size=(N, K))


Numpy provides a function `np.mean`
 for calculating averages. Using the `axis`
 argument, we specify that the average 
should be calculated over the rows:


In [None]:
np.mean(r, axis=0)


The `axis`
 argument is supported by most of Numpy’s functions, including `sum`
 and `sqrt`
.
## Essential Numpy array method for data analysis and manipulation
The next section covers essential methods for data analysis and manipulation. The following methods will be used abundantly throughout the course and are worth paying careful attention to.
- [`np.mean(Array,dim)`
](https://numpy.org/doc/stable/reference/generated/numpy.mean.html)
, [`np.std(Array,dim)`
](https://numpy.org/doc/stable/reference/generated/numpy.std.html)
: Calculated the mean value of a given numpy array of numbers (`floats`
 or `integers`
).
- `a.shape`
: Find the shape (dimensionality of a given data array), `Len(list/Array)`
 provides the length of the first list/Array dimension.
- `Slicing`
 using the `:`
 operator can call slices of an array A as `A[start:stop:step]`
. Read more in the official guide [here](https://numpy.org/doc/stable/user/basics.indexing.html)
.
- `Broadcasting`
 can be used to perform elementwise numpy operations between different dimensional array. Read more in the official guide [here](https://numpy.org/doc/stable/user/basics.broadcasting.html)
.
- `Elementwise Addition and multiplication`
, Adds of multiply array elemwise. This also works for more advanced operations e.g. exponentiation of an Array.
- [`np.concatenate(Array list, axis)`
](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)
: Stack numpy arrays along the direction of `axis`
.

`Numpy`
 can also be used to do linear algebra, but that will be covered in another tutorial. 
Next, we consider simple examples to demonstrate the functions above:


In [None]:
A = np.linspace(0,9,10)

B = np.array([
    [-16, 15, -14, 13],
    [-12, 11, -10, 9],
    [-8, 7, -6, 5],
    [-4, 3, -2, 1]
])

print('A:\n',A)
print('B:\n',B)


In [None]:
### Mean of an array 
# Using/calling the mean method from the numpy library to determine the mean of the loaded data.
print('Mean A:\n',np.mean(A)) 

# Most numpy array manipulation methods can additionally be called from an array object
print('Mean of using Array method:\n',A.mean())

# This equivalent way of calling the methods are possible for most numpy data manipulation methods. 
### Std of an array 
print('Std of A:\n',np.std(A))

### Sum of an array 
print('A sum:\n', np.sum(A))

### shape (size) of an array
print('A shape:\n',A.shape)
print('B shape:\n',B.shape)

## np.concatenation([A,B]) example
print('Concatenation of A and Slice of B matrix:\n',np.concatenate([A,B[0,:]],axis=0))


## Slicing of arrays


In [None]:
### Slicing of array
print(B[:,0])

print(A[:5])
print('A[5:], A array except the first 5:\n',A[5:])

print('A[:-5], A array except the last 5:\n', A[:-5])

print('A[1::2] array of every second elemt of A starting from the second:\n',A[1::2])


## Array Arithmetic


In [None]:
### Adding of array
print('Adding a slice of A shape (4,) to B shape (4,4) using broadcasting:\n',A[:4]+B)

print('Adding constant to A (10,) using broadcasting:\n',A+10)
print('Adding single element array (shape (1,)) to B (shape (4,4)) using broadcasting:\n',B  + np.array([10]))

### Elementwise multiplication of arrayLoading
print('Elementwise multiplication of a slice of A (shape (4,)) to B (shape (4,4)) using broadcasting:\n',A[:4]*B)

### Add division example
print('Elementwise division of a slice of B (shape (4,)) and A (shape (4,)):\n',B[0,:]/A[1:5])


## Comparison operators
Just as the elementwise arithmetic operators, Numpy implements elementwise comparison operators (see the [official guide](https://numpy.org/doc/stable/user/basics.indexing.html#boolean-array-indexing)
 for additional detail). For example, if we 
wanted to find elements of `vr`
 larger than $98$ we can use the following code:


In [None]:
vr = np.random.randint(100, size=10000) # Create array of random values


vr > 98


In itself this isn’t super useful. Luckily, Numpy supports a special indexing mode using an array of booleans to 
indicate which elements to keep.
To get the actual values we simply insert the comparison code into the index operator for the array:


In [None]:
vr[vr > 98]


Finally, we can combine boolean arrays by using the logical operators `&`
 and `|`



In [None]:
vr[(vr < 2) | (vr > 98)]


These tricks also work for assignment:


In [None]:
vr[vr > 50] = 0
vr[:10]


## Basics plotting with matplotlib
We start by importing the library:


In [None]:
# importing matplotlib.pyplot
import matplotlib.pyplot as plt


The `pyplot`
 module is a simple API for creating and manipulating plots using functions.
`plot`
 and `scatter`
 will be the most frequently used functions in this course.
- [`plot`
](https://matplotlib.org/stable/plot_types/basic/plot.html#sphx-glr-plot-types-basic-plot-py)
 is typically used for creating connected line segments described by x and y data.
- [`scatter`
](https://matplotlib.org/stable/plot_types/basic/scatter_plot.html#sphx-glr-plot-types-basic-scatter-plot-py)
 is used for plotting individual points, e.g. from a dataset.

Take a look at the following sample plot code and output:


In [None]:
## comment on the functions being elemtwise operations on simple numpy arrays. 
x_range = np.linspace(0, 5, 50) # simple linspace array
y_linear = x_range + 3 # adding to constant to the numpy array (broadcasting)
y_quadratic = x_range**2 # elementwise exponetiation
y_exp = np.exp(x_range) # exponential function applied elemtwise to x_range

plt.plot(x_range, y_linear)
plt.plot(x_range, y_quadratic)
plt.plot(x_range,y_exp);


**Notes:**
- Typical use of [`np.linspace`
](https://numpy.org/doc/stable/reference/generated/numpy.linspace.html)
: to create x-axis values for graphical plots of functions.
- Notice how `y_quadratic`
 is created using elementwise exponentiation.
- Similarly, `y_exp`
 is generated using the numpy function [`np.exp(x)`
](https://numpy.org/doc/stable/reference/generated/numpy.exp.html)

- Since Jupyter automatically outputs the last value returned in a cell, the [`plt.plot`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html)
 and similar functions return some text describing a figure object. This is simply the return value of the function and can be hidden by appending a `;`
 to the last call in a cell.

**Scatter plot**
Scatter plots work similarly but only plot the points without connections. In the example below, we create a quadratic function from the previously defined `x_range`
 and then add normally distributed random noise to it, and plot both the original (with [`plt.plot`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html)
) and noisy points (with [`plt.scatter`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.scatter.html#)
) to compare.


In [None]:
### Add comments to code, so the get introduces to each step of the implementation.
x_range = np.linspace(-10, 10, 50)
y_values = x_range**2

noise = np.random.normal(scale=5, size=50)
y_noise = y_values + noise

plt.plot(x_range, y_values)
plt.scatter(x_range, y_noise);


## Styling
Matplotlib allows customisation of the plots. Individual lines or point series can be customised. Here’s a short overview of the functionality:
- [`plt.plot`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.plot.html)
 takes a third argument, `format`
, which is used to adapt the styling of lines. Generally, a letter designating a color (e.g. `r`
,`g`
,`b`
) and a symbol designating line or point style (e.g. `+`
, `--`
) are combined to produce a format, e.g. `r+`
 to create red crosses.
- [`plt.scatter`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.scatter.html#)
 takes an argument `c`
 for the color (can be letter form or complete color names) and an argument `marker`
 for the marker style (e.g. `+`
, `o`
).

Here is a basic example:


In [None]:
plt.plot(x_range, y_values, 'r--')
plt.scatter(x_range, y_noise, c='green', marker='d');


Although it is possible to change colors manually, Matplotlib automatically assigns colors to lines and point series using an internally defined `style`
. The current style can be changed permanently using [`plt.style.use(style)`
](https://matplotlib.org/stable/api/style_api.html#matplotlib.style.use)
 or inside a `with`
 block using `plt.style.context(style)`
. A reference of built-in stylesheets can be found [here](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html)
. In the following cell is a simple example:


In [None]:
# We create some normal and uniformly distributed noise. (random data i.e. not structured)
xs, ys = np.random.normal(size=(2, 100))
xu, yu = np.random.uniform(size=(2,100))

with plt.style.context('seaborn'):
    plt.scatter(xs, ys, marker='+')
    plt.scatter(xu, yu, marker='x')


### Label, Title and Legends
You can add extra features such as a legend, title, and axis labels to plots easily. An overview and a simple example is provided below. 
- [`plt.legend(titles)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)
: Creates a legend using a list of `titles`
 for the names. Previously plotted elements are added in order.
- [`plt.suptitle(title)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
: Set plot title using string `title`
.
- [`plt.ylabel(name)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylabel.html)
/[`plt.xlabel(name)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.xlabel.html)
: Set plot axis labels.
- [`plt.legend(label_list)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)
: Set the data labels (not axis!) can be done with a label list or without input if labels are provided at each seperate plot.



In [None]:
with plt.style.context('seaborn'):
    plt.scatter(xs, ys, marker='+')
    plt.scatter(xu, yu, marker='x')
    plt.legend(['normal', 'uniform'])
    
    plt.suptitle('Distribution comparison')
    plt.ylabel('Y')
    plt.xlabel('X')


## Combining plots
Matplotlib makes it possible to combine multiple plots, a feature you will likely use often. This will introduce some more object-oriented aspects but the API luckely remains largely the same. 
To create a plot with multiple sub-plots, use the function [`plt.subplots`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
. This is similar to `plt.figure`
 used in regular Python scripts for creating a new Matplotlib figure. The function returns a _figure_ object and an array of _axes_ objects. These are then used to fill in each subplot, add titles, and so forth. Examine the code below for a usage sample:
**Notes:**
- subplots get be generated in multiple ways you are welcome to explore other alternatives.



In [None]:
fig, ax = plt.subplots(2, 2, figsize=(7, 5))

ax[0, 0].plot(x_range, y_linear)
ax[0, 1].plot(x_range, y_quadratic)
ax[1, 0].scatter(xs, ys)
ax[1, 1].plot(x_range, y_values)
ax[1, 1].scatter(x_range, y_noise);


## Saving plots
It is possible to save figures directly from a GUI or programatically.
To save a plot, simply call [`plt.savefig(output_path)`
](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.savefig.html)
 on a _figure_ object. The object can either be obtained from a `plt.subplots`
 or `plt.figure`
 call. A simple example is provided below:


In [None]:
plt.plot(y_quadratic)

plt.savefig('./Data/outputs.pdf')


That’s all for this first little tutorial, hope you find it somewhat helpful. 
