<a id='Q1'></a>
<div class=" alert alert-info"> 
    
# <center>Introduction to Python</center>
    
</div>

This notebook provides a quick introduction to the programming language Pyhton. You will learn how to load, manipulate and visualize data. In particular, you will learn about: 

<a href='#Q1'>Python Libraries</a>

<a href='#Q2'>Built-in Functions in Python</a>

<a href='#Q3'>Immutable Data Types</a>

<a href='#Q4'>Iterations</a>

<a href='#Q5'>NumPy Arrays</a>

<a href='#Q6'>Matrices in Numpy</a>

<a href='#Q7'>Plotting with matplotlib</a>

<a href='#Q8'>User defined functions</a>

<a href='#Q9'>Measuring execution time</a>

<a href='#Q10'>Dictionaries and Dataframes </a>

<a href='#Q11'>Loading Data</a>

<a href='#Q12'>Working with images using Numpy</a>

<a id='Q1'></a>
<div class=" alert alert-info"> 
    
# <center>Python Libraries</center>
    
</div>

Python programs can import functions from libraries or so called packages. Some of the most commonly used Pyhton libraries are:


**NumPy** - (Numerical Python) for operations involving arrays of numbers. One-dimensional numpy arrays are used to represent Euclidean vectors. Two-dimensional numpy arrays can represent matrices and higher-dimensional arrays represent tensors.

https://numpy.org/

**Pandas** - This library provides functions for reading in data from files and also functions for data visualization. 

https://pandas.pydata.org/docs/

**Matplotlib and Seaborn** - This library provides more powerful tools for data visualization such as plotting time series or images. 

https://matplotlib.org/3.1.1/contents.html

https://seaborn.pydata.org/

**PIL** - (Python Image Library) This library provides methods for reading in image data from files or converting between different image formats. 

https://python-pillow.org/

**OpenCV** - This library provides computer vision methods such as edge detectors. 

https://opencv.org/


**Scikit-learn** - This library provides implementations of several basic machine learning methods, such as linear regression, decision trees and clustering methods. 

https://scikit-learn.org/stable/




 <b><center><font size=4>How To Use A Python Library</font></center></b>

In order to make use of the functions provided by a library, it must first be imported via the command

`import <library name> as <short name>`

for example

`import numpy as np`

Missing imports of libraries are the main cause of error message

`NameError: <short name> is not defined`

The error message

`NameError: np is not defined`

arises if a function of a library "np" is used, where the library has not been imported beforehand. 

<a id='Q2'></a>
<div class=" alert alert-info">  
    
# <center>Built-In Functions in Python</center>
</div>

Beside functions provided by libraries, Python also provides several built-in functions. One example of a built-in function is `print(a)` which displays the value of the variable `a`. Another built-in function is `enumerate()` which creates an indexed list for a given sequence. You can find an overview of built-in functions here: 

https://docs.python.org/3/library/functions.html

<a id='Q3'></a>
<div class=" alert alert-info"> 
    
# <center>Immutable Data Types</center>
    
</div>

Python distinguishes between mutable and immutable objects or variables. An immutable object cannot be modified after it has been created. One important example of an immutable object is `range(n)` which is a sequence of n numbers starting at 0. See https://docs.python.org/3/library/stdtypes.html#range for more information. 

In particular, you can create a sequence of numbers using the built-in function  
`range(start, stop[, step])`
This built-in function creates the sequence [start,start+step,start+2*step,...]. If the argument `step` is omitted, it defaults to 1. If the `start` argument is omitted, it defaults to 0. 

<div class=" alert alert-danger"> 
    
<center><font size="5"><b>Caution!</b></font></center>
    
<p><center><font size="4">In <code>range(stop)</code> the sequence starts from 0 and does not include stop value </font></center></p>
    
</div>

In [None]:
# create 'range' object which represents a sequence 0,1,...,9
myrange = range(10)
# create list from iterable 'range' object
mylist  = list(myrange)
# print objects and their data type
print("myrange =", myrange, "data type =", type(myrange))
print("mylist =", mylist, "data type =", type(mylist))

In [None]:
# Start and stop arguments are passed

mylist = list(range(1,11))
mylist

In [None]:
# Start, stop and step arguments are passed

mylist = list(range(0,10,2))
mylist

<a id='Q4'></a>
<div class=" alert alert-info"> 
    
# <center>Iterations</center>

    
</div>

One of the main use of `range` data type is to create loops that iterate over a seqeunce of values. 

In [None]:
# create a sequence consisting of four words 
some_sequence = ["hi","how","are","you"]

# loop over the sequence of elements 
for word in some_sequence:
    print(word)

<div class=" alert alert-danger"> 
    
<center><font size="5"><b>Caution!</b></font></center>

<p><center><font size="4">Indexing in Python starts by default at 0 (and not at 1!)</font></center></p>
    
</div>

In [None]:
# create a sequence consisting of four words
some_sequence =  ["hi","how","are","you"]
# find the length of the list
length = len(some_sequence)

# loop over the sequence of indices (0,1,2,3)
for i in range(length):
    print("index: {} value: {}".format(i, some_sequence[i]))  

In [None]:
# Nested for loops

# create a list 
mylist = [[1,2,3],[4,5,6],[7,8,9]]

# outer loop
for i in range(len(mylist)):
    print("\nouter loop index: {}  values: {}\n ".format(i, mylist[i]))
    
    # inner loop
    for j in range(len(mylist[0])):
        print("inner loop index: {}  value: {} ".format(j, mylist[i][j]))


In [None]:
# Iterating with enumerate() 
# It takes as an input iterable object and returns tuple in a form of (index, element) 

# create a list 
some_sequence =  ["hi","how","are","you"]

# loop over elements of a list 
for ind, val in enumerate(some_sequence):
    print("index: {} value: {}".format(ind, val))  

if you need to iterate over two sequences of the same size, you can use the built-in function `zip()`

In [None]:
# Iterating multiple lists with zip()

# create lists
some_sequence =  ["one","two","three","four"]
another_sequence =  ["eins","zwei","drei","vier"]

# loop over two lists at the same time
for val1, val2 in zip(some_sequence, another_sequence):
    print(val1, val2)

In [None]:
# Iterating multiple lists with zip() and enumerate()

# create lists
some_sequence =  ["one","two","three","four"]
another_sequence =  ["eins","zwei","drei","vier"]

# loop over two lists at the same time
for ind, (val1, val2) in enumerate(zip(some_sequence, another_sequence)):
    print("index: {} \nvalue mylist1: {}, value mylist2: {}".format(ind, val1, val2))  

<a id='Q5'></a>
<div class=" alert alert-info"> 
    
# <center>Numpy Arrays</center>
    
</div>

The Python library `numpy` provides implementations of many matrix operations as well as other useful features, such as random number generators. Many functions of this libary are based on the data type "numpy array". A numpy array is an object that stores $N$-dimensional arrays of numbers where $N$ is the number of dimensions. The shape of a numpy array is given by a sequence of $N$ integers that indicate the number of "elements" in each dimension. Maybe the most important special case of numpy arrays is when $N=1$, corresponding to vectors, or when $N=2$ for matrices. A matrix with $5$ rows and $2$ columns is represented by a numpy array of shape $(5,2)$. 

Some additional resources to learn more about `numpy` arrays and related operations can be found here:

https://numpy.org/

https://www.youtube.com/watch?v=xECXZ3tyONo

Using NumPy arrays allows for **vectorized computation** which allows, in turn, faster code execution 

https://www.pythonlikeyoumeanit.com/Module3_IntroducingNumpy/VectorizedOperations.html

https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html

**<center><font size="4">Creating NumPy Arrays</font></center>**

In [None]:
# import the library "numpy" which provides functions for matrices and vectors 
import numpy as np

# convert a sequence 0,1,..,9 to a numpy array `myarray1`
mylist  = [0,1,2,3,4,5,6,7,8,9]
myarray1 = np.array(mylist)

# use range() to create a numpy array
myarray2 = np.array(range(10))

# use np.arange() function to create a numpy array
myarray3 = np.arange(10)

# print values of the arrays
myarray1, myarray2, myarray3

In [None]:
# create an array (6 rows, 3 columns) with zeros
zeroarray = np.zeros((6,3)) 

# create an array (6 rows, 3 columns) with ones
onesarray = np.ones((6,3))

print(zeroarray,'\n')
print(onesarray)

In [None]:
# Pass lists directly to create 2D array
myarray = np.array([[1,2,3], [4,5,6]])

# Check the array dimensions with .shape attribute (rows, columns)
print("Number of rows: {} \nNumber of columns: {}".format(myarray.shape[0], myarray.shape[1]))
print(myarray)

<div class=" alert alert-danger"> 
    
<center><font size="5"><b>Caution!</b></font></center>
    
<p><center><font size="4">A numpy array of shape (n,1) is of different data type than a numpy array of shape (n,)</font></center></p>
       
</div>

In [None]:
# Note! Array of shape (n,1) is not equal to the array of shape (n,)
# Use .shape attribute to check the array's dimensions
# Use .reshape() function to get the array with desired dimensions

myarray1 = np.array(range(10))
myarray2 = np.array(range(10)).reshape(-1,1)

myarray1.shape, myarray2.shape

**<center><font size="4">Slicing and Combining NumPy Arrays</font></center>**

In [None]:
# Access element of the array by index
# Note! Indexing starts with 0

# 1D array
myarray = np.arange(10,0,-1)
print(myarray)
print("First element of the array: {}\n".format(myarray[0]))

# 2D array
myarray = np.array([[1,2,3],[4,5,6]])
print(myarray)
print("2nd row, 3rd column element of the array: {}\n".format(myarray[1,2]))

# Conditional indexing - print values of the array larger than 2
myarray = np.array([[1,2,3],[4,5,6]])
print(myarray)
print("Values >2: {}\n".format(myarray[myarray>2]))

In [None]:
# Slicing numpy array
# create numpy array with shape=(4,5)
myarray = np.array([[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15], [16,17,18,19,20]])

# print the values of the array
print('\n',myarray, "   array shape is ", myarray.shape)
# print the values of the array located at the rows 1,2 and columns 2,3
print("\nSliced array:\n", myarray[:2,1:3])

Some more examples of numpy array slicing

<img src="../../../coursedata/R0_Intro/nparray_slicing.jpg" style="height: 500px;"/>


In [None]:
# Stack arrays vertically (row wise)
myarray = np.zeros((2,5))
print(np.vstack([myarray, myarray+2]),'\n')

# Stack arrays horizontally (column wise)
myarray = np.zeros((2,5))
print(np.hstack([myarray, myarray+2]))

**<center><font size="4">Viewing and Copying NumPy Arrays</font></center>**

Consider a numpy array `a` of shape (5,1). Assume you create a slice `b` which consists of the first two elements of `a` via `b=a[0:1]`. It is then important to be aware that the variable `b` is merely a pointer (or reference) to the first two entries of `a`. Thus, when you modify the slice by `b[0] = 10`, you will simultaneously modify the first entry of `a`. If you want the slice to become a new object you need to copy the slice using the function `copy()`. 

<div class=" alert alert-danger"> 
    
<center><font size="5"><b>Caution!</b></font></center>
    

<p><center><font size="4">Modification of an array slice will modify the original array</font></center></p>
    
</div>

In [None]:
# Slice view, creates view of the array and any modification of it will update that array.

# create the array 
myarray = np.arange(10)
# print values of the original array
print("Original array: ", myarray)

# assign the slice (view of the array)to a new variable 'myslice'
myslice = myarray[5:10]
# print values of variable 'myslice'
print("\nSlice of the array: ", myslice)

# modify variable 'myslice'
myslice[:] = 0

# print values of the original array and modified variable 'myslice'
print("\nModified slice of the array: ", myslice)
print("\nOriginal array: ", myarray)

In [None]:
# Copying array, creates a different object, original array is not modified.

# create the array 
myarray = np.arange(10)
# print values of the original array
print("Original array: ", myarray)

# assign the slice (copy of the array) to a new variable 'myslice'
myslice = np.copy(myarray[5:10])
# print values of variable 'myslice'
print("\nCopy of the array: ", myslice)

# modify variable 'myslice'
myslice[:] = 0

# print values of the original array and modified variable 'myslice'
print("\nModified copy of the array: ", myslice)
print("\nOriginal array: ", myarray)

**You can find further reading about view and copy of NumPy Arrays here:**

https://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html

**<center><font size="4">Operations on NumPy Arrays</font></center>**

In [None]:
# create two numpy arrays

x = np.arange(10)
y = np.arange(20,30)
x, y

In [None]:
# elementwise addition and substraction

print(x + y)
print(x - y)

In [None]:
# elementwise multiplication and division

print(x * y)
print(x / y)

In [None]:
# elementwise power

print(x**2)

In [None]:
# create numpy array
x = np.arange(10,0,-1)

# useful numpy array functions:
# sum of elements
x_sum = x.sum()

# maximum and minimum values
x_max = x.max()
x_min = x.min()

# indices of maximum and minimum values
x_indmax = x.argmax()
x_indmin = x.argmin()

print(x)
print("\nSum of the array: ", x_sum) 
print("\nMaximum and minimun values: {}, {} \nIndices of maximum and minimum values: {}, {}".format(
      x_max, x_min, x_indmax, x_indmin))

**<center><font size="4">Dot Products of Numpy Arrays</font></center>**

The [dot product](https://en.wikipedia.org/wiki/Dot_product) between two vectors, i.e. one-dimensional numpy arrays, of the same length is defined as
\begin{equation}
\big(\mathbf{x}_{1},\mathbf{x}_{2},\ldots,\mathbf{x}_{m}\big)  \cdot \begin{pmatrix} \mathbf{y}_{1} \\ \mathbf{y}_{2} \\ \vdots \\ \mathbf{y}_{m} \end{pmatrix} = \mathbf{x}_{1}\mathbf{y}_{1}+\mathbf{x}_{2}\mathbf{y}_{2}+\ldots+\mathbf{x}_{m}\mathbf{y}_{m}
\end{equation} 
Geometrically, it is the product of the Euclidean distances of the two vectors and the cosine of the angle between them. 

The dot product is also defined for numpy arrays with more than one dimension (see [numpy documentation](https://numpy.org/doc/stable/reference/generated/numpy.dot.html?highlight=dot#numpy.dot) for more info). 


In [None]:
# create two numpy arrays
x = np.arange(3)
y = np.arange(3,6)

# display the values of two arrays
print(x,y)
# dot product 0*3+1*4+2*5
x.dot(y) 

**<center><font size="4">Broadcasting</font></center>**

Sometimes we need to add the same constant value to all entries of a numpy array. Consider a numpy array `a` of arbitrary size and a numpy array `b` containing a single number. We would like to be able to write `a+b` to get a numpy array whose entries are given by adding the value in `b` to all entries in `a`. The concept of "broadcasting" for numpy arrays makes this possible! 

Find more information here:

https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

https://numpy.org/devdocs/user/theory.broadcasting.html

In [None]:
# It is possible to do operations with different size arrays - broadcasting
# create two numpy arrays
x = np.array([[1,2,3], [4,5,6], [7,8,9], [10,11,12]])
y = np.ones((1,3))

# display the values of two arrays
print("x = ", x)
print("\n", "y = ", y)

# print the result of arrays addition
print("\n\n x+y = ", x+y)

<a id='Q6'></a>
<div class=" alert alert-info"> 
    
# <center>Representing Matrices using numpy Arrays</center>
    
</div>

In many applications it is natural to represent data as a matrix, which is the special case of a two dimensional numpy array. Consider a grayscale image which we can represent by a matrix whose entries represent grayscale values of individual pixels. 

https://en.wikipedia.org/wiki/Matrix_(mathematics)

We will disccuss how to represent our data as a matrix for further analyses in the next round. 




Now we will use numpy arrays to create matrix $\mathbf{X}$ with $m$ rows and $n$ columns 
\
\
\begin{equation}
\mathbf{X}  = \begin{pmatrix} X_{1,1} & X_{1,2} & \ldots & X_{1,n} \\ 
X_{2,1} & X_{2,2}& \ldots & X_{2,n} \\ 
\vdots & \vdots & \vdots & \vdots \\ 
X_{m,1} & X_{m,2} & \ldots & X_{m,n} \end{pmatrix}\in \mathbb{R}^{m \times n}
\end{equation} 
and matrix $\mathbf{Y}$ with $n$ rows and $m$ columns  
\
\
\begin{equation}
\mathbf{Y}  = \begin{pmatrix} Y_{1,1} & Y_{1,2} & \ldots & Y_{1,m} \\ 
Y_{2,1} & Y_{2,2}& \ldots & Y_{2,m} \\ 
\vdots & \vdots & \vdots & \vdots \\ 
Y_{n,1} & Y_{n,2} & \ldots & Y_{n,m} \end{pmatrix}\in \mathbb{R}^{n \times m}
\end{equation}  


and perform matrix multiplication to compute the product $\mathbf{X}\mathbf{Y}$. 

[Matrix multiplication](https://en.wikipedia.org/wiki/Matrix_multiplication) is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The result matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix.

In Python, matrix multiplication can be performed using NumPy with the `@` operator, which is equivalent to the function `numpy.matmul()`, or with the `numpy.dot()` function.

In [None]:
# create an array of length m*n
m = 4
n = 3
array = np.arange(m*n)

# create matrix X represented as a numpy array of shape (m,n)
X = array.reshape(m,n)
dimension=np.shape(X)      # determine dimensions of matrix X
rows = dimension[0]        # first element of "dimension" is the number of rows 
cols = dimension[1]        # second element of "dimension" is the number of cols
print("the matrix X has", rows, "rows and", cols, "columns \n") 

# create matrix Y represented as a numpy array of shape (n,m)
Y = array.reshape(n,m)
dimension=np.shape(Y)      # determine dimensions of matrix Y
rows = dimension[0]        # first element of "dimension" is the number of rows 
cols = dimension[1]        # second element of "dimension" is the number of cols
print("the matrix Y has", rows, "rows and", cols, "columns \n") 

# matrix multiplication of X and Y
XY = X @ Y
# print the result of matrix multiplication 
print("the product XY=X@Y is XY = \n", XY) 
# print the shape of the XY matrix
print("\n the matrix XY has", XY.shape[0], "rows and", XY.shape[1], "columns \n") 

Note, that order of matrix multiplication is important and that A*B is elemet-waise multiplication, and not the matrix multiplication.

In [None]:
# For matrix multiplication A.dot(B) or A@B can be used
print("\nMatrix multiplication X@Y:\n\n", X @ Y)

# Order is important in matrix multiplication - A@B != B@A
print("\nMatrix multiplication Y@X:\n\n", Y @ X)

# Square of the matrix element-wise
Z = np.arange(9).reshape(3,3)
print("\nMatrix Z:\n\n", Z)
print("\nSquare - element-wise Z*Z:\n\n", Z**2)

# Square of the matrix by matrix multiplication
print("\nSquare - matrix multiplication Z@Z:\n\n", Z @ Z)

<a id='Q7'></a>
<div class=" alert alert-info"> 
    
# <center>Plotting with Matplotlib</center>
    
</div>

Matplotlib is a library that provides plotting functionality for Python. Good introductory tutorials for Matplotlib can be found at https://matplotlib.org/tutorials/index.html.

A useful command for creating a plot in Python is 

`fig, axes = plt.subplots()`

`plt.subplots()` return figure and axes (Axes object or array of Axes objects)

In [None]:
# Plotting line and scatter plot

# the library "pyplot" provides functions for plotting data 
import matplotlib.pyplot as plt

np.random.seed(42)

# create numpy arrays
x1 = np.linspace(10,100,50)
y1 = x1**2

# generate 100 realizations of a Gaussian random variable 
x2 = np.random.rand(100,)
y2 = np.random.rand(100,)

# create figure and axes objects
fig, axes = plt.subplots(1,2)
# plot a line in 1st subplot
axes[0].plot(x1,y1,c='r')
# plot scatter in 2nd subplot
axes[1].scatter(x2,y2)

# set axes labels for 1st subplot 
axes[0].set_xlabel("x1")
axes[0].set_ylabel("y1")
# set axes labels for 2nd subplot 
axes[1].set_xlabel("x2")
axes[1].set_ylabel("y2")
# set titles
axes[0].set_title('plot 1')
axes[1].set_title('plot 2')

# adjust subplots so the labels of different axes are not overlapping 
fig.tight_layout()
# display plot
plt.show()

In [None]:
# Plotting 3D scatter plot

# the library "pyplot" provides functions for plotting data 
import matplotlib.pyplot as plt
# the library "mplot3d" provides functions for plotting 3D data 
from mpl_toolkits.mplot3d import Axes3D

np.random.seed(42)

# generate 100 realizations of a Gaussian random variable 
x = np.random.rand(100,)
y = np.random.rand(100,)
z = np.random.rand(100,)

# create figure and axes objects
fig = plt.figure()
# add a new Axes3D axes to figure:
axes = fig.add_subplot(111, projection='3d')
# plot 3D scatter
axes.scatter(x,y,z)

# set axes labels 
axes.set_xlabel("x")
axes.set_ylabel("y")
axes.set_zlabel("z")

# set title
axes.set_title('3D scatter plot',fontweight='bold')

# display the plot
plt.show()

In [None]:
# Plotting 2D plot with meshgrid 

# create numpy arrays
x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)

# create the grid
xx, yy = np.meshgrid(x, y)

# plot the grid 
plt.plot(xx,yy,marker='.', color='k', linestyle='none')

# set axes labels 
plt.xlabel("x")
plt.ylabel("y")
# set title
plt.title('xy grid', fontweight='bold')

# display the plot
plt.show()

In [None]:
# Plotting 3D plot with meshgrid 

# create numpy arrays
x = np.linspace(0,1,1000)
y = np.linspace(0,1,1000)

# create grid of numbers
X, Y = np.meshgrid(x, y)
Z = 2*X**2 + 4*Y**2

# print shapes of X,Y,Z numpy arrays
print("X.shape={}, Y.shape={}, Z.shape={}".format(X.shape, Y.shape, Z.shape))

# create figure and axes objects
fig = plt.figure()
# add a new Axes3D axes to figure:
axes = fig.add_subplot(111, projection='3d')
# plot 3D surface 
axes.plot_surface(X, Y, Z, cmap='jet')

# set axes labels 
axes.set_xlabel("x")
axes.set_ylabel("y")
axes.set_zlabel("z")

# set title
axes.set_title('Surface plot', fontweight='bold')

# display the plot
plt.show()

In [None]:
# Plotting with Seaborn

# import "seaborn" library for plotting
import seaborn as sns

np.random.seed(42)

# generate 100 realizations of a Gaussian random variable 
x = np.random.randn(100)

# create figure and axes objects
fig, axes = plt.subplots(1,3,figsize=(10,4))

# set the labels for the plots
label1 = "histogram, plot 1"
label2 = "density estimation, plot 2"
label3 = "density estimation, plot 3"

# plot histogram obatined from the realizations stored in x 
sns.distplot(x, ax=axes[0], kde=False, label=label1)
# plot density estimation obatined from the realizations stored in x
sns.distplot(x, ax=axes[1],label=label2, rug=True, hist=False,  color="r")
# plot histogram and density estimation 
sns.distplot(x, ax=axes[2], rug=True, rug_kws={"color": "g"},
                  kde_kws={"color": "k", "lw": 3, "label": label3},
                  hist_kws={"histtype": "step", "linewidth": 3,
                            "alpha": 1, "color": "g"})
# adjust legends 
axes[0].legend( prop=dict(size=12),loc='upper center', bbox_to_anchor=(4.1, 1.))
axes[1].legend( prop=dict(size=12),loc='upper center', bbox_to_anchor=(3, 0.85))
axes[2].legend( prop=dict(size=12),loc='upper center', bbox_to_anchor=(1.8, 0.7))
# display the plot
plt.show()

<a id='Q8'></a>
<div class=" alert alert-info"> 
    
# <center>User-Defined Functions</center>
    
</div>

Like in other programming languages, users can define their own functions in Python. Below are three examples that present the basic syntax of function definitions.

The code snippet below shows how to define a function `multiply` which reads in two arguments `x`and `y`. This function computes the product of the arguments and returns it. 

In [None]:
# define a function
def multiply(x,y):
    '''   
    this function takes input x and y
    and returns multiplication of x and y
   
    '''
    return x*y

# apply the function 
y = multiply(2,3)

# print the result 
print(y)
# print the data type of the result
print(type(y))

The function `index_value()` is an example of a function containing for loop and **NOT** returning any output

In [None]:
import numpy as np

# define a function
def index_value(x):
    '''
    this function takes as input number, makes an np.array and 
    prints out value and index of each array element
    
    Note! In this example function does not return any output, 
    but only prints out the index and value
    '''
    x_array = np.arange(x)
    for i in range(x):
        print("index={} and value={}".format(i, x_array[i]))

# apply the function 
y = index_value(10)

# print the data type of the result
# you can see that variable 'y' is a NoneType, because the function does not return any output
print(type(y))

In [None]:
# Apply function to iterable with map(function, iterable)

# define a function
def square(x):
    return x**2

# create a numpy array
x = range(10)
# map the function to all elements of the array 'x'
y = map(square,x)

# display array 'x' and array 'y'
list(x) , list(y)

<a id='Q9'></a>
<div class=" alert alert-info"> 
    
# <center>Measuring Execution Time</center>
    
</div>

Sometimes it is useful to know how much time your code takes to execute. This is especially useful in applications involving massive datasets ("Big Data"). The execution time of your ML methods often translate directly into monetary costs. Indeed, nowadays you can rent computational infrastructure at an hourly rate [click here](https://aws.amazon.com/pricing/)). Thus, the faster your ML method runs the less you have to pay! 

Below we will go through a simple example on how to measure the execution time of a code block. 
Note the difference in execution time for creating a list with for loop and numpy array operation.    

In [None]:
import time # import standard library time

# lets measure the time it takes us to create a list with 1000000 values and for loop
start_time = time.time() # save starting time to variable "start_time". Time is saved in seconds.

x = [] # initialize x list
for i in range(1000000): # loop 1000000 times
    x.append(i+1)     # Add one element to the list  
                      # and increase the value of the element by one with each interation
end_time = (time.time() - start_time)*1000 # Print the difference in time, multiply by 1000 to get time in milliseconds.

print("--- %s milliseconds ---" % (end_time)) # print the variable "end_time"

In [None]:
import numpy as np
import time # import standard library time

# lets measure the time it takes us to create the same values with numpy array
start_time = time.time() # save starting time to variable "start_time". Time is saved in seconds.

x = np.arange(1000000) # Initialize np.array
x +=1                  # Increase the value of all elements by one

end_time = (time.time() - start_time)*1000 # Print the difference in time, multiply by 1000 to get time in milliseconds.

print("--- %s milliseconds ---" % (end_time)) # print the variable "end_time"

<a id='Q10'></a>
<div class=" alert alert-info"> 
    
# <center>Dictionaries and Dataframes</center>
    
</div>

**<center><font size="4">Dictionary</font></center>**

The raw data used in ML methods are typically not directly available as numpy arrays. While data is nothing but a (huge) pile of bits, some applications involve data that can be conveniently represented using a **Python dictionary**. 
[Dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries), also known as **associative arrays** are data structures that consist of a collection of key-value pairs. As we will see, Python dictionaries provide a convenient interface to data stored in files or online databases. 

In [None]:
# Lets define a simple dictionary consisting of three key-value pairs.
# Keys   - names of countries from Norhern Europe 
# Values - indicate the capital city for each country

# initialize dictionary C
C = {'Finland':'Helsinki',
    'Sweden':'Stockholm',
    'Norway':'Oslo'} 

print('The type of the variable C is:', type(C))  # print the type of the variable "C"
print(C) # print out the dictionary contents

In [None]:
# Accessing Dictionary values by key
print("C['Finland']:", C['Finland']) # prints out "Helsinki"

In [None]:
# If you refer to a key that is not in the dictionary, i.e. print(C['Denmark']), 
# Python raises an exception "KeyError: 'Denmark'"
print(C['Denmark'])

In [None]:
# Adding an entry to an existing dictionary is simply a matter of assigning a new key and value:
C['Denmark'] = 'Copenhagen'   # adds a dictionary entry with key "Denmark" and value "Copenhagen"

print("C['Denmark']:", C['Denmark']) # Print out dictionary value with key "Denmark"
print(C) # print dictionary "C"

In [None]:
# Remember, you can’t treat a dictionary like a list or numpy array, e.g. indices don't work.

# Dictionaries can contain numerical and string values as keys and values:

# Create a dictionary where values are numbers
population = {'Finland':5500000,
                'Sweden':10000000,
                'Norway':5250000} 
print("The population of Finland is ", population['Finland'], " persons") # print the value of dictionary with key "Finland"

# Create a dictionary where keys are numbers
numbers = {3:'prime',
            6:'not prime',
            7:'prime'} # Create a dictionary "Numbers"
print("The number 3 is", numbers[3]) # print a value of dictionary Numbers with key "3"

**<center><font size="4">Data Frames</font></center>**

The library `Pandas` provides the class (object type) `DataFrame`. A `DataFrame`is a two-dimensional (with rows and columns) tabular structure. Dataframes are convenient for storing and manipulating heterogeneous data such mixtures of numeric and text data. 

In [None]:
# import 'pandas' - library providing high-performance, easy-to-use data structures and data analysis tools 
import pandas as pd

# create list
mylist = ['dogs','cats','mice','rats']

# create dataframe from list
df = pd.DataFrame(mylist)
df

In [None]:
# create dictionary
mydict = {'animal':['cat', 'dog','mouse','rat'],
         'name':['Fluffy','Chewy','Squeaky','Spotty'],
         'age, years': [3,5,0.5,1]}

# create dataframe from dictionary
df = pd.DataFrame(mydict, index=['id1','id2','id3','id4'])
df

In [None]:
# Accessing DataFrame elements

# access row by name with .loc 
print(df.loc['id1'])

# access row by index with .iloc 
print('\n', df.iloc[0])

In [None]:
# access column by name with .loc 
print(df.loc[:,'animal'])

# accsss column by name without .loc 
print('\n', df['animal'])

# access column by index with .iloc 
print('\n', df.iloc[:,0])

In [None]:
# access specific row and columns by name with .loc
print(df.loc['id1',['animal','name']])

# access specific row and columns by index with .iloc
print('\n', df.iloc[0,[0,1]])

<a id='Q11'></a>
<div class=" alert alert-info"> 
    
# <center>Loading data in Python</center>
    
</div>

In [None]:
# Loading from .csv file by using pandas DataFrame structure

import pandas as pd

# load the .csv file with pandas 
df = pd.read_csv('../../../coursedata/R0_Intro/Data.csv')

# check the shape of the dataframe
print("Shape of the dataframe: ",df.shape)
print("Number of dataframe rows: ",df.shape[0])
print("Number of dataframe columns: ",df.shape[1])

# print first 5 rows 
df.head()

In [None]:
# Convert dataframe to numpy array

# DataFrame.values return a Numpy representation of the DataFrame.
X = df.values
X

With `pd.read_` it is possible to read also excel,json,html,sql and many others types of files:

https://pandas.pydata.org/pandas-docs/stable/reference/io.html

**<center><font size="4">Load Data from Helsinki city map service </font></center>**

Here is an example how to load and save image from Helsinki city map service https://kartta.hel.fi/. More info can be found here https://www.hel.fi/helsinki/en/maps-and-transport/city-maps-and-gis/geographic-information-data/open-geographic-data

The code snippet below demonstrates how to read in information from public geoinformation systems (GIS) using the Python package OWSLib (see https://geopython.github.io/OWSLib). After downloading a patch of the map covering Helsinki city area, we save this patch in the file "HelsinkiPatch.jpg" in the course data folder. 

In [None]:
# the library owslib provides functions for accessing geospatial (location) information 
# and services (like kartta.hel.fi)
from owslib.wms import WebMapService # import WebMapService from library owalib.wms
# the library io provides functions for handling data in the form of bitstreams ("raw" data)
import io                            # import library io
# the library numpy provides functions for matrices and vectors 
import numpy as np                   # import library numpy as np
# the library matplotlib.pyplot provides functions for plotting data 
import matplotlib.pyplot as plt      # import library matplotlib.pyplot as plt
# The Python Imaging Library (PIL) provides helpful functions for image processing 
from PIL import Image

# get the helsinki map
wms = WebMapService('https://kartta.hel.fi/ws/geoserver/avoindata/wms', version='1.1.1')

# select the coordinate system to be used 
# https://en.wikipedia.org/wiki/EPSG_Geodetic_Parameter_Dataset
cs = 'EPSG:4326'  

# specify region of Helsinki city area map
xmin = 24.92      # x-coordinate of bottom-left corner 
ymin = 60.15      # y-coordinate of bottom-left corner 
xmax = 24.99      # x-coordinate of upper-right corner
ymax = 60.20      # y-coordinate of upper-right corner

# divide Helsinki area into 50 by 50 patches
nr_patches_x = 50 
nr_patches_y = 50 

# determine dimensions of one single patch
patch_x = (xmax-xmin)/nr_patches_x  
patch_y = (ymax-ymin)/nr_patches_y

# choose one particular patch and determine the corresponding bounding box 
nr_x = 11
nr_y = 10
patch_box = (xmin+nr_x*patch_x,ymin+nr_y*patch_y,xmin+(nr_x+1)*patch_x,ymin+(nr_y+1)*patch_y) 

# choose a layer from the map service (for a list of available layers visit https://kartta.hel.fi)
ortholayer = 'avoindata:Ortoilmakuva_2019_5cm' 
# set the resolution in number of pixels used in each direction 
res = (1000,1000) 
# get the image based on patch parameters (see above linked documentation for more details)
img = wms.getmap(layers=[ortholayer],srs=cs,bbox=patch_box,size=res,format='image/jpeg',transparent=True) 

# convert the raw image data into an image object 
pic = Image.open(io.BytesIO(img.read())) 

# convert image object into a numpy array 
# each entry of this numpy array represents a particular pixel of the image
X = np.array(pic, dtype='uint8') 
# initialize a plot figure of size 10 x 10 inches
fig = plt.figure(figsize=(10,10))  
# add the image to the plot
plt.imshow(X) 
# display the plot
plt.show()

<a id='Q12'></a>
<div class=" alert alert-info"> 
    
# <center>Working with Images in Numpy</center>
    
</div>

Numpy arrays can represent RGB or grayscale images and, in the other direction, images can be stored as numpy arrays.

In [None]:
# Numpy array represented as RGB image
# define size of the image in pixels
width = 40   # number of columns in numpy array X
height = 30  # number of rows in numpy array X

# create 30x40x3 numpy array with values between 0 and 255
X = np.random.randint(0,256,width*height*3).reshape(height,width,3)

# create a plot of size 4 by 4 inches 
fig = plt.figure(figsize=(4,4))

# remove all axes from the plot
plt.axis('off')
# display numpy array X as RGB image
plt.imshow(X)
plt.show()

<img src="../../../coursedata/R0_Intro/ImageNumpy.jpg" style="height: 600px;"/>

The code snippet below demonstrates how to read in a RGB image from a JPG file and transform it to a grayscale image. This can be done conveniently using the Python library `PIL` (see https://pillow.readthedocs.io/en/stable/reference/Image.html)


In [None]:
# The Python Imaging Library (PIL) provides helpful functions for image processing 
from PIL import Image # import Image from library PIL

# construct a variable "filename" which contains the relative path of the jpg file 
filename = "../../../coursedata/R0_Intro/HelsinkiPatch.jpg"

# read in the jpg file 
imagedata = Image.open(filename)

# Check some properties of the image 
print("File format: ",imagedata.format) 
print("Size in pixles: ",imagedata.size)  
print("Pixel format",imagedata.mode)  

# Read image as numpy array
X=np.asarray(Image.open(filename).convert("RGB"))
# As you can see this array have values for all pixels in each RGB channel
print("Numpy array shape: ",X.shape)
print("Numpy array data type:",X.dtype)

In [None]:
# create a plot of size 10 by 10 inches 
fig = plt.figure(figsize=(10,10))
# display image stored as RGB values in the numpy array X 
plt.imshow(X)
plt.show()

In [None]:
# Convert RGB image to gray scale (model "L")
# transformation is made with formula -  L = R * 299/1000 + G * 587/1000 + B * 114/1000

X = np.asarray(Image.open(filename).convert("L"))

# create a plot of size 10 by 10 inches 
fig = plt.figure(figsize=(10,10))

# display image stored as RGB values in the numpy array X 
plt.imshow(X, cmap=plt.get_cmap('gray'))
plt.show()