# Intro and prerequisites
---
In today's laboratory session we quickly go through the python tools which will be needed for the rest of the course and for any other ML project.

We will use an online Jupyter notebook by Google named Colab to build our models and to access and analyze the data.

## Prerequisites are:
*   Laptop with any OS (Windows, Linux, MacOS) and an internet browser
*   Access to the internet (via Unimib or Eduroam Wi-Fi networks)
*   Google account

# Outline of the lesson:
---
*   Quick intro to `Python` (refresher)
*   Intro to `Numpy` for data handling
*   Intro to `matplotlib` for data representation
*   Intro to `SciPy` for fitting the data


# Quick intro to `Python`
---

## Basic functions
A command cell can be used for simple opearations like mathematical expressions or definition of variables. In order to execute it, just press `Shift+Enter`

In [1]:
2 + 3

5

In [2]:
a = 10
print("a = {}".format(a))

a = 10


for more details on the use of the `format` function you can check the python [documentation](https://docs.python.org/3.4/library/string.html#formatspec).

It is important to note that cells in the notebook can executed in a non-sequential manner and it is thus possible to make mistakes (e.g. change the value of a variable without having defined that variable first). It is thus recommended to click on `Runtime -> Run All` from the toolbar in case of doubts.

## Types of variables

In python (and other programming languages in general) variables can be of different type:
 - integers `int`
 - floats `float`
 - strings `str`
 - boolean operators `bool`
Depending on the type of variable the programming language tells the calculator which operations can be performed.

In python there is no need to define the type of variable because it is determined authomatically at the first time a value is assigned to that variable.

In [3]:
a = 1
b = 1.2
c = 'a'
d = True
print('a = {} è {}'.format(a,type(a))) # the function returns the type of variable
print('b = {} è {}'.format(b,type(b)))
print('c = {} è {}'.format(c,type(c)))
print('d = {} è {}'.format(d,type(d)))

a = 1 è <class 'int'>
b = 1.2 è <class 'float'>
c = a è <class 'str'>
d = True è <class 'bool'>


## Vectors (Lists) and Dictionaries

Python allows you to define vector-like variables, which contain a list of values, variables or objects.
These can be homgeneous (all variables in the list are of the same type)


In [4]:
A = [1,2,3]
print(A)

[1, 2, 3]


or etherogeneous, i.e. made of elements of different types

In [5]:
B = [1,2.3,'a',A]
print(B)

[1, 2.3, 'a', [1, 2, 3]]


The elements of a vector can be recalled by specifing the position of the element in the vector (from 0 onwards)

In [6]:
B[0]

1

A particular type of list is the *dictionary*, i.e. a list of elements where to each element another element is associated

In [7]:
D={'a':1, 'b': 2.0, 1: 3}
print(D)

{'a': 1, 'b': 2.0, 1: 3}


To recall the element of a dictionary you can used a syntax similar to that of lists, except that the importance of pairing of elements is emphasized

In [8]:
print(D['a'])
print(D[1])

1
3


To know which elements can be parsed inside the dictionary you can use the command `keys()` (the dimension of a vector is instead given by the command `len()`)

In [9]:
print(D.keys())
print(len(B))

dict_keys(['a', 'b', 1])
4


## Functions and Libraries

The advantage of using python for scientific purposes is that it contains many libraries of functions written and verified by the community, for which there is no (small) risk to having to re-invent the wheel whenever a specific function is required for a certain operation. For example, they [numpy](https://numpy.org) library contains various mathematical functions for common purposes and written with a syntax that allows efficient operations between vectors.

In [10]:
import numpy as np
print(np.sqrt(3))
print(np.sqrt(np.array([1,2,3])))

1.7320508075688772
[1.         1.41421356 1.73205081]


In [11]:
print('{:.20f}'.format(np.sqrt(3)))

1.73205080756887719318


In the first case I inserted an argument in the function `np.sqrt` , an integer, `int` and the function returned a `float`. In the second case a `numpy.array`  was provided as arguments and another array was returned with the results of the operation on each element of the first array.

Note that the `numpy` *module* is loaded using the python function `import` and is labelled as `np` for simplicity. After this command, each function contained in the `numpy` module can be called using the syntax `np.function(<argument>)`.
Obviously it is not mandatory to rename modules at the loading stage.

To define a function ourselves, you should use the command `def`

In [12]:
def func(x):
    y = x*x # pay attention to the indentation
    return y

note the indentation, which is a fundamental rule of the python programming language and is used to separate blocks of code which must be executed within a function or within a loop (cycle)

In [13]:
z = func(4)
print(z)

16


Note that the variable `x` is passed as an argument to the function in an agnostic way, i.e. python does not care on whether the operation required on the variable is valid. This has great advantages, for instance it allows you to use the same function with completely different arguments, for instance a numpy vector:

In [14]:
y = func(np.array([1,2,3]))
print(y)
print(type(y))

[1 4 9]
<class 'numpy.ndarray'>


However, it can also lead to errors in case the function is used in a non-correct way, for instance if a string is used as an argument

In [15]:
# y = func('a')  #commented out not to throw an error

Luckily in this case, the calculator returned a rather clear error message. However, sometime this does not happen and there is a risk to introduce a so called *bug* in the system.

## *For* and *While* Cycles
In prommamming languages it is often useful to execute a certain operation multiple time (loop) using only a few lines of code. For this purpose, the *for* e *while* cycles are commonly used. The former allows you to change a variable within a given interval and execute the operations in a block of commands

In [16]:
for i in [0,1,2]:
    print(i)

0
1
2


In [17]:
t = 0
for i in range(4): # range(n) is a useful function to define a vetor of integers between 0 and n-1
    t += i
    print('i = {}, t = {}'.format(i,t))
print(t)

i = 0, t = 0
i = 1, t = 1
i = 2, t = 3
i = 3, t = 6
6


In [18]:
for i in ['a',1,3.3]: print(i)

a
1
3.3


*while* instead repeats a block of command until a certain condition is NOT satisfied anymore

In [19]:
t = 4
while t>0:
    t = t-1
    print(t)

3
2
1
0


## Conditional statement

In [20]:
x = 0.5

if x > 1:
    print("x > 1")
else:
    print("x < 1")

x < 1


## Practical example: mean value
In the following example we define a function that calculated the mean value of the elements inside a vector

In [21]:
def mean(x):
    m = 0
    for i in x:
        m += i # += increments the variable by a value i
    m /= len(x) # /= divides the variable m by len(x)
    return m

In [22]:
mean([1,2,3,1,2,4,2])

2.142857142857143

**Esercise**: Write a function that calculates the standard deviation of the elements within a vector

In [None]:
def std(x):
    mean = mean(x)
    std = 0
    for i in x:
        std += (i-mean)**2
    std = np.sqrt(std/len(x))
    return std

## Random generation

For example, the standard library contains a module `random` for the generation of random numbers. The module contains a function `random()`, amongst others.

In [23]:
import random

print(random.random())
print(random.choice(["yes", "no", "maybe"]))

0.18320713620831197
maybe


In [24]:
tau = 40
Nevents = 100
expoSample = np.random.exponential(scale = tau, size=Nevents)

print(expoSample)

[2.74744889e+00 1.54728980e+01 1.08840781e+01 1.02678367e+01
 1.67310366e+01 5.55434781e+01 4.62436373e+01 7.31621303e+01
 1.99107059e+01 1.72960568e+01 2.27275043e+01 4.93030378e+01
 8.50750574e+01 2.45331067e+01 3.42333098e+01 1.03058924e+02
 1.66513833e+01 2.32186396e+01 1.82021812e+01 1.04018923e+01
 6.88700708e+01 2.69777242e+01 1.59503593e+01 6.28952662e+01
 1.08587220e+01 1.37303780e+01 1.48701777e+01 1.45205514e+00
 9.06085312e+01 3.07980558e+01 8.37852943e+00 2.50847507e+00
 1.02847355e+02 7.00247830e+00 2.44947368e+00 3.05022680e+00
 7.94039223e+00 9.71826068e+00 1.47532042e+01 8.80073156e+01
 8.00381960e+01 5.05389961e+00 5.25033835e+00 1.70225781e+00
 6.58248578e+00 4.05868718e+01 1.81496968e-01 1.18337672e+00
 3.00870759e+00 2.65733094e+01 1.20624463e+01 5.18369557e+01
 1.21879523e+01 1.46278463e+01 1.81021828e+01 9.09104412e+00
 1.76224574e+02 3.34002402e+00 1.64639267e+01 6.99182259e+01
 2.30971703e+01 5.34555381e+01 1.36942551e+02 2.23667724e+01
 7.80765052e+00 6.632236

# How to access files on Google Drive
---

It is often useful to store or access data which is already available. In order to do so it is possible to use Google Drive as a storage.

In [25]:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)

ModuleNotFoundError: No module named 'google'

In [None]:
%cd /content/drive/My Drive/Didattica/Corsi/AI/Lectures/Lecture1/
!ls

In [None]:
!pwd

##Read and write a file
Read file from the disk and replace "." with ","

In [None]:
data = ""
with open("data/random_numbers.txt", "r") as file:
    data = file.read().replace(".", ",").replace("\n", "; ")

with open("data/random_numbers_mod.txt", "w") as out_file:
    out_file.write(data)

In [None]:
print(data)