<img align="right" width="200" height="200" src="ovalmoney-logo-green.png">
# A very hurried course in Python
#### By Stefano Calderan, Data Scientist @ Oval Money

## NumPy

In [None]:
import numpy as np        # Standard abbreviation for numpy

### Why NumPy?

- It's free
- It is a package that provides **_high-performance_ vector, matrix and higher-dimensional data structures** (as tensors). *High performance means that computations are VERY fast, because it is  implemented in C and Fortran and calculations are vectorized, i.e. they're formulated with vectors and matrices
- Everybody uses it

### NumPy arrays

The arrays are the basic data structures of `numpy`. Every array supports by default a series of main **mathematical and statistical operations implemented as methods**.  
Let's see how to build them.  

#### NumPy arrays from sequences
We can build them by starting from sequences (lists, sets or tuples)

In [None]:
# 1-Dimensional arrays

a1 = np.array([1, 2, 3])
print(a1.shape)            # .shape is an attribute of every array

In [None]:
# a1 is neither a column or a row vector. TO transfrorm it into a column or row vector, you
# have to use the .reshape() method

a_column = a1.reshape(3, 1)           # reshape return a new vector
print('column:\n', a_column)
a_row = a1.reshape(1, 3)
print('row:', a_row)

In [None]:
# 2-Dimensional arrays (Matrixes)

M1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(M1)
print(M1.shape)
print(M1.size)

In [None]:
# We can specify the type of data inside the array

M2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)   # use the dtype argument to specify the type
print(M2, M2.dtype)                                             # dtype is also an attribute

# if we try to assing a diffent type to an array, we get an error
a1[0] = 'pippo'

#### NumPy arrays from generating functions
Here we see some useful functions provided by `numpy`

In [None]:
# range array
a = np.arange(2, 20, 3)  # start, stop, step
a

In [None]:
# linspace and logspace arrays
# they divide linearly or in a log-fashion the space contained between start and stop

lin = np.linspace(0, 20, 10)  # start, stop, number of instances
print(lin)

log = np.logspace(0, 2, 5, base=10)
print(log)

In [None]:
# zeros and ones

z_vector = np.zeros(5)
z_matrix = np.zeros((3, 3))
z_matrix_from_vector = np.zeros(6).reshape(2, 3)

ones = np.ones((2, 2))

print(z_vector, '\n')
print(z_matrix, '\n')
print(z_matrix_from_vector, '\n')
print(ones)

In [None]:
# Diagonal matrixes

iden_matrix = np.identity(5)
diag_matrix = np.diag([1, 2, 3, 4])
diag_offset = np.diag([1, 2, 3], k=1)     # k indicates the offset from the main diagonal

print(iden_matrix, '\n')
print(diag_matrix, '\n')
print(diag_offset, '\n')

## Indexing and slicing
`numpy` arrays support basic indexing and slicing, plus new fancy ways of doing it

In [None]:
# 1-dimensional arrays

a = np.arange(0, 3, 0.5)

print(a, '\n')
print(a[3], '\n')
print(a[1:3], '\n')
print(a[::2])

In [None]:
# 2-dimensional arrays

M = np.arange(0, 9).reshape(3, 3)
print(M)

For multidimensional arrays, the indexing is double: the first index refer to the **row** (first dimension, or axis 0), while the second refers to the **column** (second dimension, or axis 1) 

In [None]:
print( M[0, 1], '\n')
print( M[1, :], '\n')     # all the second row
print( M[:, 2], '\n')     # all the third column
print( M[:2, 1], '\n')    # thirst 2 rows, second column

In [None]:
# TO DO: starting from the M matrix defined above, create the matrix subM that corresponds to the square submatrix
# in the left bottom corner (with elements 3,4,6,7). Use slicing
# YOUR CODE HERE




### Fancy indexing
`numpy` arrays support very flexible way of indexing.

In [None]:
# select indexes in the order you wish!
my_indexes = [2, 1, -1, -2]

a1 = np.arange(1, 6)
print(a1, '\n')
print( a1[my_indexes])

In [None]:
# you can do it also with multidimensional arrays

M = np.arange(0, 16).reshape(4, 4)
M

In [None]:
M[[1, 2], [0, 3]]

In [None]:
M[1:3, [0, 2]]     # take all the lines from 1 to 2, only columns 0 and 2

In [None]:
# BOOLEAN INDEXING!
# 1-dimensional

a = np.arange(9)
a[a < 5]                     # take elemnts from a only where the condition applies

In [None]:
a[(a < 5) | (a%2 == 0)]      # | is the bitwise or

In [None]:
M = np.arange(16).reshape(4, 4)

M[M < 10]                          # on multidimensional arrays, boolean indexing returns a 1-dimensional array

In [None]:
# If you need the INDEXES where the condition applies, and not the elements, use the np.where function

a = np.linspace(10, 50, 5)
print(a)
np.where(a%20 == 0)

### Array operations
`numpy` arrays allow a vectorization of operations, meaning that under the hood it applies the operation *at the same time* to the whole array

In [None]:
# We'll use these two arrays for all operations demonstrations

v = np.arange(10, 60, 10)
M = np.array( [ [i * 10 + j for j  in range(5)] for i in range(1, 6)])

print(v, v.shape)
print(M, M.shape)

In [None]:
# Scalar operations: the operation is applied to each element

print(v + 1, '\n')
print(v * 5, '\n')
print(M + 10, '\n')
print(M * 10)

**ELEMENT-WISE ARRAY OPERATIONS**  
The operation is applied element by element

In [None]:
# 1-dim array

v2 = np.ones(5)
print(v)
print(v + v2)

v3 = np.arange(5)
print(v * v3)

In [None]:
# multi-dim array

N = np.arange(25).reshape(5, 5)
print(M)
print(N)
print(M * N)

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [None]:
M * v

All the operations above are the result of the application of **broadcasting**, an operation that automatically *adjusts* the computation between objects having not the same dimension (tough they must have compatible shapes)
<img align="centre" width="450" height="500" src="broadcast.png">


## Linear algebra
Let's now see how to do the basic operations from linear algebra

In [None]:
# INNER PRODUCT
# Use the function np.dot() or the array method .dot

v1 = np.arange(6)
v2 = np.ones(6)

print(v1, v2)
print(np.dot(v1, v2))
print(v1.dot(v2))

In [None]:
# MATRIX PRODUCT (m, n) x (n, p)
# Use np.matmul

M = np.arange(6).reshape(2, 3)
N = np.arange(6, 18).reshape(3, 4)
P = np.matmul(M, N)

print(M.shape, N.shape)
print(P, '\n', P.shape)

The inner product and the matrix multiplication can be done easily if, instead of creating numpy arrays, we create directly numpy matrixes:

In [None]:
# Matrix product

M = np.mat(np.arange(6).reshape(2, 3))
N = np.mat(np.arange(6, 18).reshape(3, 4))
P = M * N
print(P)

In [None]:
# Inner product

v1 = np.mat(np.arange(6))              # this is by default a row vector
v2 = np.mat(np.ones(6)).reshape(6, 1)  # we transformed v2 into a column vector
print(v1 * v2)

#### Other important matrix operations

In [None]:
# Diagonal
S = np.mat([[6, 9, 34], [-5, 6, 7], [33, 10, -9]])
S, S.diagonal()

In [None]:
# Transpose
S.T

In [None]:
# Determinant
np.linalg.det(S)

In [None]:
# Trace
S.trace()

In [None]:
# Inverse
S.I

In [None]:
S.I * S

In [None]:
# Eigenvalues and eigenvectors

eigen_values, eigen_vecs = np.linalg.eig(S)
eigen_values, eigen_vecs 

In [None]:
# TO DO: write a function multmat that accepts two numpy arrays a1 and a2 and check if they have compatible
# shapes to perform matrix multiplication. If true, it returns the resulting matrix; else, it slices the
# correct matrix in order to perform the multiplication
# YOUR CODE HERE

def multmat(a1, a2):
    

### Array data processing
Every numpy array comes equipped with very nice data processing operations

In [None]:
ages = np.array([37, 34, 27, 39, 43, 34, 33, 26, 31, 26, 28, 42, 33, 32, 39, 33])

In [None]:
ages.sum()

In [None]:
ages.cumsum()  # cumulative sum

In [None]:
ages.prod()   # product

In [None]:
ages.cumprod()  # cumulative product

In [None]:
ages2 = ages.copy()
ages2.sort()         # sorting in-place
ages2

In [None]:
ages.mean()

In [None]:
np.median(ages)  # arrays have no median method

In [None]:
ages.max(), ages.argmax()    # argmax gives the index of the maximum number

In [None]:
ages.min(), ages.argmin()

In [None]:
ages.std()

## NumPy Random
This is a very useful library to sample numbers from random distributions

In [None]:
np.random.seed(77)  # set the seed

In [None]:
# Random number between 0 and 1, uniform distribution
np.random.rand(3, 4)

In [None]:
# Random ints, sampled from uniform distibution

np.random.randint(1, 6, 10)  # low, high, size

In [None]:
# Binomial distribution

np.random.binomial(n=20, p=0.3, size=40)

In [None]:
# Poisson

np.random.poisson(lam=2, size=40)

In [None]:
# Normal

np.random.normal(loc=1, scale=1.5, size=30)  # mean and standard deviation

## Linear regression and polynomial fit

We can also perform linear regession with `numpy.polyfit()`.  
For more, see [this](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.polyfit.html)

In [None]:
x = np.array([-8,-2,3,4,5,6])
y = x * 2

fit_deg1 = np.polyfit(x, y, 1)
fit_deg1                         # Polynomial coefficients

In [None]:
fit_deg2 = np.polyfit(x, y, 2)
fit_deg2

In [None]:
fit_deg3 = np.polyfit(x, y, 3)
fit_deg3