<img align="right" width="200" height="200" src="ovalmoney-logo-green.png">
# A very hurried course in Python
#### By Stefano Calderan, Data Scientist @ Oval Money

  
- [NumPy](#NumPy)
    - [Why using NumPy](#Why-NumPy?)
    - [The arrays](#NumPy-arrays)
        - [Create arrays from sequences](#NumPy-arrays-from-sequences)
        - [Create arrays from `numpy` functions](#NumPy-arrays-from-generating-functions)
    - [Indexing and slicing](#Indexing-and-slicing)
        - [Fancy ways of indexing](#Fancy-indexing)
    - [Operations with arrays](#Array-operations)
    - [Linear algebra](#Linear-algebra)
    - [Processing data inside arrays](#Array-data-processing)

## NumPy

In [2]:
import numpy as np        # Standard abbreviation for numpy

### Why NumPy?

- It's free
- It is a package that provides **_high-performance_ vector, matrix and higher-dimensional data structures** (as tensors). *High performance* means that computations are VERY fast, because it is  implemented in C and Fortran and calculations are vectorized, i.e. they're formulated with vectors and matrices
- It provides all what is needed to deal with **linear algebric operations** (matrixes, polynomial fits, ...)
- Everybody uses it

### NumPy arrays

The arrays are the basic data structures of `numpy`. Every array supports by default a series of main **mathematical and statistical operations implemented as methods**.  
Let's see how to build them.  

#### NumPy arrays from sequences
We can build them by starting from sequences (lists, sets or tuples)

In [3]:
# 1-Dimensional arrays

a1 = np.array([1, 2, 3])
print(a1.shape)            # .shape is an attribute of every array

(3,)


In [4]:
# a1 is neither a column or a row vector. TO transfrorm it into a column or row vector, you
# have to use the .reshape() method

a_column = a1.reshape(3, 1)           # reshape return a new vector
print('column:\n', a_column)
a_row = a1.reshape(1, 3)
print('row:', a_row)

column:
 [[1]
 [2]
 [3]]
row: [[1 2 3]]


In [5]:
# 2-Dimensional arrays (Matrixes)

M1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(M1)
print(M1.shape)
print(M1.size)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
(3, 3)
9


In [6]:
# We can specify the type of data inside the array

M2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)   # use the dtype argument to specify the type
print(M2, M2.dtype)                                             # dtype is also an attribute

# if we try to assing a diffent type to an array, we get an error
a1[0] = 'pippo'

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]] float64


ValueError: invalid literal for int() with base 10: 'pippo'

#### NumPy arrays from generating functions
Here we see some useful functions provided by `numpy`

In [7]:
# range array
a = np.arange(2, 20, 3)  # start, stop, step
a

array([ 2,  5,  8, 11, 14, 17])

In [8]:
# linspace and logspace arrays
# they divide linearly or in a log-fashion the space contained between start and stop

lin = np.linspace(0, 20, 10)  # start, stop, number of instances
print(lin)

log = np.logspace(0, 2, 5, base=10)
print(log)

[ 0.          2.22222222  4.44444444  6.66666667  8.88888889 11.11111111
 13.33333333 15.55555556 17.77777778 20.        ]
[  1.           3.16227766  10.          31.6227766  100.        ]


In [9]:
# zeros and ones

z_vector = np.zeros(5)
z_matrix = np.zeros((3, 3))
z_matrix_from_vector = np.zeros(6).reshape(2, 3)

ones = np.ones((2, 2))

print(z_vector, '\n')
print(z_matrix, '\n')
print(z_matrix_from_vector, '\n')
print(ones)

[0. 0. 0. 0. 0.] 

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]] 

[[0. 0. 0.]
 [0. 0. 0.]] 

[[1. 1.]
 [1. 1.]]


In [10]:
# Diagonal matrixes

iden_matrix = np.identity(5)
diag_matrix = np.diag([1, 2, 3, 4])
diag_offset = np.diag([1, 2, 3], k=1)     # k indicates the offset from the main diagonal

print(iden_matrix, '\n')
print(diag_matrix, '\n')
print(diag_offset, '\n')

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]] 

[[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]] 

[[0 1 0 0]
 [0 0 2 0]
 [0 0 0 3]
 [0 0 0 0]] 



## Indexing and slicing
`numpy` arrays support basic indexing and slicing, plus new fancy ways of doing it

In [11]:
# 1-dimensional arrays

a = np.arange(0, 3, 0.5)

print(a, '\n')
print(a[3], '\n')
print(a[1:3], '\n')
print(a[::2])

[0.  0.5 1.  1.5 2.  2.5] 

1.5 

[0.5 1. ] 

[0. 1. 2.]


In [12]:
# 2-dimensional arrays

M = np.arange(0, 9).reshape(3, 3)
print(M)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


For multidimensional arrays, the indexing is double: the first index refer to the **row** (first dimension, or axis 0), while the second refers to the **column** (second dimension, or axis 1) 

In [15]:
print(M, '\n')
print( M[0, 1], '\n')     # single element at row 0, column 1
print( M[1, :], '\n')     # all the second row
print( M[:, 2], '\n')     # all the third column
print( M[:2, 1], '\n')    # thirst 2 rows, second column

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

1 

[3 4 5] 

[2 5 8] 

[1 4] 



In [17]:
# TO DO: starting from the M matrix defined above, create the matrix subM that corresponds to the square submatrix
# in the left bottom corner (with elements 3,4,6,7). Use slicing
# YOUR CODE HERE


subM = M[1:, :2]
print(M, '\n')
print(subM)

[[0 1 2]
 [3 4 5]
 [6 7 8]] 

[[3 4]
 [6 7]]


### Fancy indexing
`numpy` arrays support very flexible way of indexing.

In [18]:
# select indexes in the order you wish!
my_indexes = [2, 1, -1, -2]

a1 = np.arange(1, 6)
print(a1, '\n')
print( a1[my_indexes])

[1 2 3 4 5] 

[3 2 5 4]


In [19]:
# you can do it also with multidimensional arrays

M = np.arange(0, 16).reshape(4, 4)
M

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [20]:
M[[1, 2], [0, 3]]

array([ 4, 11])

In [21]:
M[1:3, [0, 2]]     # take all the lines from 1 to 2, only columns 0 and 2

array([[ 4,  6],
       [ 8, 10]])

In [22]:
# BOOLEAN INDEXING!
# 1-dimensional

a = np.arange(9)
a[a < 5]                     # take elemnts from a only where the condition applies

array([0, 1, 2, 3, 4])

In [23]:
a[(a < 5) | (a%2 == 0)]      # | is the bitwise or

array([0, 1, 2, 3, 4, 6, 8])

In [24]:
M = np.arange(16).reshape(4, 4)

M[M < 10]                          # on multidimensional arrays, boolean indexing returns a 1-dimensional array

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [25]:
# If you need the INDEXES where the condition applies, and not the elements, use the np.where function

a = np.linspace(10, 50, 5)
print(a)
np.where(a%20 == 0)      # the condition of 'elements divisible by 20' is respected by 20 and 40, at positions 1 and 3

[10. 20. 30. 40. 50.]


(array([1, 3]),)

### Array operations
`numpy` arrays allow a vectorization of operations, meaning that under the hood it applies the operation *at the same time* to the whole array

In [26]:
# We'll use these two arrays for all operations demonstrations

v = np.arange(10, 60, 10)
M = np.array( [ [i * 10 + j for j  in range(5)] for i in range(1, 6)])

print(v, v.shape)
print(M, M.shape)

[10 20 30 40 50] (5,)
[[10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]
 [50 51 52 53 54]] (5, 5)


In [27]:
# Scalar operations: the operation is applied to each element

print(v + 1, '\n')
print(v * 5, '\n')
print(M + 10, '\n')
print(M * 10)

[11 21 31 41 51] 

[ 50 100 150 200 250] 

[[20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]
 [50 51 52 53 54]
 [60 61 62 63 64]] 

[[100 110 120 130 140]
 [200 210 220 230 240]
 [300 310 320 330 340]
 [400 410 420 430 440]
 [500 510 520 530 540]]


**ELEMENT-WISE ARRAY OPERATIONS**  
The operation is applied element by element

In [28]:
# 1-dim array

v2 = np.ones(5)
print(v)
print(v + v2)

v3 = np.arange(5)
print(v * v3)

[10 20 30 40 50]
[11. 21. 31. 41. 51.]
[  0  20  60 120 200]


In [29]:
# multi-dim array

N = np.arange(25).reshape(5, 5)
print(M)
print(N)
print(M * N)

[[10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]
 [50 51 52 53 54]]
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]
[[   0   11   24   39   56]
 [ 100  126  154  184  216]
 [ 300  341  384  429  476]
 [ 600  656  714  774  836]
 [1000 1071 1144 1219 1296]]


If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [30]:
M * v

array([[ 100,  220,  360,  520,  700],
       [ 200,  420,  660,  920, 1200],
       [ 300,  620,  960, 1320, 1700],
       [ 400,  820, 1260, 1720, 2200],
       [ 500, 1020, 1560, 2120, 2700]])

All the operations above are the result of the application of **broadcasting**, an operation that automatically *adjusts* the computation between objects having not the same dimension (tough they must have compatible shapes)
<img align="centre" width="450" height="500" src="broadcast.png">


## Linear algebra
Let's now see how to do the basic operations from linear algebra

In [31]:
# INNER PRODUCT
# Use the function np.dot() or the array method .dot

v1 = np.arange(6)
v2 = np.ones(6)

print(v1, v2)
print(np.dot(v1, v2))
print(v1.dot(v2))

[0 1 2 3 4 5] [1. 1. 1. 1. 1. 1.]
15.0
15.0


In [32]:
# MATRIX PRODUCT (m, n) x (n, p)
# Use np.matmul

M = np.arange(6).reshape(2, 3)
N = np.arange(6, 18).reshape(3, 4)
P = np.matmul(M, N)

print(M.shape, N.shape)
print(P, '\n', P.shape)

(2, 3) (3, 4)
[[ 38  41  44  47]
 [128 140 152 164]] 
 (2, 4)


The inner product and the matrix multiplication can be done easily if, instead of creating numpy arrays, we create directly numpy matrixes:

In [33]:
# Matrix product

M = np.mat(np.arange(6).reshape(2, 3))
N = np.mat(np.arange(6, 18).reshape(3, 4))
P = M * N
print(P)

[[ 38  41  44  47]
 [128 140 152 164]]


In [34]:
# Inner product

v1 = np.mat(np.arange(6))              # this is by default a row vector
v2 = np.mat(np.ones(6)).reshape(6, 1)  # we transformed v2 into a column vector
print(v1 * v2)

[[15.]]


#### Other important matrix operations

In [35]:
# Diagonal
S = np.mat([[6, 9, 34], [-5, 6, 7], [33, 10, -9]])
S, S.diagonal()

(matrix([[ 6,  9, 34],
         [-5,  6,  7],
         [33, 10, -9]]), matrix([[ 6,  6, -9]]))

In [36]:
# Transpose
S.T

matrix([[ 6, -5, 33],
        [ 9,  6, 10],
        [34,  7, -9]])

In [37]:
# Determinant
np.linalg.det(S)

-7501.999999999996

In [38]:
# Trace
S.trace()

matrix([[3]])

In [39]:
# Inverse
S.I

matrix([[ 0.01652893, -0.05611837,  0.01879499],
        [-0.02479339,  0.1567582 ,  0.02825913],
        [ 0.03305785, -0.03159158, -0.01079712]])

In [40]:
S.I * S

matrix([[ 1.00000000e+00,  0.00000000e+00,  2.77555756e-17],
        [ 6.59194921e-17,  1.00000000e+00, -1.66533454e-16],
        [ 0.00000000e+00,  5.55111512e-17,  1.00000000e+00]])

In [41]:
# Eigenvalues and eigenvectors

eigen_values, eigen_vecs = np.linalg.eig(S)
eigen_values, eigen_vecs 

(array([-36.28864061,  33.02968399,   6.25895662]),
 matrix([[ 0.58791479,  0.78466076, -0.38123903],
         [ 0.19928542,  0.01534771,  0.89297386],
         [-0.78399077,  0.61973538, -0.23927909]]))

In [42]:
# TO DO: write a function multmat that accepts two numpy arrays a1 and a2 and check if they have compatible
# shapes to perform matrix multiplication. If true, it returns the resulting matrix; else, it slices the
# correct matrix in order to perform the multiplication
# YOUR CODE HERE

def multmat(a1, a2):
    a1rows, a1cols = a1.shape
    a2rows, a2cols = a2.shape
    
    if a1cols == a2rows:          # check if a1*a2 is possible
        return np.matmul(a1, a2)
    elif a2cols == a1rows:        # check if a2*a1 is possible
        return np.matmul(a2, a1)
    else:
        min_rowcol_number = min(a1cols, a2rows)       # otherwise we take the minimum common number between
        return np.matmul(a1[:, :min_rowcol_number],   # a1 columns and a2 rows and
                        a2[:min_rowcol_number, :])    # we return the matrix multiplication ;) 

In [44]:
# An example to show that the above function works
# M1 and M2 are matrixes with incompatible shapes (4,4 and 3,3) but our function returns a value anyway

M1 = np.arange(16).reshape(4, 4)
M2 = np.arange(9).reshape(3, 3)

multmat(M1, M2)

array([[ 15,  18,  21],
       [ 51,  66,  81],
       [ 87, 114, 141],
       [123, 162, 201]])

### Array data processing
Every numpy array comes equipped with very nice data processing operations

In [45]:
ages = np.array([37, 34, 27, 39, 43, 34, 33, 26, 31, 26, 28, 42, 33, 32, 39, 33])

In [46]:
ages.sum()

537

In [47]:
ages.cumsum()  # cumulative sum

array([ 37,  71,  98, 137, 180, 214, 247, 273, 304, 330, 358, 400, 433,
       465, 504, 537])

In [48]:
ages.prod()   # product

5218580222841458688

In [49]:
ages.cumprod()  # cumulative product

array([                  37,                 1258,                33966,
                    1324674,             56960982,           1936673388,
                63910221804,        1661665766904,       51511638774024,
           1339302608124624,    37500473027489472,  1575019867154557824,
       -3364576605028246656,  3014013081353416704,  6866045730525941760,
        5218580222841458688])

In [50]:
ages2 = ages.copy()
ages2.sort()         # sorting in-place
ages2

array([26, 26, 27, 28, 31, 32, 33, 33, 33, 34, 34, 37, 39, 39, 42, 43])

In [51]:
ages.mean()

33.5625

In [52]:
np.median(ages)  # arrays have no median method

33.0

In [53]:
ages.max(), ages.argmax()    # argmax gives the index of the maximum number

(43, 4)

In [54]:
ages.min(), ages.argmin()

(26, 7)

In [55]:
ages.std()

5.183733572435991

## NumPy Random
This is a very useful library to sample numbers from random distributions

In [56]:
np.random.seed(77)  # set the seed

In [57]:
# Random number between 0 and 1, uniform distribution
np.random.rand(3, 4)

array([[0.91910903, 0.6421956 , 0.75371223, 0.13931457],
       [0.08731955, 0.78800206, 0.32615094, 0.54106782],
       [0.24023518, 0.54542293, 0.4005545 , 0.71519189]])

In [58]:
# Random ints, sampled from uniform distibution

np.random.randint(1, 6, 10)  # low, high, size

array([3, 4, 4, 2, 4, 2, 2, 5, 1, 5])

In [59]:
# Binomial distribution

np.random.binomial(n=20, p=0.3, size=40)

array([ 7,  6,  4,  3,  5,  3,  7,  3,  6,  5,  4,  6,  6,  3,  8,  5,  4,
        7,  4,  5,  5,  6, 10,  7,  5,  9,  8,  5,  5,  4,  6,  9,  5,  6,
        9,  9,  5,  7,  3,  4])

In [60]:
# Poisson

np.random.poisson(lam=2, size=40)

array([2, 1, 4, 3, 0, 2, 2, 4, 0, 2, 1, 3, 5, 3, 1, 1, 3, 1, 0, 3, 3, 1,
       1, 2, 0, 0, 3, 1, 2, 0, 2, 2, 1, 1, 1, 3, 0, 3, 3, 2])

In [61]:
# Gaussian

np.random.normal(loc=1, scale=1.5, size=30)  # mean and standard deviation

array([ 2.84641781, -0.9584082 ,  2.137548  ,  1.21663723,  0.83506757,
        2.59641798,  0.41959387, -1.66938496,  1.51752755, -1.2544713 ,
       -0.13140551, -0.82536639,  0.27631677, -1.11695162,  3.82526394,
       -2.37278564,  0.46965114,  1.29323632,  1.37595121,  2.96117663,
        2.30419805,  1.20031734,  1.32297202,  0.79802141,  1.72238502,
        0.50590131,  3.96497408,  1.87193701, -3.69740505,  2.19110975])

In [62]:
# Standard Normal

np.random.randn(5, 5)

array([[ 0.42385748,  0.53991191, -0.07339363, -1.44667588,  1.6399904 ],
       [ 0.64215988, -0.11742656,  0.01896072, -0.63819645,  0.31624045],
       [ 0.05607358,  0.68218   ,  0.22992071,  0.63908435,  2.95855226],
       [-1.76392943,  0.79567636, -0.9577366 , -0.18890951, -0.27755406],
       [ 0.51693098,  0.50700864, -0.74811793, -0.80596897,  1.0689386 ]])

## Linear regression and polynomial fit

We can also perform linear regession with `numpy.polyfit()`.  
For more, see [this](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.polyfit.html)

In [63]:
x = np.array([-8, -2, 3, 4, 5, 6])
y = x * 2

fit_deg1 = np.polyfit(x, y, 1)
fit_deg1                         # Polynomial coefficients

array([2.00000000e+00, 7.25194643e-16])

In [64]:
fit_deg2 = np.polyfit(x, y, 2)
fit_deg2

array([ 3.40835245e-16,  2.00000000e+00, -2.45819666e-15])

In [65]:
fit_deg3 = np.polyfit(x, y, 3)
fit_deg3

array([-1.78591839e-17, -1.06832092e-15,  2.00000000e+00,  3.87779304e-14])