# PytzMLS2018: Python for ML and DS Research - Numpy and Scipy basics

<center>**Anthony Faustine (sambaiga@gmail.com)**</center>

## 2.1. Numpy Basics

NumPy is the fundamental package for scientific computing with Python. Provide high-performance
vector, matrix and higher-dimensional data
structures and offers Matlab-ish capabilities within Python

It contains among other things:

* a powerful N-dimensional array/vector/matrix object
* sophisticated (broadcasting) functions
* function implementation in C/Fortran assuring good performance if vectorized
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

Also known as *array oriented computing*. The recommended convention to import numpy is:

In [3]:
import numpy as np

## 2.1.1 Creating numpy arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples or
* using functions that are dedicated to generating numpy arrays, such as arange, linspace, empty,zeros etc.

#### array from list

In [5]:
# a vector
v = np.array([0.5,0.8,2,1])
print(v)

[0.5 0.8 2.  1. ]


In [6]:
# a matrix
M = np.array([[1, 2], [3, 4]])
print(M)

[[1 2]
 [3 4]]


In [7]:
# a multidimension array
N = np.array([[0.2,0.4,2],[0.1,2,5],[3,0.4,0.1]])
print(N)

[[0.2 0.4 2. ]
 [0.1 2.  5. ]
 [3.  0.4 0.1]]


#### use specific function

In [8]:
#Evenly spaced array (arange)
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [9]:
# create a range
x = np.arange(-4, 4, 1) # arguments: start, stop, step
print(x)

[-4 -3 -2 -1  0  1  2  3]


In [10]:
# using linspace, both end points ARE included
np.linspace(0, 10, 25)

array([ 0.        ,  0.41666667,  0.83333333,  1.25      ,  1.66666667,
        2.08333333,  2.5       ,  2.91666667,  3.33333333,  3.75      ,
        4.16666667,  4.58333333,  5.        ,  5.41666667,  5.83333333,
        6.25      ,  6.66666667,  7.08333333,  7.5       ,  7.91666667,
        8.33333333,  8.75      ,  9.16666667,  9.58333333, 10.        ])

In [13]:
#zeros array
np.zeros((4,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [14]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [17]:
np.empty((10,2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

#### Random numbers and seeds

In [18]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

array([[0.20798185, 0.12183353, 0.47271806, 0.9911703 , 0.84152715],
       [0.85477429, 0.06276868, 0.55210903, 0.30579402, 0.25538967],
       [0.49258281, 0.78547287, 0.71817556, 0.4984441 , 0.03936651],
       [0.61428967, 0.63684164, 0.22735893, 0.12557733, 0.0660922 ],
       [0.74243524, 0.4222157 , 0.2247728 , 0.36073419, 0.32185464]])

In [20]:
# standard normal distributed random numbers
np.random.randn(5,5)

array([[ 0.82797404,  1.36267956,  1.02981107, -1.32163615, -0.77788716],
       [ 0.41823386,  1.34345713,  0.40658163,  0.48527115, -0.72927228],
       [-0.30619807,  2.17719042,  0.29351588, -0.32999666, -2.02125836],
       [ 0.17323617,  1.81961907, -0.56150269,  0.16079983,  0.23810884],
       [ 0.09826608, -1.74533883, -1.4289133 , -1.22350873,  0.15183508]])

#### Random seed

The seed is for when we want repeatable (reproducible) results

In [21]:
np.random.seed(77)
x=np.random.rand(8,2)
print(x)

[[0.91910903 0.6421956 ]
 [0.75371223 0.13931457]
 [0.08731955 0.78800206]
 [0.32615094 0.54106782]
 [0.24023518 0.54542293]
 [0.4005545  0.71519189]
 [0.83667994 0.58848114]
 [0.29615456 0.28101769]]


### Shape, size, dimension and dtype

In [23]:
print(x)

[[0.91910903 0.6421956 ]
 [0.75371223 0.13931457]
 [0.08731955 0.78800206]
 [0.32615094 0.54106782]
 [0.24023518 0.54542293]
 [0.4005545  0.71519189]
 [0.83667994 0.58848114]
 [0.29615456 0.28101769]]


In [25]:
x.shape

(8, 2)

In [26]:
x.size

16

In [27]:
x.ndim

2

In [29]:
x.dtype

dtype('float64')

####  Shape Manipulation
The shape of an array can be changed with various commands:

In [30]:
x = np.random.rand(20)
print(x)

[0.70559724 0.42259643 0.05731599 0.74702731 0.45231301 0.17577474
 0.049377   0.29247534 0.06679913 0.75115649 0.06377152 0.43190832
 0.36417241 0.15197153 0.54671034 0.44329304 0.03606131 0.82289319
 0.27329268 0.16898522]


In [31]:
x.shape

(20,)

In [33]:
x_new=x.reshape(-1,1)

In [34]:
x_new.shape

(20, 1)

In [35]:
x_new

array([[0.70559724],
       [0.42259643],
       [0.05731599],
       [0.74702731],
       [0.45231301],
       [0.17577474],
       [0.049377  ],
       [0.29247534],
       [0.06679913],
       [0.75115649],
       [0.06377152],
       [0.43190832],
       [0.36417241],
       [0.15197153],
       [0.54671034],
       [0.44329304],
       [0.03606131],
       [0.82289319],
       [0.27329268],
       [0.16898522]])

In [36]:
x = np.random.rand(10, 2)
print(x)

[[0.64436975 0.10754108]
 [0.3532451  0.38570366]
 [0.44555591 0.97705266]
 [0.72939401 0.31223506]
 [0.89475524 0.7832736 ]
 [0.26200034 0.30948319]
 [0.12945063 0.42217136]
 [0.93976503 0.36704287]
 [0.43477497 0.91709355]
 [0.94729392 0.25477295]]


In [37]:
x.flatten()

array([0.64436975, 0.10754108, 0.3532451 , 0.38570366, 0.44555591,
       0.97705266, 0.72939401, 0.31223506, 0.89475524, 0.7832736 ,
       0.26200034, 0.30948319, 0.12945063, 0.42217136, 0.93976503,
       0.36704287, 0.43477497, 0.91709355, 0.94729392, 0.25477295])

#### vstack and hstack

In [38]:
x = np.ones((5, 2))
print(x)

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]]


In [39]:
y = np.zeros((5, 2))
print(y)

[[0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


In [40]:
z = np.hstack((x,y))
print(z)

[[1. 1. 0. 0.]
 [1. 1. 0. 0.]
 [1. 1. 0. 0.]
 [1. 1. 0. 0.]
 [1. 1. 0. 0.]]


In [41]:
z = np.vstack((x,y))
print(z)

[[1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [1. 1.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]
 [0. 0.]]


### Indexing and slicing

In [42]:
data = np.random.randint(25,37, size=10)
print(data)

[31 30 36 34 29 36 33 25 27 31]


In [44]:
#print the first sensor data
print(data[0])

31


In [45]:
#print  data between index 3 and 7
print(data[3:7])

[34 29 36 33]


In [47]:
#print the last three data
print(data[7:])

[25 27 31]


In [48]:
# We can also use negative index
print(data[-1])

31


Multidimensional array behaves like a dataframe or matrix (i.e. columns and rows).Consider the following 2D  array.

In [49]:
data = np.random.randint(25,37, size=(10,3))
print(data)

[[31 28 29]
 [29 32 31]
 [31 31 36]
 [33 35 29]
 [27 25 36]
 [34 32 29]
 [32 32 29]
 [25 34 26]
 [28 28 32]
 [29 34 33]]


In [None]:
# View the first column of the array
data[:,0]

In [None]:
# View the first row of the array
data[0,]

In [None]:
# View the first two row
data[:2,]

In [None]:
#View the first  data
data[0,0]

#### Fancy indexing

In [None]:
## view all data that is less than 30
mask = data<30
data[mask]

In [None]:
if (data > 30).any():
    print("at least one element in data is larger than 30")
else:
    print("no element in data is larger than 30")

## Save and load numpy data to/ from file

In [None]:
np.save("../data/sensor_data.npy",data)

In [None]:
sensor_data = np.load("../data/sensor_data.npy")
print(sensor_data)

### calculations

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

In [None]:
#mean
sensor_data.mean()

In [None]:
#std
sensor_data.std()

In [None]:
#min
sensor_data.min()

In [None]:
#max
sensor_data.max()

### Numpy calculation is element wise

In [None]:
x = np.arange(1,10)
print(x)

In [None]:
print(x+2)

In [None]:
#print(x**2)
np.square(x)

In [None]:
np.log(x)

# 2.2 Scipy basics

[Sci-py](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) : Collections of high level mathematical operations such as linear algebra, Optimization, signal processing.

[List of scipy modules](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html)

## References

- [python4datascience-atc](https://github.com/pythontz/python4datascience-atc)
- [PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)
- [DS-python-data-analysis](https://github.com/jorisvandenbossche/DS-python-data-analysis)