# PytzMLS2018: Python for ML and DS Research - Numpy and Scipy basics

<center>**Anthony Faustine (sambaiga@gmail.com)**</center>

## 2.1. Numpy Basics

NumPy is the fundamental package for scientific computing with Python. Provide high-performance
vector, matrix and higher-dimensional data
structures and offers Matlab-ish capabilities within Python

It contains among other things:

* a powerful N-dimensional array/vector/matrix object
* sophisticated (broadcasting) functions
* function implementation in C/Fortran assuring good performance if vectorized
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

Also known as *array oriented computing*. The recommended convention to import numpy is:

In [1]:
import numpy as np

## 2.1.1 Creating numpy arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples or
* using functions that are dedicated to generating numpy arrays, such as arange, linspace, empty,zeros etc.

#### array from list

In [2]:
# a vector
v = np.array([0.5,0.8,2,1])
print(v)

[ 0.5  0.8  2.   1. ]


In [3]:
# a matrix
M = np.array([[1, 2], [3, 4]])
print(M)

[[1 2]
 [3 4]]


In [4]:
# a multidimension array
N = np.array([[0.2,0.4,2],[0.1,2,5],[3,0.4,0.1]])
print(N)

[[ 0.2  0.4  2. ]
 [ 0.1  2.   5. ]
 [ 3.   0.4  0.1]]


#### use specific function

In [5]:
#Evenly spaced array (arange)
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
# create a range
x = np.arange(-1, 1, 0.1) # arguments: start, stop, step
print(x)

[ -1.00000000e+00  -9.00000000e-01  -8.00000000e-01  -7.00000000e-01
  -6.00000000e-01  -5.00000000e-01  -4.00000000e-01  -3.00000000e-01
  -2.00000000e-01  -1.00000000e-01  -2.22044605e-16   1.00000000e-01
   2.00000000e-01   3.00000000e-01   4.00000000e-01   5.00000000e-01
   6.00000000e-01   7.00000000e-01   8.00000000e-01   9.00000000e-01]


In [7]:
# using linspace, both end points ARE included
np.linspace(0, 10, 25)

array([  0.        ,   0.41666667,   0.83333333,   1.25      ,
         1.66666667,   2.08333333,   2.5       ,   2.91666667,
         3.33333333,   3.75      ,   4.16666667,   4.58333333,
         5.        ,   5.41666667,   5.83333333,   6.25      ,
         6.66666667,   7.08333333,   7.5       ,   7.91666667,
         8.33333333,   8.75      ,   9.16666667,   9.58333333,  10.        ])

In [8]:
#zeros array
np.zeros((2,3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [9]:
np.ones((3,3))

array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [10]:
np.empty((10,2))

array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

#### Random numbers and seeds

In [11]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

array([[ 0.98772181,  0.04017517,  0.36968699,  0.73573467,  0.35869779],
       [ 0.82253892,  0.162705  ,  0.16796984,  0.18668452,  0.99091935],
       [ 0.58313762,  0.66915286,  0.32057491,  0.37676649,  0.31806669],
       [ 0.10545221,  0.27485173,  0.89989117,  0.04186545,  0.85368306],
       [ 0.5046606 ,  0.8057503 ,  0.42752001,  0.49089731,  0.03309212]])

In [12]:
# standard normal distributed random numbers
np.random.randn(5,5)

array([[ 0.80279764, -0.0981139 , -0.97191092,  1.47711128,  0.46109836],
       [-1.26301208,  0.29733906, -1.47306489, -0.31168045, -0.37385879],
       [-0.45475429,  0.74344564, -0.72938988,  0.45724339,  0.56973645],
       [-1.8113575 , -0.40240674, -0.40110451,  0.93688043, -1.54581273],
       [ 0.54554045,  0.56179021,  0.89179139,  0.25292755,  2.0913259 ]])

#### Random seed

The seed is for when we want repeatable (reproducible) results

In [17]:
np.random.seed(77)
x=np.random.rand(8,2)
print(x)

[[ 0.91910903  0.6421956 ]
 [ 0.75371223  0.13931457]
 [ 0.08731955  0.78800206]
 [ 0.32615094  0.54106782]
 [ 0.24023518  0.54542293]
 [ 0.4005545   0.71519189]
 [ 0.83667994  0.58848114]
 [ 0.29615456  0.28101769]]


### Shape, size, dimension and dtype

In [18]:
print(x)

[[ 0.91910903  0.6421956 ]
 [ 0.75371223  0.13931457]
 [ 0.08731955  0.78800206]
 [ 0.32615094  0.54106782]
 [ 0.24023518  0.54542293]
 [ 0.4005545   0.71519189]
 [ 0.83667994  0.58848114]
 [ 0.29615456  0.28101769]]


In [19]:
x.shape

(8, 2)

In [20]:
x.size

16

In [21]:
x.ndim

2

In [22]:
x.dtype

dtype('float64')

####  Shape Manipulation
The shape of an array can be changed with various commands:

In [23]:
x = np.random.rand(20)
print(x)

[ 0.70559724  0.42259643  0.05731599  0.74702731  0.45231301  0.17577474
  0.049377    0.29247534  0.06679913  0.75115649  0.06377152  0.43190832
  0.36417241  0.15197153  0.54671034  0.44329304  0.03606131  0.82289319
  0.27329268  0.16898522]


In [24]:
x.shape

(20,)

In [31]:
x_new=x.reshape(-1,1)

In [34]:
x_new.shape

(20, 1)

In [35]:
x = np.random.rand(10, 2)
print(x)

[[ 0.64436975  0.10754108]
 [ 0.3532451   0.38570366]
 [ 0.44555591  0.97705266]
 [ 0.72939401  0.31223506]
 [ 0.89475524  0.7832736 ]
 [ 0.26200034  0.30948319]
 [ 0.12945063  0.42217136]
 [ 0.93976503  0.36704287]
 [ 0.43477497  0.91709355]
 [ 0.94729392  0.25477295]]


In [40]:
y=x.flatten()
print(y)

[ 0.64436975  0.10754108  0.3532451   0.38570366  0.44555591  0.97705266
  0.72939401  0.31223506  0.89475524  0.7832736   0.26200034  0.30948319
  0.12945063  0.42217136  0.93976503  0.36704287  0.43477497  0.91709355
  0.94729392  0.25477295]


In [39]:
y.shape

(20,)

In [41]:
x.shape

(10, 2)

In [42]:
x.reshape(-1,1)

array([[ 0.64436975],
       [ 0.10754108],
       [ 0.3532451 ],
       [ 0.38570366],
       [ 0.44555591],
       [ 0.97705266],
       [ 0.72939401],
       [ 0.31223506],
       [ 0.89475524],
       [ 0.7832736 ],
       [ 0.26200034],
       [ 0.30948319],
       [ 0.12945063],
       [ 0.42217136],
       [ 0.93976503],
       [ 0.36704287],
       [ 0.43477497],
       [ 0.91709355],
       [ 0.94729392],
       [ 0.25477295]])

#### vstack and hstack

In [43]:
x = np.ones((5, 2))
print(x)

[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]]


In [44]:
y = np.zeros((5, 2))
print(y)

[[ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]


In [45]:
z = np.hstack((x,y))
print(z)

[[ 1.  1.  0.  0.]
 [ 1.  1.  0.  0.]
 [ 1.  1.  0.  0.]
 [ 1.  1.  0.  0.]
 [ 1.  1.  0.  0.]]


In [46]:
z = np.vstack((x,y))
print(z)

[[ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 1.  1.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]
 [ 0.  0.]]


### Indexing and slicing

In [52]:
data = np.random.randint(25,37, size=10)
print(data)

[34 26 28 28 32 29 34 33 36 29]


In [53]:
#print the first sensor data
print(data[0])

34


In [54]:
#print  data between index 3 and 7
print(data[3:7])

[28 32 29 34]


In [55]:
#print the last three data
print(data[7:])

[33 36 29]


In [56]:
# We can also use negative index
print(data[-1])

29


Multidimensional array behaves like a dataframe or matrix (i.e. columns and rows).Consider the following 2D  array.

In [57]:
data = np.random.randint(25,37, size=(10,3))
print(data)

[[27 26 27]
 [25 27 34]
 [32 29 25]
 [32 26 27]
 [34 31 29]
 [35 36 26]
 [31 26 26]
 [36 26 29]
 [33 28 36]
 [28 35 25]]


In [62]:
# View the first column of the array
data[:,0]

array([27, 25, 32, 32, 34, 35, 31, 36, 33, 28])

In [60]:
# View the first row of the array
data[0,:]

array([27, 26, 27])

In [63]:
# View the first two row
data[:2,]

array([[27, 26, 27],
       [25, 27, 34]])

In [64]:
#View the first  data
data[0,0]

27

#### Fancy indexing

In [65]:
## view all data that is less than 30
mask = data<30
data[mask]

array([27, 26, 27, 25, 27, 29, 25, 26, 27, 29, 26, 26, 26, 26, 29, 28, 28,
       25])

In [66]:
if (data > 30).any():
    print("at least one element in data is larger than 30")
else:
    print("no element in data is larger than 30")

at least one element in data is larger than 30


## Save and load numpy data to/ from file

In [67]:
np.save("../data/sensor_data.npy",data)

In [68]:
sensor_data = np.load("../data/sensor_data.npy")
print(sensor_data)

[[27 26 27]
 [25 27 34]
 [32 29 25]
 [32 26 27]
 [34 31 29]
 [35 36 26]
 [31 26 26]
 [36 26 29]
 [33 28 36]
 [28 35 25]]


### calculations

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

In [69]:
#mean
sensor_data.mean()

29.566666666666666

In [70]:
#std
sensor_data.std()

3.7477400597634243

In [71]:
#min
sensor_data.min()

25

In [72]:
#max
sensor_data.max()

36

### Numpy calculation is element wise

In [73]:
x = np.arange(1,10)
print(x)

[1 2 3 4 5 6 7 8 9]


In [74]:
print(x+2)

[ 3  4  5  6  7  8  9 10 11]


In [75]:
#print(x**2)
np.square(x)

array([ 1,  4,  9, 16, 25, 36, 49, 64, 81])

In [76]:
np.log(x)

array([ 0.        ,  0.69314718,  1.09861229,  1.38629436,  1.60943791,
        1.79175947,  1.94591015,  2.07944154,  2.19722458])

# 2.2 Scipy basics

[Sci-py](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) : Collections of high level mathematical operations such as linear algebra, Optimization, signal processing.

[List of scipy modules](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html)

## References

- [python4datascience-atc](https://github.com/pythontz/python4datascience-atc)
- [PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)
- [DS-python-data-analysis](https://github.com/jorisvandenbossche/DS-python-data-analysis)