# PytzMLS2018: Python for ML and DS Research - Numpy and Scipy basics

<center>**Anthony Faustine (sambaiga@gmail.com)**</center>

## 2.1. Numpy Basics

NumPy is the fundamental package for scientific computing with Python. Provide high-performance
vector, matrix and higher-dimensional data
structures and offers Matlab-ish capabilities within Python

It contains among other things:

* a powerful N-dimensional array/vector/matrix object
* sophisticated (broadcasting) functions
* function implementation in C/Fortran assuring good performance if vectorized
* tools for integrating C/C++ and Fortran code
* useful linear algebra, Fourier transform, and random number capabilities

Also known as *array oriented computing*. The recommended convention to import numpy is:

In [1]:
import numpy as np

## 2.1.1 Creating numpy arrays

There are a number of ways to initialize new numpy arrays, for example from

* a Python list or tuples or
* using functions that are dedicated to generating numpy arrays, such as arange, linspace, empty,zeros etc.

#### array from list

In [2]:
# a vector
v = np.array([0.5,0.8,2,1])
print(v)

[ 0.5  0.8  2.   1. ]


In [3]:
# a matrix
M = np.array([[1, 2], [3, 4]])
print(M)

[[1 2]
 [3 4]]


In [4]:
# a multidimension array
N = np.array([[0.2,0.4,2],[0.1,2,5],[3,0.4,0.1]])
print(N)

[[ 0.2  0.4  2. ]
 [ 0.1  2.   5. ]
 [ 3.   0.4  0.1]]


#### use specific function

In [5]:
#Evenly spaced array (arange)
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [6]:
# create a range
x = np.arange(-1, 1, 0.1) # arguments: start, stop, step
print(x)

[ -1.00000000e+00  -9.00000000e-01  -8.00000000e-01  -7.00000000e-01
  -6.00000000e-01  -5.00000000e-01  -4.00000000e-01  -3.00000000e-01
  -2.00000000e-01  -1.00000000e-01  -2.22044605e-16   1.00000000e-01
   2.00000000e-01   3.00000000e-01   4.00000000e-01   5.00000000e-01
   6.00000000e-01   7.00000000e-01   8.00000000e-01   9.00000000e-01]


In [None]:
# using linspace, both end points ARE included
np.linspace(0, 10, 25)

In [None]:
#zeros array
np.zeros((2,3))

In [None]:
np.ones((3,3))

In [None]:
np.empty((10,2))

#### Random numbers and seeds

In [None]:
# uniform random numbers in [0,1]
np.random.rand(5,5)

In [None]:
# standard normal distributed random numbers
np.random.randn(5,5)

#### Random seed

The seed is for when we want repeatable (reproducible) results

In [None]:
np.random.seed(77)
x=np.random.rand(8,2)
print(x)

### Shape, size, dimension and dtype

In [None]:
print(x)

In [None]:
x.shape

In [None]:
x.size

In [None]:
x.ndim

In [None]:
x.dtype

####  Shape Manipulation
The shape of an array can be changed with various commands:

In [None]:
x = np.random.rand(20)
print(x)

In [None]:
x.shape

In [None]:
x_new=x.reshape(-1,1)

In [None]:
x_new.shape

In [None]:
x = np.random.rand(10, 2)
print(x)

In [None]:
x.flatten()

#### vstack and hstack

In [None]:
x = np.ones((5, 2))
print(x)

In [None]:
y = np.zeros((5, 2))
print(y)

In [None]:
z = np.hstack((x,y))
print(z)

In [None]:
z = np.vstack((x,y))
print(z)

### Indexing and slicing

In [None]:
data = np.random.randint(25,37, size=10)
print(data)

In [None]:
#print the first sensor data
print(data[0])

In [None]:
#print  data between index 3 and 7
print(data[3:7])

In [None]:
#print the last three data
print(data[7:])

In [None]:
# We can also use negative index
print(data[-1])

Multidimensional array behaves like a dataframe or matrix (i.e. columns and rows).Consider the following 2D  array.

In [None]:
data = np.random.randint(25,37, size=(10,3))
print(data)

In [None]:
# View the first column of the array
data[:,0]

In [None]:
# View the first row of the array
data[0,]

In [None]:
# View the first two row
data[:2,]

In [None]:
#View the first  data
data[0,0]

#### Fancy indexing

In [None]:
## view all data that is less than 30
mask = data<30
data[mask]

In [None]:
if (data > 30).any():
    print("at least one element in data is larger than 30")
else:
    print("no element in data is larger than 30")

## Save and load numpy data to/ from file

In [None]:
np.save("../data/sensor_data.npy",data)

In [None]:
sensor_data = np.load("../data/sensor_data.npy")
print(sensor_data)

### calculations

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays. 

In [None]:
#mean
sensor_data.mean()

In [None]:
#std
sensor_data.std()

In [None]:
#min
sensor_data.min()

In [None]:
#max
sensor_data.max()

### Numpy calculation is element wise

In [None]:
x = np.arange(1,10)
print(x)

In [None]:
print(x+2)

In [None]:
#print(x**2)
np.square(x)

In [None]:
np.log(x)

# 2.2 Scipy basics

[Sci-py](https://docs.scipy.org/doc/scipy/reference/tutorial/general.html) : Collections of high level mathematical operations such as linear algebra, Optimization, signal processing.

[List of scipy modules](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html)

## References

- [python4datascience-atc](https://github.com/pythontz/python4datascience-atc)
- [PythonDataScienceHandbook](https://github.com/jakevdp/PythonDataScienceHandbook)
- [DS-python-data-analysis](https://github.com/jorisvandenbossche/DS-python-data-analysis)