# Lesson 5: prctical python libraries in data science
This lesson is mainly taken from <a href="https://www.w3schools.com/python/" target="_blank">this link</a>.


## 1. numpy


We have actually introduced this package in previous lessons. However, if you like to learn more about this package you can take a look at <a href="https://docs.scipy.org/doc/numpy-1.15.1/user/quickstart.html" target="_blank">this link</a>. 


## 2. scipy


**SciPy** is a scientific computation library that uses `NumPy` underneath.

### Numpy VS SciPy

Numpy:

* Numpy is written in C and use for mathematical or numeric calculation.
* It is faster than other Python Libraries
* Numpy is the most useful library for Data Science to perform basic calculations.
* Numpy contains nothing but array data type which performs the most basic operation like sorting, shaping, indexing, etc. 

SciPy:

* SciPy is built in top of the NumPy
* SciPy is a fully-featured version of Linear Algebra while Numpy contains only a few features.
* Most new Data Science features are available in Scipy rather than Numpy.

scipy is a collection of different packages including:

* `constants`: Physical and mathematical constants
* `optimize`: Optimization
* `signal` : Signal processing
* `fftpack`: Fourier transform
* `sparse`: Sparse matrices
* `interpolate`: Interpolation
* `integrate`: Integration routines
* `io`: Data input and output
* `linalg`: 	Linear algebra routines
* `special`: Contains numerous functions of mathematical physics

Each package must be imported separately. The packages are not automatically imported if you just do `import scipy`. The following usage of optimize package will get error!

In [None]:
import scipy
import math
def f(x):
    return (x**3 - 2*x**2+10-200*math.sin(x))
root=scipy.optimize.newton(f,-1)
print(root)

### File Input / Output package:

Scipy, I/O package, has a wide range of functions for work with different files format which are Matlab, Arff, Wave, Matrix Market, IDL, NetCDF, TXT, CSV and binary format.

Let's we take one file format example as which are regularly use of MatLab:

 


In [None]:
import numpy as np
from scipy import io as sio

array = np.ones((4, 4))
sio.savemat('example.mat', {'ar': array}) 

data = sio.loadmat('example.mat',struct_as_record=True)


In [None]:
#help(sio.savemat)

### Linear Algebra with SciPy

* Linear Algebra of SciPy is an implementation of BLAS and ATLAS LAPACK libraries.
* Performance of Linear Algebra is very fast compared to BLAS and LAPACK.
* Linear algebra routine accepts two-dimensional array object and output is also a two-dimensional array.

Now let's do some test with `scipy.linalg`,

Calculating determinant of a two-dimensional matrix, 

In [None]:
from scipy import linalg
import numpy as np
#define square matrix
A = np.array([ [4,5], [3,2] ])
#pass values to det() function
linalg.det( A )

In [None]:
scipy.linalg.inv(A)

In [None]:
eg_val, eg_vect = linalg.eig(A)
#get eigenvalues
print(eg_val)
#get eigenvectors
print(eg_vect)

## Interpolation
There are several general interpolation facilities available in SciPy, for data in 1, 2, and higher dimensions:

* A class representing an interpolant (`interp1d`) in 1-D, offering several interpolation methods.

* Convenience function `griddata` offering a simple interface to interpolation in N dimensions (N = 1, 2, 3, 4, …). 

* Functions for 1- and 2-D (smoothed) cubic-spline interpolation, based on the FORTRAN library FITPACK.

* Interpolation using radial basis functions.

For a nice tutorial on `scipy.interpolation` take a look at this link.


In [None]:
from scipy.interpolate import interp1d

x = np.linspace(0, 10, num=11, endpoint=True)

y = np.cos(-x**2/9.0)

f = interp1d(x, y)

f2 = interp1d(x, y, kind='cubic')


xnew = np.linspace(0, 10, num=101, endpoint=True)

import matplotlib.pyplot as plt

plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')

plt.legend(['data', 'linear', 'cubic'], loc='best')

plt.show()

## 3. matplotlib


Most of the `Matplotlib` utilities lies under the `pyplot` submodule, and are usually imported under the `plt` alias.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.array([0, 1, 2, 3])
y = np.array([3, 8, 1, 10])
plt.plot(x,y)

In [None]:

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])

plt.subplot(2, 2, 1)
plt.plot(x)

plt.subplot(2,2,2)
plt.plot(y, linestyle = 'dotted')

plt.subplot(2,2,3)
plt.plot(x,y, marker = 'o')

plt.subplot(2,2,4)
plt.scatter(x, y)


In [None]:
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])

plt.subplot(1,3,1)
plt.bar(x,y)

xx = np.random.normal(170, 10, 250)
plt.subplot(1,3,2)
plt.hist(xx)

plt.subplot(1,3,3)
y = np.array([35, 25, 25, 15])
plt.pie(y)
plt.show()

### Exercise:
Plot the gaussian function of previous exercise. Then, calculate it's discrete Fourier transform usin `scipy.fftpack.fft` and then plot it's amplitude and phase spectrum using `numpy.abs` and `numpy.angle`