# Exercises for Day 5
Using SciPy, Scikit-Learn and Pandas

## 1. Scipy

### Linear Algebra
Have a look at the ```scipy.linalg``` [module](https://docs.scipy.org/doc/scipy/reference/linalg.html)




#### f. Solve the eigenvalue problem for the matrix A and print the eigenvalues and eigenvectors

#### g. Calculate the inverse, determinant of A

#### h. Calculate the norm of A with different orders


### Statistics
Have a look at the ```scipy.stats``` [module](https://docs.scipy.org/doc/scipy/reference/stats.html)

#### a. Create a discrete random variable with poissonian distribution and plot its probability mass function (PMF), cummulative distribution function (CDF) and a histogram of 1000 random realizations of the variable

#### b. Create a continious random variable with normal distribution and plot its probability mass function (PMF), cummulative distribution function (CDF) and a histogram of 1000 random realizations of the variable

#### c. Test if two sets of (independent) random data comes from the same distribution
Hint: Have a look at the ```ttest_ind``` function



## 2. Pandas
For this exercise you need to have Pandas installed (you can try to install it with pip)

Tutorials are taken from [https://github.com/guipsamora/pandas_exercises](https://github.com/guipsamora/pandas_exercises)

#### a. Download the notebook [food_facts.ipynb](food_facts.ipynb) and learn how to load and display data with Pandas

#### b. Download the notebook [army.ipynb](army.ipynb) and try yourself to use Pandas for filtering and sorting of data

#### c. Download the notebook [alcohol.ipynb](alcohol.ipynb) and try yourself to use Pandas for grouping of data


## Bonus. Publishing of Python code 

#### a. Take some piece of Python code (e.g. the ```simple_math.py``` module) and make it into a package.
A package should include a ```__init__.py``` inside all subdirectories, it should be able to import all the submodules from the top level, functions should have documentation (docstrings) and ideally there is a docs folder with Sphinx documentation. You should also include a small Readme.md file

#### b. Add a ```setup.py``` file to your package (following the instructions from the lecture notes)
Now you should be able to install your package using pip

```
pip install . --user
```

#### c. Upload your package to the TestPyPi repository
Follow the instructions from [here](https://packaging.python.org/guides/using-testpypi/)




In [16]:
import scipy.linalg as la
import numpy as np
from numpy.linalg import matrix_rank

#### a. Define a matrix A
```
[[1 2 3]
 [4 5 6]
 [7 8 9]]
```

#### b. Define a vector b
```
[1 2 3]
```

#### c. Solve the linear system of equations A x = b

#### d. Check that your solution is correct by plugging it into the equation


In [5]:
#Exercise 1.a-d
#A = np.arange(1,10,1).reshape(3,3)
A=np.array([[1,2,3],[4,5,6],[7,8,9] ])
b = np.array([1,2,3])
x=la.solve(A,b)
#print(la.det(A)) # negative, no solution no invertible
print(x)


[-0.23333333  0.46666667  0.1       ]


  x=la.solve(A,b)


In [29]:
#try least square
x1 = np.linalg.lstsq(A, b, rcond=None)[0]
x2 = np.linalg.solve(A,b)
print(x)
#Check solution 
print(A@x1)
print(A@x2)

# Solution ok

[-0.23333333  0.46666667  0.1       ]
[1. 2. 3.]
[1. 2. 3.]


In [20]:
#For fun try QR functions
#Reduced row-echelon form, 1 on diagonal and 0  else, I -> a unique solution. 
#If no I-> either solution or infinitely many solutions
#Check last row of the reduced matrix

#Q,R decomposition(singular matrix A)
Q, R = la.qr(A) 
#R_1 = la.inv(R)
#Q_t = Q.T
y = np.dot(Q.T, b) #  y=Q'.b - matrix multiplication
x = la.solve(R, y) # Solve Rx=y
print(x)


[-0.1335896   0.26717919  0.19974374]


  x = la.solve(R, y) # Solve Rx=y


#### e. Repeat steps a-d using a random 3x3 matrix B (instead of the vector b)

In [44]:
#### e. Repeat steps a-d using a random 3x3 matrix B (instead of the vector b)

#Exercise 1.e
#A = np.arange(1,10,1).reshape(3,3)
A= np.array([[1,2,3],[4,5,6],[7,8,9] ])
B = np.random.randint(100,size=(3,3))
print(B)


[[62 60  5]
 [30 16 77]
 [13 74 76]]


In [46]:
B

array([[62, 60,  5],
       [30, 16, 77],
       [13, 74, 76]])

In [57]:

X = np.linalg.solve(A,B)
#X = la.linalg.lstsq(A, B, rcond=None)[0]
#X = np.linalg.inv(A)@B    
#np.allclose(A@np.linalg.solve(A, B), B)   
X

array([[ 4.72877961e+16,  3.21557013e+17, -2.30133941e+17],
       [-9.45755922e+16, -6.43114027e+17,  4.60267882e+17],
       [ 4.72877961e+16,  3.21557013e+17, -2.30133941e+17]])

In [59]:
#AX = B ?
np.allclose(np.dot(A, X), B)
#A@X


False