## Modelado de datos con Redes Neuronales (Class 8)
* closing the topic of linear regression (with ML example)
* the same example we will use for Neural Network later

# Order of the class:
* supplementary material:
  * suplementary material on data: the imporatance of data
  * one more metric for evaluation the goodness of fit: $R^2$ (R-Squared)
  * gradient descent: multidimensional and if we dont know the function
  * 3D plot (you know)

* scikit-learn (TensorFlow and Neural network is not self-sufficient, needs other packages)
  * numpy, pandas, matplotlib but also
  * scikit-learn
  * TensorFlow will have similar syntax as scikit-learn
* Train/test split: 2 ways
* simple linear and multiple regression: hypothesis and linear algebra
* simple regression example (machine learning application)
* homework

# Linear regression (continuation) 
* you know how it works
  * what least squares means
  * you the metrics (MSE, MAE,MAPE,RMSE, + R-squared)
  * you know how to use scipy-optimize to fit a function
  * you know how gradient descent work
  * you know what hypothesis is
* simple linear and multiple: hypothesis and linear algebra
* `scikit-learn`
* scaling features
* machine learning application (example using scikit-learn)
  * you know how to use linear regression:
    * analytically
    * scipy-optimize
    * gradient descent written by yourself
    * now its good to practice with scikit-learn regressor
* homework

## Simple linear regression and the hypothesis
* single feature case

$$h_{\theta}(x)=\theta_0 + \theta_1 x$$

* if $x$ represents size of the house in square meters
* then $h_{\theta}(x)$ will be the estimated price of the house
* if $x_0$ is called bias

* objective is to minimize $$
J(\theta_0,\theta_1) = \frac{1}{m} \sum_{i=1}^{m}[ h_{\theta}(x_i) - y_i]^2 
$$

## Multiple linear regression  and the hypothesis

* multiple features case

$$h_{\theta}(x)=\theta_0 + \theta_1 x_1 + \theta_2 x_2 + \theta_3 x_3 $$

* Multiple linear regression is when each feature scales linearly with the output
  * each independent variable scales linearly with the dependent variable
  * best way is just to visulize it

$$ \theta =
\begin{bmatrix}
\theta_{0} \\
\theta_{1} \\
\theta_{2} \\
\vdots\\
\theta_{n}
\end{bmatrix}
$$

$$ X=
\begin{bmatrix}
1 \\
x_{1} \\
x_{2} \\
\vdots\\
x_{n}
\end{bmatrix}
$$

If $x_0=1$ then we can rewrite this as a matrix - matrix multiplication (if we do the transpose of the first matrix) hence:

$$h_{\theta}(x)=\theta^T X $$

*matrix matrix multiplication works only if $$(nxm) \times (mxn) $$
$$(1xn) \times (nx1)$$

$$ \begin{bmatrix}
\theta_{0} &
\theta_{1} &
\theta_{2} &
\cdots &
\theta_{n}
\end{bmatrix} \times \begin{bmatrix}
1 \\
x_{1} \\
x_{2} \\
\vdots\\
x_{n}
\end{bmatrix}
$$


* objective is to minimize $$
J(\theta_0,\theta_1,\theta_2,,\theta_3...) = \frac{1}{m} \sum_{i=1}^{m}[ h_{\theta}(x_i) - y_i]^2 
$$


# Feature scaling (normalizing data)
* recall the gradient descent problem of finding minima in a 1D problem

<img src="imgs/gradient1d.png" width="400" /> 

### 2D problem is like a surface:
* multivariable problem, lets assume its a 2D problem, so we have $x_1^{(i)}$ and $x_2^{(i)}$
  * if $x_1$ represents size of the house in square meters, eg 5000
  * if $x_2$ represents the number of bedrooms eg: 5
  * then one variable is three orders of magnitude larger than the other so the gradient descent will be very not symmetrical
<img src="imgs/feature_scaling.png" width="600" /> 

## Two methods of doing it:

### 1) Method one manually: 
if x is a feature (a column represent one type of data) then the scaled feature is is calculated as follow:

$\bar{x} = \frac{(x^{i} - u)} {s}$

where `u` is the mean of the training samples
and `s` is the standard deviation of the training samples

* which must be done to all the features (columns)
  * if for example column 1 represents size of the house
  * if for example column 2 represents number of bedrooms

### 2) Method two: scikit-learn scales all the features at ones: 
`from sklearn import preprocessing
x = preprocessing.StandardScaler().fit(X).transform(X)`

done and done!

# Splitting data into train and test datasets

* the file 'mnist_test.csv' contains 10000 images of digits in grayscale saved as a rows in the csv file
* first column contains value of the digits then 784 columns correspond to pixels
* make an array of plots of 25 randomly chosen images (5x5) 
* each plot has to have a title corresponding to a digit 

## Two methods of doing it:

### method 1

In [1]:
import pandas as pd
df=pd.read_csv('mnist_test.csv')
df.head()

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [2]:
data=df.to_numpy()
data.shape

(10000, 785)

In [3]:
y_data=data[:,0]
X_data=data[:,1:]

In [4]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X_data, y_data, test_size=0.2)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (8000, 784) (8000,)
Test set: (2000, 784) (2000,)


### method 2 
(the first method is more common but its good to be aware of other ways)

In [5]:
import numpy as np
import pandas as pd
df=pd.read_csv('mnist_test.csv')

In [6]:
rand_unif_numbers=np.random.rand(100)
#print(rand_unif_numbers)
# np.random.rand generates uniformly distributed numbers between 0:1

In [7]:
print(np.min(rand_unif_numbers))
print(np.max(rand_unif_numbers))

0.008198550146346961
0.9684614187325206


In [8]:
rand_unif_numbers = np.random.rand(len(df))
#rand_unif_numbers[0:10]

In [9]:
mask = rand_unif_numbers < 0.8 # this < is a logical comparison opertor

In [10]:
print(mask)

[ True  True  True ...  True  True False]


In [11]:
df_train_data = df[mask] # both are still Panda's dataframe
df_test_data = df[~mask] # both are still Panda's dataframe

In [12]:
type(df_train_data)

pandas.core.frame.DataFrame

In [13]:
df_train_data.head(2)

Unnamed: 0,label,1x1,1x2,1x3,1x4,1x5,1x6,1x7,1x8,1x9,...,28x19,28x20,28x21,28x22,28x23,28x24,28x25,28x26,28x27,28x28
0,7,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,2,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


# Logistic Regression
* you want a probabilities

<img src="imgs/logistic_5.png" width="800" /> 