# End To End Machine Learning Projects

## ML Project Checklist

- [ ]  Look at the big picture. Specify problem and goal. 
- [ ]  Get Data
- [ ]  Discover and visualize data to gain insights
- [ ]  Prepare Data for machine learning algorithms
- [ ]  Select a model and Train it
- [ ]  Fine tune model
- [ ]  Present Solution
- [ ]  Launch, monitor, and maintain solution

## Some example machine learning datasets 

UC Irvine Machine learning repository

Kaggle datasets

Amazon's AWS datasets



## Data Preprocessing

__1. Cleaning Data__

- Fix or remove outliers
- Fill in missing values OR
- Drop rows or columns for missing data (`pandas.DataFrame.dropna`)

__2. Feature Selection __

- Drop features that are less effective in creating a better model.

__3. Feature Engineering__

- Discretie continuous features. Spliting or containing the range of values of a feature.
- Decomposing Features (Categorical, date/time, etc)
- Add promising transmofrations of features (log(x), sqrt(x), x^2, etc)
- Aggregate features in to promising new features


__4. Feature Scaling__

- Standardize or Normalize features


 ## Mathematical Notation

$$

\textrm{Data instances are oftern represented as vectors}\\

x = [x_1, x_2, \dots , x_D]^\intercal\\


\textrm{And the complete  data set $X$, is represented as a matrix}\\

X = 

\begin{bmatrix}
x_{11} & x_{12} & \dots & x_{1D}\\
x_{11} & x_{12} & \dots & x_{1D}\\
\vdots & \vdots & \ddots & \vdots\\
x_{N1} & x_{N2} & \dots & x_{ND}\\
\end{bmatrix}\\

\textrm{There are also the output vector $y$ or $t$, as well as the model paramter vector $w$ }\\

t = [t_1, t_1, \dots ,  t_D]^\intercal\\

w = [w_1, w_1, \dots ,  w_D]^\intercal\\
$$


### Linear Model

- Some models operate by taking a linear combination of input $X$ and weight $w$, i.e. the dot product of

$$
f(x:w) = w^\intercal x = \sum \limits_0^D w_i x_i\\

= w_0 x_0 + w_1 x_ 1 + \dots + w_D x_D\\

$$


## Scaling Features

Some independent variables have different scales, and therefore can artifically cause some false sense of relevance to the change in said feature. We want to normalize the features to have similar scales so we can judge them fairly against one another. We can use __standardization__ for this. 


$$ \tilde{x} = \frac{x - \bar{x}}{\sigma} = \frac{x - \textrm{mean(x)}}{\textrm{Std_dev(x)}}$$

We can also use __Mean Normalization__, which normalized the feature based on its range.

$$ \tilde{x} = \frac{x - \bar{x}}{\textrm{max(x) - minxc)}}$$


We could also turn categorigal (low, middle, high), into a numberical values (0, 1, 2). Can to this with a `one_hot_encoder`


## Performance Measures

Those performance measures used for classification are __different__ than those used for regression. You can't use MSE for classification, and you can't use recall for regression.

The most common performance measures for regression are 
- Mean Squared Error
- Root Mean Squared Error
- Mean Absolute Error
- $R^2$ Score

The most common performance measures for classification are
- Accuracy
- Precision
- Recall
- F_1 Score
- ROC Curve
- Area Under Curve (AUC)

## Data Partitioning

## Training and Testing Set

We need to split data sets into a larger (around 80%) data set, that the model can train on, and then a smaller dataset (whatever is left), which the model can't see the answers for and is used for evaluation. We can also use cross validation for a more robust evaluation method.


Usually done with SCiKit Learns `train-test-split` function

## Fine Tuning Model Hyper Parameters

Hyperparamters are those set before we even start training, and remain unchanged during each training session. They control and optimize model performance. Differ from model parameters in that model parameters are adjusted during training.

In a neural network, we have neurons and edges connecting each leayer of neurons. Each of these edges have associated weights. In this example, the hyperparamters would be the number of neurons, or number of layers of neurons. So the major achritectural components of the model. Whereas the model paramters would be the weights of each edge.

## Example: Califorina Housing Prices

In this example we want to predict housing prices in california based on  number of features. This is a regression problem, so we need to use a regression performance measure to evaluate the model.

We can use Root Mean Square Error

$$RMSE(X, h) = \sqrt{ \frac{1}{m} \sum \limits_{i=1}^m(h(x^{(i)} - y^{(i)})^2}$$

This also implicitly shows us MSE, which is just RMSE without taking the square root at the end.

Or we could use Mean Absolute Error

$$MAE(X, h) = \frac{1}{m} \sum \limits_{i=1}^m |h(x^{(i)}) - y^{(i)}|$$