# Practical ML
Source: Andrew Ng Coursera [lectures](https://www.coursera.org/learn/machine-learning/)

## How to evaluate hypothesis?

* Ensure **no ordering** in the data. Otherwise shuffle data.
* Split the data in train-test set. Generally 70-30 split.
* Use train data for learning and compute error on test set.
* For classification problems we can also compute misclassification error. What fraction of examples did the model get wrong.

## Model Selection: Which Hypothesis function(s) we should pursue more?
* The goal of a ML model is to generalize well on **unseen data**
* If we use the test data to select model then we choose the best model that fits the test data. Such model may not generaliza well on real unseen data.
* Instead split the data into train-validation-test set with 60-20-20 split in general
* Use `validation` set to optimize parameters and hyperparameters of the model

**Use Test set only once to report true generalization capabilities of the model**

## Bias-Variance tradeoff
* As model complexity increases training error reduces
* Initially for simple models validation error is high (**underfitting** / **high bias**)
* This is evident via **high training error** and **high validation error**
* As model complexity increases then validation error decreases
* After a point as model complexity increases more validation error increases (**overfitting / high variance**)
* This is evident via **low training error** and **high validation error**

### Bias-Variance with respect to regularization
* If we regularize too much then underfitting / high bias
* If we regularize too less then overfitting / high variance

#### Choosing regularization parameter
* Define train, validation and test errors wrt optimization objective of the model without the regularization term
* For differe values of regularization parameters find validation error and choose the regularization parameter that minimizes validation error

## Diagnosing High Bias or High Variance via Learning curves
* Plot error against training set size
* Generally as size of training set increases training error increases since for less no. of examples it is easy to fit to data
* As training set size increases validation error starts to decrease since the more training data we have the better are our generalization guarantees


* If the model suffers from **high bias** even if we increase training set size validation error will not decrease beyond a certain point
* Similarly the training error will increase and after a point will not increase much and will be very close to validation error
* This results in high training error and high validation error with very similar values
* So for a model suffering from high bias **getting more data is NOT going to help much**


* If the model suffers from **high variance** as training set size increase training error increases but not very much since model fit training data well
* Validation error starts to decrease as training set size increases but does not decrease very much since model does not generalize well
* The validation error will be significantly higher than training error
* In this case **getting more training data WILL help**.

## What to do if a given ML model performs poorly?
* **Collect More Data**: fixes **HIGH VARIANCE**
* **Try less no. of features**: fixes **HIGH VARIANCE**
* **Get more no. of features**: fixes **HIGH BIAS**
* **Add new features from existing data**: fixes **HIGH BIAS**
* **Decrease regularization**: fixes **HIGH BIAS**
* **Increase regularization**: fixes **HIGH VARIANCE**

## Prioritizing what to work on about improving algorithm
* Get more data
* Build better features
* Build algorithms to solve subproblems

### Error Analysis
* Implement simple algorithm
* Plot learning curves to find high bias or high variance
* Error analysis: manually look at data that algorithm misclassified and try to find a pattern in them
* Try to categorize data based on various parameters and find insights where algorithm is making more mistakes
* Find what features can help to classify such things correctly

### Single evaluation numerical metric
* Helps to evaluate ideas quickly
* Evaluate performance of algrithm with and without idea and see if it helps

## Handling skewed classes
* Only having accuracy in presence of skewed class data is not the correct approach since we can jus always predict the majority class and get really good accuracy
* Precision and Recall in such case are better metrics
* Since we need a single number evaluation metric we can use f1-score

Note: use StatQuest channel to expand more on precision recall f1-score even in multi-class settings

## Tradeoff between precision and recall
* If model tends to predict negative more often than positive then it wants to be really sure (strict criteria) about positive. Such model will have high precision and low recall.
* If model tends to predict positive more often then it has very relaxed criteria. Such model has high recall and low precision.

## Large data in ML
* The more data we give it to a learning algorithm the better the results will be
* With a large training set actual algorithm does not matter as much if it has sufficient number of parameters

* Use an algorithm has many parameters. Leads to low training error. This takes care of high bias.
* Provide large training data. This is unlikely to overfit. 
* This causes test error to be very close to training error. Since number of training examples is much more than number of features. Hene model will have low variance.

## Learning with large datasets

* Classical gradient descent will be very slow since we compute average over all examples
* In such case it is better to use **stochastic gradient descent**
* Another alternative: **minibatch gradient descent** use only subset of examples for gradient descent update
* minibatch gradient descent is faster than gradient descent
* minibatch gradient descent is faster than stochastic gradient descent if we use vectorized operations