# Bias Variance Analysis
**All errors in machine learning are as a result of high bias or high variance.**

In this notebook, we'll learn exactly how to identify when your model has high bias (underfitting) or high variance (overfitting)  and how to optimise the model for the perfect bias-variance trade-off. Here is an overview of what will be covered:

1. The bias-variance trade-off
1. Optimising for Bias/Variance: Model Hyperparameters, Outside Hyperparameters
2. Diagnosing for Bias/Variance: Learning Curves, Loss Curves, and Validation Curves
3. Bonus 1: Baseline Models (Incremental Optimisation)
4. Bonus 2: Handling Imbalance Datasets

## The Bias Variance Trade-Off

## Diagnosing for Bias/Variance

## Optimising for Bias/Variance
Now that you have evaluated your model for bias and variance, you need to know what actions to take to reduce bias (fix underfitting) or reduce variance (fix overfitting). There are two ways to approach this:
1. Model hyperparameter tuning - you should understand which hyperparameters adjust for bias and/or variance
2. Model/Workflow adjustments - you should understand how different steps in your modeling workflow affect bias and variance

### Using Hyperparameter Tuning
This section is more of a revision of what you covered when learning different machine learning algorithms. For each of the following algorithms, identify which hyperparameters reduce bias (adding model complexity) or reduce variance (regularisation). 

Below is an activity with a handful selection of common machine learning algorithms. Can you identify the key hyperparameters for reducing bias or reduding variance? Open the Scikit Learn documentation for each algorithm, go through each hyperparameter and classify them in the table below:

| ML Algorithms | How to reduce bias (Add complexity) | How to reduce variance (Regularise) |
|---|---|---|
| K-Nearest Neighbours | **Reduce K** - the number of nearest neighbours, **Change weight** - give closer neighbours higher weights, Use **distance metric** that emphasises closer neighbours - larger `p` i.e. Euclidean or Minkowski | Increase K, Use uniform weight, Use a distance metric that deemphasises closer neighbours - smaller `p` i.e. Manhattan distance |
| Linear/Logistic Regression | Add polynomial features - higher `degree` equals higher variance/low bias, Other functional transformations - more complex ways to tranform data before regression | Add regularisation - increase regularisation parameter when doing Lasso (L1), Ridge (L2), or Elastic Net (L1-L2) regularisation |
| Support Vector Machines |  | |
| Decision Trees | | |
| Random Forest | | |
| Gradient Boosting |  |  |
| Neural Networks |  |  |

Note that it is essential before using any new machine learning algorithm that you understand how it works, what hyperparameters it has, and what these hyperparameters adjust (whether it is bias, variance, or other algorithmic adjustments e.g. learning rate and maximum iterations on gradient descent ML algorithms)

### Adjusting Models or Workflows
Outside the model hyperparameters, here are some different how other steps of your machine learning will affect bias/variance:
1. **Data:** Increasing the data is the one way that guarantees to reduce both bias and variance at the same time. More data means our model can generalise better (less variance) while also allowing the model to identify more complex patterns in the dataset (less bias)
2. **Features:** Increasing features adds model complexity (reducing bias) while reducing features reduces variance. There are numerous techniques for adjusting features ranging from simple methods such as intuitive feature selection, data collection for more features to more advanced methods such feature selection through feature importance scores, dimensionality reduction algorithms (Principal Component Analysis & Manifold Learning), to domain-anchored engineering of new features
3. **Models:** You can always try different models in a bid to identify a models that performs best with your dataset. It is always recommended to start with simple models and try more complex models as you seek to reduce bias, however, trial and error still works. Your model understanding should help you understand which models can identify more complex data patterns. Examples: Gradient Boosting models tend to be more complex than your simple Linear Regression or a Multilayered Perceptron id definitely more complex than a Perceptron or a Logistic Regression.
4. **Epochs**: In neural networks and gradient descent algorithms, the more epochs/iterations we run our model, the higher the variance as we saw when covering Loss curves above.
5. **Objective/Cost Functions**: It is possible to adjust the objective function of your model in some cases. Let's take an example, in regression fitting a model based on mean-squared errors penalises more for larger errors therefore yields higher variance, lower bias. If model was to be fit using absolute errors, there will less penalty on larger errors hence higher bias and lower variance. You could also opt for **Huber Loss (Hybrid of MSE and MAE)** to balance the bias-variance. In classification, you could opt for cross-entropy/log loss errors where misclassifications are penalised more heavily yielding higher variance and lower variance. Below are some additional things to note:
    - Not all models allow you to adjust the cost function but some do. For example, in Scikit Learn:
        - For linear models, `LinearRegression` is fixed to MSE but `SGDRegressor` allows you to choose the `loss` metric
        - In decision trees, you can choose the `criterion` as `gini` or `entropy`
        - Gradient boosting models also support different `loss` parameters
        - When studying models, it is important to understand if the objective function can be adjusted
    - Use can also use the `sample_weight` parameter on Scikit Learn supported by most models to indirectly adjust the loss function. In the example below, we have doubled the penalty on classification misclassifications for our Logistic regression
        ``` python
        from sklearn.linear_model import LogisticRegression
        clf = LogisticRegression()
        clf.fit(X_train, y_train, sample_weight=[2 if y==1 else 1 for y in y_train])
        ```
    - Outisde vanilla Scikit Learn, some libraries allow you to have more control over the cost function. These include: `scipy.optimize`, `XGBoost`, `LightGBM`, or even the `sklearn.base.BaseEstimator`

## Baseline Models (Incremantal Optimisation)

## Imbalanced Datasets (Classification Problems)