# ML Tutorial Day 19

## Bias vs Variance

The concept of bias and variance arise from the machine learning model either underfitting or overfitting the data.

![image.png](attachment:image.png)

### High Variance
We split the data into training and testing datasets randomly, and if we observe that our model has zero error on the training sets for two different and random training sets while having extremely different errors for corresponding sets of testing sets, then we say that the model has high variance. Here our model overfits the available training datasets while performing poorly on the testing datasets.

![image-2.png](attachment:image-2.png)

### High Bias
Now if we use an overly simplistic model that doesn't capture the data properly and we observe errors in training data, then we say that the model has high bias. Here even for two different sets of training and testing datasets, our model shows comparable training and testing errors.

![image-3.png](attachment:image-3.png)

Bottomline is, that when the model has high variance, then it performs extremely well on the training set (overfitting) while performing extremely poorly on the testing dataset. When talking about variance, we focus on the testing performance of the model.
And when the model has high bias, then it performs poorly on both the training set and the testing set (underfitting). When talking about bias, we focus on the training performance of the model.

### Ideal Case: Low Variance and Low Bias
The ideal case is to have a model that sufficiently fits the data while also performing sufficiently good on the testing data. The model has low variance as the testing error doesn't vary greatly when a different random set of datapoints is used to test it. And the model has low bias because the error on the training dataset is also small.

![image-4.png](attachment:image-4.png)

### Variance
Definition: The error from a model's high sensitivity to fluctuations and noise in the training data. 

Result: Overfitting

Characteristics:
1. A model with high variance is too complex and sensitive to the training data. 
2. It fits the training data too closely, including the random noise. 
3. It performs very well on the training data but poorly on new, unseen data because it cannot generalize. 

Example: A very deep decision tree that "memorizes" the training data.

### Bias
Definition: The error from a model making overly simplistic assumptions, failing to capture the true relationship in the data. 

Result: Underfitting.

Characteristics:
1. A model with high bias is too simple. 
2. It makes systematic errors because it makes strong assumptions. 
3. It performs poorly on both training and test data. 

Example: Using a straight line to predict a non-linear relationship. 

## Bull's eye diagram for bias and variance.
The following diagram explains the concept of bias and variance succinctly.
The central circle in each board is the truth value and the diamonds are the predictions.

![image.png](attachment:image.png)

1. Low Bias-Low Variance: All the predictions are close to the truth (low bias) and they are clustered together (low variance)
2. Low Bias-High Variance: All the predictions are close to the truth (low bias) but they are more spread out around the truth (high variance)
3. High Bias-Low Variance: All the predictions are far from the truth (high bias) but they are clustered together (low variance)
4. High Bias-High Variance: All the predictions are far from the truth (high bias) and they are more spread out (high variance)

## How to get a balanced fit model?
We can use several techniques to make sure that the model we are using has low bias and low variance.
1. Cross Validation (K-Fold): We divide the entire dataset into `n` smaller datasets and then continuously train the model on `n-1` datasets while testing it on `1` dataset. We continue doing this and record the score on each iteration. We evaluate the final score of the model as the average of all the `n` scores. This ensures that our model is more generalizable (low variance).
2. Regularization (L1 and L2): We introduce a penalty term in the error to deter the model from using bigger values of the independent feature, thus, making the model simpler (low bias).
3. Dimensionality Reduction (PCA): We create new features using the principal components of our dataset to capture only those features that represent the maximum information present in the dataset, thus, reducing the dimensions of the dataset and making the model simpler(low bias).
4. Ensemble Techniques (Bagging and Boosting): In ensemble learning, we train multiple models on smaller, random subsets of our dataset and when we make predictions, we somehow combine the result given by all the models, to get the optimal response (low variance).