___

<a href='https://github.com/ai-vithink'> <img src='https://avatars1.githubusercontent.com/u/41588940?s=200&v=4' /></a>
___

# 16. Bias-Variance Trade-Off

In [1]:
from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a href="javascript:code_toggle_err()">here</a>.''')
# To hide warnings, which won't change the desired outcome.

In [4]:
%%HTML
<style type="text/css">
table.dataframe td, table.dataframe th {
    border: 3px  black solid !important;
  color: black !important;
}
# For having gridlines 

In [5]:
import warnings
warnings.filterwarnings("ignore")

## Reading Material

- Bias Variance Trade off is a fundamental topic of understanding your model's performance.

- Review Chapter 2 of ISLR for a more in depth look on this topic.

### What is Bias-Variance Trade-off ?

- The bias-variance trade-off is the point where we are adding just noise by adding model complexity.

- The training error goes down as it has to , but the test error starts to go up.

- Model after the bias trade-off begins to overfit.

- Imagine that the center of the target is a model that perfectly predicts the correct values. 

- As we move away from the bulls-eye, our predictions get worse and worse. 

![image.png](attachment:image.png)

- Imagine we can repeat our entire model building process to get a number of separate hits on the target. 
- Each hit represents an individual realization of our model, given the chance variability in the training data we gather. 
- Sometimes we will get a good distribution of training data so we predict very well and we are close to the bulls-eye, while sometimes our training data might be full of outliers or non-standard values resulting in poorer predictions.
- These different realizations result in a scatter of hits on the target.

## Bias Variance Trade-Off

- A common temptation for beginners is to continually add complexity to a model until it fits the training set very well.
- Given a set of red training data, in simple model with blue line, we might get some error on training data, so as beginners sometimes people make the model more complex by changing the parameters so it has almost no error for all those data points on training data. But what we should care about is that if we are predicting so accurately for all those training points then model will fail to predict for newer, never seen before test values, that is why we do train_test_split.
![image.png](attachment:image.png)


- Doing this can cause a model to overfit to your training data and cause large errors on new data, such as the test set.
- Let’s take a look at an example model on how we can see overfitting occur from a error standpoint using test data! 
- We’ll use a black curve with some “noise” points off of it to represent the True shape the data follows.

![image.png](attachment:image.png)

![image.png](attachment:image.png)
- Above are total 3 images. The first one is x vs y, here we have model flexibility as different linear fits, a linear fit (yellow), a quadratic fit (blue), a spiline fit (green), simplest is linear and most complicated is spiline fit.

- Black curve is the truth that model follows, so all the points are just noise around actual black curve. 

- To evaluate our models and compare their complexity with each other, we have to plot complexity or flexibility of the model, for instance polynomial level of a regression fit vs the error metric, such as mean squared error. We have done exactly this for second image and is plotted for train data vs test data.

- On seeing the second image we see that yellow on top shows that linear model has high error on train as well as test data, so linear by far is worst.

- Next we see that blue quadratic fit has error optimally somewhere around 1 MSE with test data having error slightly more than 1 and train data having MSE slightly less than one.

- For spiline green data we see that on train data it has MSE less than 0.5 but on test data it has MSE around 1.5 now this when compared to quadratic data has more difference between MSE on train and test data, so we can safely say that by far quadratic fit would be most optimal solution.

- We have to find a point which balances the bias and variance for train and test values. Where error is not extremely high or low so that new data can be predicted with sufficient accuracy and model is not overfitted.

- Third image tells us that the point where bias and variance intersect a little more ahead of that is the optimal point of test data where MSE is balanced, which is for quadratic model.

## Classic Representation of High and Low,  Bias&Variance
![image.png](attachment:image.png)

- On moving to left to a lower complexity model we have higher bias but lower variance, on moving to right to a higher complexity model we get lower bias but higher variance.

- We pick a point where we are sufficiently satisfied with bias-variance trade-off, left of this point model starts to underfit and to the right it starts to overfit, meaning we hit all the points of train data and model gives high error to test data values.