# Bias Variance Trade-off for Model Evaluation

Review Chapter 2 of Introduction to Statistical Learning for more indepth look

Bias refers to the error introduced by approximatin a real world problem with a simplified model. A model with a high bias makes strong assumptions about the form of the underlying data and may consistently underpredict or overpredict the target variable across different samples of data. High bias models are often too simple and unable to capture the complexity of the true relationship between the features and the target variable. Examples of high bias models include linear regression with few features or low polynomial degree. 
- example of bias: Underfitting - this occurs when the model is too simple to capture the underlying structure of the data, failing to learn the patterns present in the training data and performs poorly on both training and unseen data
    - we see this commonly in linear regression underfitting data for non-linear relationship between features and target variable
- example 2: Inadequate complexity - sometimes the model may not have enough capacity to represent the complexity of the data, resulting in failing to capture important patterns, leading to biased predictions
- example 3: Assumptions - models often make assumptions about the data distribution or the relationship between variables if these assumptions are incorrect, the model's predictions will be biased

Bias-Variance tradeoff is the point where we are adding just noise by adding model complexity (flexibility)
- the training error goes down as it has to, but the test error is starting to go up
- the model after the bias trade-off begins to overfit

## What is Bias? 

Imagine a dart board, a center the target is model that predicts the perfect values, as we move away from the bullseye, our predictions will be worse and worse.


Bias is accuracy (is the crosshair dead center on the target?) 

Variance is spread (how much one bullet or hit deviates from the others)

Ideally, we want low bias, low variance but usually there will be a tradeoff

![44444.PNG](attachment:44444.PNG)



Imagine we can repeat our entire model building process to get a number of separate hits on the target. 
- Each hit represents an individual realization of our model, given the chance variability in the training data we gather.
- Sometimes, we will get a good distribution of training data so we predict very well and we are close to the bulls-eye, while sometimes our training data might be full of outliers or non-standard values resulting in poorer predictions
- These different realizations result in a scatter of hits on the target

IT'S POSSIBLE (in some cases) FOR HIGH BIAS BUT VERY ACCURATE
- high bias means the model makes strong assumptions about the data, however, if the true underlying relationship between features and the target variable is SO simple, and matches the assumptions of the model, then the model can still be accurate

![5152151.PNG](attachment:5152151.PNG)

In this scenario, the red dots are training data (original data to train the models), and we might shape the model to be perfectly hitting all the red dots, your model is going to fail to predict for new test points (which is why we have train-test-split)
- shaping the model perfectly around the red dots will cause the model to overfit and cause large errors on new unseen data (test data)

To combat this, we use a black curve with "noise" points to represent the True shape the data follows

![4242424.PNG](attachment:4242424.PNG)

In the plot on the left: 
- we have linear, quadratic, and spline ".fit" curves. The black curve is the truth that the model actually follows so all the points are just noise around the black curve (the Truth).

In the middle visual: 
- in order to evaluate your models and compare the complexities to each other, you'll need to plot out the complexity/flexibility of the model (for example: the polynomial level of a regression fit vs. the error metric such as "MSE")
- the training data is plotted vs. the test data:
    - for the linear model (yellow), we have a high error on both the test and training data (MSE is above 1.5 for the training data, and above 2 for the test data)
    - for the quadratic model (blue), we begin to lower the error for the test data and the training data
    - for the spline model (green), as we get more complex, we lower the error for training data significantly, but the test data error SPIKES up
- Conclusion: we want to balance the bias and variance to the point where your test data and your training data meet together or gets close to that

![455225252525.PNG](attachment:455225252525.PNG)

- the x-axis is explaining model complexity from low to high
- the y-axis is explaining prediction error from low to high
- the more we overfit, the higher the prediction error
- the more we underfit, the higher both prediction and training error


The bias trade-off happens at the inflection point of the red curve (dip in the test data curve). On the left is considered underfitting, on the right is considered overfitting. 
- Problem Complexity: More complex problems may require models with higher variance to capture intricate patterns in the data. Conversely, simpler problems may be adequately addressed by models with lower variance and bias.
- Data Availability: The amount and quality of data available for training can influence the bias-variance trade-off. With limited data, simpler models with higher bias may be preferred to avoid overfitting.
- Model Interpretability: In some cases, interpretability is crucial, and simpler models with higher bias may be preferred even if they sacrifice some predictive performance.
- Computational Resources: More complex models with higher variance may require more computational resources for training and inference. Consideration should be given to the computational constraints of the problem.
- Risk Tolerance: Depending on the application, there may be different tolerances for errors. For example, in medical diagnosis, false positives and false negatives may have different consequences.

Conclusion: Finding the right balance often involves experimentation, validation, and iteration. Techniques such as cross-validation, regularization, ensemble methods, and hyperparameter tuning can help in achieving a suitable balance between bias and variance for a given problem. It's essential to continually evaluate and refine the model to ensure that it generalizes well to unseen data and meets the requirements of the problem at hand.

# Underfitting:

- Model Complexity: If the model is too simple to capture the underlying patterns in the data, it may underfit.

- Insufficient Data: If the amount of data available for training is limited or does not adequately represent the underlying distribution of the problem, the model may not generalize well to unseen data. This can lead to underfitting.

- Feature Selection: If important features are not included in the model or irrelevant features are included, the model's performance may suffer, leading to bias.

# Overfitting:

- Data Noise: If the training data contains a lot of noise or errors, the model may learn from these inconsistencies, leading to overfitting. Conversely, if the noise is not properly accounted for, it may cause the model to underfit.

- Model Architecture: In the case of neural networks, the choice of architecture, including the number of layers, nodes per layer, and activation functions, can significantly impact the model's ability to capture complex patterns. If the architecture is not appropriately designed for the problem at hand, it may result in underfitting or overfitting.

- Hyperparameter Tuning: Improper tuning of hyperparameters such as learning rate, batch size, or number of epochs can lead to suboptimal model performance, resulting in underfitting or overfitting.

# Assumptions (can contribute to both underfitting and overfitting):

- Model Complexity and Assumptions: Models often make assumptions about the data distribution or the relationships between variables. If these assumptions are incorrect, the model's predictions will be biased, potentially leading to underfitting or overfitting depending on the nature of the mismatch between the assumptions and the true underlying relationships.