# Bias-Variance Tradeoff

If our model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.

This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time.

### What is Bias?

The bias is known as the difference between the prediction of the values by the Machine Learning model and the correct value. Being high in biasing gives a large error in training as well as testing data. It recommended that an algorithm should always be low-biased to avoid the problem of **underfitting**. By high bias, the data predicted is in a straight line format, thus not fitting accurately in the data in the data set. Such fitting is known as the Underfitting of Data. This happens when the hypothesis is too simple or linear in nature. Refer to the graph given below for an example of such a situation.

![image.png](attachment:fd7e0724-df93-4ec4-a129-a9cfaab6aade.png)

### What is Variance?

The variability of model prediction for a given data point which tells us the spread of our data is called the variance of the model. The model with high variance has a very complex fit to the training data and thus is not able to fit accurately on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data. When a model is high on variance, it is then said to as **Overfitting of Data**. Overfitting is fitting the training set accurately via complex curve and high order hypothesis but is not the solution as the error with unseen data is high. While training a data model variance should be kept low. The high variance data looks as follows.

![image.png](attachment:df813d79-db50-4c2e-9d37-11b13e578bdc.png)

### Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex (hypothesis with high degree equation) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is something between both of these conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff will be like this.

![image.png](attachment:821653af-0748-47ea-93a8-286b7f1e5bc9.png)

![image.png](attachment:d75a8a44-837e-4ad5-bf29-658ce8abd222.png)

We try to optimize the total error for the model by using the Bias-Variance Tradeoff.
\begin{equation}
    \text{Total\_Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible Error}
\end{equation}

The best fit will be given by the hypothesis on the tradeoff point. The error to complexity graph to show trade-off is given as

![image.png](attachment:de5e8ae8-19ee-4374-81f0-e7ffb7e96418.png)

This is referred to as the best point chosen for the training of the algorithm which gives low error in training as well as testing data.

<br>

## Handling Underfitting and OverFitting

### Handling Underfitting
To decrease underfitting increase Algorithm Complexity. Or make a little more complex hypothesis.

### Handling Overfitting

#### Regularization
Bias-Variance Tradeoff can be done in linear regression model using a technique known as regularization that improves the generalization of the logistic regression model. Two commonly used regularization methods are Ridge regression and Lasso regression.

Model Complexity * $\lambda$

1. Ridge Regression (L2 Regularization)
 - Reduce model complexity with assiging weights to features
2. Lasso Regression (L1 Regularization)
- Lasso Regression serves the same purpose as done by Ridge Regression. In addition it has an advantage of sparsity. Lasso Regression makes all the non-important features zero. In case of Ridge Regression, non-important become low but not necessarily zero.

#### Ridge Regression 
As we recalled from OLS, the objective of OLS is to find a column matrix or a column vector, , such that Sum
of Squared Errors, is minimum. Likewise, the objective of Ridge Regression is to find a column matrix or
column vector, , that minimized the sum of Residual Sum of Squares ( ) and penalty equivalent to sum of
squared coefficie.

It increase the weight of Important Features and decrease the weight of less important features.nts