## Overfitting vs Underfitting

#### Bias 
- Assumptions made by a model to make a function easier to learn. It is actually the error rate of the training data. When the error rate has a high value, we call it High Bias and when the error rate has a low value, we call it low Bias.

#### Variance  
- The difference between the error rate of training data and testing data is called variance. If the difference is high then it’s called high variance and when the difference of errors is low then it’s called low variance. Usually, we want to make a low variance for generalized our model.

### Underfitting

- A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data, i.e., it only performs well on training data but performs poorly on testing data. (It’s just like trying to fit undersized pants!) 
- Underfitting destroys the accuracy of our machine learning model. 
- Its occurrence simply means that our model or the algorithm does not fit the data well enough. 
- It usually happens when we have fewer data to build an accurate model and also when we try to build a linear model with fewer non-linear data. 
- In such cases, the rules of the machine learning model are too easy and flexible to be applied to such minimal data and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing the features by feature selection. 

- In a nutshell, Underfitting refers to a model that can neither performs well on the training data nor generalize to new data. 

#### Reasons for Underfitting:

1. High bias and low variance.
2. The size of the training dataset used is not enough. 
3. The model is too simple. 
4. Training data is not cleaned and also contains noise in it.

![image.png](attachment:image.png)

#### Techniques to reduce underfitting: 

1. Increase model complexity
2. Increase the number of features, performing feature engineering
3. Remove noise from the data.
4. Increase the number of epochs or increase the duration of training to get better results.


### Overfitting
- A statistical model is said to be overfitted when the model does not make accurate predictions on testing data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. And when testing with test data results in High variance. 
- Then the model does not categorize the data correctly, because of too many details and noise. 
- The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. 
- A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees. 

- In a nutshell, Overfitting is a problem where the evaluation of machine learning algorithms on training data is different from unseen data.

#### Reasons for Overfitting are as follows:
1. High variance and low bias 
2. The model is too complex
3. The size of the training data

![image.png](attachment:image-2.png)

#### Techniques to reduce overfitting:

1. Increase training data.
2. Reduce model complexity.
3. Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
4. Ridge Regularization and Lasso Regularization
5. Use dropout for neural networks to tackle overfitting.


### Best fit

![image.png](attachment:image-3.png)

- low bias, low variance — is a good result, just right.

- low bias, high variance — overfitting — the algorithm outputs very different predictions for similar data.
 
- high bias, low variance — underfitting — the algorithm outputs similar predictions for similar data, but predictions are wrong (algorithm “miss”).
 
- high bias, high variance — very bad algorithm. You will most likely never see this.



All these cases can be placed on the same plot. It is a bit less clear than the previous one but more compact.

![image.png](attachment:image-4.png)

## Techniques to find overfitting and underfitting

- Underfitting means that your model makes accurate, but initially incorrect predictions. In this case, train error is large and val/test error is large too.

- Overfitting means that your model makes not accurate predictions. In this case, train error is very small and val/test error is large.

- When you find a good model, train error is small (but larger than in the case of overfitting), and val/test error is small too.

#### As we remember:

- underfitting occurs when our model is too simple for your data.
- overfitting occurs when our model is too complex for your data.

#### Based on this, simple intuition you should keep in mind is:

- to fix underfitting, we should complicate the model.
- to fix overfitting, we should simplify the model.


![image.png](attachment:image.png)



 ### Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression.

 1. Ridge Regression : 
   - In ridge regression, the cost function is altered by adding a penalty equivalent to square of the magnitude of the coefficients.
   - So ridge regression puts constraint on the coefficients (w). The penalty term (lambda) regularizes the coefficients such that if the coefficients take large values the optimization function is penalized. So, ridge regression shrinks the coefficients and it helps to reduce the model complexity and multi-collinearity.

2. Lasso Regression :
  - Lasso regression stands for Least Absolute Shrinkage and Selection Operator. 
  - It adds penalty term to the cost function. This term is the absolute sum of the coefficients. 
  - As the value of coefficients increases from 0 this term penalizes, cause model, to decrease the value of coefficients in order to reduce loss. 
  - The difference between ridge and lasso regression is that it tends to make coefficients to absolute zero as compared to Ridge which never sets the value of coefficient to absolute zero.

- __Cost function of Ridge and Lasso regression and importance of regularization term.__

