### Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?


#### <b><u>Underfitting</u></b>:

> A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data, i.e., it only performs well on training data but performs poorly on testing data.

* <b>Reasons for Underfitting</b>

    * High bias and low variance.
    * The size of the training dataset used is not enough.
    * The model is too simple.
    * Training data is not cleaned and also contains noise in it.


* <b>Techniques to Reduce Underfitting</b>

    * Increase model complexity.
    * Increase the number of features, performing feature engineering.
    * Remove noise from the data.
    * Increase the number of epochs or increase the duration of training to get better results.



* <b>Consequences of Underfitting</b>
    
> Underfitting destroys the accuracy of our machine-learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough.

<br>

#### <b><u>Overfitting</u></b>:

> When a model performs very well for training data but has poor performance with test data (new data), it is known as overfitting. In this case, the machine learning model learns the details and noise in the training data such that it negatively affects the performance of the model on test data. Overfitting can happen due to low bias and high variance.


* <b>Reasons for Overfitting</b>:

    * High variance and low bias.
    * The model is too complex.
    * The size of the training data.

* <b>Techniques to Reduce Overfitting</b>:

    * Increase training data.
    * Reduce model complexity.
    * Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
    * Ridge Regularization and Lasso Regularization.
    * Use dropout for neural networks to tackle overfitting.


* <b>Consequences of Underfitting</b>

> Then the model does not categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. 

* <b>Note</b>:

> Signal: It refers to the true underlying pattern of the data that helps the machine learning model to learn from the data.

> Noise: Noise is unnecessary and irrelevant data that reduces the performance of the model.

### Q2: How can we reduce overfitting? Explain in brief.

* <b>Techniques to Reduce Overfitting</b>:

    * Increase training data.
    * Reduce model complexity.
    * Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
    * Ridge Regularization and Lasso Regularization.
    * Use dropout for neural networks to tackle overfitting.



### Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

#### <b><u>Underfitting</u></b>:

> A statistical model or a machine learning algorithm is said to have underfitting when it cannot capture the underlying trend of the data, i.e., it only performs well on training data but performs poorly on testing data. (It’s just like trying to fit undersized pants!) Underfitting destroys the accuracy of our machine-learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with fewer non-linear data. In such cases, the rules of the machine learning model are too easy and flexible to be applied to such minimal data, and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing the features by feature selection. 

* <b>Reasons for Underfitting</b>

    * High bias and low variance.
    * The size of the training dataset used is not enough.
    * The model is too simple.
    * Training data is not cleaned and also contains noise in it.

### Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?

<br>

* <b>Bias-Variance Tradeoff</b>:
   * If the algorithm is too simple (hypothesis with linear equation) then it may be on high bias and low variance condition and thus is error prone. 
   * If algorithms fit too complex (hypothesis with high degree equation) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. 
   * Well, there is something between both conditions, known as a Trade-off or Bias Variance Trade-off. This tradeoff in complexity is why there is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time. For the graph, the perfect tradeoff will be like this.

<br>

![image.png](attachment:5895b34c-5038-4fae-84d2-10b8715a9dd9.png)

<br>

* <b>Relationship between Bias and Variance</b>:
  * High Bias, Low Variance: A model with high bias and low variance is said to be underfitting.
  * High Variance, Low Bias: A model with high variance and low bias is said to be overfitting.
  * High-Bias, High-Variance: A model has both high bias and high variance, which means that the model is not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on average.
  * Low Bias, Low Variance: A model that has low bias and low variance means that the model is able to capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data (low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.




### Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?


* <b>Method: Learning Curve</b>:
Learning curves plot the training and validation loss of a sample of training examples by incrementally adding new training examples. Learning curves help us in identifying whether adding additional training examples would improve the validation score (score on unseen data). If a model is overfit, then adding additional training examples might improve the model performance on unseen data. Similarly, if a model is underfit, then adding training examples doesn’t help. ‘learning_curve’ method can be imported from Scikit-Learn’s ‘model_selection’ module.

<br>

* <b>Typical features of the learning curve of an overfit model</b>:

  1. Training loss and Validation loss are far away from each other.
  2. Gradually decreasing validation loss (without flattening) upon adding training examples.
  3. Very low training loss that’s very slightly increasing upon adding training examples.

<br>

* <b>Typical features of the learning curve of an underfit model</b>:

  1. Increasing training loss upon adding training examples.
  2. Training loss and validation loss are close to each other at the end.
  3. Sudden dip in the training loss and validation loss at the end (not always).


### Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?


* <b>High Bias, Low Variance</b>: A model with high bias and low variance is said to be underfitting.
* <b>High Variance, Low Bias</b>: A model with high variance and low bias is said to be overfitting.
* <b>High-Bias, High-Variance</b>: A model has both high bias and high variance, which means that the model is not able to capture the underlying patterns in the data (high bias) and is also too sensitive to changes in the training data (high variance). As a result, the model will produce inconsistent and inaccurate predictions on average.
* <b>Low Bias, Low Variance</b>: A model that has low bias and low variance means that the model is able to capture the underlying patterns in the data (low bias) and is not too sensitive to changes in the training data (low variance). This is the ideal scenario for a machine learning model, as it is able to generalize well to new, unseen data and produce consistent and accurate predictions. But in practice, it’s not possible.

<br>

#### <b>High Bias Model</b>: 

In this model, more assumptions are taken to build the target function. In this case, the model will not match the training dataset closely. The high-bias model will not be able to capture the dataset trend. It is considered as the underfitting model which has a high error rate. It is due to a very simplified algorithm.

For example, a linear regression model may have a high bias if the data has a non-linear relationship.

<br>

#### <b>High variance Model</b>: 

The model is very sensitive to changes in the training data and can result in significant changes in the estimate of the target function when trained on different subsets of data from the same distribution. This is the case of overfitting when the model performs well on the training data but poorly on new, unseen test data. It fits the training data too closely that it fails on the new training dataset.

High-variance models include those that strongly rely on individual data points to define their parameters such as classification or regression trees, nearest neighbor models, and neural networks. 


### Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.


<b>What Is Regularization?</b>
> Regularization means restricting a model to avoid overfitting by shrinking the coefficient estimates to zero. 


<b>How can it be used to prevent overfitting?</b>
>When a model suffers from overfitting, we should control the model's complexity. Technically, regularization avoids overfitting by adding a penalty to the model's loss function:

>Regularization = Loss Function + Penalty



<b>Regularization Technique</b>:

There are three commonly used regularization techniques to control the complexity of machine learning models, as follows:

1. L2 regularization
2. L1 regularization
3. Elastic Net


#### <b><u>L2 Regularization</u></b>

A linear regression that uses the L2 regularization technique is called ridge regression. In other words, in ridge regression, a regularization term is added to the cost function of the linear regression, which keeps the magnitude of the model’s weights (coefficients) as small as possible. The L2 regularization technique tries to keep the model’s weights close to zero, but not zero, which means each feature should have a low impact on the output while the model's accuracy should be as high as possible.

$\text{Ridge Regression Cost Function} = \text{Loss Function} + \dfrac{1}{2} \lambda\sum_{j=1}^m w_j^2 $


Where $\lambda$ controls the strength of regularization, and $w_j$ are the model's weights (coefficients). By increasing $\lambda$, the model becomes flattered and underfit. On the other hand, by decreasing $\lambda$, the model becomes more overfit, and with $\lambda$ = 0, the regularization term will be eliminated.


#### <b><u>L1 Regularization</u></b>

Least Absolute Shrinkage and Selection Operator (lasso) regression is an alternative to ridge for regularizing linear regression. Lasso regression also adds a penalty term to the cost function, but slightly different, called L1 regularization. L1 regularization makes some coefficients zero, meaning the model will ignore those features. Ignoring the least important features helps emphasize the model's essential features.

$\text{Lasso Regression Cost Function} = \text{Loss Function} + \lambda \sum_{j=1}^m |w_j|$

Where $\lambda$ controls the strength of regularization, and $w_j$ are the model's weights (coefficients). Lasso regression automatically performs feature selection by eliminating the least important features.


#### <b><u>Elastic Net</u></b>

The Elastic Net is a regularized regression technique combining ridge and lasso's regularization terms. The parameter controls the combination ratio. When  $r=1$, the L2 term will be eliminated, and when $r=0$, the L1 term will be removed.

$\text{Elastic Net Cost Function} = \text{Loss Function} + r \lambda \sum_{j=1}^m |wj|+ \dfrac{(1-r)}{2} \lambda\sum{j=1}^m w_j^2$

Although combining the penalties of lasso and ridge usually works better than only using one of the regularization techniques, adjusting two parameters,
and , is a little tricky.