# Machine Learning 2

Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how
can they be mitigated?

Ans. Overfitting and underfitting in machine learning are two types of scenarios which can occur during training of a machine learning model:

- **Overfitting :** Overfitting is a condition when the model has a very high accuracy on the training dataset but performs poorly on the test data. The model has a low bias and high variance. This happens when we train on too much data and model starts learning from noise.
    - Consequences: Increased complexity of model, sensitivity to noise, fails to predict correctly
    - Mitigation of overfitting: using more data, simpler models, cross-validation, feature selection we can reduce chances of overfitting.


- **Underfitting :** Underfitting is a condition when the model has a low accuracy on the training dataset as well as on the test data. The model has a high bias and high variance. This happens when our model is too simple and fails to capture important patterns in our data.
    - Consequence: low accuracy of model, limited insights, poor flexibility of model
    - Mitigation of underfitting: using more data, little bit more complex models, cross-validation, feature engineering and hyperparameter tuning we can reduce chances of underfitting.


Q2: How can we reduce overfitting? Explain in brief.

Ans. We can reduce overfitting in following ways:

- More Data: Increasing the size of the training dataset can help the model capture the true underlying patterns rather than memorizing noise.
- Simpler Models: Use simpler model architectures with fewer parameters, reducing the risk of fitting noise.
- Feature Selection: Choose relevant features and eliminate irrelevant or redundant ones to improve the model's ability to generalize.
- Regularization: Add regularization techniques (such as L1 or L2 regularization) to penalize large parameter values and prevent the model from fitting noise.
- Cross-Validation: Use techniques like k-fold cross-validation to assess model performance on multiple subsets of the data and avoid over-optimistic estimates.
- Early Stopping: Monitor the model's performance on a validation set and stop training when performance starts to degrade.
- Ensemble Methods: Combine predictions from multiple models to reduce overfitting's impact

Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

Ans. Underfitting occurs when a model is too simple to capture the underlying patterns in the training data. As a result, it performs poorly not only on the training data but also on new, unseen data, since it fails to capture the complexity of the relationships. the model has a high bias and a high variance.

Here are some scenarios where underfitting can occur:

1. **Insufficient Model Complexity:** When using a simple model architecture with too few parameters, it might struggle to represent complex relationships in the data.

2. **Limited Feature Space:** If the feature set used for training the model lacks important variables or doesn't capture the relevant aspects of the data, the model might not have enough information to make accurate predictions.

4. **High Bias Algorithms:** Algorithms like linear regression are naturally prone to underfitting if the relationships in the data are nonlinear.

5. **Sparse Data:** In cases where the data is sparse and there are many missing values, the model might not have enough examples to learn from.

6. **Low-Quality Data:** If the training data is noisy, contains errors, or is poorly labeled, the model might struggle to learn meaningful patterns.

7. **Insufficient Training Time:** Sometimes, a model might require more training iterations to learn the patterns in the data. Stopping training too early can lead to underfitting.

10. **Ignoring Domain Knowledge:** If domain-specific knowledge or expertise isn't incorporated into the model design, important insights might be missed.

11. **Imbalanced Classes:** In classification tasks with imbalanced class distributions, the model might struggle to learn the minority class due to insufficient examples.

12. **Ignoring Temporal Dynamics:** When dealing with time-series data, neglecting the temporal dependencies and trends can lead to underfitting.

13. **Under-Sampling or Over-Sampling:** In cases of imbalanced datasets, under-sampling the majority class or over-sampling the minority class without careful consideration can lead to underfitting.

14. **Ignoring Nonlinearity:** Using a linear model for data that exhibits strong nonlinear relationships can result in underfitting.

15. **Small Training Dataset:** If the available training data is very limited, the model might not have enough examples to learn from, leading to underfitting.


Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and
variance, and how do they affect model performance?

Ans. The bias-variance tradeoff is a fundamental concept in machine learning that describes the relationship between two sources of error that affect a model's performance: bias and variance.

**Bias:**
Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. A model with high bias makes strong assumptions about the underlying relationships in the data and may not capture the true complexities. In other words, a biased model is systematically off the mark, consistently underestimating or overestimating the true values.

**Variance:**
Variance, on the other hand, refers to the model's sensitivity to small fluctuations or noise in the training data. A model with high variance is overly flexible and captures random noise in the training data. As a result, it can fit the training data very well, but it fails to generalize to new, unseen data because it's too tailored to the noise.

**Relationship Between Bias and Variance:**

**Low Bias, High Variance:** A model with low bias and high variance fits the training data very closely, including the noise. However, it's likely to perform poorly on new data because it's overly sensitive to small fluctuations. This is case of Overfitting.

**High Bias, Low Variance:** A model with high bias and low variance makes strong assumptions and simplifications, which may lead to underfitting. It doesn't capture the underlying patterns well and performs poorly both on the training data and new data.

**Balanced Bias and Variance:** The goal is to strike a balance between bias and variance. A model with moderate complexity aims to capture the essential patterns in the data while not fitting the noise. Such a model is more likely to generalize well to new data.

**Impact on Model Performance:**

**Bias's Impact:** Models with high bias tend to perform poorly on both the training data and new data. They overlook important relationships in the data and can lead to systematic errors.

**Variance's Impact:** Models with high variance perform very well on the training data but fail to generalize. They are sensitive to the specific training examples and can lead to erratic predictions on unseen data.


Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models.
How can you determine whether your model is overfitting or underfitting?

Ans.

Detecting Overfitting:

- Validation Set Performance: Split your data into training and validation sets. If the model's performance is significantly better on the training set compared to the validation set, it might be overfitting.
- Learning Curves: Plot the model's training and validation performance as a function of the number of training examples. Overfitting is often indicated by a large gap between the training and validation performance curves.
- Cross-Validation: Use techniques like k-fold cross-validation to assess model performance on multiple subsets of the data. If the model performs much better on the training folds compared to the validation folds, overfitting might be occurring.

Detecting Underfitting:

- Validation Set Performance: If both the training and validation performance are poor, the model might be underfitting. The model is not capturing the underlying patterns in the data.
- Learning Curves: In the case of underfitting, both the training and validation performance curves might plateau at a low level of accuracy or error, indicating that the model is not learning effectively.
- Model Complexity: If you've chosen a very simple model architecture and the data exhibits complex patterns, it could result in underfitting.
- Visualization: Visualizing the model's predictions and comparing them to the actual data can help you see whether the model captures the relationships accurately.


![sensors-21-08083-g003.jpeg](attachment:27054d9e-88f5-4baf-ad1f-034012f428cf.jpeg)
(source:https://pub.mdpi-res.com/sensors/sensors-21-08083/article_deploy/html/images/sensors-21-08083-g003.png?1638501762)

Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias
and high variance models, and how do they differ in terms of their performance?

Ans. 
**Bias:**
Bias refers to the error introduced by approximating a real-world problem, which may be complex, by a simplified model. A model with high bias makes strong assumptions about the underlying relationships in the data and may not capture the true complexities. In other words, a biased model is systematically off the mark, consistently underestimating or overestimating the true values.

**Variance:**
Variance, on the other hand, refers to the model's sensitivity to small fluctuations or noise in the training data. A model with high variance is overly flexible and captures random noise in the training data. As a result, it can fit the training data very well, but it fails to generalize to new, unseen data because it's too tailored to the noise.

**Relationship Between Bias and Variance:**

- Low Bias, High Variance: A model with low bias and high variance fits the training data very closely, including the noise. However, it's likely to perform poorly on new data because it's overly sensitive to small fluctuations. This is case of Overfitting.
- High Bias, Low Variance: A model with high bias and low variance makes strong assumptions and simplifications, which may lead to underfitting. It doesn't capture the underlying patterns well and performs poorly both on the training data and new data.
- Balanced Bias and Variance: The goal is to strike a balance between bias and variance. A model with moderate complexity aims to capture the essential patterns in the data while not fitting the noise. Such a model is more likely to generalize well to new data.

**High Bias Model (Underfitting):**
A high bias model is one that is too simplistic to capture the underlying patterns in the data. It makes strong assumptions about the relationships in the data, leading to poor performance both on the training data and on new, unseen data. Examples of high bias models include:

1. **Linear Regression on Nonlinear Data:** Using linear regression to model data with complex nonlinear relationships will likely result in underfitting. The model's assumptions of linear relationships won't capture the true data distribution.
2. **Low-Degree Polynomial Regression on Complex Data:** If you fit a low-degree polynomial (e.g., degree 1 or 2) to data that requires a higher-degree polynomial to accurately represent the relationships, the model will underperform.
3. **Very Shallow Neural Network:** Using a neural network with a single hidden layer or very few neurons in the hidden layers to model a complex problem can result in underfitting. The network won't have enough capacity to learn intricate patterns.

**High Variance Model (Overfitting):**
A high variance model is overly complex and captures noise or random fluctuations in the training data. It fits the training data very closely but fails to generalize to new data due to its sensitivity to noise. Examples of high variance models include:

1. **High-Degree Polynomial Regression on Small Data:** Fitting a high-degree polynomial to a small dataset can lead to overfitting. The model will fit the training data extremely well but will fail to generalize.
2. **Deep Neural Network on Small Data:** Training a deep neural network with many layers and parameters on a small dataset can lead to overfitting. The model may memorize the training examples and not generalize to new data.
3. **Decision Trees with No Pruning:** Unpruned decision trees can grow very deep, capturing noise in the data. This can lead to overfitting, where the tree creates too many splits that aren't indicative of the true data relationships.

**Performance Differences:**
- **High Bias Model Performance:** A high bias model will have both training and validation errors that are high and similar. The model doesn't fit the training data well, and it also fails to generalize to new data.

- **High Variance Model Performance:** A high variance model will have a significant gap between the training error and the validation error. The training error will be low (since it fits the training data closely), but the validation error will be much higher (due to poor generalization).


Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe
some common regularization techniques and how they work.

Ans. Regularization is a set of techniques used in machine learning to prevent overfitting by adding a penalty to the model's loss function based on the complexity of the model. Regularization techniques encourage the model to have smaller parameter values, effectively reducing its capacity to fit noise in the training data and promoting better generalization to new, unseen data.

Suppose we have :
$$ h_{\theta}(x) = \theta_{0} + \theta_{1}x + \theta_{2}x + ...$$

Suppose we use Mean Squared Error as our cost function:
$$ J(\theta) = \frac{1}{n} \sum\limits_{i=1}^{n}\left(y_i - h_{\theta}(x_i)\right)^2 $$

Common regularization techniques and the modified cost function is as follows:

- **L1 Regularization (Lasso):** L1 regularization adds a penalty term proportional to the absolute values of the model's parameters to the cost function. This can help in feature selection and reducing model complexity.
    
    The regularized function is: 
    
    $$ J(\theta) = \frac{1}{n} \sum\limits_{i=1}^{n}\left(y_i - h_{\theta}(x_i)\right)^2 + \lambda\sum\limits_{i=1}^{n}|\theta_{i}| $$


- **L2 Regularization (Ridge):** L2 regularization adds a penalty term proportional to the squared values of the model's parameters to the loss function. It encourages the model to have smaller but non-zero weight values for all features, distributing the influence more evenly across features.

    The regularized function is: 
    
    $$ J(\theta) = \frac{1}{n} \sum\limits_{i=1}^{n}\left(y_i - h_{\theta}(x_i)\right)^2 + \lambda\sum\limits_{i=1}^{n}(\theta_{i}) ^2$$

- **Elastic Net Regularization:** Elastic Net combines both L1 and L2 regularization. It adds a combination of L1 and L2 penalty terms to the loss function. This can provide a balance between the feature selection capability of L1 regularization and the even distribution of weights from L2 regularization.

    The regularized function is:
    
    $$ J(\theta) = \frac{1}{n} \sum\limits_{i=1}^{n}\left(y_i - h_{\theta}(x_i)\right)^2 + \lambda_1\sum\limits_{i=1}^{n}|\theta_{i}| + \lambda_2\sum\limits_{i=1}^{n}(\theta_{i}) ^2$$
    
Here, $\lambda$ is called **Hyper-parameter**