# Assignment (16th March) : Introduction to Machine Learning - 2

### Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?

**ANS:** 

**`1. Overfitting:`** A model learns the training data too well, including noise and outliers, resulting in poor generalization to new data.

- **Consequences:** High accuracy on training data, low accuracy on test data.

- **Mitigation:** Use techniques like cross-validation, regularization, pruning, or more training data.


**`2. Underfitting:`** A model is too simple to capture the underlying patterns in the data.

- **Consequences:** Poor performance on both training and test data.

- **Mitigation:** Use more complex models, feature engineering, or reduce regularization.

### Q2: How can we reduce overfitting? Explain in brief.

**ANS:** We can reduce overfitting by:

1. **Cross-validation:** Splitting data into training and validation sets to ensure the model generalizes well.
2. **Regularization:** Adding penalties for larger coefficients (e.g., L1 or L2 regularization).
3. **Pruning:** Simplifying decision trees by removing less important nodes.
4. **More training data:** Increasing the size of the training dataset.
5. **Dropout:** Randomly dropping neurons during training in neural networks.
6. **Early stopping:** Halting training when performance on a validation set starts to deteriorate.
7. **Data augmentation:** Increasing the diversity of the training data using techniques like rotation, scaling, and flipping in image data.

### Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

**ANS:*** `Underfitting` occurs when a machine learning model is too simple to capture the underlying patterns in the data, resulting in poor performance on both the training and test datasets.

**Scenarios where underfitting can occur:**
1. **Using a linear model for non-linear data:** Applying linear regression to a dataset with complex, non-linear relationships.
2. **Insufficient training time:** Stopping the training process too early.
3. **High regularization:** Over-penalizing the model's complexity, restricting it from learning the data's structure.
4. **Too few features:** Not including enough relevant features in the training data.
5. **Incorrect model selection:** Choosing a model that is too simplistic for the problem at hand (e.g., using k-NN with k too high).

### Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?

**ANS:** **`Bias-Variance Tradeoff:`**

- **Bias:** The error introduced by approximating a real-world problem, which may be complex, by a simplified model. High bias leads to underfitting.
  
- **Variance:** The error introduced by the model's sensitivity to small fluctuations in the training set. High variance leads to overfitting.

**`Relationship:`**

- **High Bias, Low Variance:** The model is too simple, doesn't capture the complexity of the data, and performs poorly on both training and test data.
- **Low Bias, High Variance:** The model is too complex, captures noise in the training data, and performs well on training data but poorly on test data.
- **Optimal Model:** A balance between bias and variance, achieving good performance on both training and test data.

**`Effect on Model Performance:`**

- **High Bias:** Leads to underfitting, where the model misses important trends (high error on training and test data).
- **High Variance:** Leads to overfitting, where the model captures noise along with the trend (low error on training data, high error on test data).
- **Balanced Bias and Variance:** The model generalizes well to new data, minimizing overall error.

### Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?

**ANS:** **`Common Methods for Detecting Overfitting and Underfitting:`**

1. **Training vs. Validation Performance:**
   - **Overfitting:** High accuracy on training data, low accuracy on validation/test data.
   - **Underfitting:** Low accuracy on both training and validation/test data.

2. **Learning Curves:**
   - Plot training and validation errors over epochs.
   - **Overfitting:** Training error decreases, but validation error increases after a point.
   - **Underfitting:** Both training and validation errors are high and do not decrease significantly.

3. **Cross-Validation:**
   - Use k-fold cross-validation to assess model performance.
   - **Overfitting:** Large differences in performance across different folds.
   - **Underfitting:** Consistently poor performance across all folds.

4. **Residual Plots:**
   - Analyze residuals (differences between predicted and actual values).
   - **Overfitting:** Residuals show patterns, indicating the model is capturing noise.
   - **Underfitting:** Residuals are large and do not show any particular pattern.

5. **Complexity Curves:**
   - Plot model complexity (e.g., number of features, depth of tree) vs. performance.
   - **Overfitting:** Performance improves on training data but worsens on validation data as complexity increases.
   - **Underfitting:** Performance is poor on both training and validation data regardless of complexity.

**`Determining Overfitting vs. Underfitting:`**

- **Overfitting:** If the model performs well on training data but poorly on validation/test data.
- **Underfitting:** If the model performs poorly on both training and validation/test data.

### Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?

**ANS:** **`Bias:`**

- **Definition:** Bias is the error introduced by approximating a complex problem with a simplified model.
- **Characteristics:** 
  - High bias models are too simple.
  - They fail to capture the underlying patterns in the data.
  - Result in systematic errors.

**`Variance:`**

- **Definition:** Variance is the error introduced by the model's sensitivity to small fluctuations in the training data.
- **Characteristics:** 
  - High variance models are too complex.
  - They capture noise along with the underlying patterns.
  - Result in high variability in model predictions.

**`Examples:`**

- **High Bias Models:** 
  - **Linear Regression on Non-Linear Data:** A linear regression model applied to data with a non-linear relationship will underfit.
  - **Simple Decision Tree (Shallow Tree):** A decision tree with very few levels may not capture the complexity of the data.

- **High Variance Models:** 
  - **Overly Complex Decision Tree:** A decision tree with many levels will overfit the training data.
  - **k-Nearest Neighbors (k-NN) with k=1:** A k-NN model with k=1 will be too sensitive to noise in the training data.

**`Performance Differences:`**

- **High Bias Models:** 
  - **Training Performance:** Poor (high error).
  - **Validation/Test Performance:** Poor (high error).
  - **Example:** Linear regression on complex data shows consistently high error.

- **High Variance Models:** 
  - **Training Performance:** Good (low error).
  - **Validation/Test Performance:** Poor (high error).
  - **Example:** Complex decision tree performs well on training data but poorly on new data due to overfitting.


### Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.

**ANS:** `Regularization` is a technique used to prevent overfitting by adding a penalty term to the model's loss function. This penalty discourages the model from becoming too complex, thus improving its ability to generalize to new data.

**`Common Regularization Techniques:`**

1. **L1 Regularization (Lasso):**
   - **Description:** Adds the absolute value of the coefficients as a penalty term to the loss function.
   - **Mathematical Form:** \( \text{Loss} + \lambda \sum |w_i| \)
   - **Effect:** Encourages sparsity in the model (many coefficients become zero), effectively performing feature selection.

2. **L2 Regularization (Ridge):**
   - **Description:** Adds the square of the coefficients as a penalty term to the loss function.
   - **Mathematical Form:** \( \text{Loss} + \lambda \sum w_i^2 \)
   - **Effect:** Shrinks the coefficients towards zero but rarely makes them exactly zero, helping to reduce the model complexity.

3. **Dropout (for Neural Networks):**
   - **Description:** Randomly drops a fraction of the neurons during training.
   - **Effect:** Prevents neurons from co-adapting too much, encouraging the network to generalize better.

4. **Early Stopping:**
   - **Description:** Monitors the model's performance on a validation set and stops training when performance starts to degrade.
   - **Effect:** Prevents the model from overfitting by halting training at the optimal point.

**`How Regularization Works:`**

Regularization techniques add a complexity penalty to the loss function, which the model tries to minimize along with the prediction error. This penalty discourages the model from fitting the training data too closely by either shrinking the coefficients (L1 and L2 regularization) or making the model less reliant on any single feature or neuron (Dropout). By doing so, regularization helps the model maintain a balance between bias and variance, improving its performance on unseen data.