### Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?

**Overfitting** occurs when a machine learning model captures not only the underlying pattern in the training data but also the noise and outliers. This leads to high accuracy on the training data but poor generalization to new, unseen data.

**Consequences of overfitting:**
- Poor performance on validation/test data.
- High variance, where the model’s predictions vary significantly with small changes in the training data.

**Mitigation of overfitting:**
- Use more training data.
- Apply regularization techniques (e.g., L1, L2 regularization).
- Simplify the model by reducing the number of features or parameters.
- Use cross-validation to ensure the model generalizes well to unseen data.
- Employ techniques like dropout in neural networks to randomly omit units during training.

**Underfitting** occurs when a machine learning model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and test data.

**Consequences of underfitting:**
- High bias, where the model makes systematic errors and fails to capture the underlying trend of the data.
- Low accuracy on training data and even lower accuracy on test data.

**Mitigation of underfitting:**
- Increase the model complexity (e.g., adding more features or using more sophisticated algorithms).
- Reduce regularization if it is too strong.
- Increase the duration of training.

### Q2: How can we reduce overfitting? Explain in brief.

To reduce overfitting:
- **Cross-validation:** Use techniques like k-fold cross-validation to ensure the model generalizes well.
- **Regularization:** Apply L1 (Lasso) or L2 (Ridge) regularization to penalize large coefficients.
- **Pruning:** For decision trees, prune unnecessary branches to avoid over-complexity.
- **Early Stopping:** In neural networks, stop training when the performance on a validation set starts to degrade.
- **Dropout:** Randomly omit units during training in neural networks to prevent co-adaptation.
- **Data Augmentation:** Increase the diversity of training data by augmenting it with transformations.

### Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

**Underfitting** occurs when a model is too simple to learn the underlying pattern in the data. This can happen due to:

- **Insufficient Model Complexity:** Using a linear model for non-linear data.
- **Excessive Regularization:** Strong regularization can constrain the model too much.
- **Insufficient Training:** Not training the model long enough to learn the patterns.
- **Feature Selection:** Using too few features or irrelevant features that do not capture the pattern in the data.

### Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?

The **bias-variance tradeoff** describes the balance between two sources of error that affect model performance:

- **Bias:** Error due to overly simplistic assumptions in the model. High bias leads to systematic errors and underfitting.
- **Variance:** Error due to sensitivity to small fluctuations in the training data. High variance leads to overfitting and poor generalization.

The goal is to find a model with low bias and low variance, but increasing model complexity to reduce bias can increase variance, and vice versa. The key is to find the optimal balance to minimize total error.

### Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?

**Detecting overfitting:**
- **Performance Discrepancy:** High accuracy on training data but low accuracy on validation/test data.
- **Learning Curves:** Large gap between training and validation loss indicates overfitting.

**Detecting underfitting:**
- **Consistently Low Accuracy:** Poor performance on both training and validation/test data.
- **Learning Curves:** Both training and validation loss are high and close to each other.

**Determining the type of fitting:**
- Evaluate model performance on training and validation data.
- Use cross-validation to assess how well the model generalizes to unseen data.
- Compare learning curves to identify discrepancies.

### Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?

**Bias** refers to the error introduced by approximating a real-world problem by a simplified model:
- **High Bias Model:** Linear regression on non-linear data. Underfits and shows poor performance on both training and test data.

**Variance** refers to the error introduced by the model’s sensitivity to small fluctuations in the training data:
- **High Variance Model:** A deep neural network with insufficient data. Overfits and shows good performance on training data but poor performance on test data.

**Difference in performance:**
- High bias models make consistent errors regardless of the training data.
- High variance models perform well on training data but poorly on new, unseen data due to overfitting.

### Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.

**Regularization** is a technique to prevent overfitting by adding a penalty to the model for complexity:

- **L1 Regularization (Lasso):** Adds the absolute value of coefficients to the loss function. Encourages sparsity and can lead to feature selection.
  
- **L2 Regularization (Ridge):** Adds the squared value of coefficients to the loss function. Penalizes large coefficients and prevents the model from becoming too complex.
  
- **Elastic Net:** Combines L1 and L2 regularization, balancing between the two.

Regularization works by discouraging the model from fitting the noise in the training data, thus improving generalization to unseen data.