#### Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?

**Overfitting** occurs when a model learns the noise and details of the training data to the extent that it negatively impacts its performance on new, unseen data. It means the model is too complex and captures random fluctuations instead of the underlying data pattern.

- **Consequences**: Poor generalization to new data, leading to high variance and low bias.
- **Mitigation**: Use techniques like cross-validation, regularization (L1, L2), early stopping, data augmentation, or reducing model complexity (e.g., fewer features, simpler models).

**Underfitting** occurs when a model is too simple to capture the underlying pattern of the data, resulting in poor performance on both training and test data.

- **Consequences**: High bias and low variance, poor model accuracy.
- **Mitigation**: Increase model complexity, add more features, use more sophisticated algorithms, or reduce regularization.

#### Q2: How can we reduce overfitting? Explain in brief.

To reduce overfitting, you can:

- **Regularization**: Apply techniques like L1 (Lasso) or L2 (Ridge) regularization to penalize large coefficients in linear models.
- **Cross-Validation**: Use techniques like k-fold cross-validation to validate the model on different subsets of the data.
- **Reduce Model Complexity**: Simplify the model by reducing the number of features, pruning decision trees, or using simpler algorithms.
- **Early Stopping**: Stop training when performance on the validation set starts to degrade.
- **Data Augmentation**: Increase the size and variability of the training dataset by augmenting it with transformed versions of the existing data.
- **Ensemble Methods**: Use ensemble techniques like bagging or boosting to combine multiple models to improve generalization.

#### Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

Underfitting happens when the model is too simple to learn the underlying pattern in the data.

**Scenarios where underfitting can occur**:

- Using a linear model on data that has non-linear relationships.
- Using very few features or irrelevant features in the model.
- Excessive regularization that limits the model's flexibility.
- Insufficient training (too few epochs or iterations).
- Over-simplified models (e.g., shallow decision trees, insufficient neural network layers).

#### Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?

The bias-variance tradeoff is a fundamental concept that reflects the balance between two sources of error that affect model performance:

- **Bias**: The error due to overly simplistic assumptions in the learning algorithm. High bias leads to underfitting.
- **Variance**: The error due to excessive sensitivity to small fluctuations in the training data. High variance leads to overfitting.

**Relationship**: Increasing model complexity usually reduces bias but increases variance, and vice versa. The goal is to find the right balance that minimizes total error (sum of bias², variance, and irreducible error).

#### Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models. How can you determine whether your model is overfitting or underfitting?

**Methods for detecting overfitting**:

- Large difference between training and validation/test set performance (high training accuracy, low validation/test accuracy).
- High variance in model predictions for new, unseen data.

**Methods for detecting underfitting**:

- Low performance on both training and validation/test sets.
- High bias and consistent errors across different data points.

**Determine overfitting or underfitting**:

Compare training and validation losses/accuracies over time (e.g., using a learning curve). A gap between training and validation indicates overfitting, while similar and high errors suggest underfitting.

#### Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?

**Bias**:

- Tendency of a model to make systematic errors.
- High bias: Models make strong assumptions, oversimplifying the data.
- **Examples**: Linear regression on non-linear data, shallow decision trees.

**Variance**:

- Sensitivity of a model to small changes in the training data.
- High variance: Models capture noise in the training data.
- **Examples**: Deep neural networks without regularization, overfitted decision trees.

**Differences in performance**:

- High bias models perform poorly on both training and test data.
- High variance models perform well on training data but poorly on test data.

#### Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.

Regularization is a technique to prevent overfitting by adding a penalty term to the model's loss function to discourage overly complex models.

**Common regularization techniques**:

- **L1 Regularization (Lasso)**: Adds the absolute value of coefficients as a penalty term to the loss function. It can reduce some feature coefficients to zero, effectively performing feature selection.
- **L2 Regularization (Ridge)**: Adds the squared value of coefficients as a penalty. It shrinks the coefficients but does not make them zero, encouraging small weights to avoid complexity.
- **Elastic Net**: Combines L1 and L2 regularization to benefit from both techniques.
- **Dropout (for neural networks)**: Randomly drops a fraction of neurons during training to prevent co-adaptation and promote robustness.
