# Q1: Define overfitting and underfitting in machine learning. What are the consequences of each, and how can they be mitigated?
|Overfitting|Underfitting|
|---|---|
|When a model is too complex and fits the training data too closely, including the noise in the data.|When a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both the training and test data. |
|Result: High Train accuracy so low bias|Low Train accuracy so high bias|
|Result: Low Test accuracy so high variance|Low Test accuracy so high variance|

## `Mitigate Overfitting:`

- L1 regularization: This technique adds a penalty term that is proportional to the `absolute value of the weights` in the model. L1 regularization encourages the model to `select a subset of the most important features and set the weights of the other features to zero`. This can be useful for feature selection and reducing the complexity of the model.

- L2 regularization: This technique adds a penalty term that is proportional to the `square of the weights` in the model. L2 regularization encourages the model to `spread the weight values across all the features rather than relying on a few dominant features`. This can improve the generalization performance of the model.
- Regularization: By adding a penalty term to the objective function during training, we can encourage the model to prefer simpler solutions and avoid overfitting.
- Dropout: By randomly dropping out some of the neurons in a neural network during training, we can prevent the network from relying too heavily on any particular feature or combination of features.
- Early stopping: By monitoring the performance of the model on a validation set during training, we can stop the training early when the validation performance starts to deteriorate, indicating that the model is starting to overfit.
- Cross-validation: Cross-validation is a technique used to assess the performance of a model on new, unseen data. This is done by dividing the data into training and validation sets and then repeating this process multiple times with different splits. By doing this, we can get a better estimate of the model's performance on new data and prevent overfitting.

- Data augmentation: Data augmentation is a technique used to artificially increase the size of the training set by creating new, slightly modified versions of the existing data. By doing this, we can prevent overfitting by exposing the model to more variations of the data.

## `Mitigate Underfitting:`
- Feature engineering: By transforming or combining the input features, we can create new features that capture the underlying patterns in the data more effectively.
- Model complexity: By increasing the complexity of the model, we can allow it to capture more complex patterns in the data.
- Ensemble methods: By combining multiple weak models into a stronger ensemble, we can improve the accuracy and reduce the risk of underfitting.


# Q2: How can we reduce overfitting? Explain in brief.
Refer Q.1. 'Mitigate Overfitting:' part



# Q3: Explain underfitting. List scenarios where underfitting can occur in ML.

Refer Q.1. 'Underfitting' part


The underfitting can occur due to:
- Insufficient data
- Insufficient features

- Over-regularization
- High bias model

- Poor model selection

# Q4: Explain the bias-variance tradeoff in machine learning. What is the relationship between bias and variance, and how do they affect model performance?
Bias-variance tradeoff describes the relationship between model complexity, model performance, and the ability of the model to generalize to new, unseen data.

Bias refers to the difference between the expected prediction of the model and the true value of the target variable

Variance refers to the variability of the model's predictions for different training sets.

As we get low bias and low variance we get high performance model.


# Q5: Discuss some common methods for detecting overfitting and underfitting in machine learning models.How can you determine whether your model is overfitting or underfitting?

- `Visual inspection`: Plotting the `learning curves` for both the training and validation datasets. If the training error is significantly lower than the validation error, it indicates overfitting, while if both errors are high, it indicates underfitting.

- `Cross-validation`:  If the model performs well on the training data but poorly on the validation data, it indicates overfitting, while poor performance on both training and validation data indicates underfitting.

- `Regularization`:  If the regularization strength is too high, it can lead to underfitting.

- `Feature importance`: If the model is overfitting, it may be giving too much importance to certain features that are not important for the task, resulting in poor generalization. Feature importance analysis can help identify such features.

- `Ensemble methods`: Ensemble methods such as bagging, boosting, and stacking can help reduce overfitting by combining multiple models and reducing the variance of the predictions.


# Q6: Compare and contrast bias and variance in machine learning. What are some examples of high bias and high variance models, and how do they differ in terms of their performance?
Refer Q.1.  

high bias and high variance models is our underfitting model  for example: 

linear regression models that assume a linear relationship between the input features and the target variable, while the true relationship is non-linear. 

# Q7: What is regularization in machine learning, and how can it be used to prevent overfitting? Describe some common regularization techniques and how they work.
Refer Q.1 and Q.5


