# Guiding Principle

### Whenever you're under fitting the model, it's best to add more features
### When you're overfitting you should subtract features

### Underfitting occurs when a model is too simple to capture the underlying patterns of the data. This can manifest as poor performance on both training and test datasets.

- Adding More Features: When a model underfits, it may not have enough information to make accurate predictions. Adding more features can provide the model with additional signals to learn from, potentially capturing more complexity and improving its predictive accuracy.

- Increasing Model Complexity: Besides adding more features, you might also consider increasing the complexity of the model itself. This could involve moving from a linear model to a more complex nonlinear model, increasing the depth of trees in decision tree-based methods, or adding layers to a neural network.

### Overfitting occurs when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This is often seen as high accuracy on the training data but poor accuracy on the test data.

- Reducing Features: If a model is overfitting, one approach is to reduce the number of features. This helps by limiting the model's ability to fit to noise in the training data. Techniques like feature selection can help identify and retain the most useful features, removing those that contribute to overfitting.

- Simplifying the Model: This could involve reducing the complexity of the model, such as lowering the depth of decision trees, reducing the number of layers or neurons in a neural network, or choosing simpler models.

- Increasing Regularization: Adding or increasing the regularization (such as L1 or L2 regularization) can penalize larger model coefficients or more complex models, thus preventing the model from fitting too closely to the training data.

## Question
### A Machine Learning Specialist has created a neural network model for an image classification task. The Specialist encountered an overfitting issue wherein the validation loss is much greater than the training loss. Which action would MOST likely solve the problem and how should the Specialist justify it?

- The option that says: The model is not generalizing well because it’s not complex enough, therefore, additional nodes should be added at the hidden layer is incorrect. Overfitting means the model is already complex. Therefore, this would cause the model to overfit more.

- Dropout is a technique that addresses this issue. It prevents overfitting and provides a way of approximately combining exponentially many different neural network architectures efficiently. The term dropout refers to dropping out units (hidden and visible) in a neural network. Hence, the correct answer is: Since the model is not generalizing well, he should increase the dropout rate at the hidden layer.

## Question

### After training a SageMaker XGBoost based model over a huge training dataset, the data science team observed that it has low accuracy on the training data as well as low accuracy on the test data.As an AWS Certified ML Specialist, which of the following techniques would you recommend to help resolve this problem? (Select two):

- Add regularization to the model
- Use more training data
- Use more features in the model
- Remove regularization from the model
- Use less features in the model


### Use More Features in the Model

- Rationale: Adding more features can provide the model with more information and potentially capture more complexity in the dataset. If the model is currently not performing well because it's too simplistic, increasing the number and variety of features might help it learn better and make more accurate predictions.
- Impact: By introducing more relevant features, you allow the model to make decisions based on a wider array of data points, which might capture relationships and patterns previously overlooked.