![Model Tuning and Selection banner](./images/7_model_tuning_and_selection.png)

# 7. Model Tuning and Selection

## 7.1. Hyperparameter Tuning

The next step after selecting a model and training it is to tune the hyperparameters to optimize performance. Hyperparameters are settings that control the learning process of the model, such as learning rate, regularization strength, number of trees, etc.

There are several techniques for hyperparameter tuning:
- **Grid search**: A brute-force approach that exhaustively builds and evaluates models for every combination of hyperparameter values within a pre-specified search space. Computationally expensive but will not miss any points.
- **Random search**: Instead of a full grid, random search evaluates models with randomly sampled hyperparameter values from the search space. More efficient than grid search for higher dimensional problems.
- **Bayesian optimization**: An intelligent search that uses a probabilistic model to guide the hyperparameter selection process. It aims to find the optimal values with fewer iterations by leveraging information from previous evaluations.

The goal is to find the hyperparameter values that lead to the best performance on a held-out validation set. This is often done within a cross-validation loop to ensure the selected hyperparameters generalize well.

![Model Tuning and Selection loops](./images/7_model_tuning_and_selection_loops.png)

Some other important considerations:
- Feature engineering can have a big impact on model performance
- Evaluate using appropriate metrics for the problem (accuracy, F1, RMSE, etc.)
- Monitor for overfitting and underfitting during tuning
- Techniques like early stopping can help prevent overfitting

So in summary, hyperparameter tuning is a critical step to optimize a model's performance after selecting and training it. Techniques like grid search, random search, and Bayesian optimization are commonly used to find the best hyperparameter values

-----

## 7.2. Ensemble Methods

Ensemble methods are techniques that create multiple models and combine them to produce improved results compared to a single model. The main idea is that when weak models are correctly combined, more accurate and robust models can be obtained.

### 7.2.1. Bagging

Bagging involves training multiple base models independently on different random subsets of the training data, then combining their predictions through techniques like voting for classification or averaging for regression. Popular bagging methods include:
- **Random Forests**: An ensemble of decision trees, each trained on a bootstrap sample of the data using random subsets of features.
- **Bootstrap Aggregating (Bagging)**: Fits separate models on random subsets of data and averages their predictions.

Bagging aims to reduce the variance of individual models, making the ensemble more robust to overfitting.

### 7.2.2. Boosting

Boosting trains models sequentially, with each new model focusing on the instances misclassified by the previous one. The final prediction is a weighted combination of the weak learners. Key boosting algorithms include:
- **AdaBoost**: Iteratively adjusts weights of training instances based on their difficulty.
- **Gradient Boosting**: Builds additive models by sequentially training weak learners on the negative gradients of the loss function.

Boosting reduces bias and variance, allowing weak learners to become strong predictors.

### 7.2.3. Stacking

Stacking involves training different types of models on the same data, then using a meta-model to combine their predictions optimally. This allows leveraging strengths of diverse base models.

The success of ensemble methods depends on factors like diversity among base models, data sampling strategies, and the technique used to combine predictions. Ensembles have consistently achieved top results in machine learning competitions by reducing overfitting and capturing different aspects of the data.
