# Model Tuning

## Tuning Model Hyperparameters with GridSearchCV

- Non-parametric models like decision trees and k-nearest neighbors have various hyperparameters that can be specified.
- Hyperparameters help balance the bias-variance trade-off and find the optimal model.
- Writing clean, readable code with so many hyperparameter combinations can be difficult.
- Scikit-learn has a tool called GridSearchCV for searching through a grid of hyperparameters.
- This tool will be introduced in the section.

## Pickle and Model Deployment

- Shutting down your notebook kernel means your model disappears.
- To use the model again, you'd need to re-train it, which is time-consuming.
- Pickling your model stores it for later use without needing to re-train.
- Pickled models are often used in model deployment and can be used as the backend of an API.

## Gridsearchcv

### Parameter tuning

- Building and training a supervised learning model is an iterative process.
- Improving model performance can be done by improving or engineering data, or finding good parameters to set when creating the model.
- Setting the wrong parameters can cause overfitting or underfitting.
- Each modeling problem is unique and requires a different set of parameters.
- A combinatorial grid search can be used to find the best combination of parameters for a given model.

### Grid search

- Setting parameters in a model affects overall model performance
- Increasing min_samples_split generally improves model performance up to a certain point
- Too low or too high max_depth may lead to overfitting or underfitting
- The best combination of parameters can only be determined through exhaustive search or grid search

### Use GridSearchCV

- The sklearn library provides a way to tune model parameters through an exhaustive search using GridSearchCV.
- GridSearchCV combines K-Fold Cross-Validation with a grid search of parameters.
- To use GridSearchCV, create a parameter grid dictionary with keys as parameter names and values as values to try.
- Pass the parameter grid dictionary to GridSearchCV along with the classifier.
- K-fold cross-validation can also be used during this process by specifying the cv parameter.
- GridSearchCV will run all combinations of parameters and default to the model with the best score.
- Access the best combination of parameters by checking the best_params_ attribute.

In [2]:
# import decisiontreeclassifier
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier()

param_grid = {
    'criterion': ['gini', 'entropy'],
    'max_depth': [1, 2, 5, 10],
    'min_samples_split': [1, 5, 10, 20]
}

gs_tree = GridSearchCV(clf, param_grid, cv=3)
gs_tree.fit(train_data, train_labels)

gs_tree.best_params_

NameError: name 'GridSearchCV' is not defined

### Drawbacks of GridSearchCV

- GridSearchCV is useful for finding the best parameter combination.
- However, it is only as good as the parameters we put in the grid.
- An exhaustive search like GridsearchCV can be very time-consuming and expensive.
- For complex models or large datasets, the time needed to run a grid search can be prohibitive.
- Be thoughtful about the parameters you set, as the extra runtime may not be worth it.