<a href="https://colab.research.google.com/github/Dashnyam7/scikit-learn/blob/main/Hyperparameter_optimization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Hyperparameter optimization

## What are hyperparameters?

Hyperparameters are values ​​that engineers must specify, such as the depth of trees in a decision tree model or the number of decision trees in a random forest

 Grid search and random search
are typical techniques for finding the optimal values ​​for these values

### What is grid search?

Grid search is a method of repeatedly executing learning and prediction using multiple hyperparameter values ​​and finding the most accurate hyperparameter from the obtained prediction results.

Grid search is often used in conjunction with cross-validation, and this text also explains how to find optimal hyperparameter values ​​using both grid search and cross-validation.

## Dataset preperation

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
data, target = iris.data, iris.target
data_train, data_test, target_train, target_test = train_test_split(data, target, random_state=123, test_size=0.2)

## Cross-Validation using grid search

This time, we will find the optimal value for the depth of the decision tree that is specified when creating the decision tree model.

In [2]:
from sklearn.tree import DecisionTreeClassifier
tree = DecisionTreeClassifier()

In [3]:
from sklearn.model_selection import GridSearchCV

In [4]:
tree_depth = {'max_depth': [2, 3, 4, 5]}
clf = GridSearchCV(tree, param_grid=tree_depth, cv=10)
clf.fit(data_train, target_train)

GridSearchCV(cv=10, estimator=DecisionTreeClassifier(),
             param_grid={'max_depth': [2, 3, 4, 5]})

In [5]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
tree_depth = {'max_depth': [2, 3, 4, 5]}
tree = DecisionTreeClassifier()
clf = GridSearchCV(tree, param_grid=tree_depth, cv=10)
clf.fit(data_train, target_train)

GridSearchCV(cv=10, estimator=DecisionTreeClassifier(),
             param_grid={'max_depth': [2, 3, 4, 5]})

## Check the optimal value

In [6]:
clf.best_params_

{'max_depth': 4}

In [7]:
clf.best_estimator_

DecisionTreeClassifier(max_depth=4)

In [8]:
target_pred = clf.predict(data_test)
target_pred

array([1, 2, 2, 1, 0, 1, 1, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 1, 0, 0, 1,
       0, 2, 0, 0, 0, 2, 2, 0])

In [9]:
target_test

array([1, 2, 2, 1, 0, 2, 1, 0, 0, 1, 2, 0, 1, 2, 2, 2, 0, 0, 1, 0, 0, 2,
       0, 2, 0, 0, 0, 2, 2, 0])