<a href="https://colab.research.google.com/github/Mel-iza/Optuna_hyperparameter_search/blob/main/Testing_hyperparameter_search_Optuna.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tutorial was based on [Hypertune Machine Learning Models with OPTUNA](https://www.youtube.com/watch?v=TgdEZ6LFj-I)

---

What are the necessary steps? <br>
1. Install optuna;
2. Build a surrogate probability model of the objective function;
3. Find the hyperparameters that perform best on the surrogate;
4. Apply these hyperparameters to the true objective function;
5. Update the surrogate model incorporating the new results;
6. Repeat steps 2-4 until max iterations or time is reached.

In [2]:
!pip install optuna
!pip install scikit-learn



In [5]:
import optuna
import sklearn.datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

def objective(trial):
  iris = sklearn.datasets.load_iris()
  x, y = iris.data, iris.target

  criterion = trial.suggest_categorical("criterion", ["gini", "entropy"])
  max_depth = trial.suggest_int("max_depth", 2, 32, log=True)
  n_estimators = trial.suggest_int("n_estimators", 100, 500)

  random_forest = sklearn.ensemble.RandomForestClassifier(criterion=criterion,
                                                          max_depth=max_depth,
                                                          n_estimators=n_estimators)

  score = cross_val_score(random_forest, x, y, n_jobs=-1, cv=3)
  accuracy = score.mean()
  return accuracy

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)


[I 2023-11-08 19:46:10,247] A new study created in memory with name: no-name-4f23f742-841a-48d2-837d-5d28bb84f7ea
[I 2023-11-08 19:46:14,668] Trial 0 finished with value: 0.96 and parameters: {'criterion': 'entropy', 'max_depth': 5, 'n_estimators': 333}. Best is trial 0 with value: 0.96.
[I 2023-11-08 19:46:15,916] Trial 1 finished with value: 0.9666666666666667 and parameters: {'criterion': 'entropy', 'max_depth': 17, 'n_estimators': 296}. Best is trial 1 with value: 0.9666666666666667.
[I 2023-11-08 19:46:17,227] Trial 2 finished with value: 0.9533333333333333 and parameters: {'criterion': 'entropy', 'max_depth': 2, 'n_estimators': 324}. Best is trial 1 with value: 0.9666666666666667.
[I 2023-11-08 19:46:18,343] Trial 3 finished with value: 0.96 and parameters: {'criterion': 'gini', 'max_depth': 5, 'n_estimators': 277}. Best is trial 1 with value: 0.9666666666666667.
[I 2023-11-08 19:46:18,830] Trial 4 finished with value: 0.96 and parameters: {'criterion': 'entropy', 'max_depth': 3,

In [7]:
trial = study.best_trial
trial

FrozenTrial(number=1, state=TrialState.COMPLETE, values=[0.9666666666666667], datetime_start=datetime.datetime(2023, 11, 8, 19, 46, 14, 671277), datetime_complete=datetime.datetime(2023, 11, 8, 19, 46, 15, 916243), params={'criterion': 'entropy', 'max_depth': 17, 'n_estimators': 296}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'criterion': CategoricalDistribution(choices=('gini', 'entropy')), 'max_depth': IntDistribution(high=32, log=True, low=2, step=1), 'n_estimators': IntDistribution(high=500, log=False, low=100, step=1)}, trial_id=1, value=None)

In [12]:
trial.distributions

{'criterion': CategoricalDistribution(choices=('gini', 'entropy')),
 'max_depth': IntDistribution(high=32, log=True, low=2, step=1),
 'n_estimators': IntDistribution(high=500, log=False, low=100, step=1)}

In [13]:
print('Accuracy: {}'.format(trial.value))
print('Best hyperparameters: {}'.format(trial.params))

Accuracy: 0.9666666666666667
Best hyperparameters: {'criterion': 'entropy', 'max_depth': 17, 'n_estimators': 296}
