Optuna is an open-source hyperparameter optimization (HPO) framework for machine learning. It provides a simple and efficient way to search for optimal hyperparameters in a given machine learning model or algorithm. Optuna automates the process of tuning hyperparameters by using various search algorithms and pruning techniques to efficiently explore the hyperparameter space.

In [1]:
!pip install optuna

Collecting optuna
  Downloading optuna-3.2.0-py3-none-any.whl (390 kB)
Collecting alembic>=1.5.0
  Downloading alembic-1.11.1-py3-none-any.whl (224 kB)
Collecting cmaes>=0.9.1
  Downloading cmaes-0.9.1-py3-none-any.whl (21 kB)
Collecting colorlog
  Downloading colorlog-6.7.0-py2.py3-none-any.whl (11 kB)
Collecting Mako
  Downloading Mako-1.2.4-py3-none-any.whl (78 kB)
Installing collected packages: Mako, colorlog, cmaes, alembic, optuna
Successfully installed Mako-1.2.4 alembic-1.11.1 cmaes-0.9.1 colorlog-6.7.0 optuna-3.2.0


In [16]:
import optuna
import sklearn.datasets
import sklearn.ensemble
import sklearn.model_selection
import sklearn.svm
from sklearn import metrics

In [3]:
iris = sklearn.datasets.load_iris()
X, y = iris.data, iris.target

In [4]:
X.shape

(150, 4)

In [6]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=42,test_size=0.33)

In this case, the suggest_float method will suggest a floating-point value for the svc_c hyperparameter within the range specified (from 1e-10 to 1e10) using a logarithmic scale. You can then use this suggested value when configuring and training your SVM classifier.

In [8]:
def objective(trial):
    classifier_name = trial.suggest_categorical("classifier", ["SVC", "RandomForest"])
    if classifier_name == "SVC":
        svc_c = trial.suggest_float("svc_c", 1e-10, 1e10, log=True)
        classifier_obj = sklearn.svm.SVC(C=svc_c, gamma="auto")
    else:
        rf_max_depth = trial.suggest_int("rf_max_depth", 2, 32, log=True)
        classifier_obj = sklearn.ensemble.RandomForestClassifier(
            max_depth=rf_max_depth, n_estimators=10
        )

    score = sklearn.model_selection.cross_val_score(classifier_obj, X_train, y_train, n_jobs=-1, cv=3)
    accuracy = score.mean()
    return accuracy

In [9]:
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
print(study.best_trial)

[I 2023-05-30 11:03:18,187] A new study created in memory with name: no-name-95dd2d78-48a8-4945-ba7b-d16e6f000cb0
[I 2023-05-30 11:03:25,504] Trial 0 finished with value: 0.929590017825312 and parameters: {'classifier': 'SVC', 'svc_c': 8217489007.697121}. Best is trial 0 with value: 0.929590017825312.
[I 2023-05-30 11:03:26,430] Trial 1 finished with value: 0.919489007724302 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 23}. Best is trial 0 with value: 0.929590017825312.
[I 2023-05-30 11:03:27,404] Trial 2 finished with value: 0.929590017825312 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 12}. Best is trial 0 with value: 0.929590017825312.
[I 2023-05-30 11:03:28,193] Trial 3 finished with value: 0.9494949494949495 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 31}. Best is trial 3 with value: 0.9494949494949495.
[I 2023-05-30 11:03:28,968] Trial 4 finished with value: 0.9099821746880571 and parameters: {'classifier': 'RandomForest', 'r

[I 2023-05-30 11:03:30,454] Trial 42 finished with value: 0.929590017825312 and parameters: {'classifier': 'SVC', 'svc_c': 708.1550578293574}. Best is trial 24 with value: 0.9497920380273323.
[I 2023-05-30 11:03:30,486] Trial 43 finished with value: 0.9396910279263221 and parameters: {'classifier': 'SVC', 'svc_c': 5.15184583081535}. Best is trial 24 with value: 0.9497920380273323.
[I 2023-05-30 11:03:30,502] Trial 44 finished with value: 0.929590017825312 and parameters: {'classifier': 'SVC', 'svc_c': 37429.37217556443}. Best is trial 24 with value: 0.9497920380273323.
[I 2023-05-30 11:03:30,517] Trial 45 finished with value: 0.9497920380273323 and parameters: {'classifier': 'SVC', 'svc_c': 57.642941120425476}. Best is trial 24 with value: 0.9497920380273323.
[I 2023-05-30 11:03:30,533] Trial 46 finished with value: 0.9197860962566845 and parameters: {'classifier': 'SVC', 'svc_c': 0.2878439895843587}. Best is trial 24 with value: 0.9497920380273323.
[I 2023-05-30 11:03:30,548] Trial 47

[I 2023-05-30 11:03:31,275] Trial 85 finished with value: 0.9197860962566845 and parameters: {'classifier': 'SVC', 'svc_c': 0.3562133765220864}. Best is trial 63 with value: 0.9595959595959597.
[I 2023-05-30 11:03:31,306] Trial 86 finished with value: 0.9393939393939394 and parameters: {'classifier': 'RandomForest', 'rf_max_depth': 3}. Best is trial 63 with value: 0.9595959595959597.
[I 2023-05-30 11:03:31,322] Trial 87 finished with value: 0.9396910279263221 and parameters: {'classifier': 'SVC', 'svc_c': 5.016803140768866}. Best is trial 63 with value: 0.9595959595959597.
[I 2023-05-30 11:03:31,353] Trial 88 finished with value: 0.9396910279263221 and parameters: {'classifier': 'SVC', 'svc_c': 192.49345056646575}. Best is trial 63 with value: 0.9595959595959597.
[I 2023-05-30 11:03:31,369] Trial 89 finished with value: 0.9497920380273323 and parameters: {'classifier': 'SVC', 'svc_c': 51.247441249262295}. Best is trial 63 with value: 0.9595959595959597.
[I 2023-05-30 11:03:31,385] Tria

FrozenTrial(number=63, state=TrialState.COMPLETE, values=[0.9595959595959597], datetime_start=datetime.datetime(2023, 5, 30, 11, 3, 30, 831604), datetime_complete=datetime.datetime(2023, 5, 30, 11, 3, 30, 850139), params={'classifier': 'SVC', 'svc_c': 161.59091829286348}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'classifier': CategoricalDistribution(choices=('SVC', 'RandomForest')), 'svc_c': FloatDistribution(high=10000000000.0, log=True, low=1e-10, step=None)}, trial_id=63, value=None)


In [10]:
svc = sklearn.svm.SVC(C=161.59091829286348, gamma='auto')
svc.fit(X_train,y_train)

SVC(C=161.59091829286348, gamma='auto')

In [12]:
y_pred = svc.predict(X_test)

In [13]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,y_pred))

0.98


In [18]:
def report_performance(model):
    print("\n\nConfusion Matrix:")
    print("{0}".format(metrics.confusion_matrix(y_test, y_pred)))
    print("\n\nClassification Report: ")
    print(metrics.classification_report(y_test, y_pred))

In [19]:
report_performance(svc) 



Confusion Matrix:
[[19  0  0]
 [ 0 15  0]
 [ 0  1 15]]


Classification Report: 
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.94      1.00      0.97        15
           2       1.00      0.94      0.97        16

    accuracy                           0.98        50
   macro avg       0.98      0.98      0.98        50
weighted avg       0.98      0.98      0.98        50



Optuna is a powerful tool for automating hyperparameter optimization and can significantly improve the performance of machine learning models by finding the best combination of hyperparameters. It simplifies the process of tuning hyperparameters, saving time and effort for machine learning practitioners and researchers.