# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

Kalau pakai GridSearchCV, nyobain semua kombinasi parameter yang kita tentuin, hasilnya pasti ketemu yang terbaik tapi lama banget kalau parameternya banyak. RandomizedSearchCV lebih cepet karena dia milih kombinasi secara acak, tapi kadang nggak nemu yang paling optimal. BayesianSearchCV lebih pinter, soalnya setiap kali nyoba dia belajar dari hasil sebelumnya biar coba kombinasi yang lebih menjanjikan. Nah kalau Optuna, ini yang paling fleksibel dan modern, bisa nyoba parameter secara lebih cerdas, bisa stop lebih awal kalau percobaannya jelek, jadi jauh lebih hemat waktu dan tenaga komputasi.

# Import Data & Libraries

In [69]:
# jalankan hanya sekali
!pip install optuna -q

In [70]:
# import library yang dibutuhkan di sini
import seaborn as sns
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.preprocessing import LabelEncoder
import optuna

In [71]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [72]:
# ubah variabel kategorik ke numerik
le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])

In [73]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [74]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)


# Base Model Random Forest

In [75]:
# gunakan random forest classifier
rfr = RandomForestClassifier(random_state=42)
rfr.fit(X_train, y_train)

In [76]:
y_pred = rfr.predict(X_test)

In [77]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.900
Precision: 0.902
Recall: 0.900
F1 Score: 0.900
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.82      0.90      0.86        10
           2       0.89      0.80      0.84        10

    accuracy                           0.90        30
   macro avg       0.90      0.90      0.90        30
weighted avg       0.90      0.90      0.90        30



# Optuna

In [78]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 10, 50)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)


    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    score = score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [79]:
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler())

[I 2025-10-03 08:55:04,339] A new study created in memory with name: no-name-f8e6d0cd-6680-4b99-95c0-fc7b2b6b1d42


In [80]:
study.optimize(objective, n_trials=100)

[I 2025-10-03 08:55:24,020] Trial 0 finished with value: 0.9583333333333333 and parameters: {'n_estimators': 981, 'max_depth': 11, 'min_samples_split': 19, 'min_samples_leaf': 7}. Best is trial 0 with value: 0.9583333333333333.
[I 2025-10-03 08:55:30,704] Trial 1 finished with value: 0.95 and parameters: {'n_estimators': 796, 'max_depth': 16, 'min_samples_split': 30, 'min_samples_leaf': 3}. Best is trial 0 with value: 0.9583333333333333.
[I 2025-10-03 08:55:31,785] Trial 2 finished with value: 0.95 and parameters: {'n_estimators': 104, 'max_depth': 21, 'min_samples_split': 5, 'min_samples_leaf': 11}. Best is trial 0 with value: 0.9583333333333333.
[I 2025-10-03 08:55:35,348] Trial 3 finished with value: 0.95 and parameters: {'n_estimators': 418, 'max_depth': 49, 'min_samples_split': 4, 'min_samples_leaf': 2}. Best is trial 0 with value: 0.9583333333333333.
[I 2025-10-03 08:55:42,842] Trial 4 finished with value: 0.7083333333333333 and parameters: {'n_estimators': 937, 'max_depth': 16, 

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [81]:
study.best_params

{'n_estimators': 826,
 'max_depth': 31,
 'min_samples_split': 7,
 'min_samples_leaf': 19}

In [82]:
best_params = study.best_params

In [83]:
import matplotlib.pyplot as plt

In [84]:
optuna.visualization.plot_optimization_history(study)

In [85]:
optuna.visualization.plot_parallel_coordinate(study)

In [86]:
optuna.visualization.plot_slice(study, params={'n_estimators','max_depth', 'min_samples_split', 'min_samples_leaf'})

In [87]:
optuna.visualization.plot_param_importances(study)

Berikut hasil hyperparameter tuning dari Optuna

In [88]:
print("Best Hyperparameters:", best_params)

Best Hyperparameters: {'n_estimators': 826, 'max_depth': 31, 'min_samples_split': 7, 'min_samples_leaf': 19}


# Random Forest Using Optuna

In [89]:
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [90]:
best_model = RandomForestClassifier(
    n_estimators=best_n_estimators,
    max_depth=best_max_depth,
    min_samples_split=best_min_samples_split,
    min_samples_leaf=best_min_samples_leaf)

best_model.fit(X_train, y_train)

In [91]:
y_pred = best_model.predict(X_test)

In [92]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.933
Precision: 0.933
Recall: 0.933
F1 Score: 0.933
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.90      0.90      0.90        10
           2       0.90      0.90      0.90        10

    accuracy                           0.93        30
   macro avg       0.93      0.93      0.93        30
weighted avg       0.93      0.93      0.93        30



Performa model meningkat sekitar +3.3% di akurasi, precision, recall, dan F1.