# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

Grid Search nyoba semua kombinasi parameter yang ada, jadi hasilnya akurat tapi lebih lama. Sedangkan Randomized Search ambil kombinasi secara acak biar lebih efisien.
Bayesian SearchCV dan Optuna punya persamaan belajar dari percobaan sebelumnya buat nemuin kombinasi terbaik, tapi Optuna biasanya lebih fleksibel dan performanya lebih bagus.

# Import Data & Libraries

In [86]:
# jalankan hanya sekali
!pip install optuna -q

In [87]:
# import library yang dibutuhkan di sini
...

In [88]:
import pandas as pd
import numpy as np
import seaborn as sns
import optuna

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

In [89]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [90]:
# ubah variabel kategorik ke numerik
...

In [91]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])

In [92]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [93]:
# split dengan rasio 80:20
...

In [94]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Base Model Random Forest

In [95]:
# gunakan random forest classifier
from sklearn.ensemble import RandomForestClassifier

rfr = RandomForestClassifier(random_state=69)
rfr.fit(X_train, y_train)

In [96]:
y_pred = rfr.predict(X_test)

In [97]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [98]:
def objective(trial):
    ...

    model = ...

    score = ...

    return score.mean()

In [99]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 300)
    max_depth = trial.suggest_int('max_depth', 2, 20)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 20)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 20)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42
    )

    score = cross_val_score(
        model, X_train, y_train,
        cv=5,
        scoring="accuracy"
    )

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [100]:
study = optuna.create_study(direction="maximize")

[I 2025-10-04 17:13:52,673] A new study created in memory with name: no-name-84a0b4fd-ea3b-4c6c-8ca0-50f302ec9c0d


In [101]:
study.optimize(objective, n_trials=50)

[I 2025-10-04 17:13:53,691] Trial 0 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 72, 'max_depth': 20, 'min_samples_split': 10, 'min_samples_leaf': 5}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-04 17:13:55,805] Trial 1 finished with value: 0.95 and parameters: {'n_estimators': 172, 'max_depth': 18, 'min_samples_split': 11, 'min_samples_leaf': 18}. Best is trial 1 with value: 0.95.
[I 2025-10-04 17:14:01,200] Trial 2 finished with value: 0.925 and parameters: {'n_estimators': 261, 'max_depth': 14, 'min_samples_split': 9, 'min_samples_leaf': 19}. Best is trial 1 with value: 0.95.
[I 2025-10-04 17:14:03,904] Trial 3 finished with value: 0.95 and parameters: {'n_estimators': 113, 'max_depth': 10, 'min_samples_split': 17, 'min_samples_leaf': 1}. Best is trial 1 with value: 0.95.
[I 2025-10-04 17:14:06,218] Trial 4 finished with value: 0.925 and parameters: {'n_estimators': 73, 'max_depth': 12, 'min_samples_split': 10, 'min_samples_leaf': 17}. Best

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [102]:
study.best_params

{'n_estimators': 86,
 'max_depth': 11,
 'min_samples_split': 6,
 'min_samples_leaf': 3}

Berikut hasil hyperparameter tuning dari Optuna

In [103]:
# cek hasil hyperparameter tuning dari Optuna


In [104]:
print("Best parameters:", study.best_params)
print("Best score:", study.best_value)

Best parameters: {'n_estimators': 86, 'max_depth': 11, 'min_samples_split': 6, 'min_samples_leaf': 3}
Best score: 0.9666666666666666


# Random Forest Using Optuna

In [105]:
# simpan hasil best hyperparameter tuning ke variabel bari
...

In [106]:
best_params = study.best_params

In [85]:
best_model = ...

best_model.fit(X_train, y_train)

AttributeError: 'ellipsis' object has no attribute 'fit'

In [107]:
best_model = RandomForestClassifier(
    n_estimators=best_params['n_estimators'],
    max_depth=best_params['max_depth'],
    min_samples_split=best_params['min_samples_split'],
    min_samples_leaf=best_params['min_samples_leaf'],
    random_state=42
)

In [108]:
y_pred = best_model.predict(X_test)

NotFittedError: This RandomForestClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

In [None]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus