# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

Grid Search mencoba semua kombinasi parameter yang kita tentukan. Cara ini menjamin menemukan hasil terbaik dalam ruang parameter kecil, tapi akan sangat memakan waktu kalau kombinasi parameternya banyak. Sebaliknya, Random Search hanya mengambil sebagian kombinasi secara acak. Lebih cepat dan efisien, tapi ada kemungkinan kombinasi terbaik tidak terpilih.

Bayesian Search lebih efisien karena memanfaatkan informasi dari percobaan sebelumnya untuk memutuskan parameter berikutnya. Sementara Optuna menawarkan pendekatan yang lebih modern: fleksibel, adaptif, dan bisa menghentikan percobaan yang kurang menjanjikan lebih awal. Hal ini membuat Optuna sering dipilih karena lebih praktis sekaligus hemat waktu.

# Import Data & Libraries

In [None]:
# jalankan hanya sekali
!pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.6/400.9 kB[0m [31m3.6 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m399.4/400.9 kB[0m [31m5.9 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [22]:
# import library yang dibutuhkan di sini
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import optuna
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.preprocessing import LabelEncoder

In [None]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [12]:
# ubah variabel kategorik ke numerik
iris_mapping = {
    'setosa': 0,
    'versicolor': 1,
    'virginica': 2
}

df['species'] = df['species'].map(iris_mapping)

In [None]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [48]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=19
)

# Base Model Random Forest

In [49]:
# gunakan random forest classifier
rfc = RandomForestClassifier(random_state=69)
rfc.fit(X_train, y_train)

In [50]:
y_pred = rfc.predict(X_test)

In [51]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00        12
           2       1.00      1.00      1.00         8

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [52]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 10, 50)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)

    model = RandomForestClassifier(
        n_estimators=n_estimators,
        max_depth=max_depth,
        min_samples_split=min_samples_split,
        min_samples_leaf=min_samples_leaf,
        random_state=42)

    score = cross_val_score(model, X, y, n_jobs=-1, cv=5, scoring='accuracy')

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [55]:
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler(seed=42))

[I 2025-10-04 14:45:19,455] A new study created in memory with name: no-name-486e3e47-ae38-4421-9a62-ec612fff1ed2


In [56]:
study.optimize(objective, n_trials=100)

[I 2025-10-04 14:45:51,032] Trial 0 finished with value: 0.9466666666666665 and parameters: {'n_estimators': 437, 'max_depth': 48, 'min_samples_split': 24, 'min_samples_leaf': 20}. Best is trial 0 with value: 0.9466666666666665.
[I 2025-10-04 14:45:52,801] Trial 1 finished with value: 0.8933333333333333 and parameters: {'n_estimators': 240, 'max_depth': 16, 'min_samples_split': 3, 'min_samples_leaf': 28}. Best is trial 0 with value: 0.9466666666666665.
[I 2025-10-04 14:45:58,659] Trial 2 finished with value: 0.7666666666666667 and parameters: {'n_estimators': 641, 'max_depth': 39, 'min_samples_split': 2, 'min_samples_leaf': 32}. Best is trial 0 with value: 0.9466666666666665.
[I 2025-10-04 14:46:05,196] Trial 3 finished with value: 0.9533333333333334 and parameters: {'n_estimators': 850, 'max_depth': 18, 'min_samples_split': 7, 'min_samples_leaf': 6}. Best is trial 3 with value: 0.9533333333333334.
[I 2025-10-04 14:46:07,956] Trial 4 finished with value: 0.9466666666666665 and paramete

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [60]:
best_params = study.best_params
print(best_params)

{'n_estimators': 136, 'max_depth': 34, 'min_samples_split': 23, 'min_samples_leaf': 1}


Berikut hasil hyperparameter tuning dari Optuna

In [61]:
# cek hasil hyperparameter tuning dari Optuna
best_score = study.best_value
print(best_score)

0.9666666666666668


# Random Forest Using Optuna

In [63]:
# simpan hasil best hyperparameter tuning ke variabel bari
n_estimators = best_params['n_estimators']
max_depth = best_params['max_depth']
min_samples_split = best_params['min_samples_split']
min_samples_leaf = best_params['min_samples_leaf']

In [64]:
best_model = RandomForestClassifier(
    n_estimators=n_estimators,
    max_depth=max_depth,
    min_samples_split=min_samples_split,
    min_samples_leaf=min_samples_leaf,
    random_state=42
)

best_model.fit(X_train, y_train)

In [65]:
y_pred = best_model.predict(X_test)

In [66]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00        12
           2       1.00      1.00      1.00         8

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus