# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

Grid: Mencoba semua kombinasi hyperparameter dari grid yang sudah ditentukan

Randomized: Memilih kombinasi acak dari distribusi hyperparameter yang ditentukan

Bayesian search cv: Memakai model probabilistik untuk memprediksi daerah mana yang paling menjanjikan, lalu mengeksplorasi ke sana

source: https://www.youtube.com/watch?v=t-INgABWULw

# Import Data & Libraries

In [27]:
# jalankan hanya sekali
!pip install optuna -q

In [28]:
# import library yang dibutuhkan di sini
import seaborn as sns
import pandas as pd
import numpy as np

In [29]:
df = sns.load_dataset('iris')

print(df.head())
print(df.tail())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
     sepal_length  sepal_width  petal_length  petal_width    species
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica


In [30]:
new_df = df.copy()

In [31]:
df.dtypes

Unnamed: 0,0
sepal_length,float64
sepal_width,float64
petal_length,float64
petal_width,float64
species,object


# Data Preprocessing

In [32]:
# ubah variabel kategorik ke numerik
spe_map = {"setosa": 0, "versicolor": 1, "virginica": 2}
df["species"] = df["species"].str.lower().map(spe_map)
df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,2
146,6.3,2.5,5.0,1.9,2
147,6.5,3.0,5.2,2.0,2
148,6.2,3.4,5.4,2.3,2


In [33]:
df.isna().sum()

Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
species,0


In [35]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [36]:
# split dengan rasio 80:20
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, train_size=0.8, random_state=42)

train, test = train_test_split(df, test_size = 0.2, train_size=0.8, random_state=42)

In [37]:
train.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
22,4.6,3.6,1.0,0.2,0
15,5.7,4.4,1.5,0.4,0
65,6.7,3.1,4.4,1.4,1
11,4.8,3.4,1.6,0.2,0
42,4.4,3.2,1.3,0.2,0


In [38]:
test.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
73,6.1,2.8,4.7,1.2,1
18,5.7,3.8,1.7,0.3,0
118,7.7,2.6,6.9,2.3,2
78,6.0,2.9,4.5,1.5,1
76,6.8,2.8,4.8,1.4,1


# Base Model Random Forest

In [39]:
# gunakan random forest classifier
from sklearn.ensemble import RandomForestClassifier


rfr = RandomForestClassifier(random_state=42)
rfr.fit(X_train, y_train)

In [40]:
y_pred = rfr.predict(X_test)

In [41]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report

print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



# Optuna

In [42]:
import optuna

In [43]:
from sklearn.model_selection import cross_val_score

In [52]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 10, 100)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)

    model = RandomForestClassifier(n_estimators = n_estimators,
                                   max_depth = max_depth,
                                   min_samples_split = min_samples_split,
                                   min_samples_leaf = min_samples_leaf,
                                   random_state=42)

    score = cross_val_score(model, X_train, y_train, cv=5, scoring="accuracy")
    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [51]:
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler())

[I 2025-10-03 16:11:36,459] A new study created in memory with name: no-name-4b331d36-6798-485c-b567-e7b7ed681d6a


In [53]:
study.optimize(objective, n_trials=100)

[I 2025-10-03 16:12:35,009] Trial 0 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 751, 'max_depth': 32, 'min_samples_split': 25, 'min_samples_leaf': 16}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 16:12:39,963] Trial 1 finished with value: 0.7 and parameters: {'n_estimators': 563, 'max_depth': 78, 'min_samples_split': 5, 'min_samples_leaf': 30}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 16:12:42,405] Trial 2 finished with value: 0.8166666666666668 and parameters: {'n_estimators': 189, 'max_depth': 80, 'min_samples_split': 4, 'min_samples_leaf': 24}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 16:12:47,976] Trial 3 finished with value: 0.9416666666666667 and parameters: {'n_estimators': 709, 'max_depth': 74, 'min_samples_split': 16, 'min_samples_leaf': 12}. Best is trial 0 with value: 0.9416666666666667.
[I 2025-10-03 16:12:49,461] Trial 4 finished with value: 0.7 and parameters: {'n_estimators': 188, 'm

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [54]:
study.best_params

{'n_estimators': 184,
 'max_depth': 22,
 'min_samples_split': 10,
 'min_samples_leaf': 1}

Berikut hasil hyperparameter tuning dari Optuna

In [58]:
# cek hasil hyperparameter tuning dari Optuna
best_params = study.best_params

# Random Forest Using Optuna

In [59]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [60]:
best_model = RandomForestClassifier(n_estimators=best_n_estimators,
                                    max_depth=best_max_depth,
                                    min_samples_split=best_min_samples_split,
                                    min_samples_leaf=best_min_samples_leaf)

best_model.fit(X_train, y_train)

In [61]:
y_pred = best_model.predict(X_test)

In [62]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 1.000
Precision: 1.000
Recall: 1.000
F1 Score: 1.000
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus