# **Optuna Hyper Parameter Tuning**

Optuna is a hyperparameter optimization framework that automates the search for optimal hyperparameters.

It is designed to work with machine learning libraries like TensorFlow, PyTorch, and Scikit-learn.

Optuna allows users to define an objective function that evaluates the performance of a model with given hyperparameters.

The framework then uses various optimization algorithms to find the best hyperparameters for the model.

Optuna supports features like pruning, which can stop unpromising trials early, and visualization tools to analyze the optimization process.

Some features of Optuna are,
1. Automatic hyperparameter optimization
2. Pruning of unpromising trials
3. Visualization of optimization results
4. Support for various machine learning frameworks
5. Easy integration with existing codebases
6. Parallel and distributed optimization
7. Support for conditional hyperparameters
8. Flexible and extensible design
9. Support for multi-objective optimization
10. Rich set of samplers and pruners
11. Ability to save and load study results

Optuna is used in various machine learning tasks, including classification, regression, and deep learning.

It is particularly useful for tuning complex models where manual hyperparameter tuning would be inefficient or ineffective.

In [2]:
import seaborn as sns
import pandas as pd

healthexp = sns.load_dataset("healthexp")
print(healthexp)


     Year        Country  Spending_USD  Life_Expectancy
0    1970        Germany       252.311             70.6
1    1970         France       192.143             72.2
2    1970  Great Britain       123.993             71.9
3    1970          Japan       150.437             72.0
4    1970            USA       326.961             70.9
..    ...            ...           ...              ...
269  2020        Germany      6938.983             81.1
270  2020         France      5468.418             82.3
271  2020  Great Britain      5018.700             80.4
272  2020          Japan      4665.641             84.7
273  2020            USA     11859.179             77.0

[274 rows x 4 columns]


In [3]:
# Convert categorical variables to dummy/indicator variables
# This is useful for machine learning models that require numerical input
healthexp = pd.get_dummies(healthexp, dtype='int')
display(healthexp.head())

Unnamed: 0,Year,Spending_USD,Life_Expectancy,Country_Canada,Country_France,Country_Germany,Country_Great Britain,Country_Japan,Country_USA
0,1970,252.311,70.6,0,0,1,0,0,0
1,1970,192.143,72.2,0,1,0,0,0,0
2,1970,123.993,71.9,0,0,0,1,0,0
3,1970,150.437,72.0,0,0,0,0,1,0
4,1970,326.961,70.9,0,0,0,0,0,1


In [4]:
X = healthexp.drop(['Life_Expectancy'], axis=1)
y = healthexp['Life_Expectancy']

In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
from sklearn.ensemble import RandomForestRegressor

rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

0,1,2
,n_estimators,100
,criterion,'squared_error'
,max_depth,
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,1.0
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [7]:
display(rfr.score(X_test, y_test))  

0.991622718445446

In [8]:
Y_pred = rfr.predict(X_test)


In [9]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


display(mean_squared_error(y_test, Y_pred), mean_absolute_error(y_test, Y_pred), r2_score(y_test, Y_pred))

0.1022747272727173

0.2514545454545284

0.991622718445446

In [10]:
import optuna
from sklearn.model_selection import cross_val_score

In [11]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 5, 20)   
    min_samples_split = trial.suggest_int('min_samples_split', 2, 10)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 4)
    
    model = RandomForestRegressor(n_estimators=n_estimators, 
                                  max_depth=max_depth, 
                                  min_samples_split=min_samples_split, 
                                  min_samples_leaf=min_samples_leaf, 
                                  random_state=42)
    
    score = cross_val_score(model, X_train, y_train, cv=10, scoring='neg_mean_squared_error')
    
    return score.mean()

In [12]:
study = optuna.create_study(direction='maximize', sampler=optuna.samplers.RandomSampler())

[I 2025-07-25 12:51:21,883] A new study created in memory with name: no-name-c78eff4c-0622-48be-9c7f-87c465900a11


In [13]:
study.optimize(objective, n_trials=50)

[I 2025-07-25 12:51:27,822] Trial 0 finished with value: -0.28078356114357417 and parameters: {'n_estimators': 421, 'max_depth': 8, 'min_samples_split': 4, 'min_samples_leaf': 2}. Best is trial 0 with value: -0.28078356114357417.
[I 2025-07-25 12:51:39,222] Trial 1 finished with value: -0.31663809240171287 and parameters: {'n_estimators': 817, 'max_depth': 12, 'min_samples_split': 7, 'min_samples_leaf': 2}. Best is trial 0 with value: -0.28078356114357417.
[I 2025-07-25 12:51:53,157] Trial 2 finished with value: -0.22725536105486482 and parameters: {'n_estimators': 787, 'max_depth': 13, 'min_samples_split': 4, 'min_samples_leaf': 1}. Best is trial 2 with value: -0.22725536105486482.
[I 2025-07-25 12:51:58,683] Trial 3 finished with value: -0.48005497708219946 and parameters: {'n_estimators': 447, 'max_depth': 9, 'min_samples_split': 8, 'min_samples_leaf': 4}. Best is trial 2 with value: -0.22725536105486482.
[I 2025-07-25 12:52:01,858] Trial 4 finished with value: -0.2669767159514346 a

In [14]:
study.best_params

{'n_estimators': 429,
 'max_depth': 15,
 'min_samples_split': 2,
 'min_samples_leaf': 1}

In [15]:
best_params = study.best_params


In [16]:
import matplotlib.pyplot as plt
import optuna.visualization

optuna.visualization.plot_optimization_history(study).show()


In [17]:
optuna.visualization.plot_parallel_coordinate(study).show()

In [18]:
optuna.visualization.plot_slice(study, params=['n_estimators', 'max_depth', 'min_samples_split', 'min_samples_leaf']).show()

In [19]:
optuna.visualization.plot_param_importances(study).show()

In [20]:
# Create a new model with the best parameters
best_n_estimators = best_params['n_estimators']
best_max_depth = best_params['max_depth']
best_min_samples_split = best_params['min_samples_split']
best_min_samples_leaf = best_params['min_samples_leaf']

In [21]:
best_model = RandomForestRegressor(    
    n_estimators=best_n_estimators,
    max_depth=best_max_depth,
    min_samples_split=best_min_samples_split,
    min_samples_leaf=best_min_samples_leaf,
    random_state=42
)

In [22]:
best_model.fit(X_train, y_train)

0,1,2
,n_estimators,429
,criterion,'squared_error'
,max_depth,15
,min_samples_split,2
,min_samples_leaf,1
,min_weight_fraction_leaf,0.0
,max_features,1.0
,max_leaf_nodes,
,min_impurity_decrease,0.0
,bootstrap,True


In [25]:
y_pred_2 = best_model.predict(X_test)
display(y_pred_2)

array([72.85407925, 81.41491841, 82.34708625, 76.97785548, 81.02913753,
       81.78018648, 81.22470862, 78.31818182, 78.69090909, 81.83892774,
       76.6030303 , 78.74172494, 81.54615385, 80.72517483, 78.58717949,
       80.40722611, 80.21305361, 74.46317016, 76.75034965, 78.71818182,
       73.81118881, 74.9981352 , 79.94009324, 83.54219114, 84.23170163,
       81.10862471, 76.51212121, 73.88158508, 71.31888112, 77.55874126,
       72.7972028 , 75.72424242, 72.57342657, 77.5969697 , 77.83799534,
       75.40815851, 81.00559441, 78.75664336, 71.35571096, 74.97715618,
       76.23962704, 77.72610723, 74.93006993, 74.29487179, 77.64219114,
       81.404662  , 72.43333333, 80.28787879, 78.22703963, 81.91794872,
       75.5025641 , 80.83543124, 78.27972028, 71.66689977, 79.36083916])

In [26]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

print("RandomForestRegressor Score:", best_model.score(X_test, y_test))
print("Mean Squared Error:", mean_squared_error(y_test, y_pred_2))
print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred_2))
print("R^2 Score:", r2_score(y_test, y_pred_2))

RandomForestRegressor Score: 0.9904922673717763
Mean Squared Error: 0.11607593169706533
Mean Absolute Error: 0.2685823267641656
R^2 Score: 0.9904922673717763
