# **Optuna Hyper Parameter Tuning**

Optuna is a hyperparameter optimization framework that automates the search for optimal hyperparameters.

It is designed to work with machine learning libraries like TensorFlow, PyTorch, and Scikit-learn.

Optuna allows users to define an objective function that evaluates the performance of a model with given hyperparameters.

The framework then uses various optimization algorithms to find the best hyperparameters for the model.

Optuna supports features like pruning, which can stop unpromising trials early, and visualization tools to analyze the optimization process.

Some features of Optuna are,
1. Automatic hyperparameter optimization
2. Pruning of unpromising trials
3. Visualization of optimization results
4. Support for various machine learning frameworks
5. Easy integration with existing codebases
6. Parallel and distributed optimization
7. Support for conditional hyperparameters
8. Flexible and extensible design
9. Support for multi-objective optimization
10. Rich set of samplers and pruners
11. Ability to save and load study results

Optuna is used in various machine learning tasks, including classification, regression, and deep learning.

It is particularly useful for tuning complex models where manual hyperparameter tuning would be inefficient or ineffective.

In [1]:
import seaborn as sns
import pandas as pd

healthexp = sns.load_dataset("healthexp")
print(healthexp)


     Year        Country  Spending_USD  Life_Expectancy
0    1970        Germany       252.311             70.6
1    1970         France       192.143             72.2
2    1970  Great Britain       123.993             71.9
3    1970          Japan       150.437             72.0
4    1970            USA       326.961             70.9
..    ...            ...           ...              ...
269  2020        Germany      6938.983             81.1
270  2020         France      5468.418             82.3
271  2020  Great Britain      5018.700             80.4
272  2020          Japan      4665.641             84.7
273  2020            USA     11859.179             77.0

[274 rows x 4 columns]


In [2]:
# Convert categorical variables to dummy/indicator variables
# This is useful for machine learning models that require numerical input
healthexp = pd.get_dummies(healthexp, dtype='int')
display(healthexp.head())

Unnamed: 0,Year,Spending_USD,Life_Expectancy,Country_Canada,Country_France,Country_Germany,Country_Great Britain,Country_Japan,Country_USA
0,1970,252.311,70.6,0,0,1,0,0,0
1,1970,192.143,72.2,0,1,0,0,0,0
2,1970,123.993,71.9,0,0,0,1,0,0
3,1970,150.437,72.0,0,0,0,0,1,0
4,1970,326.961,70.9,0,0,0,0,0,1


In [3]:
X = healthexp.drop(['Life_Expectancy'], axis=1)
y = healthexp['Life_Expectancy']

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
from sklearn.ensemble import RandomForestRegressor

rfr = RandomForestRegressor()
rfr.fit(X_train, y_train)

In [6]:
display(rfr.score(X_test, y_test))  

0.9906662794576698

In [7]:
Y_pred = rfr.predict(X_test)


In [8]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score


display(mean_squared_error(y_test, Y_pred), mean_absolute_error(y_test, Y_pred), r2_score(y_test, Y_pred))

0.11395149090908245

0.26687272727271455

0.9906662794576698