# Hyperparameter Optimization with Optuna and MLflow

In this notebook, we compare a baseline XGBoost model with a tuned XGBoost model optimized using Optuna. 
The goal is to evaluate whether hyperparameter tuning improves model performance on the California Housing dataset.


In [1]:
pip install plotly

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: C:\Users\DELL\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.13_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [2]:
pip install optuna mlflow xgboost pandas scikit-learn

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.2 -> 25.3
[notice] To update, run: C:\Users\DELL\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.13_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


In [3]:
import json
import pandas as pd
import numpy as np
import optuna
from sklearn.datasets import fetch_california_housing
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
import matplotlib.pyplot as plt
from optuna.visualization import plot_optimization_history, plot_param_importances, plot_parallel_coordinate


  from .autonotebook import tqdm as notebook_tqdm


In [4]:
import os
os.environ["SCIKIT_LEARN_DATA"] = r"C:\Users\DELL\scikit_learn_data"


In [5]:
with open("../outputs/results.json", "r") as f:
    results = json.load(f)

results


{'n_trials': 6,
 'best_value_rmse': 51810.604089555774,
 'best_params': {'n_estimators': 51,
  'max_depth': 6,
  'learning_rate': 0.1003206097319599,
  'subsample': 0.8638441659604221,
  'colsample_bytree': 0.7203777752894938,
  'gamma': 3.0283499395155857,
  'reg_lambda': 0.09023908797044065,
  'reg_alpha': 0.6092039774760741,
  'min_child_weight': 7}}

## Tuned Model (After Optuna Optimization)

We now load the best hyperparameters found using Optuna and retrain the model.


In [6]:
study = optuna.load_study(
    study_name="xgb_opt_study",
    storage="sqlite:///../outputs/optuna_study.db"
)
study


<optuna.study.study.Study at 0x24ff9542a50>

In [7]:
best_trial = study.best_trial

print("Best RMSE:", best_trial.value)
print("\nBest Params:\n")
for k, v in best_trial.params.items():
    print(f"{k}: {v}")


Best RMSE: 51810.604089555774

Best Params:

n_estimators: 51
max_depth: 6
learning_rate: 0.1003206097319599
subsample: 0.8638441659604221
colsample_bytree: 0.7203777752894938
gamma: 3.0283499395155857
reg_lambda: 0.09023908797044065
reg_alpha: 0.6092039774760741
min_child_weight: 7


In [8]:
plot_optimization_history(study)


In [9]:
plot_parallel_coordinate(study)


In [10]:
from sklearn.datasets import fetch_california_housing

data = fetch_california_housing(as_frame=True)

X = data.data
y = data.target

X.shape, X.columns


((20640, 8),
 Index(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup',
        'Latitude', 'Longitude'],
       dtype='object'))

We split the data into 80% training and 20% testing.


In [12]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.2,
    random_state=42
)


## Baseline Model

We train an XGBoost regressor using default (reasonable) hyperparameters. 
This serves as a reference point to compare against the tuned model.


In [14]:
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
from math import sqrt

baseline = XGBRegressor(
    random_state=42,
    n_estimators=200,
    max_depth=6,
    learning_rate=0.1,
    tree_method="hist",
    n_jobs=-1
)

baseline.fit(X_train, y_train)

baseline_rmse = sqrt(
    mean_squared_error(
        y_test,
        baseline.predict(X_test)
    )
)

baseline_rmse


0.4638700072964944

**Baseline RMSE = <0.4638700072964944>**


A lower RMSE means better performance.


In [17]:
best_params = {
    'n_estimators': 181,
    'max_depth': 10,
    'learning_rate': 0.09432201915095745,
    'subsample': 0.9286461251144378,
    'colsample_bytree': 0.561747070049646,
    'gamma': 3.477298000861585,
    'reg_lambda': 3.383229878581516,
    'reg_alpha': 0.06581290291633338,
    'min_child_weight': 9
}


In [18]:
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

tuned_model = XGBRegressor(
    random_state=42,
    tree_method="hist",
    n_jobs=-1,
    **best_params
)

tuned_model.fit(X_train, y_train)


0,1,2
,"objective  objective: typing.Union[str, xgboost.sklearn._SklObjWProto, typing.Callable[[typing.Any, typing.Any], typing.Tuple[numpy.ndarray, numpy.ndarray]], NoneType] Specify the learning task and the corresponding learning objective or a custom objective function to be used. For custom objective, see :doc:`/tutorials/custom_metric_obj` and :ref:`custom-obj-metric` for more information, along with the end note for function signatures.",'reg:squarederror'
,"base_score  base_score: typing.Union[float, typing.List[float], NoneType] The initial prediction score of all instances, global bias.",
,booster,
,"callbacks  callbacks: typing.Optional[typing.List[xgboost.callback.TrainingCallback]] List of callback functions that are applied at end of each iteration. It is possible to use predefined callbacks by using :ref:`Callback API `. .. note::  States in callback are not preserved during training, which means callback  objects can not be reused for multiple training sessions without  reinitialization or deepcopy. .. code-block:: python  for params in parameters_grid:  # be sure to (re)initialize the callbacks before each run  callbacks = [xgb.callback.LearningRateScheduler(custom_rates)]  reg = xgboost.XGBRegressor(**params, callbacks=callbacks)  reg.fit(X, y)",
,colsample_bylevel  colsample_bylevel: typing.Optional[float] Subsample ratio of columns for each level.,
,colsample_bynode  colsample_bynode: typing.Optional[float] Subsample ratio of columns for each split.,
,colsample_bytree  colsample_bytree: typing.Optional[float] Subsample ratio of columns when constructing each tree.,0.561747070049646
,"device  device: typing.Optional[str] .. versionadded:: 2.0.0 Device ordinal, available options are `cpu`, `cuda`, and `gpu`.",
,"early_stopping_rounds  early_stopping_rounds: typing.Optional[int] .. versionadded:: 1.6.0 - Activates early stopping. Validation metric needs to improve at least once in  every **early_stopping_rounds** round(s) to continue training. Requires at  least one item in **eval_set** in :py:meth:`fit`. - If early stopping occurs, the model will have two additional attributes:  :py:attr:`best_score` and :py:attr:`best_iteration`. These are used by the  :py:meth:`predict` and :py:meth:`apply` methods to determine the optimal  number of trees during inference. If users want to access the full model  (including trees built after early stopping), they can specify the  `iteration_range` in these inference methods. In addition, other utilities  like model plotting can also use the entire model. - If you prefer to discard the trees after `best_iteration`, consider using the  callback function :py:class:`xgboost.callback.EarlyStopping`. - If there's more than one item in **eval_set**, the last entry will be used for  early stopping. If there's more than one metric in **eval_metric**, the last  metric will be used for early stopping.",
,enable_categorical  enable_categorical: bool See the same parameter of :py:class:`DMatrix` for details.,False


In [21]:
from sklearn.metrics import mean_squared_error
import numpy as np

mse = mean_squared_error(
    y_test,
    tuned_model.predict(X_test)
)

tuned_rmse = np.sqrt(mse)

tuned_rmse


np.float64(0.4894778226946998)

### Model Performance Comparison

We evaluated two XGBoost regression models on the California Housing dataset to measure how hyperparameter tuning impacts model performance. Root Mean Squared Error (RMSE) was used as the evaluation metric, where lower values indicate better performance.



| Model | RMSE |
|------|------|
| Baseline XGBoost | 0.4639 |
| Tuned XGBoost (Optuna) | 0.4895 |


## Final Summary

In this project, I built an end-to-end machine learning workflow using:

- XGBoost for regression
- Optuna for hyperparameter optimization
- MLflow for experiment tracking
- California Housing dataset as the data source