# 🧪 Optuna + Small MLPs: What to Tweak & How It Works (High-Level)

## What this notebook does
- Splits **California Housing** into **80/10/10** (train/val/test).
- Standardizes features with **train-only** stats.
- Uses **Optuna** to **minimize validation MAE** by tuning a *small* `MLPRegressor` (≤20 units/layer) **with early stopping**.
- Retrains the best configuration on **train+val**, evaluates on **test**, and saves useful artifacts (metrics, plots, CSV logs, pickled model/scaler).

Artifacts land in `./Module2/optuna_artifacts/`:
- `final_metrics.csv` (R², MAE, MAPE for train/val/test)
- `scatter_train.png`, `scatter_test.png`
- `trials_full.csv` (one row per Optuna trial with params + val MAE)
- `best_track.csv` (MAE per trial + best‐so‐far curve)
- `mae_over_trials.png` (learning curve across trials)
- `best_params.json`, `final_model.pkl`, `scaler_final.pkl`

---

## 🎛️ Knobs students can play with
**Search budget**
- `n_trials` in `study.optimize(...)` – more trials → better chances of finding good configs (but more time).
- Or use time-boxing: `study.optimize(objective, timeout=180)`.

**Model size & shape**
- `n_layers` (1–3)  
- `n_units_l{i}` (4–20, step 4) – cap keeps it fast and forces compact networks.

**Optimization & regularization**
- `activation`: `"relu"` vs `"tanh"`
- `alpha` (L2 weight decay): `1e-6` to `1e-2` (log scale)
- `learning_rate_init`: `1e-4` to `1e-2` (log scale)
- `batch_size`: `{64, 128, 256}`

**Training dynamics**
- `early_stopping=True` with `validation_fraction=0.1` and `n_iter_no_change=20`  
  (This uses a **train-internal** holdout to stop early; Optuna still scores **external** validation for fairness.)
- `max_iter` (e.g., 1000 while tuning, 2000 for final fit)

**Objective**
- We optimize **validation MAE**. Try swapping to `MAPE` or maximizing `R²` (minimize `-R2`) to see trade-offs.

**Reproducibility**
- `RANDOM_STATE` controls both splitting and model seeds.

---

## 🔍 How to read the outputs
- **`mae_over_trials.png`**: Dots are each trial’s validation MAE; the line is the *best‐so‐far*. If the line keeps dropping, more trials may help.
- **`trials_full.csv`**: Explore which hyperparams correlate with lower MAE (e.g., smaller `learning_rate_init` + certain `alpha`).
- **`final_metrics.csv`**: Check for overfitting. If train MAE ≪ test MAE, consider more regularization (`alpha` ↑), smaller nets, or more patience with early stopping.
- **Scatter plots**: Points tightly along `y=x` indicate better calibration; look for heteroscedasticity (fan shapes).

---

## 🤖 What Optuna does (high level)
Optuna is an **automatic hyperparameter optimization** library. It frames tuning as:
- A **study** (the whole experiment) composed of
- multiple **trials** (one set of hyperparameters → train → score),
- where a **sampler** proposes new hyperparameters each trial.

By default we use **TPE (Tree-structured Parzen Estimator)**:
1. It models two probability densities over the hyperparameter space:  
   - one for **good** results (low MAE) and one for **not-so-good**.
2. It then **samples** new hyperparameters where the ratio *p(good)/p(bad)* is high—balancing **exploration** (try new areas) and **exploitation** (refine promising zones).
3. Over trials, this Bayesian-inspired process **learns where better configs live**, usually beating naive grid/random searches for the same budget.

**Key terms**
- **Study**: the whole optimization run.
- **Trial**: one attempt (train + evaluate with a specific param set).
- **Params**: the hyperparameters suggested for a trial.
- **Value**: the objective score returned (here: validation **MAE**).
- **Sampler**: strategy for proposing params (TPE is the default workhorse).

---

## 🧩 Extensions (great mini-experiments)
- **Pruning**: stop bad trials early to save time (`optuna.integration` pruners).
- **Cross-validation objective**: average MAE across folds instead of a single val split.
- **Categorical search tweaks**: add/exclude activations, change unit ranges, try `batch_size=32` for stability.
- **Logging & dashboards**: `optuna-dashboard` for interactive trial exploration.

Have fun! Try small changes, run ~10–25 trials, and compare the artifacts you generate between runs. This is exactly how we iterate in real ML workflows. 🚀


In [1]:
# ONE-CELL Optuna workflow for small MLPRegressor (≤20 units/layer, early stopping)
# Artifacts saved to ./Module2/optuna_artifacts (folder is overwritten each run)

# --- Imports & setup ---
import os, shutil, json, pickle, datetime as dt, warnings
warnings.filterwarnings("ignore")

# If Optuna isn't installed in your env, uncomment:
# %pip install optuna

import optuna
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
from sklearn.metrics import r2_score, mean_absolute_error, mean_absolute_percentage_error

RANDOM_STATE = 42
RESULTS_DIR = "./optuna_artifacts"

# --- Recreate results directory ---
if os.path.exists(RESULTS_DIR):
    shutil.rmtree(RESULTS_DIR)
os.makedirs(RESULTS_DIR, exist_ok=True)
with open(os.path.join(RESULTS_DIR, "run_info.json"), "w") as f:
    json.dump({"run_tag": dt.datetime.now().strftime("%Y%m%d_%H%M%S")}, f, indent=2)

# --- Load & split data (80/10/10) ---
data = fetch_california_housing(as_frame=True)
X = data.frame.drop(columns=["MedHouseVal"])
y = data.frame["MedHouseVal"]

X_train, X_temp, y_train, y_temp = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_STATE
)
X_val, X_test, y_val, y_test = train_test_split(
    X_temp, y_temp, test_size=0.5, random_state=RANDOM_STATE
)

# --- Scale with train-only stats ---
scaler = StandardScaler().fit(X_train)
X_train_s = scaler.transform(X_train)
X_val_s   = scaler.transform(X_val)
X_test_s  = scaler.transform(X_test)

# --- Optuna objective (minimize external VAL MAE); small nets + early stopping ---
def objective(trial):
    n_layers = trial.suggest_int("n_layers", 1, 3)
    hidden_sizes = tuple(trial.suggest_int(f"n_units_l{i+1}", 4, 20, step=4) for i in range(n_layers))
    params = {
        "hidden_layer_sizes": hidden_sizes,
        "activation": trial.suggest_categorical("activation", ["relu", "tanh"]),
        "solver": "adam",
        "alpha": trial.suggest_float("alpha", 1e-6, 1e-2, log=True),
        "learning_rate_init": trial.suggest_float("learning_rate_init", 1e-4, 1e-2, log=True),
        "batch_size": trial.suggest_categorical("batch_size", [64, 128, 256]),
        "max_iter": 1000,
        "early_stopping": True,      # internal split on TRAIN only
        "validation_fraction": 0.1,
        "n_iter_no_change": 20,
        "random_state": RANDOM_STATE,
        "shuffle": True,
    }
    model = MLPRegressor(**params).fit(X_train_s, y_train)
    y_val_pred = model.predict(X_val_s)
    return mean_absolute_error(y_val, y_val_pred)

# --- Callback to log CSVs/plot after every trial ---
def log_progress_callback(study, trial):
    df = study.trials_dataframe(attrs=("number","value","state","datetime_start","datetime_complete"))
    params_df = pd.DataFrame([t.params for t in study.trials])
    if len(params_df):
        df = pd.concat([df.reset_index(drop=True), params_df.reset_index(drop=True)], axis=1)
    df.to_csv(os.path.join(RESULTS_DIR, "trials_full.csv"), index=False)

    vals = df["value"].astype(float).values
    best_so_far = np.minimum.accumulate(vals)
    pd.DataFrame({"trial": np.arange(len(vals)), "val_mae": vals, "best_mae": best_so_far}) \
      .to_csv(os.path.join(RESULTS_DIR, "best_track.csv"), index=False)

    plt.figure(figsize=(6.5,4))
    plt.plot(np.arange(len(vals)), vals, marker="o", linewidth=1)
    plt.plot(np.arange(len(vals)), best_so_far, linewidth=2)
    plt.xlabel("Trial")
    plt.ylabel("Validation MAE")
    plt.title("Optuna: MAE per trial (line = best so far)")
    plt.tight_layout()
    plt.savefig(os.path.join(RESULTS_DIR, "mae_over_trials.png"), dpi=150, bbox_inches="tight")
    plt.close()

# --- Run study (calm search) ---
sampler = optuna.samplers.TPESampler(seed=RANDOM_STATE)
study = optuna.create_study(direction="minimize", sampler=sampler)
study.optimize(objective, n_trials=25, callbacks=[log_progress_callback], show_progress_bar=True)

# Save best params
with open(os.path.join(RESULTS_DIR, "best_params.json"), "w") as f:
    json.dump(study.best_params, f, indent=2)

# --- Retrain best model on TRAIN+VAL; evaluate on TEST; save metrics/plots/model ---
best = study.best_params
hidden_sizes = tuple(best[f"n_units_l{i+1}"] for i in range(best["n_layers"]))
final_params = {
    "hidden_layer_sizes": hidden_sizes,
    "activation": best["activation"],
    "solver": "adam",
    "alpha": best["alpha"],
    "learning_rate_init": best["learning_rate_init"],
    "batch_size": best["batch_size"],
    "max_iter": 2000,
    "early_stopping": True,
    "validation_fraction": 0.1,
    "n_iter_no_change": 25,
    "random_state": RANDOM_STATE,
    "shuffle": True,
}

# Refit scaler on TRAIN+VAL for the final model
scaler_final = StandardScaler().fit(np.vstack([X_train, X_val]))
X_trval_s = scaler_final.transform(np.vstack([X_train, X_val]))
y_trval   = np.concatenate([y_train.values, y_val.values])
X_train_sf = scaler_final.transform(X_train)
X_val_sf   = scaler_final.transform(X_val)
X_test_sf  = scaler_final.transform(X_test)

final_model = MLPRegressor(**final_params).fit(X_trval_s, y_trval)

def metrics(y_true, y_pred):
    return {
        "R2": r2_score(y_true, y_pred),
        "MAE": mean_absolute_error(y_true, y_pred),
        "MAPE": mean_absolute_percentage_error(y_true, y_pred),
    }

# Evaluate and save metrics
rows = []
for name, (Xsplit, ytrue) in {
    "train": (X_train_sf, y_train.values),
    "val":   (X_val_sf,   y_val.values),
    "test":  (X_test_sf,  y_test.values),
}.items():
    ypred = final_model.predict(Xsplit)
    rows.append({"split": name, **metrics(ytrue, ypred)})
metrics_df = pd.DataFrame(rows).round(4)
metrics_df.to_csv(os.path.join(RESULTS_DIR, "final_metrics.csv"), index=False)
print("Final metrics:\n", metrics_df.to_string(index=False))

# Scatterplot helper
def scatter_with_reference(y_true, y_pred, title, outpath):
    plt.figure(figsize=(6,6))
    plt.scatter(y_true, y_pred, alpha=0.3, s=10)
    lo = min(np.min(y_true), np.min(y_pred))
    hi = max(np.max(y_true), np.max(y_pred))
    plt.plot([lo, hi], [lo, hi], linewidth=1)
    plt.xlabel("Actual MedHouseVal")
    plt.ylabel("Predicted MedHouseVal")
    plt.title(title)
    plt.tight_layout()
    plt.savefig(outpath, dpi=150, bbox_inches="tight")
    plt.close()

# Save train & test scatterplots
scatter_with_reference(y_train.values, final_model.predict(X_train_sf),
                       "Predicted vs Actual — Train", os.path.join(RESULTS_DIR, "scatter_train.png"))
scatter_with_reference(y_test.values, final_model.predict(X_test_sf),
                       "Predicted vs Actual — Test",  os.path.join(RESULTS_DIR, "scatter_test.png"))

# Persist model & scaler
with open(os.path.join(RESULTS_DIR, "final_model.pkl"), "wb") as f:
    pickle.dump(final_model, f)
with open(os.path.join(RESULTS_DIR, "scaler_final.pkl"), "wb") as f:
    pickle.dump(scaler_final, f)

print(f"\nArtifacts saved to: {os.path.abspath(RESULTS_DIR)}")


[I 2025-10-08 14:53:52,315] A new study created in memory with name: no-name-2a7ab0d7-f305-4edf-a1a3-5f345cc5c691
  0%|          | 0/25 [00:11<?, ?it/s]

[I 2025-10-08 14:54:04,146] Trial 0 finished with value: 0.36647443102993993 and parameters: {'n_layers': 2, 'n_units_l1': 20, 'n_units_l2': 16, 'activation': 'relu', 'alpha': 4.207053950287936e-06, 'learning_rate_init': 0.00013066739238053285, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:   8%|▊         | 2/25 [00:23<04:24, 11.49s/it]

[I 2025-10-08 14:54:15,392] Trial 1 finished with value: 0.39011007294976646 and parameters: {'n_layers': 1, 'n_units_l1': 20, 'activation': 'relu', 'alpha': 5.337032762603957e-06, 'learning_rate_init': 0.00023270677083837802, 'batch_size': 128}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  12%|█▏        | 3/25 [00:30<03:28,  9.48s/it]

[I 2025-10-08 14:54:22,446] Trial 2 finished with value: 0.392796156175982 and parameters: {'n_layers': 1, 'n_units_l1': 16, 'activation': 'tanh', 'alpha': 2.9204338471814107e-05, 'learning_rate_init': 0.000816845589476017, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  16%|█▌        | 4/25 [00:33<02:30,  7.17s/it]

[I 2025-10-08 14:54:26,094] Trial 3 finished with value: 0.39218412450937495 and parameters: {'n_layers': 2, 'n_units_l1': 4, 'n_units_l2': 16, 'activation': 'relu', 'alpha': 0.006245139574743076, 'learning_rate_init': 0.00853618986286683, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  20%|██        | 5/25 [00:37<02:00,  6.01s/it]

[I 2025-10-08 14:54:30,058] Trial 4 finished with value: 0.37872750198541627 and parameters: {'n_layers': 3, 'n_units_l1': 12, 'n_units_l2': 4, 'n_units_l3': 12, 'activation': 'tanh', 'alpha': 1.0842262717330169e-05, 'learning_rate_init': 0.0021137059440645744, 'batch_size': 256}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  24%|██▍       | 6/25 [00:42<01:48,  5.69s/it]

[I 2025-10-08 14:54:35,125] Trial 5 finished with value: 0.3888342345218851 and parameters: {'n_layers': 1, 'n_units_l1': 20, 'activation': 'tanh', 'alpha': 0.003795853142670641, 'learning_rate_init': 0.0015696396388661157, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  28%|██▊       | 7/25 [00:47<01:36,  5.37s/it]

[I 2025-10-08 14:54:39,824] Trial 6 finished with value: 0.4150107493437893 and parameters: {'n_layers': 1, 'n_units_l1': 8, 'activation': 'relu', 'alpha': 0.0020651425578959264, 'learning_rate_init': 0.0005170191786366995, 'batch_size': 128}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  32%|███▏      | 8/25 [00:51<01:22,  4.87s/it]

[I 2025-10-08 14:54:43,623] Trial 7 finished with value: 0.38441589726398406 and parameters: {'n_layers': 3, 'n_units_l1': 4, 'n_units_l2': 20, 'n_units_l3': 16, 'activation': 'relu', 'alpha': 0.0018274508859816032, 'learning_rate_init': 0.002592475660475161, 'batch_size': 128}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  36%|███▌      | 9/25 [00:54<01:11,  4.46s/it]

[I 2025-10-08 14:54:47,177] Trial 8 finished with value: 0.4113002413312427 and parameters: {'n_layers': 2, 'n_units_l1': 4, 'n_units_l2': 20, 'activation': 'relu', 'alpha': 1.7956984225677624e-06, 'learning_rate_init': 0.0004187594718900631, 'batch_size': 128}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  40%|████      | 10/25 [01:06<01:41,  6.79s/it]

[I 2025-10-08 14:54:59,206] Trial 9 finished with value: 0.37012690657257485 and parameters: {'n_layers': 3, 'n_units_l1': 12, 'n_units_l2': 4, 'n_units_l3': 16, 'activation': 'relu', 'alpha': 0.0012130221181165164, 'learning_rate_init': 0.0009718319944817398, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  44%|████▍     | 11/25 [01:32<02:55, 12.53s/it]

[I 2025-10-08 14:55:24,722] Trial 10 finished with value: 0.3886756978446689 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'tanh', 'alpha': 0.00018300053640667708, 'learning_rate_init': 0.00011952270129143879, 'batch_size': 256}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 0. Best value: 0.366474:  48%|████▊     | 12/25 [01:44<02:39, 12.24s/it]

[I 2025-10-08 14:55:36,306] Trial 11 finished with value: 0.41123521024337245 and parameters: {'n_layers': 3, 'n_units_l1': 12, 'n_units_l2': 4, 'n_units_l3': 4, 'activation': 'relu', 'alpha': 0.00028592111831985886, 'learning_rate_init': 0.0001317923269837779, 'batch_size': 64}. Best is trial 0 with value: 0.36647443102993993.


Best trial: 12. Best value: 0.360515:  52%|█████▏    | 13/25 [01:47<01:54,  9.56s/it]

[I 2025-10-08 14:55:39,709] Trial 12 finished with value: 0.36051466828445433 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'relu', 'alpha': 0.0005197473211490454, 'learning_rate_init': 0.006665510953435556, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  56%|█████▌    | 14/25 [01:50<01:23,  7.63s/it]

[I 2025-10-08 14:55:42,893] Trial 13 finished with value: 0.3659967761149731 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'relu', 'alpha': 5.1580948836723734e-05, 'learning_rate_init': 0.009818226878189102, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  60%|██████    | 15/25 [01:53<01:01,  6.17s/it]

[I 2025-10-08 14:55:45,622] Trial 14 finished with value: 0.37864407980111736 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'relu', 'alpha': 5.2931921597924574e-05, 'learning_rate_init': 0.009886061418374873, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  64%|██████▍   | 16/25 [01:57<00:49,  5.51s/it]

[I 2025-10-08 14:55:49,574] Trial 15 finished with value: 0.36255921639276106 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'relu', 'alpha': 0.0004971782474948573, 'learning_rate_init': 0.005069521238203797, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  68%|██████▊   | 17/25 [01:59<00:34,  4.37s/it]

[I 2025-10-08 14:55:51,337] Trial 16 finished with value: 0.38364577664347194 and parameters: {'n_layers': 2, 'n_units_l1': 8, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.0005532709028590391, 'learning_rate_init': 0.00425627455524194, 'batch_size': 256}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  72%|███████▏  | 18/25 [02:03<00:30,  4.30s/it]

[I 2025-10-08 14:55:55,474] Trial 17 finished with value: 0.3646730703778574 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.000666431028775116, 'learning_rate_init': 0.004570102541959654, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  76%|███████▌  | 19/25 [02:07<00:26,  4.36s/it]

[I 2025-10-08 14:55:59,975] Trial 18 finished with value: 0.3874914420022282 and parameters: {'n_layers': 1, 'n_units_l1': 8, 'activation': 'tanh', 'alpha': 0.00014508608704143724, 'learning_rate_init': 0.004425557038568155, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  80%|████████  | 20/25 [02:08<00:16,  3.35s/it]

[I 2025-10-08 14:56:00,987] Trial 19 finished with value: 0.3792449297877108 and parameters: {'n_layers': 3, 'n_units_l1': 12, 'n_units_l2': 16, 'n_units_l3': 4, 'activation': 'relu', 'alpha': 0.009666555694984076, 'learning_rate_init': 0.005884303830089448, 'batch_size': 256}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  84%|████████▍ | 21/25 [02:12<00:14,  3.57s/it]

[I 2025-10-08 14:56:05,024] Trial 20 finished with value: 0.36453739268652213 and parameters: {'n_layers': 2, 'n_units_l1': 20, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.00036812112820042684, 'learning_rate_init': 0.00274149616533467, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  88%|████████▊ | 22/25 [02:18<00:12,  4.20s/it]

[I 2025-10-08 14:56:10,723] Trial 21 finished with value: 0.36784907888886864 and parameters: {'n_layers': 2, 'n_units_l1': 20, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.00041248303237075067, 'learning_rate_init': 0.00280964826862046, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  92%|█████████▏| 23/25 [02:22<00:08,  4.22s/it]

[I 2025-10-08 14:56:15,003] Trial 22 finished with value: 0.3738951743726374 and parameters: {'n_layers': 2, 'n_units_l1': 20, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.000946360773587218, 'learning_rate_init': 0.001587358013351102, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 12. Best value: 0.360515:  96%|█████████▌| 24/25 [02:26<00:04,  4.20s/it]

[I 2025-10-08 14:56:19,123] Trial 23 finished with value: 0.37212259529477637 and parameters: {'n_layers': 2, 'n_units_l1': 16, 'n_units_l2': 12, 'activation': 'relu', 'alpha': 7.954826327346708e-05, 'learning_rate_init': 0.00557551331170524, 'batch_size': 64}. Best is trial 12 with value: 0.36051466828445433.


Best trial: 24. Best value: 0.358643: 100%|██████████| 25/25 [02:33<00:00,  6.15s/it]


[I 2025-10-08 14:56:26,021] Trial 24 finished with value: 0.35864318916592725 and parameters: {'n_layers': 2, 'n_units_l1': 20, 'n_units_l2': 8, 'activation': 'relu', 'alpha': 0.0003032566999511661, 'learning_rate_init': 0.0031586493225653225, 'batch_size': 64}. Best is trial 24 with value: 0.35864318916592725.
Final metrics:
 split     R2    MAE   MAPE
train 0.8011 0.3516 0.1933
  val 0.7903 0.3517 0.1986
 test 0.7843 0.3617 0.2026

Artifacts saved to: c:\Users\dww05002\Desktop\OPIM5512\Module2\optuna_artifacts
