# Hyper-parameter Optimisation of the LightGBM-CQR Benchmark  

> *Goal: sharpen prediction intervals and reduce training time **without
> sacrificing split-conformal coverage***  

---

## 1  Motivation  

The untuned LightGBM-CQR model already outperforms the linear-QR
baseline, but:

| Pain-point | Symptom | Effect on study |
|------------|---------|-----------------|
| **Over-capacity** | 500 trees × unlimited depth | Slow training; risk of noisy tail estimates |
| **Occasional over-coverage** | 80 % PI covers 92–100 % on some tokens | Wider-than-necessary bands ⇒ lost trading edge |
| **Heterogeneous feature scales** | Deep trees learn spiky leaf quantiles | Conformal layer must add large δ, inflating width |

Tuning offers a principled path to **tighter, faster, still-valid**
intervals.

---

## 2  Literature & documentation cues  

| Source | Key takeaway | How we incorporate it |
|--------|--------------|-----------------------|
| Ke et al. (2017) – LightGBM paper | Sweet-spot depth ≈ 6–10 and moderate `num_leaves` | Search `max_depth∈[3,12]`, `num_leaves∈[16,256]` |
| LightGBM *Parameters Tuning* guide ¹ | Bagging & column-sampling accelerate and decorrelate trees | Tune `bagging_fraction`, `feature_fraction`, `bagging_freq` |
| Romano et al. (2019) – Conformalized QR | Validity is preserved under any base model; sharper base ⇒ smaller conformal δ | Optimise pinball loss; coverage enforced post-hoc |
| Giordano & Yang (2023) | Larger `min_data_in_leaf` stabilises tail quantiles | Search `min_data_leaf∈[20,500]` |
| Bergstra & Bengio (2012) | Random / Bayesian > grid when search budget limited | Use **Optuna-TPE** (Bayesian) with 50–150 trials |

> ¹ *<https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html>*

---

## 3  Optimisation design  

| Component | Choice | Justification |
|-----------|--------|---------------|
| **Objective** | Mean pinball loss across τ = 0.05, 0.25, 0.50, 0.75, 0.95 | Single scalar summarises complete predictive distribution |
| **Constraint** | Split-conformal applied **after** tuning → guarantees marginal coverage ≥ nominal | Simplifies search: no need for custom constrained optimiser |
| **Search method** | Optuna TPE sampler, `n_trials = 120`, early-pruning on 3-fold CV | Empirically ~15 % lower loss than random with same budget |
| **CV scheme** | *Chronological* 3-fold (no shuffling) within training span | Respects time-order and avoids leakage |
| **Hyper-parameter space** | See Table 1 | Covers capacity, regularisation & sampling knobs |
| **Hardware** | 8-core CPU; `n_jobs=-1` inside LightGBM | Parallel tree-building + parallel Optuna trials |
| **Reproducibility** | `seed = 42` in both Optuna and LightGBM; trials archived to `tuning/lgbm_cqr_trials.csv` | Deterministic reruns; traceable decisions |

### Table 1  Search space  

| Symbol | LightGBM key | Range (log-uniform **★**) |
|--------|--------------|---------------------------|
| η | `learning_rate` | 0.01 – 0.15 ★ |
| L | `num_leaves` | 16 – 256 ★ |
| d | `max_depth` | 3 – 12 |
| _n<sub>leaf</sub>_ | `min_data_in_leaf` | 20 – 500 ★ |
| λ₁ | `lambda_l1` | 0 – 2 ★ |
| λ₂ | `lambda_l2` | 0 – 2 ★ |
| γ | `min_gain_to_split` | 0 – 0.2 |
| _f_ | `feature_fraction` | 0.5 – 1.0 |
| _b_ | `bagging_fraction` | 0.5 – 1.0 |
| _k_ | `bagging_freq` | 0 – 10 |

---

## 4  Procedure (to be executed)  

1. **Load** frozen feature matrix `features_v1_tail.parquet`.  
2. **Define** Optuna objective → returns 3-fold mean pinball.  
3. **Run** `study.optimize(..., n_trials=120, timeout=5 400 s)`.  
4. **Refit** one model per τ with the *best* hyper-params & early-stopping.  
5. **Apply conformal adjustment** on calibration slice (δₜ).  
6. **Evaluate** on rolling test windows:  
   * pinball loss,  
   * empirical coverage @ 80 %,  
   * interval width.  
7. **Compare** to untuned baseline ⇒ accept if  
   * pinball ↓ ≥ 3 % **and**  
   * coverage ∈ [0.78, 0.82].

---

## 5  Expected outcomes  

* **Sharper intervals** – prior work reports 10–20 % narrower PI for crypto VaR after tuning depth + leaf size.  
* **Faster training** – early-stopping typically finds optimal trees ≈ 600–800 vs hard cap 4 000.  
* **Stable coverage** – conformal step guarantees validity; better base model → smaller δ → less over-coverage.

---

## 6  Versioning plan  

| Artefact | Path | Notes |
|----------|------|-------|
| Optuna trials | `tuning/lgbm_cqr_trials.csv` | full hyper-param trace |
| Best params (JSON) | `models/lgb_cqr_v2/params.json` | one file, reused for all τ |
| Trained models | `models/lgb_cqr_v2/q05.txt` … | stored per quantile |
| Metrics | `metrics/lgb_cqr_v2_pinball.csv` | same columns as v1 |
| Plots | `figures/lgb_cqr_v2_*` | auto-generated via `evaluate_lgbm_cqr()` |

Freeze `v2` once accepted → downstream QRF will benchmark against it.

---


In [7]:
# ============================================================
# 0.  Imports & environment check
# ============================================================
import os, gc, json, joblib, warnings, datetime as dt
import numpy as np
import pandas as pd

import lightgbm as lgb
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import make_scorer
from sklearn.model_selection import train_test_split

import optuna
from optuna.samplers import TPESampler
from optuna.pruners  import HyperbandPruner
from optuna_integration import LightGBMTuner  # backup (CPU only)

warnings.filterwarnings("ignore", category=UserWarning)

print("LightGBM:", lgb.__version__, "| Optuna:", optuna.__version__)

# -------- Robust GPU probe (works on any build) --------
def lightgbm_has_gpu() -> bool:
    """Return True if the loaded LightGBM DLL was compiled with CUDA / OpenCL."""
    try:
        # available since v3.3.0; returns 'CPU' on cpu-only builds
        return lgb.get_device_name(0) != "CPU"
    except AttributeError:         # very old 3.2.x or earlier
        return False

gpu_available = lightgbm_has_gpu()
print("LightGBM GPU support:", gpu_available)

LightGBM: 3.3.5 | Optuna: 3.6.0
LightGBM GPU support: False
