# 7.1 Linear Quantile Regression (L-QR)

> *Purpose — A fully-transparent, parametric benchmark.*  
> Measures how much predictive signal is already captured by a linear
> relationship between engineered features and 72-h returns before we
> invoke non-linear ML models.

### 1 Model specification  
\[
\hat{Q}_{\tau}\!\left(R_{t+72h}\mid\mathbf{x}_{t}\right)
  \;=\; \beta_{0,\tau} \;+\; \mathbf{x}_{t}^{\!\top}\,\boldsymbol{\beta}_{\tau},
  \qquad \tau \in \{0.05,\,0.25,\,0.50,\,0.75,\,0.95\}
\]

* Estimator : `statsmodels.QuantReg` (Koenker–Bassett).  
* Loss : pinball (check) function at each τ.  
* No regularisation → interpretability of raw coefficients.

---

### 2 Pre-processing decisions  

| Aspect | Implementation | Rationale |
|--------|----------------|-----------|
| **Missing numeric** | **within each fold**: median-impute *only* `holder_*` and `tx_per_account`; all other features are complete. | Preserves cross-sectional variation; avoids leakage. |
| **Categoricals** | One-hot encode (`pandas.get_dummies`, drop first level). | Linear model cannot split on ordinal codes. |
| **Feature scaling** | `RobustScaler` on numeric predictors. | Handles fat-tailed distributions without distorting outliers. |

`sklearn.pipeline.Pipeline` is used so **identical transformations** are
learned on the train slice and applied to the test slice in every rolling
fold.

---

### 3 Rolling cross-validation protocol  

* **Window** : 120 train bars (≈ 60 d) ➜ 24 calibration bars  
  (reserved for CQR later, but still part of the train fit here) ➜
  6 test bars (72 h).  
* **Tokens** : loop over 21 tokens; concatenate fold metrics.  
* **Metrics stored per fold**   
  – pinball loss (τ) – absolute error (median only) – coverage of empirical  
  90 % interval (τ = 0.05/0.95) – width of that interval.

---

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import RobustScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
import statsmodels.api as sm
import pandas as pd, numpy as np