# Shared Across Methods

| Feature     | Type        | Meaning                                 |
| ----------- | ----------- | --------------------------------------- |
| `x_time`    | numeric     | Temporal index (trend + seasonality)    |
| `x_load`    | numeric     | System stress / demand intensity        |
| `x_quality` | numeric     | Data quality / sensor reliability proxy |
| `x_env`     | numeric     | Environmental volatility                |
| `x_group`   | categorical | Segment / customer / machine ID         |
| `x_shift`   | binary      | Regime change indicator                 |


| Feature     | Interpretation                     |
| ----------- | ---------------------------------- |
| `x_time`    | Operating hours                    |
| `x_load`    | Mechanical stress                  |
| `x_quality` | Sensor confidence                  |
| `x_env`     | Temperature / vibration volatility |
| `x_group`   | Machine ID                         |
| `x_shift`   | Maintenance policy change          |
| `y`         | Expected failure cost              |

## Synthetic Data Generator

In [1]:
import numpy as np
import pandas as pd

np.random.seed(42)

n = 2000
groups = 20

df = pd.DataFrame({
    "x_time": np.arange(n),
    "x_load": np.random.gamma(shape=2.0, scale=1.5, size=n),
    "x_quality": np.random.uniform(0.5, 1.0, size=n),
    "x_env": np.random.normal(0, 1, size=n),
    "x_group": np.random.randint(0, groups, size=n),
    "x_shift": (np.arange(n) > n * 0.6).astype(int)
})

# Group-level latent effect (epistemic)
group_effect = np.random.normal(0, 2, groups)

# True signal
signal = (
    0.03 * df["x_time"]
    + 1.5 * np.sin(df["x_time"] / 20)
    + 2.0 * df["x_load"]
    + group_effect[df["x_group"]]
    + 3.0 * df["x_shift"]
)

# Heteroskedastic noise (aleatoric)
noise_std = 0.5 + 1.5 * (1 - df["x_quality"]) + 0.8 * np.abs(df["x_env"])

df["y"] = signal + np.random.normal(0, noise_std)

df.head()


Unnamed: 0,x_time,x_load,x_quality,x_env,x_group,x_shift,y
0,0,3.590519,0.890945,-0.368939,6,0,9.324415
1,1,2.241697,0.882929,0.213809,14,0,5.008318
2,2,2.073425,0.845828,-0.917976,2,0,4.275296
3,3,2.073453,0.982819,1.73117,15,0,8.967254
4,4,6.974572,0.696778,-0.63541,0,0,15.714185


In [2]:
df.to_csv("synthetic_data.csv", index=False)