# How Quantreo Generates “Features Info” YAML Files

This notebook illustrates the philosophy behind **Quantreo’s Feature Information pipeline** how raw market data is transformed into structured YAML summaries before being interpreted by the **DSR agent** (Definition–Stability–Robustness).

⚠️ **Educational version only.**
This notebook is designed to explain *the logic* behind the process, without revealing the proprietary code or optimizations used in production.

---

### What the real Quantreo pipeline includes

- Multi-asset and multi-timeframe universes
- Stationarity and redundancy checks (VIF, MI filtering)
- Rolling and cross-asset robustness validation
- Parallelized computations optimized with Numba

---

This simplified version focuses on clarity, helping you understand how Quantreo structures empirical relations between features and redictive targets before DSR analysis.


In [2]:
# ==============================================================
# 1. Imports
# ==============================================================
import pandas as pd
import numpy as np
import yaml

np.random.seed(42)

In [3]:
# ==============================================================
# 2. Simulate basic market data
# ==============================================================
prices = np.cumprod(1 + np.random.normal(0, 0.001, 2000))
df = pd.DataFrame({"close": prices})
df["open"] = df["close"].shift(1)
df["high"] = df[["open", "close"]].max(axis=1) * (1 + np.random.rand(len(df)) * 0.002)
df["low"] = df[["open", "close"]].min(axis=1) * (1 - np.random.rand(len(df)) * 0.002)
df.dropna(inplace=True)

In [5]:
# ==============================================================
# 3. Generate Quantreo-style features
# ==============================================================

"""
Quantreo groups its feature functions into logical categories:
- fe.volatility.*        → volatility estimators
- fe.trend.*             → slope and persistence measures
- fe.math.*              → statistical & distribution metrics
- fe.candle.*            → candle-based structural ratios
- fe.transformation.*    → smoothing, normalization, etc.

Below is a simplified example using Quantreo’s feature engineering tools
to create a few core variables used later in the Features Info pipeline.
"""

import quantreo.features_engineering as fe

# --- 3.1 Base transformations ---
df["close_smoothed"] = fe.transformation.mma(df=df, col="close", window_size=3)
df["returns_10"] = df["close"].pct_change(10)
df["abs_returns_10"] = np.abs(df["returns_10"])

# --- 3.2 Volatility & Trend ---
df["rs_vol_60"] = fe.volatility.rogers_satchell_volatility(
    df=df, high_col="high", low_col="low", open_col="open", close_col="close", window_size=60
)
df["rs_vol_120"] = fe.volatility.rogers_satchell_volatility(
    df=df, high_col="high", low_col="low", open_col="open", close_col="close", window_size=120
)

df["linear_slope_60"] = fe.trend.linear_slope(df=df, col="close", window_size=60)
df["linear_slope_120"] = fe.trend.linear_slope(df=df, col="close", window_size=120)

# --- 3.3 Statistical Shape & Tail Behavior ---
df["tail_returns_100"] = fe.math.tail_index(df=df, col="abs_returns_10", window_size=100, k_ratio=0.10)
df["tail_vol_100"] = fe.math.tail_index(df=df, col="rs_vol_60", window_size=100, k_ratio=0.10)
df["hurst_100"] = fe.math.hurst(df=df, col="close_smoothed", window_size=100)

df["skew_100"] = fe.math.skewness(df=df, col="returns_10", window_size=100)
df["bimodality_coef_200"] = fe.math.bimodality_coefficient(df=df, col="returns_10", window_size=200)

# --- 3.4 Candle Structure ---
df["price_dist_0_25_100"] = fe.candle.price_distribution(
    df=df, col="close", window_size=100, start_percentage=0.0, end_percentage=0.25
)
df["price_dist_75_100_100"] = fe.candle.price_distribution(
    df=df, col="close", window_size=100, start_percentage=0.75, end_percentage=1.00
)

In [6]:
# ==============================================================
# 4. Create a target variable
# ==============================================================
df["future_rs_vol_60"] = df["rs_vol_60"].shift(-60)

In [7]:
df.columns

Index(['close', 'open', 'high', 'low', 'close_smoothed', 'returns_10',
       'abs_returns_10', 'rs_vol_60', 'rs_vol_120', 'linear_slope_60',
       'linear_slope_120', 'tail_returns_100', 'tail_vol_100', 'hurst_100',
       'skew_100', 'bimodality_coef_200', 'price_dist_0_25_100',
       'price_dist_75_100_100', 'future_rs_vol_60'],
      dtype='object')

In [8]:


# ==============================================================
# 5. Compute simplified relations (correlation + MI proxy)
# ==============================================================
def compute_relations(x: pd.Series, y: pd.Series) -> dict:
    """Compute basic correlation and covariance-based MI proxy."""
    x, y = x.dropna(), y.dropna()
    if len(x) != len(y):
        min_len = min(len(x), len(y))
        x, y = x.iloc[-min_len:], y.iloc[-min_len:]

    corr = x.corr(y)
    cov = np.cov(x, y)[0, 1]
    mi = cov / np.sqrt(x.var() * y.var()) if x.var() > 0 and y.var() > 0 else np.nan

    return {
        "correlation": round(float(corr), 3) if pd.notnull(corr) else None,
        "mutual_info": round(float(mi), 3) if pd.notnull(mi) else None,
    }

# Select features to analyze
feature_cols = [
    'returns_10',
       'abs_returns_10', 'rs_vol_60', 'rs_vol_120', 'linear_slope_60',
       'linear_slope_120', 'tail_returns_100', 'tail_vol_100', 'hurst_100',
       'skew_100', 'bimodality_coef_200', 'price_dist_0_25_100',
       'price_dist_75_100_100', 'future_rs_vol_60'
]

relations = {col: compute_relations(df[col], df["future_rs_vol_60"]) for col in feature_cols}

In [11]:
# ==============================================================
# 6. Build YAML dictionary
# ==============================================================
yaml_dict = {
    "target": "future_rs_vol_60",
    "relations": relations,
    "meta": {
        "comment": (
            "Demonstration of how Quantreo structures empirical relations "
            "between features and predictive targets before interpretation by the DSR agent."
        ),
        "generated_with": "Quantreo educational pipeline (simplified demo)",
        "note": (
            "The real implementation includes rolling robustness tests, "
            "cross-asset validation, and automatic feature ranking."
        ),
    },
}

# ==============================================================
# 7. Save example YAML file
# ==============================================================
output_path = "results/rs_vol.yaml"
with open(output_path, "w") as f:
    yaml.dump(yaml_dict, f, sort_keys=False)

print(f"✅ YAML file successfully saved to: {output_path}\n")
print(yaml.dump(yaml_dict, sort_keys=False))

✅ YAML file successfully saved to: results/rs_vol.yaml

target: future_rs_vol_60
relations:
  returns_10:
    correlation: 0.119
    mutual_info: -0.072
  abs_returns_10:
    correlation: -0.022
    mutual_info: 0.115
  rs_vol_60:
    correlation: -0.307
    mutual_info: 1.0
  rs_vol_120:
    correlation: -0.388
    mutual_info: 0.601
  linear_slope_60:
    correlation: 0.212
    mutual_info: 0.056
  linear_slope_120:
    correlation: 0.08
    mutual_info: 0.254
  tail_returns_100:
    correlation: 0.149
    mutual_info: -0.184
  tail_vol_100:
    correlation: -0.236
    mutual_info: -0.225
  hurst_100:
    correlation: 0.065
    mutual_info: -0.01
  skew_100:
    correlation: -0.213
    mutual_info: 0.223
  bimodality_coef_200:
    correlation: 0.036
    mutual_info: 0.019
  price_dist_0_25_100:
    correlation: 0.112
    mutual_info: -0.201
  price_dist_75_100_100:
    correlation: -0.047
    mutual_info: 0.055
  future_rs_vol_60:
    correlation: 1.0
    mutual_info: 1.0
meta:
  com