### SP500 Stock Demo — Notebook 05: Incremental Retraining and Compare

Use this notebook as a live “how models evolve” demo: we shift the time window forward, retrain on fresher data, and compare the new model with the previously registered version. If the new model wins, we register a new version and set it as default.

What you will show (talk track)
- Why retrain: data and regimes change; we regularly refresh models on the latest window
- How we define windows: old cutoff at max(TS) − 30 days; new cutoff shifts forward by 15 days
- What changes: data between (max(TS) − 30d, max(TS) − 15d] moves from test into train; the newest 15 days becomes the new test
- How we compare: compute RMSE/R² on aligned test windows and only register if improved

Outputs
- Printed cutoffs for old vs new training/test windows
- Side-by-side metrics for old (registry) vs new (freshly trained)
- Optional new registry version if improved


In [None]:
# 0) Imports and session/context
# - Attach to the active Snowflake session used by this notebook
# - Set warehouse, database, schema
# - Load ML libraries for retraining and registry
from snowflake.snowpark.context import get_active_session
from snowflake.snowpark.functions import col, lead, avg, sqrt
from snowflake.snowpark.functions import abs as sp_abs
from snowflake.snowpark.functions import pow as sp_pow
from snowflake.snowpark import Window
from snowflake.ml.modeling.pipeline import Pipeline
from snowflake.ml.modeling.xgboost import XGBRegressor
from snowflake.ml.registry import Registry

session = get_active_session()
session.sql("USE WAREHOUSE DEMO_WH_M").collect()
session.sql("USE DATABASE SP500_STOCK_DEMO").collect()
session.sql("USE SCHEMA DATA").collect()

# Note: retraining uses the same feature table `PRICE_FEATURES` as Notebook 03.
# If you enriched features (V2), ensure they are present in `PRICE_FEATURES`.


### Step 1 — Define dataset and rolling windows

- We reuse `PRICE_FEATURES` and construct the same 3‑month-ahead label (`TARGET_PCT_3M`) as in Notebook 03.
- Old window: cutoff_old = max(TS) − 30 days; train_old = TS < cutoff_old; test_old = TS ≥ cutoff_old.
- New window: cutoff_new = max(TS) − 15 days; train_new = TS < cutoff_new; test_new = TS ≥ cutoff_new.
- What’s “new data”: rows where TS ∈ (cutoff_old, cutoff_new] move from the old test into the new train; rows with TS ∈ (cutoff_new, max(TS)] form the new test.

Demo tip: call out how little code change is needed to move the window forward while leaving the feature and target logic intact.


In [None]:
# 1) Define base dataset and time windows
# - Reuse PRICE_FEATURES and create the same label as in Notebook 03
hourly = session.table('PRICE_FEATURES')
win_order = Window.partition_by('TICKER').order_by(col('TS'))
horizon_hours = 378  # ~3 months ahead label

# Build labeled dataset
ds = (
    hourly
    .with_column('FUT_CLOSE', lead(col('CLOSE'), horizon_hours).over(win_order))  # future close
    .with_column('TARGET_PCT_3M', (col('FUT_CLOSE')/col('CLOSE') - 1))           # 3M-ahead return
    .drop('FUT_CLOSE')
    .filter(col('TARGET_PCT_3M').is_not_null())
)

# Compute cutoffs relative to the labeled dataset's max TS (avoids empty test due to lead horizon)
from datetime import timedelta
from snowflake.snowpark.functions import max as sp_max
max_ts_ds = ds.select(sp_max(col('TS')).alias('mx')).collect()[0]['MX']

# Old split used in V_1 (train_old/test_old)
cutoff_old = max_ts_ds - timedelta(days=30)
train_old = ds.filter(col('TS') < cutoff_old)
test_old = ds.filter(col('TS') >= cutoff_old)

# New split (shift forward by 15 days): move some recent data from test_old into train_new
cutoff_new = max_ts_ds - timedelta(days=15)
train_new = ds.filter(col('TS') < cutoff_new)
test_new = ds.filter(col('TS') >= cutoff_new)

print({'cutoff_old': cutoff_old, 'cutoff_new': cutoff_new,
       'train_old_rows': train_old.count(), 'test_old_rows': test_old.count(),
       'train_new_rows': train_new.count(), 'test_new_rows': test_new.count()})


### Step 2 — Retrain on the new window

- Fixed XGB parameters (keeps the retrain fast/stable for demos); you can swap in the “best” config from Notebook 03.
- Train set is `TS < cutoff_new`; Test set for comparison later is `TS ≥ cutoff_new`.
- Demo tip: emphasize we do not change feature definitions or label—only the time window slides.


In [None]:
# 2) Retrain with fixed XGB params (fast/stable demo setup)
feature_cols = ['RET_1','SMA_5','SMA_20','VOL_20','RSI_PROXY','VOLUME','CLOSE']
label_col = 'TARGET_PCT_3M'
output_col = 'PREDICTED_RETURN'

xgb = XGBRegressor(
    n_estimators=200,     # consider 400–800 + early stopping for production
    max_depth=6,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    input_cols=feature_cols,
    label_cols=[label_col],
    output_cols=[output_col],
    random_state=42,
)

pipe = Pipeline(steps=[('xgb', xgb)])
model_new = pipe.fit(train_new)
print('Retrained on train_new (TS < cutoff_new).')


### Step 3 — Evaluate and compare versions

- We evaluate the previous registry version (V_1) on `test_old` and the newly trained model on `test_new`.
- Both metrics use RMSE and R² computed in-database to avoid data movement.
- Demo tip: highlight you can add additional splits (rolling CV) and richer metrics as needed.


In [None]:
# 3) Evaluate and compare with previous registered version (using snowflake.ml.modeling.metrics)
from snowflake.ml.modeling.metrics import mean_squared_error, r2_score

# Helper to compute metrics with built-ins (see docs)
# https://docs.snowflake.com/en/developer-guide/snowpark-ml/reference/latest/modeling#snowflake-ml-modeling-metrics

def compute_metrics(df, y_col, yhat_col):
    rmse = mean_squared_error(df=df, y_true_col_names=y_col, y_pred_col_names=yhat_col, squared=False)
    r2 = r2_score(df=df, y_true_col_name=y_col, y_pred_col_name=yhat_col)
    return {'rmse': rmse, 'r2': r2}

# Score previous registry version on its old test window
pred_old = Registry(session=session, database_name='SP500_STOCK_DEMO', schema_name='DATA')\
    .get_model('XGB_SP500_RET3M').version('V_1').run(test_old, function_name='PREDICT')
metrics_old = compute_metrics(pred_old.select(label_col, output_col), label_col, output_col)

# Score newly retrained model on the current test window
pred_new = model_new.predict(test_new)
metrics_new = compute_metrics(pred_new.select(label_col, output_col), label_col, output_col)
print({'old': metrics_old, 'new': metrics_new})


### Step 4 — Register if improved (idempotent demo flow)

- We only log a new registry version when the new model’s RMSE improves over the old.
- Version naming increments numerically (V_1, V_2, ...). Default is updated to the new version on success.
- Demo tip: This flow is safe to re-run; if no improvement, it skips registration gracefully.


In [None]:
# 4) Register new version if improved
reg = Registry(session=session, database_name='SP500_STOCK_DEMO', schema_name='DATA')
model_name = 'XGB_SP500_RET3M'

if metrics_new['rmse'] < metrics_old['rmse']:
    models_df = reg.show_models()
    if models_df.empty or model_name not in models_df['name'].to_list():
        version = 'V_1'
    else:
        import ast, builtins
        max_v = builtins.max([int(v.split('_')[-1]) for v in ast.literal_eval(models_df.loc[models_df['name']==model_name,'versions'].values[0])])
        version = f'V_{max_v+1}'
    mv = reg.log_model(
        model_new,
        model_name=model_name,
        version_name=version,
        conda_dependencies=['snowflake-ml-python'],
        comment='Incremental retrain (shifted window) — train_new vs test_new',
        metrics={'rmse': metrics_new['rmse'], 'r2': metrics_new['r2']},
        options={'relax_version': False}
    )
    reg.get_model(model_name).default = version
    print({'registered': version})
else:
    print('No registration: new model did not improve RMSE')


### How to reuse this notebook as a repeatable asset

What this retrains on
- Source: `SP500_STOCK_DEMO.DATA.PRICE_FEATURES`
- Label: `TARGET_PCT_3M = lead(CLOSE, 378h)/CLOSE − 1` per `TICKER`
- Old training/test: `TS < cutoff_old` / `TS ≥ cutoff_old` with `cutoff_old = max(TS) − 30d`
- New training/test: `TS < cutoff_new` / `TS ≥ cutoff_new` with `cutoff_new = max(TS) − 15d`
- New data: all rows with `TS ∈ (cutoff_old, cutoff_new]` that were previously in test now join training; the latest 15 days become the new test

To rerun in future demos
- No manual config if `PRICE_FEATURES` exists and `DEMO_WH_M` is available; the notebook auto-computes `max(TS)`.
- If your account uses different objects, change once at the top:
  - Warehouse/DB/Schema in the session setup
  - Table name for features (`PRICE_FEATURES`)
- Optional knobs:
  - Shift sizes: change 30d and 15d to fit your cadence
  - Model params: use the best config from Notebook 03 or keep the fixed demo params
  - Registration rule: switch from RMSE-only to a composite (e.g., RMSE and R²)

Demo script (short)
- “We roll the window forward by 15 days; yesterday’s test becomes today’s training. We retrain on `TS < cutoff_new`, evaluate on `TS ≥ cutoff_new`, compare against the last registered version on its original test, and only register a new version if it truly improves.”


### Create versioned predictions with ground truth and per-version monitors

This section mirrors the reference flow: it scores a labeled recent slice for `V_1` and the latest registry version, persists per-version prediction tables with class/score/actuals, builds per-version baselines, and creates two monitors so Snowsight can compare V1 vs the latest.


In [None]:
# 5) Score labeled recent slice for V_1 and latest; persist per-version GT tables and baselines
from snowflake.snowpark.functions import col
from snowflake.ml.registry import Registry

reg = Registry(session=session, database_name='SP500_STOCK_DEMO', schema_name='DATA')
model_name = 'XGB_SP500_RET3M'

# Discover versions
models_df = reg.show_models()
versions = []
if not models_df.empty and model_name in models_df['name'].to_list():
    import ast
    versions = sorted(ast.literal_eval(models_df.loc[models_df['name']==model_name,'versions'].values[0]))
latest_version = versions[-1] if versions else None
print({'available_versions': versions, 'latest_version': latest_version})

# Use the labeled dataset 'ds' (contains TARGET_PCT_3M)
cutoff_common = session.sql("select dateadd('day', -60, max(TS)) as c from PRICE_FEATURES").collect()[0]['C']
ds_recent = ds.filter(col('TS') >= cutoff_common)

# Helper: score version on labeled data and produce class/score/actuals table

def score_with_labels(version_name: str, out_table: str):
    mdl = reg.get_model(model_name).version(version_name)
    scored = mdl.run(ds_recent, function_name='PREDICT')
    tmp = f"TMP_{out_table}"
    scored.write.save_as_table(tmp, mode='overwrite')
    session.sql(f"""
        CREATE OR REPLACE TABLE {out_table} AS
        SELECT
          TS,
          TICKER,
          PREDICTED_RETURN,
          TARGET_PCT_3M AS ACTUAL_RETURN,
          IFF(TARGET_PCT_3M > 0, 1, 0) AS ACTUAL_UP,
          IFF(PREDICTED_RETURN > 0, 1, 0) AS PREDICTED_RESPONSE,
          1.0/(1.0 + EXP(-5 * PREDICTED_RETURN)) AS PREDICTED_SCORE
        FROM {tmp}
        WHERE TARGET_PCT_3M IS NOT NULL
    """).collect()
    return out_table

v1_table = None
try:
    v1_table = score_with_labels('V_1', 'PREDICTIONS_SP500_RET3M_V1_GT')
    print({'scored_with_gt': 'V_1', 'table': v1_table})
except Exception as e:
    print('V_1 not available; skipping V1 scoring. Reason:', str(e))

v2_table = None
if latest_version and latest_version != 'V_1':
    v2_table = score_with_labels(latest_version, 'PREDICTIONS_SP500_RET3M_V2_GT')
    print({'scored_with_gt': latest_version, 'table': v2_table})

# Baselines per-version (earliest 14 days)
session.sql("USE WAREHOUSE DEMO_WH_M").collect()
session.sql("USE DATABASE SP500_STOCK_DEMO").collect()
session.sql("USE SCHEMA DATA").collect()

if v1_table:
    session.sql(f"""
        CREATE OR REPLACE TABLE BASELINE_PREDICTIONS_V1 AS
        WITH b AS (SELECT MIN(TS) AS mn FROM {v1_table})
        SELECT p.* FROM {v1_table} p, b
        WHERE p.TS >= b.mn AND p.TS < DATEADD('day', 14, b.mn);
    """).collect()

if v2_table:
    session.sql(f"""
        CREATE OR REPLACE TABLE BASELINE_PREDICTIONS_V2 AS
        WITH b AS (SELECT MIN(TS) AS mn FROM {v2_table})
        SELECT p.* FROM {v2_table} p, b
        WHERE p.TS >= b.mn AND p.TS < DATEADD('day', 14, b.mn);
    """).collect()


In [None]:
-- 6) Create per-version monitors with performance metrics enabled
USE WAREHOUSE DEMO_WH_M;
USE DATABASE SP500_STOCK_DEMO;
USE SCHEMA DATA;

-- V1 monitor (classification-style metrics enabled)
CREATE OR REPLACE MODEL MONITOR SP500_RET3M_MONITOR_V1
WITH 
  MODEL = SP500_STOCK_DEMO.DATA.XGB_SP500_RET3M,
  VERSION = 'V_1',
  FUNCTION = 'PREDICT',
  SOURCE = SP500_STOCK_DEMO.DATA.PREDICTIONS_SP500_RET3M_V1_GT,
  BASELINE = SP500_STOCK_DEMO.DATA.BASELINE_PREDICTIONS_V1,
  WAREHOUSE = DEMO_WH_M,
  REFRESH_INTERVAL = '1 DAY',
  AGGREGATION_WINDOW = '7 DAYS',
  TIMESTAMP_COLUMN = TS,
  ID_COLUMNS = ('TICKER'),
  PREDICTION_CLASS_COLUMNS = ('PREDICTED_RESPONSE'),
  ACTUAL_CLASS_COLUMNS = ('ACTUAL_UP'),
  PREDICTION_SCORE_COLUMNS = ('PREDICTED_SCORE');

-- V2/latest monitor (adjust VERSION if your latest isn't V_2)
CREATE OR REPLACE MODEL MONITOR SP500_RET3M_MONITOR_V2
WITH 
  MODEL = SP500_STOCK_DEMO.DATA.XGB_SP500_RET3M,
  VERSION = 'V_2',
  FUNCTION = 'PREDICT',
  SOURCE = SP500_STOCK_DEMO.DATA.PREDICTIONS_SP500_RET3M_V2_GT,
  BASELINE = SP500_STOCK_DEMO.DATA.BASELINE_PREDICTIONS_V2,
  WAREHOUSE = DEMO_WH_M,
  REFRESH_INTERVAL = '1 DAY',
  AGGREGATION_WINDOW = '7 DAYS',
  TIMESTAMP_COLUMN = TS,
  ID_COLUMNS = ('TICKER'),
  PREDICTION_CLASS_COLUMNS = ('PREDICTED_RESPONSE'),
  ACTUAL_CLASS_COLUMNS = ('ACTUAL_UP'),
  PREDICTION_SCORE_COLUMNS = ('PREDICTED_SCORE');

SHOW MODEL MONITORS LIKE 'SP500_RET3M_MONITOR%';
