## SkewSentry Tutorial Notebook

This notebook shows how to use SkewSentry to check training ↔ serving feature parity end-to-end using the programmatic API.

What you'll do:
- Create a small example dataset
- Load a FeatureSpec from YAML
- Use PythonFunctionAdapter for offline/online feature functions
- Run the parity check and view text/HTML/JSON reports

Prereqs:
- You have this repo installed in editable mode (e.g., `uv pip install -e ".[dev]"`).


In [None]:
import pandas as pd

from skewsentry.spec import FeatureSpec
from skewsentry.adapters.python_func import PythonFunctionAdapter
from skewsentry.runner import run_check

OFFLINE = 'offline_features:build_features'
ONLINE = 'online_features:get_features'

# Create a tiny dataset
import numpy as np
np.random.seed(0)

df = pd.DataFrame({
    'user_id': [1,1,2,2,3,3,3],
    'ts': pd.to_datetime(['2024-01-01','2024-01-02','2024-01-01','2024-01-03','2024-01-01','2024-01-02','2024-01-03']),
    'price': [10, 10, 5, 5, 1, 1, 1],
    'qty':   [ 1,  2, 2, 2, 1, 1, 1],
    'country': ['UK','UK','US','US','DE','DE','DE'],
})

df.head()


Unnamed: 0,user_id,ts,price,qty,country
0,1,2024-01-01,10,1,UK
1,1,2024-01-02,10,2,UK
2,2,2024-01-01,5,2,US
3,2,2024-01-03,5,2,US
4,3,2024-01-01,1,1,DE


In [None]:
# Ensure correct paths regardless of current working directory
from pathlib import Path
import sys

REPO = Path('/Users/yasserelhaddar/SkewSentry')
SPEC_PATH = REPO / 'examples' / 'simple' / 'features.yml'
ROOT = SPEC_PATH.parent  # examples/simple
assert SPEC_PATH.exists(), f"Missing spec file: {SPEC_PATH}"

# Make example modules importable
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))

# Ensure artifacts directory exists for outputs
( REPO / 'artifacts').mkdir(exist_ok=True)



In [None]:
# Load the spec and adapters
spec = FeatureSpec.from_yaml(str(SPEC_PATH))

offline = PythonFunctionAdapter(OFFLINE)
online = PythonFunctionAdapter(ONLINE)

report = run_check(
    spec=spec,
    data=df,
    offline=offline,
    online=online,
    sample=None,
    html_out=str(REPO / 'artifacts' / 'parity_report.html'),
    json_out=str(REPO / 'artifacts' / 'parity_report.json'),
)

report.ok, report.summary['failing_features']


FileNotFoundError: [Errno 2] No such file or directory: 'examples/simple/features.yml'

In [None]:
# View a concise text summary
print(report.to_text())
print('HTML:', str(REPO / 'artifacts' / 'parity_report.html'))
print('JSON:', str(REPO / 'artifacts' / 'parity_report.json'))


### Artifacts

- JSON: `artifacts/parity_report.json`
- HTML: `artifacts/parity_report.html`

Open the HTML in your browser to see a friendly report. If you run this in a CI pipeline, you can attach the HTML artifact to the PR.

Tip: You can parameterize `sample` to run quickly on a subset for faster iteration.
