In [None]:
import sys, subprocess
if "google.colab" in sys.modules:
    subprocess.run(["pip", "install", "-q", "pandas", "numpy", "scikit-learn", "requests", "pydantic", "jsonschema"])


# Engineer Features from Experiments

**What:** Add normalized metrics, categorical indicators, and time-based fields to the synthetic experiment data.

**Why:** Simple features speed up baseline modeling and reveal patterns without heavy tooling.

**How:** Run the install cell if needed, make sure cleaned data exists, then execute cells. Feature engineering here means creating informative columns (like normalized scores or day-of-week).

**You will learn:** How to create lightweight features that can feed into models or visual summaries.

By the end of this notebook, you will have completed the listed steps and produced the outputs described in the success criteria.

### Success criteria
- You computed normalized metrics.
- You derived categorical/time features.
- You prepared a feature-rich table for downstream use.

In [None]:
from pathlib import Path


def find_data_dir() -> Path:
    candidates = [Path.cwd() / "data", Path.cwd().parent / "data", Path.cwd().parent.parent / "data"]
    for candidate in candidates:
        if (candidate / "sample_texts" / "articles_sample.csv").exists():
            return candidate
    raise FileNotFoundError("data directory not found. Run scripts/generate_synthetic_data.py.")

DATA_DIR = find_data_dir()


In [None]:
import pandas as pd

experiments = pd.read_csv(DATA_DIR / "sample_tabular" / "experiments_sample.csv")
experiments["timestamp"] = pd.to_datetime(experiments["timestamp"])

experiments["metric_normalized"] = (experiments["metric_value"] - experiments["metric_value"].mean()) / experiments["metric_value"].std()
experiments["day_of_week"] = experiments["timestamp"].dt.day_name()
experiments["is_treatment"] = experiments["condition"].str.contains("treatment")
experiments[["experiment_id", "condition", "metric_value", "metric_normalized", "day_of_week", "is_treatment"]].head()


### If you get stuck / What to try next

If you get stuck: verify the cleaning notebook ran and that dependencies installed. What to try next: prototype simple models or export features to the Streamlit app for quick views.