# 03 — Baseline Backtest (Opening Range Strategy)

**Goal of Notebook 3:** turn the audited days from `valid_days.csv` into a clean, reproducible **baseline backtest**:
- One decision per day at **10:22** based on **Opening Range 09:30–10:00 (inclusive)**:
  - **Long** if price is in the **top 35%** of OR
  - **Short** if in the **bottom 35%**
  - **No trade** otherwise
- **Risk:** SL = 25 pts, TP = 75 pts, **forced exit at 12:00**.
- **One trade max per day**.
- Compute P&L in **points**, then convert to **$** via config `point_value_usd`.  
- No slippage/fees at first (we’ll add them later as toggles).

## 3.1 — Baseline Backtest: scope & inputs (read-first)

**This section (3.1) just:**
1) Loads config + `valid_days.csv` (from Notebook 2).  
2) Prepares a map from **year → raw CSV path** to quickly fetch minute bars per day.  
3) Shows a tiny summary so we know how many days will be simulated.

Next section (3.2): we’ll implement a **small loader** that returns the **09:30–12:00** minute bars for a single date, ready for the trade logic.


In [28]:
# 3.1 — Load config, audited days, and wire up file references

from pathlib import Path
import pandas as pd
import yaml

# --- Paths (works whether you're inside notebooks/ or repo root) ---
ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
CONFIG_DIR = ROOT / "config"
DATA_RAW_DIR = ROOT / "data" / "raw"
REPORTS_TBLS = ROOT / "reports" / "tables"

# --- Configs (single source of truth) ---
def load_yaml(p: Path):
    with open(p, "r", encoding="utf-8") as f:
        return yaml.safe_load(f)

STRATEGY = load_yaml(CONFIG_DIR / "strategy.yml")
INSTR    = load_yaml(CONFIG_DIR / "instruments.yml")

# Frequently-used fields
session   = INSTR.get("session", {})
market    = INSTR.get("market", {})
costs     = INSTR.get("costs", {})
data_cfg  = INSTR.get("data", {})
params    = STRATEGY.get("parameters", {})

OR_START  = session.get("or_window", {}).get("start", "09:30")
OR_END    = session.get("or_window", {}).get("end_inclusive", "10:00")  # inclusive
ENTRY_T   = session.get("entry_time", "10:22")
EXIT_T    = session.get("hard_exit_time", "12:00")

TOP_PCT   = params.get("zones", {}).get("top_pct", 0.35)
BOT_PCT   = params.get("zones", {}).get("bottom_pct", 0.35)
SL_PTS    = params.get("risk", {}).get("stop_loss_points", 25)
TP_PTS    = params.get("risk", {}).get("take_profit_points", 75)
POINT_VAL = market.get("point_value_usd", 80.0)

# --- Audited trading days from Notebook 2 ---
valid_path = REPORTS_TBLS / "valid_days.csv"
valid_days = pd.read_csv(valid_path, parse_dates=["date"])
valid_days = valid_days.sort_values("date").reset_index(drop=True)

# --- Map year -> raw CSV path (assumes one file per year, like your current naming) ---
year_files = {}
for p in sorted(DATA_RAW_DIR.glob("*.csv")):
    # find a year token in filename
    y = None
    for token in ["2020","2021","2022","2023","2024","2019"]:
        if token in p.name:
            y = int(token)
            break
    if y:
        year_files[y] = p

# --- Tiny summary so we know our inputs are sane ---
summary = pd.DataFrame.from_dict({
    "valid_days_count": [len(valid_days)],
    "years_present":    [sorted(valid_days["date"].dt.year.unique().tolist())],
    "entry_time":       [ENTRY_T],
    "exit_time":        [EXIT_T],
    "or_window":        [f"{OR_START}–{OR_END} (incl)"],
    "zones(top/bot)":   [f"{TOP_PCT:.2f}/{BOT_PCT:.2f}"],
    "risk(SL/TP pts)":  [f"{SL_PTS}/{TP_PTS}"],
    "$ per point":      [POINT_VAL],
})

print("ROOT:", ROOT)
print("Raw files found by year:")
for y in sorted(year_files):
    print(f"  {y}: {year_files[y].name}")

display(summary.head(1))
display(valid_days.head(5))


ROOT: d:\Projects\OpeningRange
Raw files found by year:
  2020: DAT_ASCII_NSXUSD_M1_2020.csv
  2021: DAT_ASCII_NSXUSD_M1_2021.csv
  2022: DAT_ASCII_NSXUSD_M1_2022.csv
  2023: DAT_ASCII_NSXUSD_M1_2023.csv


Unnamed: 0,valid_days_count,years_present,entry_time,exit_time,or_window,zones(top/bot),risk(SL/TP pts),$ per point
0,902,"[2020, 2021, 2022, 2023]",10:22,12:00,09:30–10:00 (incl),0.35/0.35,25/75,80.0


Unnamed: 0,file,date,or_count,or_high,or_low,or_range,trade_count,expected_trade_minutes,missing_minutes,duplicate_minutes,has_entry_1022,has_exit_1200
0,DAT_ASCII_NSXUSD_M1_2020.csv,2020-01-03,31,8822.17,8752.1,70.07,151,151,0,0,True,True
1,DAT_ASCII_NSXUSD_M1_2020.csv,2020-01-06,31,8783.99,8712.49,71.5,151,151,0,0,True,True
2,DAT_ASCII_NSXUSD_M1_2020.csv,2020-01-07,31,8859.59,8816.84,42.75,151,151,0,0,True,True
3,DAT_ASCII_NSXUSD_M1_2020.csv,2020-01-08,31,8879.5,8831.97,47.53,151,151,0,0,True,True
4,DAT_ASCII_NSXUSD_M1_2020.csv,2020-01-09,31,9003.51,8977.64,25.87,151,151,0,0,True,True


### 3.2 — Day loader (returns 09:30–12:00 window and OR slice for a given date)
**What this does:**  
Given a date like `YYYY-MM-DD`, it:
1) Finds the correct year CSV in `data/raw/`.  
2) Parses timestamps (assumes NY local unless `source_timezone` is set).  
3) Slices the **Opening Range** (09:30–10:00 inclusive) and the **trade window** (09:30–12:00 inclusive).  
4) Computes quick QC (missing/duplicate minutes, OR stats, 10:22/12:00 present).  
5) Returns:
   - `win` → DataFrame for **09:30–12:00** (1-min OHLC),
   - `or_slice` → DataFrame for **09:30–10:00**,
   - `qc` → dict with checks and OR stats.

> **How to use (after running the cell):**  
> `win, or_slice, qc = load_day_window("2020-01-03")`


In [None]:
# 3.2 — Day loader (UPDATED: caches each year's data; PREVIEW switch)

from pathlib import Path
import pandas as pd
import pytz
import yaml
from functools import lru_cache

# Toggle quick prints/heads
PREVIEW = False  # set True only when debugging a specific day

# -- Paths (works whether you run from notebooks/ or repo root)
ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
CONFIG_DIR = ROOT / "config"
DATA_RAW_DIR = ROOT / "data" / "raw"

# -- Load configs
def _load_yaml(p: Path):
    with open(p, "r", encoding="utf-8") as f:
        return yaml.safe_load(f)

INSTR    = _load_yaml(CONFIG_DIR / "instruments.yml")
STRATEGY = _load_yaml(CONFIG_DIR / "strategy.yml")

session   = INSTR.get("session", {})
data_cfg  = INSTR.get("data", {})

TZ_MARKET   = session.get("timezone", "America/New_York")
OR_START    = session.get("or_window", {}).get("start", "09:30")
OR_END      = session.get("or_window", {}).get("end_inclusive", "10:00")  # inclusive
ENTRY_T     = session.get("entry_time", "10:22")
EXIT_T      = session.get("hard_exit_time", "12:00")

DELIM        = data_cfg.get("delimiter", ";")
DATETIME_FMT = data_cfg.get("datetime_format", "%Y%m%d %H%M%S")
SRC_TZ_NAME  = data_cfg.get("source_timezone")  # None ⇒ already NY local

NY = pytz.timezone(TZ_MARKET)
SRC_TZ = pytz.timezone(SRC_TZ_NAME) if SRC_TZ_NAME else None

# -- Map year -> CSV once
year_files = {}
for p in sorted(DATA_RAW_DIR.glob("*.csv")):
    for token in ["2018","2019","2020","2021","2022","2023","2024","2025"]:
        if token in p.name:
            year_files[int(token)] = p
            break

def _parse_index(ts: pd.Series) -> pd.DatetimeIndex:
    idx = pd.to_datetime(ts, format=DATETIME_FMT, errors="coerce")
    if SRC_TZ is not None:
        idx = idx.dt.tz_localize(SRC_TZ, nonexistent="NaT", ambiguous="NaT").dt.tz_convert(NY)
    else:
        idx = idx.dt.tz_localize(NY, nonexistent="NaT", ambiguous="NaT")
    return pd.DatetimeIndex(idx)

@lru_cache(maxsize=8)
def _load_year_df(year: int) -> pd.DataFrame:
    """Read + type + tz-localize/convert ONCE per year, then reuse from cache."""
    fp = year_files[year]
    raw = pd.read_csv(fp, sep=DELIM, header=None)
    if raw.shape[1] != 6:
        raise ValueError(f"{fp.name}: expected 6 columns, found {raw.shape[1]}.")
    raw.columns = ["datetime","open","high","low","close","volume"]
    for c in ["open","high","low","close","volume"]:
        raw[c] = pd.to_numeric(raw[c], errors="coerce")
    idx = _parse_index(raw["datetime"])
    df = raw.drop(columns=["datetime"])
    df.index = idx
    return df[~df.index.isna()].sort_index()

def _expected_index_local(day: pd.Timestamp, start_str: str, end_str: str) -> pd.DatetimeIndex:
    start = NY.localize(pd.Timestamp.combine(day.date(), pd.Timestamp(start_str).time()))
    end   = NY.localize(pd.Timestamp.combine(day.date(), pd.Timestamp(end_str).time()))
    return pd.date_range(start=start, end=end, freq="T", tz=NY)

def load_day_window(date_str: str):
    """
    Returns:
      win:      09:30–12:00 inclusive (DataFrame with OHLC/volume)
      or_slice: 09:30–10:00 inclusive
      qc:       dict with OR stats and window integrity checks
    """
    day = pd.to_datetime(date_str).tz_localize(NY)  # anchor in NY
    y = int(day.year)
    if y not in year_files:
        raise FileNotFoundError(f"No CSV file found for year {y} in {DATA_RAW_DIR}")
    df_year = _load_year_df(y)  # <-- cached

    day_start = NY.localize(pd.Timestamp(day.date()))
    next_day  = day_start + pd.Timedelta(days=1)
    one = df_year[(df_year.index >= day_start) & (df_year.index < next_day)]

    # OR and trade window
    or_slice = one.between_time(OR_START, OR_END, inclusive="both")
    win      = one.between_time("09:30", EXIT_T, inclusive="both")

    # expected minute grid & QC
    tgt_or   = _expected_index_local(day, OR_START, OR_END)
    tgt_win  = _expected_index_local(day, "09:30", EXIT_T)

    missing  = tgt_win.difference(win.index)
    dupes    = int(win.index.duplicated().sum())
    has_1022 = any(win.index.time == pd.Timestamp(ENTRY_T).time())
    has_1200 = any(win.index.time == pd.Timestamp(EXIT_T).time())

    or_high  = float(or_slice["high"].max()) if not or_slice.empty else None
    or_low   = float(or_slice["low"].min())  if not or_slice.empty else None
    or_range = (or_high - or_low) if (or_high is not None and or_low is not None) else None

    qc = {
        "date": day.date().isoformat(),
        "file": year_files[y].name,
        "or_count": int(len(or_slice)),
        "expected_or_minutes": int(len(tgt_or)),
        "or_high": or_high,
        "or_low": or_low,
        "or_range": or_range,
        "trade_count": int(len(win)),
        "expected_trade_minutes": int(len(tgt_win)),
        "missing_minutes": int(len(missing)),
        "duplicate_minutes": dupes,
        "has_entry_1022": bool(has_1022),
        "has_exit_1200": bool(has_1200),
    }

    if PREVIEW:
        print(f"[{date_str}] File: {qc['file']}")
        print("OR slice 09:30–10:00  (rows):", len(or_slice), " | expected:", len(tgt_or))
        print("Trade win 09:30–12:00 (rows):", len(win),      " | expected:", len(tgt_win))
        print("Has 10:22?", qc["has_entry_1022"], " | Has 12:00?", qc["has_exit_1200"],
              " | Missing:", qc["missing_minutes"], " | Duplicates:", qc["duplicate_minutes"])
        if not or_slice.empty:
            print(f"OR High: {qc['or_high']}  OR Low: {qc['or_low']}  OR Range: {qc['or_range']}")
        display(win.head(3)[["open","high","low","close"]])
        display(win.tail(3)[["open","high","low","close"]])

    return win, or_slice, qc


[2020-01-03] File: DAT_ASCII_NSXUSD_M1_2020.csv
OR slice 09:30–10:00  (rows): 31  | expected: 31
Trade win 09:30–12:00 (rows): 151  | expected: 151
Has 10:22? True  | Has 12:00? True  | Missing: 0  | Duplicates: 0
OR High: 8822.17  OR Low: 8752.1  OR Range: 70.06999999999971


  return pd.date_range(start=start, end=end, freq="T", tz=NY)


### 3.3 — Signal & barrier levels for one day (deterministic entry at 10:22)

**What this does (read-first):**
- Input: a trading **date** (e.g., `"2020-01-03"`).
- Uses the day loader (3.2) to get:
  - **OR slice** = 09:30–10:00 (inclusive)
  - **Trade window** = 09:30–12:00 (inclusive)
  - **QC** checks
- Computes:
  - `OR_high`, `OR_low`, `OR_range`
  - **Bottom-35% cutoff** = `OR_low + 0.35 * OR_range`
  - **Top-35% cutoff**    = `OR_high − 0.35 * OR_range`
  - **10:22 close** (`P_10:22`)
  - **Decision**: Long / Short / No trade
  - **Barriers** from entry `E = P_10:22`:
    - Long: `SL = E − 25`, `TP = E + 75`
    - Short: `SL = E + 25`, `TP = E − 75`
- Returns a small **summary dict** and shows a neat, human-readable table.

> Note: This cell **only sets the signal & barriers**. The minute-by-minute execution (TP/SL/12:00) comes next (3.4).


In [None]:
# 3.3 — Signal & barriers (UPDATED: accepts preloaded data; PREVIEW switch)

from dataclasses import dataclass
import pandas as pd

# Use the same PREVIEW flag as 3.2
try:
    PREVIEW
except NameError:
    PREVIEW = False

# Strategy defaults (fallbacks)
try:
    TOP_PCT = STRATEGY["parameters"]["zones"]["top_pct"]
    BOT_PCT = STRATEGY["parameters"]["zones"]["bottom_pct"]
    SL_PTS  = STRATEGY["parameters"]["risk"]["stop_loss_points"]
    TP_PTS  = STRATEGY["parameters"]["risk"]["take_profit_points"]
    ENTRY_T = INSTR["session"]["entry_time"]
    EXIT_T  = INSTR["session"]["hard_exit_time"]
except Exception:
    TOP_PCT = 0.35; BOT_PCT = 0.35
    SL_PTS  = 25;   TP_PTS  = 75
    ENTRY_T = "10:22"; EXIT_T = "12:00"

@dataclass
class DaySignal:
    date: str
    decision: str                 # "long" | "short" | "none" | "no_signal_missing_1022" | "invalid_or"
    entry_time: str
    entry_price: float | None
    sl: float | None
    tp: float | None
    or_high: float | None
    or_low: float | None
    or_range: float | None
    top_cutoff: float | None
    bottom_cutoff: float | None
    has_1022: bool
    has_1200: bool
    notes: str

def _first_close_at(minute_df: pd.DataFrame, hhmm: str) -> float | None:
    try:
        target_t = pd.Timestamp(hhmm).time()
        row = minute_df.loc[minute_df.index.time == target_t]
        return float(row["close"].iloc[0]) if not row.empty else None
    except Exception:
        return None

def compute_signal_for_date(date_str: str, *, win=None, or_slice=None, qc=None) -> DaySignal:
    """If win/or_slice/qc are not provided, loads them (3.2). Otherwise uses the preloaded ones."""
    if (win is None) or (or_slice is None) or (qc is None):
        if "load_day_window" not in globals():
            raise RuntimeError("load_day_window(...) not found. Please run section 3.2 first.")
        win, or_slice, qc = load_day_window(date_str)

    # Basic OR sanity
    if or_slice.empty or qc.get("or_range") is None or qc.get("or_range") <= 0:
        sig = DaySignal(
            date=date_str, decision="invalid_or", entry_time=ENTRY_T,
            entry_price=None, sl=None, tp=None,
            or_high=qc.get("or_high"), or_low=qc.get("or_low"), or_range=qc.get("or_range"),
            top_cutoff=None, bottom_cutoff=None,
            has_1022=bool(qc.get("has_entry_1022")), has_1200=bool(qc.get("has_exit_1200")),
            notes="Opening range invalid or zero."
        )
        if PREVIEW:
            display(pd.DataFrame([sig.__dict__]).T.rename(columns={0:"value"}))
        return sig

    or_high  = float(qc["or_high"])
    or_low   = float(qc["or_low"])
    or_range = float(qc["or_range"])

    # Zone cutoffs
    bottom_cut = or_low  + BOT_PCT * or_range
    top_cut    = or_high - TOP_PCT * or_range

    # Entry price at 10:22
    e = _first_close_at(win, ENTRY_T)
    if e is None:
        sig = DaySignal(
            date=date_str, decision="no_signal_missing_1022", entry_time=ENTRY_T,
            entry_price=None, sl=None, tp=None,
            or_high=or_high, or_low=or_low, or_range=or_range,
            top_cutoff=top_cut, bottom_cutoff=bottom_cut,
            has_1022=False, has_1200=bool(qc.get("has_exit_1200")),
            notes="No 10:22 bar in trade window."
        )
        if PREVIEW:
            display(pd.DataFrame([sig.__dict__]).T.rename(columns={0:"value"}))
        return sig

    # Decision
    if e > top_cut:
        decision = "long"
        sl = e - SL_PTS
        tp = e + TP_PTS
    elif e < bottom_cut:
        decision = "short"
        sl = e + SL_PTS
        tp = e - TP_PTS
    else:
        decision = "none"
        sl = None
        tp = None

    sig = DaySignal(
        date=date_str, decision=decision, entry_time=ENTRY_T,
        entry_price=float(e), sl=float(sl) if sl is not None else None, tp=float(tp) if tp is not None else None,
        or_high=or_high, or_low=or_low, or_range=or_range,
        top_cutoff=float(top_cut), bottom_cutoff=float(bottom_cut),
        has_1022=True, has_1200=bool(qc.get("has_exit_1200")),
        notes="OK"
    )

    if PREVIEW:
        df_preview = pd.DataFrame([sig.__dict__]).T.rename(columns={0:"value"})
        display(df_preview)

    return sig


In [31]:
# Example (you can change the date):
sig = compute_signal_for_date("2020-01-03")
sig


[2020-01-03] File: DAT_ASCII_NSXUSD_M1_2020.csv
OR slice 09:30–10:00  (rows): 31  | expected: 31
Trade win 09:30–12:00 (rows): 151  | expected: 151
Has 10:22? True  | Has 12:00? True  | Missing: 0  | Duplicates: 0
OR High: 8822.17  OR Low: 8752.1  OR Range: 70.06999999999971


  return pd.date_range(start=start, end=end, freq="T", tz=NY)


DaySignal(date='2020-01-03', decision='long', entry_time='10:22', entry_price=8798.09, sl=8773.09, tp=8873.09, or_high=8822.17, or_low=8752.1, or_range=70.06999999999971, top_cutoff=8797.6455, bottom_cutoff=8776.6245, has_1022=True, has_1200=True, notes='OK')

### 3.4 — Single-day execution (10:22 → 12:00): TP/SL/Hard-exit

**What this does (read-first):**
- Takes a `date` (e.g., `"2020-01-03"`).
- Uses **3.2** (loader) and **3.3** (signal) to get entry `E`, `SL`, `TP`.
- Walks minute-by-minute **from 10:23 to 12:00** (we enter at **10:22 close**, so checks start on the **next bar**).
- Detects barrier touches using **bar high/low** (more realistic than close-only).

**Tie-break rule (if a single bar could hit both SL and TP):**
- **Conservative “worse-first”**: count it as **SL** (for long: low≤SL and high≥TP ⇒ SL; for short: high≥SL and low≤TP ⇒ SL).

**Outputs:**
- A small summary table with `exit_time`, `exit_price`, `exit_reason` (`tp` / `sl` / `time`), `PnL_pts`, and `PnL_usd` using your config `$ per point` (and `position_size` if set; default 1).


In [None]:
# 3.4 — Single-day execution (UPDATED: loads day once; reuses for signal + path; PREVIEW switch)

from dataclasses import dataclass
import pandas as pd

# PREVIEW flag shared with earlier cells
try:
    PREVIEW
except NameError:
    PREVIEW = False

# $/pt and position size
try:
    POINT_VAL = INSTR["market"]["point_value_usd"]
except Exception:
    POINT_VAL = 80.0
try:
    POS_SIZE = INSTR["market"].get("position_size", 1.0)
except Exception:
    POS_SIZE = 1.0

@dataclass
class DayExecution:
    date: str
    decision: str                 # long | short | none | invalid_or | no_signal_missing_1022
    entry_time: str
    entry_price: float | None
    sl: float | None
    tp: float | None
    exit_time: str | None
    exit_price: float | None
    exit_reason: str | None       # "tp" | "sl" | "time" | "no_trade" | None
    pnl_pts: float | None
    pnl_usd: float | None
    notes: str

def execute_day(date_str: str) -> DayExecution:
    # Dependencies from 3.2 and 3.3
    if "load_day_window" not in globals():
        raise RuntimeError("Please run section 3.2 first (load_day_window).")
    if "compute_signal_for_date" not in globals():
        raise RuntimeError("Please run section 3.3 first (compute_signal_for_date).")

    # Load ONCE
    win, or_slice, qc = load_day_window(date_str)
    sig = compute_signal_for_date(date_str, win=win, or_slice=or_slice, qc=qc)

    # Fast exits for no-trade / invalid
    if sig.decision in ("invalid_or", "no_signal_missing_1022"):
        out = DayExecution(
            date=date_str, decision=sig.decision, entry_time=sig.entry_time,
            entry_price=sig.entry_price, sl=sig.sl, tp=sig.tp,
            exit_time=None, exit_price=None, exit_reason=None,
            pnl_pts=None, pnl_usd=None, notes=sig.notes
        )
        if PREVIEW:
            display(pd.DataFrame([out.__dict__]).T.rename(columns={0:"value"}))
        return out

    if sig.decision == "none":
        out = DayExecution(
            date=date_str, decision="none", entry_time=sig.entry_time,
            entry_price=sig.entry_price, sl=None, tp=None,
            exit_time=sig.entry_time, exit_price=sig.entry_price, exit_reason="no_trade",
            pnl_pts=0.0, pnl_usd=0.0, notes="Middle zone at 10:22; no position."
        )
        if PREVIEW:
            display(pd.DataFrame([out.__dict__]).T.rename(columns={0:"value"}))
        return out

    # Entry happens at 10:22 close; start checks from the **next** minute
    entry_t = pd.Timestamp(sig.entry_time).time()
    path = win.loc[win.index.time > entry_t].copy()  # 10:23 ... 12:00 inclusive

    # Hard exit bar
    has_1200 = any(path.index.time == pd.Timestamp("12:00").time()) or bool(qc.get("has_exit_1200"))
    if not has_1200:
        hard_exit_time = path.index.max()
    else:
        hard_exit_time = path.index[path.index.time == pd.Timestamp("12:00").time()][0]

    E  = float(sig.entry_price)
    SL = float(sig.sl)
    TP = float(sig.tp)

    exit_ts = None
    exit_px = None
    exit_reason = None

    # Conservative tie-break: if both TP and SL touched in same bar, count SL
    if sig.decision == "long":
        for ts, row in path.iterrows():
            hi = float(row["high"]); lo = float(row["low"])
            if (lo <= SL) and (hi >= TP):
                exit_ts, exit_px, exit_reason = ts, SL, "sl"; break
            if lo <= SL:
                exit_ts, exit_px, exit_reason = ts, SL, "sl"; break
            if hi >= TP:
                exit_ts, exit_px, exit_reason = ts, TP, "tp"; break
        if exit_ts is None:
            last = win.loc[win.index == hard_exit_time]
            exit_ts = hard_exit_time
            exit_px = float(last["close"].iloc[0]) if not last.empty else float(path.iloc[-1]["close"])
            exit_reason = "time"
        pnl_pts = exit_px - E

    elif sig.decision == "short":
        for ts, row in path.iterrows():
            hi = float(row["high"]); lo = float(row["low"])
            if (hi >= SL) and (lo <= TP):
                exit_ts, exit_px, exit_reason = ts, SL, "sl"; break
            if hi >= SL:
                exit_ts, exit_px, exit_reason = ts, SL, "sl"; break
            if lo <= TP:
                exit_ts, exit_px, exit_reason = ts, TP, "tp"; break
        if exit_ts is None:
            last = win.loc[win.index == hard_exit_time]
            exit_ts = hard_exit_time
            exit_px = float(last["close"].iloc[0]) if not last.empty else float(path.iloc[-1]["close"])
            exit_reason = "time"
        pnl_pts = E - exit_px

    pnl_usd = pnl_pts * POINT_VAL * POS_SIZE

    out = DayExecution(
        date=date_str, decision=sig.decision, entry_time=sig.entry_time,
        entry_price=E, sl=SL, tp=TP,
        exit_time=str(exit_ts.tz_convert(win.index.tz).time()) if hasattr(exit_ts, "tzinfo") else str(exit_ts),
        exit_price=exit_px, exit_reason=exit_reason,
        pnl_pts=float(pnl_pts), pnl_usd=float(pnl_usd),
        notes="Conservative tie-break; checks start after entry bar."
    )

    if PREVIEW:
        display(pd.DataFrame([out.__dict__]).T.rename(columns={0:"value"}))

    return out


In [33]:
# Example run (you can change the date):
ex = execute_day("2020-01-03")
ex


  return pd.date_range(start=start, end=end, freq="T", tz=NY)


[2020-01-03] File: DAT_ASCII_NSXUSD_M1_2020.csv
OR slice 09:30–10:00  (rows): 31  | expected: 31
Trade win 09:30–12:00 (rows): 151  | expected: 151
Has 10:22? True  | Has 12:00? True  | Missing: 0  | Duplicates: 0
OR High: 8822.17  OR Low: 8752.1  OR Range: 70.06999999999971
[2020-01-03] File: DAT_ASCII_NSXUSD_M1_2020.csv
OR slice 09:30–10:00  (rows): 31  | expected: 31
Trade win 09:30–12:00 (rows): 151  | expected: 151
Has 10:22? True  | Has 12:00? True  | Missing: 0  | Duplicates: 0
OR High: 8822.17  OR Low: 8752.1  OR Range: 70.06999999999971


  return pd.date_range(start=start, end=end, freq="T", tz=NY)


DayExecution(date='2020-01-03', decision='long', entry_time='10:22', entry_price=8798.09, sl=8773.09, tp=8873.09, exit_time='12:00:00', exit_price=8820.81, exit_reason='time', pnl_pts=22.719999999999345, pnl_usd=1817.5999999999476, notes='Conservative tie-break; checks start after entry bar.')

### 3.5 — Batch backtest over all valid days (strict mode)

**What this does (read-first):**
- Iterates **only** the dates in `valid_days.csv` (the audited, tradable set).
- For each date: calls **3.3** (signal) + **3.4** (execution) to get `exit_reason`, `PnL_pts`, `PnL_usd`, etc.
- Aggregates daily results; computes:
  - #days simulated, #signals long/short, #no-trade
  - #trades executed (long/short only), wins, losses, **win rate**
  - Sum of **PnL (pts)** and **PnL ($)**
  - **Equity curve** (using `initial_capital`), **Max Drawdown** (absolute & %)
- Saves a CSV (`reports/tables/backtest_daily.csv`) for reproducibility.

> Notes:
> - A “trade” is a day with **decision in {long, short}**. “none” are counted separately.
> - Profit is in **points**, then converted to **$** with your config’s `point_value_usd × position_size`.
> - Equity = `initial_capital + cumulative PnL ($)`.


In [36]:
# 3.5 — Batch backtest

from pathlib import Path
import pandas as pd
import numpy as np
from tqdm.auto import tqdm
from contextlib import contextmanager, redirect_stdout, redirect_stderr
import io, IPython.display as ipd

# --- Paths
ROOT = Path.cwd().parent if Path.cwd().name == "notebooks" else Path.cwd()
REPORTS_TBLS = ROOT / "reports" / "tables"
REPORTS_TBLS.mkdir(parents=True, exist_ok=True)

# --- Load audited days
valid_path = REPORTS_TBLS / "valid_days.csv"
if "valid_days" in globals():
    vd = valid_days.copy()
else:
    vd = pd.read_csv(valid_path, parse_dates=["date"])
vd = vd.sort_values("date").reset_index(drop=True)

# --- Pull config values (with safe fallbacks)
try:
    init_capital = STRATEGY.get("reporting", {}).get("initial_capital", 100_000)
except Exception:
    init_capital = 100_000
try:
    point_val = INSTR["market"]["point_value_usd"]
except Exception:
    point_val = 80.0
try:
    pos_size = INSTR["market"].get("position_size", 1.0)
except Exception:
    pos_size = 1.0

# --- Require dependencies from 3.2–3.4
if "execute_day" not in globals():
    raise RuntimeError("Please run sections 3.2, 3.3, and 3.4 first.")

@contextmanager
def mute_everything():
    buf_out, buf_err = io.StringIO(), io.StringIO()
    # Save originals
    orig_ipd_display = ipd.display
    orig_global_display = globals().get("display", None)
    try:
        # Silence any display(...) calls
        ipd.display = lambda *a, **k: None
        if orig_global_display is not None:
            globals()["display"] = lambda *a, **k: None
        # Silence print/stdout/stderr
        with redirect_stdout(buf_out), redirect_stderr(buf_err):
            yield
    finally:
        # Restore
        ipd.display = orig_ipd_display
        if orig_global_display is not None:
            globals()["display"] = orig_global_display


# --- Iterate
dates = vd["date"].dt.strftime("%Y-%m-%d").tolist()
records = []
for d in tqdm(dates, desc="Backtesting valid days", unit="day"):
    with mute_everything():            # <- wraps all nested calls (3.3 + 3.4 + 3.2)
        ex = execute_day(d)
    records.append({
        "date": d,
        "decision": ex.decision,
        "entry_time": ex.entry_time,
        "entry_price": ex.entry_price,
        "SL": ex.sl,
        "TP": ex.tp,
        "exit_time": ex.exit_time,
        "exit_price": ex.exit_price,
        "exit_reason": ex.exit_reason,
        "pnl_pts": ex.pnl_pts,
        "pnl_usd": ex.pnl_usd,
        "notes": ex.notes,
    })

bt = pd.DataFrame.from_records(records)
bt["date"] = pd.to_datetime(bt["date"])

# --- Save raw daily results
bt_path = REPORTS_TBLS / "backtest_daily.csv"
bt.sort_values("date").to_csv(bt_path, index=False)

# --- KPIs
is_trade   = bt["decision"].isin(["long","short"])
trades     = bt[is_trade].copy()
wins       = trades["pnl_pts"] > 0
losses     = trades["pnl_pts"] < 0
n_days     = len(bt)
n_trades   = int(is_trade.sum())
n_wins     = int(wins.sum())
n_losses   = int(losses.sum())
n_flat     = int((trades["pnl_pts"] == 0).sum())
n_none     = int((bt["decision"] == "none").sum())
winrate    = (n_wins / n_trades * 100.0) if n_trades > 0 else 0.0

sum_pts    = float(trades["pnl_pts"].sum())
sum_usd    = float(trades["pnl_usd"].sum())

# --- Equity & drawdown (USD)
bt["pnl_usd_filled"] = bt["pnl_usd"].fillna(0.0)
bt["equity"] = init_capital + bt.sort_values("date")["pnl_usd_filled"].cumsum()
bt["equity_peak"] = bt["equity"].cummax()
bt["drawdown"] = bt["equity_peak"] - bt["equity"]
max_dd = float(bt["drawdown"].max())
max_dd_pct = float(max_dd / bt["equity_peak"].max() * 100.0) if bt["equity_peak"].max() > 0 else 0.0

# --- Small monthly summary (USD)
bt["year"] = bt["date"].dt.year
bt["month"] = bt["date"].dt.month
monthly = bt.groupby(["year","month"]).agg(
    days=("date","count"),
    trades=("decision", lambda s: int(s.isin(["long","short"]).sum())),
    pnl_usd=("pnl_usd_filled","sum"),
)
# monthly return uses equity Δ within month
bt["equity_shift"] = bt["equity"].shift(1).fillna(init_capital)
bt_month = bt.groupby(["year","month"]).agg(
    eq_start=("equity_shift","first"),
    eq_end=("equity","last")
)
bt_month["ret_pct"] = np.where(
    bt_month["eq_start"] > 0,
    100.0 * (bt_month["eq_end"] - bt_month["eq_start"]) / bt_month["eq_start"],
    np.nan
)
monthly = monthly.join(bt_month, how="left")

# --- Summary table
summary = pd.DataFrame({
    "days_simulated":         [n_days],
    "trades_executed":        [n_trades],
    "no_trade_days":          [n_none],
    "wins":                   [n_wins],
    "losses":                 [n_losses],
    "flats":                  [n_flat],
    "winrate_%":              [round(winrate, 2)],
    "sum_pnl_pts":            [round(sum_pts, 2)],
    "sum_pnl_usd":            [round(sum_usd, 2)],
    "initial_capital":        [init_capital],
    "final_equity":           [round(float(bt['equity'].iloc[-1]) if len(bt) else init_capital, 2)],
    "max_drawdown_usd":       [round(max_dd, 2)],
    "max_drawdown_pct":       [round(max_dd_pct, 2)],
    "point_value_usd":        [point_val],
    "position_size":          [pos_size],
}).T.rename(columns={0:"value"})

print("Saved daily results to:", bt_path)
display(summary)

# --- Quick visuals: Equity + Drawdown
import matplotlib.pyplot as plt

plt.figure(figsize=(10,4))
plt.plot(bt["date"], bt["equity"])
plt.title("Equity Curve (USD)")
plt.xlabel("Date"); plt.ylabel("Equity")
plt.tight_layout(); plt.show()

plt.figure(figsize=(10,3))
plt.plot(bt["date"], bt["drawdown"])
plt.title("Drawdown (USD)")
plt.xlabel("Date"); plt.ylabel("Drawdown")
plt.tight_layout(); plt.show()

# Peek at daily trades
display(bt.head(10).sort_values("date").reset_index(drop=True))
display(monthly.reset_index().sort_values(["year","month"]))


  from .autonotebook import tqdm as notebook_tqdm
Backtesting valid days:   8%|▊         | 74/902 [03:50<43:01,  3.12s/day]


KeyboardInterrupt: 