# EUR/PLN Tick Strategy — Double Volatility Breakout (EGARCH‑adaptive)
**Author:** <your name>

**Data window:** 2025‑09‑01 00:00:00 to 2025‑10‑06 00:00:00 (tick‑by‑tick)

**Out‑of‑sample evaluation:** 2025‑09‑22 00:00:00 to 2025‑10‑06 00:00:00

**Goal:** Build a technically correct, relatively original, and profitable strategy on the OOS window, using tools from the course.

> Tip: keep parameters ≤5 to curb data‑snooping. Add rich comments explaining each choice.

## 0. Project requirements (checklist)
- Download tick EUR/PLN data for the full window
- Implement strategy and backtest with transaction costs
- Report **gross/net P&L** and **Sharpe ratio** on the OOS window
- Provide concise email‑body bullets with the strategy idea & metrics
- Ensure no look‑ahead, minimize biases; keep parameters simple

## 1. Environment & packages

In [6]:

# Recommended: create a dedicated virtual env before running locally.
# Install as needed:
# !pip install pandas numpy pyarrow polars==0.20.* plotly arch hmmlearn scikit-learn statsmodels nbformat pytz tzdata
# Optional data helpers if you choose to pull from brokers:
# !pip install dukascopy  # or 'dukascopy-trader', depending on package choice

import os, math
import numpy as np
import pandas as pd

pd.set_option("display.width", 120)
pd.set_option("display.max_columns", 50)


## 2. Data download & loading
Choose one source (free suggestions from class): **Dukascopy** or **TrueFX**. Save raw ticks (bid/ask if available). Keep a **parquet** copy.

In [11]:

import pandas as pd
from datetime import datetime, timezone

DATA_DIR = "./data"
os.makedirs(DATA_DIR, exist_ok=True)

START = pd.Timestamp('2025-09-01 00:00:00', tz='UTC')
END   = pd.Timestamp('2025-10-06 00:00:00', tz='UTC')

def load_ticks_from_parquet(path: str) -> pd.DataFrame:
    df = pd.read_parquet(path)
    if "ts" not in df.columns:
        raise ValueError("Expected 'ts' column in parquet.")
    df["ts"] = pd.to_datetime(df["ts"], utc=True)
    df = df.sort_values("ts").reset_index(drop=True)
    return df

PARQUET_PATH = os.path.join(DATA_DIR, "eurpln_ticks.parquet")
if os.path.exists(PARQUET_PATH):
    ticks = load_ticks_from_parquet(PARQUET_PATH)
else:
    # Dummy data so the template runs; replace with real data load.
    rng = pd.date_range(START, END, freq="5min", tz="UTC", inclusive="left")
    mid = 4.5 + np.cumsum(np.random.normal(scale=0.0005, size=len(rng)))
    spread = np.full(len(rng), 0.0004)  # 4 pips proxy
    ticks = pd.DataFrame({"ts": rng, "bid": mid - spread/2, "ask": mid + spread/2})

ticks.head()


Unnamed: 0,ts,bid,ask
0,2025-09-01 00:00:00+00:00,4.500135,4.500535
1,2025-09-01 00:05:00+00:00,4.500177,4.500577
2,2025-09-01 00:10:00+00:00,4.499469,4.499869
3,2025-09-01 00:15:00+00:00,4.500656,4.501056
4,2025-09-01 00:20:00+00:00,4.50058,4.50098


## 3. Cleaning & mid‑price series
- Remove obvious outliers, non‑finite values
- Create mid price; compute microstructure returns
- (Optional) resample to **1s** bars for speed, keeping highest fidelity feasible.

In [12]:

def basic_clean(df: pd.DataFrame) -> pd.DataFrame:
    df = df.copy()
    df = df.replace([np.inf, -np.inf], np.nan).dropna()
    good = (df["ask"] > df["bid"]) & ((df["ask"] - df["bid"]) < 0.02)  # 2 grosze sanity
    df = df.loc[good]
    df["mid"] = (df["bid"] + df["ask"]) / 2
    return df

ticks = basic_clean(ticks)
s1 = (ticks.set_index("ts")[["bid","ask","mid"]]
      .resample("1s")
      .last()
      .dropna()
      .reset_index())
s1.rename(columns={"ts":"time"}, inplace=True)
s1.head()


Unnamed: 0,time,bid,ask,mid
0,2025-09-01 00:00:00+00:00,4.500135,4.500535,4.500335
1,2025-09-01 00:05:00+00:00,4.500177,4.500577,4.500377
2,2025-09-01 00:10:00+00:00,4.499469,4.499869,4.499669
3,2025-09-01 00:15:00+00:00,4.500656,4.501056,4.500856
4,2025-09-01 00:20:00+00:00,4.50058,4.50098,4.50078


## 4. Intraday seasonality filter (FX active hours)
Filter to active overlap hours to reduce false signals (e.g., 07:00–18:00 Europe/Warsaw). Adjust if you prefer 24/5 trading.

In [15]:

import pytz

def warsaw_active_mask(ts: pd.Series) -> pd.Series:
    tz = "Europe/Warsaw"
    local = ts.dt.tz_convert(tz)
    hours = local.dt.hour
    return (hours >= 7) & (hours <= 18)

s1["active"] = warsaw_active_mask(s1["time"])
s1 = s1[s1["active"]].drop(columns=["active"]).reset_index(drop=True)
s1.head()


Unnamed: 0,time,bid,ask,mid
0,2025-09-01 05:00:00+00:00,4.506916,4.507316,4.507116
1,2025-09-01 05:05:00+00:00,4.507013,4.507413,4.507213
2,2025-09-01 05:10:00+00:00,4.507285,4.507685,4.507485
3,2025-09-01 05:15:00+00:00,4.507111,4.507511,4.507311
4,2025-09-01 05:20:00+00:00,4.507284,4.507684,4.507484


In [16]:
s1["ret"] = np.log(s1["mid"]).diff()
s1["mom_5s"]  = s1["mid"].pct_change(5)
s1["mom_30s"] = s1["mid"].pct_change(30)
s1["rv_60s"]  = s1["ret"].rolling(60,  min_periods=20).std()
s1["rv_300s"] = s1["ret"].rolling(300, min_periods=60).std()
s1["spread"]  = s1["ask"] - s1["bid"]
s1["spr_z"]   = (s1["spread"] - s1["spread"].rolling(600, min_periods=60).median()) / \
                (s1["spread"].rolling(600, min_periods=60).std() + 1e-12)

# Warsaw session filter (7:00–18:00)
local = s1["time"].dt.tz_convert("Europe/Warsaw")
s1 = s1[(local.dt.hour>=7) & (local.dt.hour<=18)].reset_index(drop=True)

# EMA reference for breakout
EMA_SPAN_DEFAULT = 300
s1["ref"] = s1["mid"].ewm(span=EMA_SPAN_DEFAULT, adjust=False).mean()
s1["signal_raw"] = s1["mid"] - s1["ref"]
s1 = s1.dropna().reset_index(drop=True)

## 5. Volatility model
- Compute short‑horizon realized volatility
- Fit **EGARCH(1,1)** to mid‑returns (no look‑ahead) and forecast next‑step σ
- Band = k × σ_forecast; use **double threshold** (entry/exit)

*Note:* EGARCH expanding fits can be slow on dense data. You can speed this up by batching or using rolling windows.

In [20]:
from arch import arch_model

def egarch_forecast_series(returns, start_idx: int):
    sigmas = np.full(len(returns), np.nan, dtype=float)
    for t in range(start_idx, len(returns)):
        r_train = returns.iloc[:t]
        try:
            am = arch_model(r_train*100, vol="EGARCH", p=1, o=0, q=1, mean="Zero", dist="t")
            res = am.fit(disp="off")
            f = res.forecast(horizon=1, reindex=False)
            sigmas[t] = float(f.variance.values[-1,0])**0.5 / 100.0
        except Exception:
            sigmas[t] = r_train.rolling(300, min_periods=60).std().iloc[-1]
    return pd.Series(sigmas, index=returns.index)

burn_in = 5000 if len(s1)>6000 else max(300, int(len(s1)*0.1))
s1["sigma"] = egarch_forecast_series(s1["ret"].fillna(0.0), start_idx=burn_in).fillna(s1["rv_300s"])
s1.dropna(inplace=True)

ModuleNotFoundError: No module named 'arch'

Optional diagnostic target

In [None]:
h = 5
s1["fut_ret_5s"] = s1["mid"].pct_change(h).shift(-h)
s1["y_5s_dir"]   = np.sign(s1["fut_ret_5s"]).fillna(0.0)

## 6. Double‑volatility breakout logic
Signal = mid − EMA(mid). Entry bands: ± k_entry·σ; Exit bands: ± k_exit·σ (k_exit < k_entry). Momentum direction.

In [None]:

span = 300  # ~5 min EMA at 1s sampling
s1["ref"] = s1["mid"].ewm(span=span, adjust=False).mean()
s1["signal"] = s1["mid"] - s1["ref"]

k_entry = 2.0
k_exit  = 0.8

N_year = 252*24*60*60
target_vol_ann = 0.10
def unit_risk(sigma):
    return target_vol_ann / (sigma * math.sqrt(N_year) + 1e-8)

s1["upper_entry"] =  k_entry * s1["sigma"]
s1["lower_entry"] = -k_entry * s1["sigma"]
s1["upper_exit"]  =  k_exit  * s1["sigma"]
s1["lower_exit"]  = -k_exit  * s1["sigma"]

pos = np.zeros(len(s1), dtype=float)
size = np.zeros(len(s1), dtype=float)

for i in range(1, len(s1)):
    sig_prev = s1["signal"].iat[i-1]
    up_e = s1["upper_entry"].iat[i-1]
    lo_e = s1["lower_entry"].iat[i-1]
    up_x = s1["upper_exit"].iat[i-1]
    lo_x = s1["lower_exit"].iat[i-1]
    p_prev = pos[i-1]
    if p_prev == 0:
        if sig_prev > up_e:
            pos[i] =  1
        elif sig_prev < lo_e:
            pos[i] = -1
        else:
            pos[i] = 0
    elif p_prev > 0:
        if sig_prev <= up_x:
            pos[i] = 0
        elif sig_prev < lo_e:
            pos[i] = -1
        else:
            pos[i] = 1
    else:  # p_prev < 0
        if sig_prev >= lo_x:
            pos[i] = 0
        elif sig_prev > up_e:
            pos[i] = 1
        else:
            pos[i] = -1
    sigma_t = s1["sigma"].iat[i-1]
    size[i] = np.clip(unit_risk(sigma_t), 0, 10.0)

s1["pos_dir"] = pos
s1["pos"] = s1["pos_dir"] * size
s1.head()


## 7. P&L with transaction costs and bid/ask
- Trade on next tick, using bid/ask (market order assumption)
- Transaction cost = half‑spread + commission (bps)
- Compute **gross** and **net** P&L

In [None]:

s1["pos_exec"] = s1["pos"].shift().fillna(0.0)

delta_pos = s1["pos_exec"].diff().fillna(s1["pos_exec"])
trade_side = np.sign(delta_pos)  # +1 buy, -1 sell, 0 none

mid = s1["mid"]
ret = np.log(mid).diff().fillna(0.0)
gross_dPnL = s1["pos_exec"] * ret

half_spread = (s1["ask"] - s1["bid"]) / 2.0
commission_bps = 0.00005  # 0.5 bps one-way
trade_cost = (np.abs(delta_pos) * (half_spread / mid)) + (commission_bps * (np.abs(np.sign(delta_pos))))
trade_cost = trade_cost.fillna(0.0)

net_dPnL = gross_dPnL - trade_cost

s1["gross_equity"] = gross_dPnL.cumsum()
s1["net_equity"]   = net_dPnL.cumsum()

s1[["time","gross_equity","net_equity"]].tail()


## 8. Metrics (Out‑of‑Sample)
OOS window: 2025‑09‑22 to 2025‑10‑06. Report Gross/Net P&L and annualized Sharpe.

In [21]:

OOS_START = pd.Timestamp('2025-09-22 00:00:00', tz='UTC')
OOS_END   = pd.Timestamp('2025-10-06 00:00:00', tz='UTC')

oos = s1[(s1["time"] >= OOS_START) & (s1["time"] < OOS_END)].copy()

oos["gross_r"] = oos["gross_equity"].diff().fillna(0.0)
oos["net_r"]   = oos["net_equity"].diff().fillna(0.0)

def ann_sharpe(r: pd.Series) -> float:
    mu = r.mean()
    sd = r.std(ddof=1)
    if sd == 0 or np.isnan(sd):
        return float("nan")
    return (mu / sd) * math.sqrt(252*24*60*60)

gross_pnl = float(oos["gross_r"].sum())
net_pnl   = float(oos["net_r"].sum())
gross_sr  = float(ann_sharpe(oos["gross_r"]))
net_sr    = float(ann_sharpe(oos["net_r"]))
trades    = int((oos["pos_exec"].diff().abs() > 1e-9).sum())

summary = {
    "OOS Gross PnL": gross_pnl,
    "OOS Net PnL": net_pnl,
    "OOS Gross Sharpe (ann)": gross_sr,
    "OOS Net Sharpe (ann)": net_sr,
    "Trades": trades
}
summary


KeyError: 'gross_equity'

Split, rolling CV, execution with bid/ask + commission + slippage

In [None]:
START = pd.Timestamp('2025-09-01 00:00:00', tz='UTC')
OOS_START = pd.Timestamp('2025-09-22 00:00:00', tz='UTC')
OOS_END   = pd.Timestamp('2025-10-06 00:00:00', tz='UTC')

ins = s1[(s1["time"] >= START) & (s1["time"] < OOS_START)].copy()
oos = s1[(s1["time"] >= OOS_START) & (s1["time"] < OOS_END)].copy()

def simulate(df, k_entry, k_exit, ema_span, commission_bps=0.00005, slippage_bps=0.00010):
    ref = df["mid"].ewm(span=ema_span, adjust=False).mean()
    signal = df["mid"] - ref
    sigma  = df["sigma"]
    up_e, lo_e = k_entry*sigma, -k_entry*sigma
    up_x, lo_x = k_exit*sigma,  -k_exit*sigma

    pos = np.zeros(len(df)); size = np.zeros(len(df))
    N_year = 252*24*60*60; target_vol_ann = 0.10
    unit = lambda s: target_vol_ann/(s*np.sqrt(N_year) + 1e-12)

    for i in range(1, len(df)):
        sig_prev, p_prev = signal.iat[i-1], pos[i-1]
        if p_prev == 0:
            pos[i] =  1 if sig_prev > up_e.iat[i-1] else (-1 if sig_prev < lo_e.iat[i-1] else 0)
        elif p_prev > 0:
            pos[i] = 0 if sig_prev <= up_x.iat[i-1] else (-1 if sig_prev < lo_e.iat[i-1] else 1)
        else:
            pos[i] = 0 if sig_prev >= lo_x.iat[i-1] else (1 if sig_prev > up_e.iat[i-1] else -1)
        size[i] = np.clip(unit(sigma.iat[i-1]), 0, 10.0)

    df = df.copy(); df["pos_exec"] = (pos*size).shift().fillna(0.0)

    delta = df["pos_exec"].diff().fillna(df["pos_exec"])
    mid = df["mid"]; ret = np.log(mid).diff().fillna(0.0)
    half_spread = (df["ask"] - df["bid"])/2.0
    trade_cost = (np.abs(delta)*(half_spread/mid)) + (commission_bps*np.abs(np.sign(delta))) + (slippage_bps*np.abs(np.sign(delta)))
    gross = df["pos_exec"]*ret; net = gross - trade_cost.fillna(0.0)
    df["gross_eq"] = gross.cumsum(); df["net_eq"] = net.cumsum()

    inc_n = df["net_eq"].diff().fillna(0.0)
    ann = lambda r: (r.mean()/r.std(ddof=1))*np.sqrt(252*24*60*60) if r.std(ddof=1)>0 else np.nan
    peak = df["net_eq"].cummax(); max_dd = float((df["net_eq"]-peak).min())
    hit_rate = (np.sign(inc_n)==np.sign(df["pos_exec"])).replace({np.nan:0}).mean()

    return {"net_pnl": float(inc_n.sum()), "sharpe_net": float(ann(inc_n)),
            "max_dd_net": max_dd, "trades": int((df["pos_exec"].diff().abs()>1e-9).sum()),
            "hit_rate": float(hit_rate)}, df

# Rolling CV folds (3-day blocks; 6h validation tail)
folds = []
edges = pd.date_range(ins["time"].min()+pd.Timedelta(hours=6), ins["time"].max(), freq="3D")
prev = ins["time"].min()
for e in edges:
    tr = ins[(ins["time"]>=prev) & (ins["time"]<e - pd.Timedelta(hours=6))]
    va = ins[(ins["time"]>=e - pd.Timedelta(hours=6)) & (ins["time"]<e)]
    if len(tr)>2000 and len(va)>2000:
        folds.append((tr.index, va.index))
    prev = e

param_grid = {"k_entry":[1.5,2.0,2.5], "k_exit":[0.5,0.8,1.0], "ema_span":[120,300,600]}
rows = []
for ke in param_grid["k_entry"]:
  for kx in param_grid["k_exit"]:
    for sp in param_grid["ema_span"]:
      sc=[]
      for tr_idx, va_idx in folds:
        _ , _ = simulate(ins.loc[tr_idx], ke, kx, sp)  # warm-up
        met,_ = simulate(ins.loc[va_idx], ke, kx, sp)
        sc.append(met["sharpe_net"])
      rows.append({"k_entry":ke,"k_exit":kx,"ema_span":sp,"cv_net_sharpe":np.nanmean(sc) if sc else np.nan})
cv = pd.DataFrame(rows).sort_values("cv_net_sharpe", ascending=False).reset_index(drop=True)
best = cv.iloc[0].to_dict(); best

Fix params and evaluate OOS + email bullets

In [None]:
best_ke, best_kx, best_sp = float(best["k_entry"]), float(best["k_exit"]), int(best["ema_span"])
metrics_is, _   = simulate(ins, best_ke, best_kx, best_sp)
metrics_oos, oos_sim = simulate(oos, best_ke, best_kx, best_sp)

import matplotlib.pyplot as plt
plt.figure(figsize=(10,4))
plt.plot(oos_sim["time"], oos_sim["gross_eq"], label="Gross")
plt.plot(oos_sim["time"], oos_sim["net_eq"],   label="Net")
plt.title("OOS Equity Curve (22.09 → 06.10)"); plt.legend(); plt.tight_layout(); plt.show()

print(f"""Email bullets:
- Strategy: EGARCH-adaptive double volatility breakout (k_entry={best_ke}, k_exit={best_kx}, EMA span={best_sp}), trading 07:00–18:00 CET.
- IS (net): Sharpe={metrics_is['sharpe_net']:.2f}, P&L={metrics_is['net_pnl']:.6f}, MaxDD={metrics_is['max_dd_net']:.6f}, Trades={metrics_is['trades']}, Hit rate={metrics_is['hit_rate']:.2%}
- OOS (22.09→06.10): Net P&L={metrics_oos['net_pnl']:.6f}, Net Sharpe={metrics_oos['sharpe_net']:.2f}, MaxDD={metrics_oos['max_dd_net']:.6f}, Trades={metrics_oos['trades']}, Hit rate={metrics_oos['hit_rate']:.2%}.
- Correctness: chronological split, rolling CV on IS, no look-ahead, bid/ask + commission + slippage included.
""")

## 9. Plot equity curve

In [None]:

import matplotlib.pyplot as plt

plt.figure(figsize=(10,4))
plt.plot(s1["time"], s1["gross_equity"], label="Gross")
plt.plot(s1["time"], s1["net_equity"], label="Net")
plt.axvspan(pd.Timestamp('2025-09-22 00:00:00', tz='UTC'),
            pd.Timestamp('2025-10-06 00:00:00', tz='UTC'), alpha=0.15)
plt.title("Equity Curve (shaded = OOS)")
plt.legend()
plt.tight_layout()
plt.show()


## 10. Email‑body bullets (auto‑fill after you run metrics)

In [None]:

print(f"""Strategy idea (bullets):
- Double volatility breakout on EUR/PLN ticks with EGARCH‑adaptive bands (entry k={k_entry}, exit k={k_exit}); reference = EMA.
- Trade only during Europe/Warsaw 07:00–18:00 to avoid thin liquidity.
- Position sizing scales inversely with forecast σ to target constant risk.
- Execution uses bid/ask; costs = half‑spread + 0.5 bps commission (configurable).

Out-of-sample results (2025‑09‑22 → 2025‑10‑06):
- Gross P&L: {gross_pnl:.6f}
- Net P&L:   {net_pnl:.6f}
- Gross Sharpe (ann): {gross_sr:.2f}
- Net Sharpe (ann):   {net_sr:.2f}

Notes on correctness & bias control:
- No look‑ahead: positions act on previous‑bar signals; OOS kept separate.
- Parameters ≤5; avoided excessive tuning; included realistic transaction costs.
""")
