#**TRADING OIL**
---

##0.REFERENCE

##1.CONTEXT

**Introduction**

This Colab notebook is a synthetic, mechanism-first laboratory for learning how to trade **oil through the futures curve**—not as a vague slogan, but as a disciplined set of ideas that connect physical reality, market structure, and tradable expressions. If you have ever heard that “contango is bearish,” or that “backwardation is bullish,” or that “roll yield matters more than spot,” this notebook is designed to turn those phrases into an operational understanding. The goal is not to predict the price of oil. The goal is to understand what oil futures are actually pricing, how the curve encodes constraints and incentives in the physical market, and how a controlled decision policy can trade that curve under explicit frictions and risk limits.

Oil is not just another asset. It is a commodity whose economics are inseparable from storage and logistics. In equities, it is easy to imagine a universal spot price and treat derivatives as an overlay. In oil, “spot” is itself a complex object: it is proxied by physical assessments, location-specific differentials, and the front portion of the futures strip. The futures curve, by contrast, is a clean market object: a set of quoted prices for maturities one month out, two months out, and so on. That curve is where the market expresses its balance sheet constraints: inventory tightness, storage capacity, financing costs, seasonal demand, refinery bottlenecks, and the value of immediate availability. Oil is the canonical setting where the curve is not a decoration—it is the market.

The notebook’s first job is to provide conceptual anchors: **spot**, **futures**, **carry**, **convenience yield**, **contango**, and **backwardation**. These are not definitions for their own sake; they are the language needed to translate the curve into economic meaning. In its simplest form, the relationship between spot and futures is governed by a cost-of-carry logic. Holding physical oil is costly: storage is not free, financing is not free, and operational constraints can become binding. But physical oil can also provide a benefit that financial holders do not receive directly: the benefit of having the commodity available when it is scarce, sometimes called **convenience yield**. When inventories are abundant and storage is easy, convenience yield tends to be low; the curve tends to slope upward (contango) because the market is effectively paying you to store oil over time—at least, paying you relative to an immediate sale. When inventories are tight, convenience yield can dominate; the curve can slope downward (backwardation) because immediate barrels are valuable and the market penalizes deferred delivery.

Those statements are true at the level of economic intuition, but they become meaningful only when you can see them in a concrete curve and connect them to PnL. This notebook does that by building a synthetic market model with three core state variables: a **spot proxy**, an **inventory tightness proxy**, and a **convenience yield proxy** linked to inventory. From those states, it generates a full futures curve across multiple maturities. Because the curve is generated from interpretable components, you can run experiments: increase volatility, increase storage cost, change the inventory dynamics, amplify seasonality, and observe how the curve responds. This is the most important pedagogical advantage of synthetic modeling: the curve becomes a controlled instrument you can interrogate, not a black box you can only observe.

The notebook then introduces the real subject: **how to trade the curve**. Curve trading is not one strategy; it is a family of expressions that isolate different risk premia. A pure long or short in the front contract is primarily a spot-direction bet. A calendar spread—long the front month and short the second, or the reverse—tilts exposure toward roll and carry dynamics. A butterfly—long the first, short the second, long the third—expresses a view on curvature and localized dislocations. These are not academic toys. In oil, where term structure can be steep and unstable, the difference between being long front outright and being long a front spread is the difference between bearing massive spot risk and targeting a more interpretable mechanism: the evolution of contango/backwardation and the roll yield embedded in the strip.

To make those expressions tradeable in a notebook, we must define an environment: an action space, a position representation, a PnL model, and frictions. The environment in this notebook is intentionally minimal but economically faithful in the dimensions that matter for learning. It tracks a portfolio, assigns transaction costs to changes in position, applies a leverage cap, and simulates daily mark-to-market. The goal is to capture the idea that curve trades are not free. You do not get to pivot between “long front spread” and “short front spread” every day without paying. Costs matter because curve edges are often thin relative to turnover. A notebook that ignores costs will teach you the wrong lesson: it will make high-frequency flip-flopping look intelligent rather than destructive.

Once the market and environment exist, the notebook introduces an agentic policy layer. Here the word “agentic” is used carefully. The notebook does not ask a model to discover deep signals in raw data. It provides an explicit set of interpretable features—curve slope measures, a contango/backwardation score, a roll yield proxy, and state variables such as inventory tightness—and asks a policy to choose an action from a discrete menu of curve expressions. This is closer to how real systematic strategies are built than most people admit: the difficult part is often not computing features; it is deciding which trades to place, how often, and under what risk constraints. A rule-based baseline policy is included because it provides an indispensable control: without a baseline, you cannot tell whether any more sophisticated agent adds value. An optional LLM policy hook is included to demonstrate how a constrained language model could act as a policy function—choosing among actions and producing a rationale—while remaining bounded and auditable.

The notebook’s analytic layer is designed to teach interpretation. You are shown not just a final equity number but diagnostics: the spot path, the contango score over time, the equity curve, cumulative costs, and action counts. These are deliberately chosen. Spot reminds you that oil is volatile and that directionality can dominate outcomes. Contango score reminds you that curve regime is not static and that regime shifts matter. Equity and drawdown remind you that strategies must survive. Costs remind you that trading too often is a form of self-harm in disguise. Action counts reveal whether your policy is stable or indecisive. Together, these diagnostics transform the notebook from a story into an experiment.

A final reason this notebook matters is that oil curve trading is one of the cleanest settings to learn the general principle that applies to many alternative markets: **a large portion of returns can come from structure, not prediction**. In equities, most participants begin with directional views. In commodities, and especially in oil, direction is only one component; the curve itself is a living object shaped by physical constraints. Learning to trade oil through the curve is therefore a way to learn a professional habit: separate what you think you are trading (a narrative about supply and demand) from what you are actually trading (a set of forward prices and their relative shape), and then measure your outcomes in a model that accounts for costs and constraints.

This notebook is deliberately not a production system. It is a training harness. It is meant to be modified, stressed, and iterated. Change the storage parameter and watch contango deepen. Change the inventory mean reversion and watch regime persistence increase. Increase jumps and watch how a policy behaves when the curve becomes erratic. Increase transaction costs and watch whether the policy learns to trade less. Replace the LLM policy with your own factor-based policy and compare results. In doing so, you will learn the core conceptual objective: contango and backwardation are not labels; they are economic regimes that determine whether rolling a position is a tailwind or a headwind, and curve trading is the art of structuring exposure so that your PnL is coming from the mechanism you intended to target.

That is the notebook’s purpose: to give you a controlled environment in which you can understand oil futures as a curve, understand carry as a source of return, and understand policy choice as a trade-off between opportunity and fragility. If you finish this notebook able to explain—without handwaving—why backwardation can generate positive roll yield for a long front position, why contango can punish long-only exposure even when spot is flat, and why disciplined trading frequency matters as much as signal quality, then the notebook has done its job.


##2.LIBRARIES AND ENVIRONMENT

In [12]:
# CELL 2 — PATCHED: install + OpenAI client init (Colab-safe)
# Uses: from google.colab import userdata ; userdata.get("OPENAI_API_KEY")

!pip -q install -U openai

import os, json
import numpy as np
import pandas as pd
import math
from dataclasses import dataclass
import matplotlib.pyplot as plt

np.random.seed(7)

def clamp(x, lo, hi):
    return max(lo, min(hi, x))

def stable_hash(obj) -> str:
    s = str(obj).encode("utf-8")
    return str(abs(hash(s)))

# --- OpenAI client (only used if USE_LLM=True later) ---
OPENAI_MODEL = "gpt-4o-mini"   # cheap + fast; change to "gpt-4o" if you want

client = None
try:
    from google.colab import userdata
    key = userdata.get("OPENAI_API_KEY")
    if key:
        os.environ["OPENAI_API_KEY"] = key
except Exception:
    pass

try:
    from openai import OpenAI
    if os.getenv("OPENAI_API_KEY"):
        client = OpenAI()
except Exception as e:
    client = None
    print("OpenAI client init failed:", e)

print("Setup complete. OpenAI client ready:", client is not None)


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.1 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m1.1/1.1 MB[0m [31m48.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m25.8 MB/s[0m eta [36m0:00:00[0m
[?25hSetup complete. OpenAI client ready: True


##3.SYNTHETIC OIL MARKET MODEL

###3.1.OVERVIEW

**Cell 3 — Synthetic oil market model: spot, inventory tightness, and curve construction**

Cell 3 constructs the synthetic market that the rest of the notebook trades. Its purpose is to generate a plausible oil-like environment where the futures curve is not an arbitrary shape but an interpretable consequence of state variables. The cell defines two evolving states: a spot proxy and an inventory tightness proxy. Spot is modeled as a mean-reverting process with stochastic shocks and occasional jumps. Mean reversion is important pedagogically because it prevents the simulation from drifting into unrealistic levels and creates regimes where spot can wander but tends to return toward a long-run level. Jumps exist to mimic the discrete shock character of oil: geopolitical events, supply outages, policy announcements, and sudden demand collapses can move oil in ways that are not well approximated by small Gaussian steps.

The second state, inventory tightness, is represented as a normalized deviation variable that also mean-reverts but experiences shocks. This is a compact way to encode an economic driver of term structure: when inventory is “high,” physical scarcity is low; when inventory is “low,” immediate availability becomes valuable. The cell then links inventory to a convenience yield proxy. Convenience yield is the conceptual bridge between physical reality and forward pricing: it represents the benefit of holding the physical commodity when it is scarce. In the model, convenience yield decreases as inventory increases and increases as inventory tightens, with explicit bounds to prevent pathological values.

Given spot, interest rates, storage costs, and convenience yield, the cell generates a full futures curve through a simplified cost-of-carry relationship. A seasonal factor is included to reflect the idea that oil markets are not time-homogeneous: demand patterns, refinery schedules, and seasonal consumption can tilt the curve. The output of the cell is a snapshot object that contains spot, inventory state, convenience yield, and the vector of futures prices. Conceptually, this cell defines the “physics” of the world: it determines when contango is likely, when backwardation is likely, and how persistent each regime tends to be. Everything downstream—features, trades, PnL—depends on the quality and interpretability of this synthetic curve generator.


###3.2.CODE AND IMPLEMENTATION

In [2]:
# CELL 3 — Synthetic oil market model: spot + inventory + regime-driven curve
# We generate:
#  - spot S_t (mean-reverting + shocks)
#  - inventory state I_t (mean-reverting + shocks)
#  - convenience yield proxy CY_t = a - b * I_t (tight inventories => high CY)
# Then futures curve via a simplified cost-of-carry:
#  F(t,T) = S_t * exp( (r + storage - CY_t) * tau ) * seasonal_factor(tau)

@dataclass
class OilMarketConfig:
    M: int = 12                  # months on curve
    dt: float = 1/252            # daily step
    r: float = 0.045             # short rate proxy
    storage: float = 0.06        # storage/financing proxy
    spot_mu: float = 80.0        # long-run spot
    spot_kappa: float = 1.5      # mean reversion speed
    spot_sigma: float = 0.35     # annualized vol
    inv_mu: float = 0.0          # long-run inventory deviation (z)
    inv_kappa: float = 1.0
    inv_sigma: float = 0.40
    cy_a: float = 0.08           # base convenience yield
    cy_b: float = 0.10           # sensitivity to inventory (higher inventory => lower CY)
    seasonal_amp: float = 0.05   # seasonal term structure modulation
    jump_prob: float = 0.01
    jump_sigma: float = 0.05     # jump magnitude on spot

class SyntheticOilMarket:
    def __init__(self, cfg: OilMarketConfig):
        self.cfg = cfg
        self.t = 0
        self.S = cfg.spot_mu
        self.I = cfg.inv_mu

    def step(self):
        c = self.cfg
        dt = c.dt

        # inventory z-score dynamics
        dI = c.inv_kappa*(c.inv_mu - self.I)*dt + c.inv_sigma*math.sqrt(dt)*np.random.randn()
        self.I += dI

        # spot dynamics (mean reverting on level)
        dS = c.spot_kappa*(c.spot_mu - self.S)*dt + c.spot_sigma*self.S*math.sqrt(dt)*np.random.randn()

        # occasional jump (geopolitics / outages proxy)
        if np.random.rand() < c.jump_prob:
            dS += c.jump_sigma*self.S*np.random.randn()

        self.S = max(1.0, self.S + dS)
        self.t += 1

        return self.snapshot()

    def curve(self):
        c = self.cfg
        # convenience yield proxy: tight inv => high CY => backwardation tendency
        CY = c.cy_a - c.cy_b*self.I
        CY = clamp(CY, -0.10, 0.25)

        F = []
        for m in range(1, c.M+1):
            tau = m/12.0
            seasonal = 1.0 + c.seasonal_amp*math.sin(2*math.pi*(tau + 0.15))
            carry = (c.r + c.storage - CY)
            Fm = self.S * math.exp(carry * tau) * seasonal
            F.append(Fm)
        return np.array(F), CY

    def snapshot(self):
        F, CY = self.curve()
        return {
            "t": self.t,
            "spot": float(self.S),
            "inv_z": float(self.I),
            "convenience_yield": float(CY),
            "futures": F
        }

cfg = OilMarketConfig()
mkt = SyntheticOilMarket(cfg)

snap = mkt.snapshot()
print("Initial spot:", round(snap["spot"],2), "| CY:", round(snap["convenience_yield"],4))
print("Front 4 futures:", [round(x,2) for x in snap["futures"][:4]])


Initial spot: 80.0 | CY: 0.08
Front 4 futures: [np.float64(84.15), np.float64(84.0), np.float64(82.87), np.float64(81.09)]


##4.CURVE DIAGNOSTICS

###4.1.OVERVIEW

**Cell 4 — Curve diagnostics: contango score, slope metrics, and roll-yield proxies**

Cell 4 is the interpretability layer between raw curves and trading decisions. The futures curve is a vector of prices, but a policy cannot reason effectively about a vector without compressing it into features that have economic meaning. This cell computes a small set of curve features that map directly to the vocabulary traders use: slope and roll. It measures the slope between the first and sixth contract and between the first and twelfth contract. These slopes are expressed as relative changes, so they are comparable across different price levels. A positive slope implies the curve rises with maturity, consistent with contango; a negative slope implies the curve declines with maturity, consistent with backwardation. Because the curve can be noisy, the cell also builds a single summary statistic called a contango score: a weighted blend of short and longer slope measures. This score is intended as a regime indicator rather than a precise pricing theorem. It answers a practical question: “Is the curve clearly upward, clearly downward, or near flat?”

The cell also computes a roll-yield proxy for holding a front contract and rolling forward. The simplest intuition is captured by comparing the first and second contracts. If the second contract is higher than the first, a long front-month position that must roll forward will tend to lose as it sells the cheaper expiring contract and buys the more expensive next one. If the second contract is lower than the first, the long roll tends to benefit. The cell expresses this as a simple percentage measure, and it also provides the symmetric short-side proxy.

These features are not meant to be perfect representations of the full carry decomposition of commodity returns. They are meant to be didactic objects that translate curve shape into directionally correct incentives. Their primary value is that they allow policies—rule-based or LLM-based—to reason about curve regime without requiring access to the entire curve vector. In a well-structured notebook, this cell becomes the stable interface between market generation and decision-making: the market can become more complex, but the features remain interpretable and comparable, which supports experimentation and learning.


###4.2.CODE AND IMPLEMENTATION

In [4]:
# CELL 4 — Curve diagnostics: contango/backwardation, slope, roll yield proxies

@dataclass
class CurveFeatures:
    slope_1_6: float
    slope_1_12: float
    contango_score: float
    roll_yield_long_front: float
    roll_yield_short_front: float

def compute_curve_features(spot: float, F: np.ndarray) -> CurveFeatures:
    # slopes in percent (relative)
    f1, f6, f12 = F[0], F[5], F[-1]
    slope_1_6 = (f6 - f1) / f1
    slope_1_12 = (f12 - f1) / f1

    # contango_score: positive => contango, negative => backwardation
    contango_score = 0.7*slope_1_6 + 0.3*slope_1_12

    # roll yield proxy for holding 1M long and rolling to 2M:
    # approx: (F1 - F2)/F1 (if F2 > F1 => negative roll)
    roll_long = (F[0] - F[1]) / F[0]
    roll_short = -roll_long

    return CurveFeatures(
        slope_1_6=float(slope_1_6),
        slope_1_12=float(slope_1_12),
        contango_score=float(contango_score),
        roll_yield_long_front=float(roll_long),
        roll_yield_short_front=float(roll_short),
    )

feat = compute_curve_features(snap["spot"], snap["futures"])
print("slope(1->6):", round(feat.slope_1_6*100,2), "% | slope(1->12):", round(feat.slope_1_12*100,2), "%")
print("contango_score:", round(feat.contango_score*100,2), "% | roll_long_front:", round(feat.roll_yield_long_front*100,2), "%")


slope(1->6): -7.63 % | slope(1->12): 1.41 %
contango_score: -4.92 % | roll_long_front: 0.18 %


##5.TRADING ENVIRONMENT

###5.1.CONTEXT

**Cell 5 — Trading environment: actions as curve expressions, frictions, and portfolio dynamics**

Cell 5 defines the trading world that turns curve intuition into measurable outcomes. It introduces an explicit action space consisting of canonical curve expressions. This is the heart of the didactic design: instead of letting a policy invent arbitrary positions, it can choose among a small set of interpretable trades. These include staying flat, taking directional exposure in the front contract, taking a calendar spread that is long the front and short the second (a carry/roll expression aligned with backwardation), taking the inverse spread aligned with contango carry, and taking a butterfly that expresses curvature. Each action maps deterministically to a target position vector across maturities. This makes the environment auditable: the user can always understand what risk is being taken.

The environment then applies key constraints and frictions. It enforces a maximum gross leverage cap: if a target position would exceed allowed leverage relative to equity, it is scaled down. It applies transaction costs whenever positions change. Costs are modeled as a proportional fee plus a slippage proxy per leg, applied to the traded notional. This is essential because curve strategies often rely on relatively small expected edges; without costs, a policy that trades too frequently will look artificially good. The environment marks positions to market each step using the change in futures prices from pre-step to post-step. While simplified relative to real futures margining and settlement, this mark-to-market framework captures the key learning objective: PnL is generated by price changes in the instruments you hold, and turning over those holdings costs money.

The environment also includes a simple risk penalty term that discourages extreme gross exposure and extreme curvature exposure. This is not a full risk model; it is a pedagogical regularizer that nudges policies away from pathological behavior. Finally, the environment records a history table containing actions, regime features, costs, and equity. That history is the raw material for diagnostics later. This cell is where the notebook becomes a true laboratory: it converts a policy’s decisions into a time series of outcomes under explicit rules, which can then be plotted, critiqued, and improved.


###5.2.CODE AND IMPLEMENTATION

In [6]:
# CELL 5 — PATCHED (self-contained): Trading environment with a local fallback for compute_curve_features

ACTIONS = ["FLAT","LONG_F1","SHORT_F1","SPREAD_L1_S2","SPREAD_S1_L2","FLY_L1_S2_L3"]

from dataclasses import dataclass
import numpy as np

# ---- Fallback: define compute_curve_features here if it doesn't already exist ----
try:
    compute_curve_features  # noqa: F821
except NameError:
    @dataclass
    class CurveFeatures:
        slope_1_6: float
        slope_1_12: float
        contango_score: float
        roll_yield_long_front: float
        roll_yield_short_front: float

    def compute_curve_features(spot: float, F: np.ndarray) -> CurveFeatures:
        f1, f6, f12 = F[0], F[5], F[-1]
        slope_1_6 = (f6 - f1) / (f1 + 1e-12)
        slope_1_12 = (f12 - f1) / (f1 + 1e-12)
        contango_score = 0.7*slope_1_6 + 0.3*slope_1_12
        roll_long = (F[0] - F[1]) / (F[0] + 1e-12)
        roll_short = -roll_long
        return CurveFeatures(
            slope_1_6=float(slope_1_6),
            slope_1_12=float(slope_1_12),
            contango_score=float(contango_score),
            roll_yield_long_front=float(roll_long),
            roll_yield_short_front=float(roll_short),
        )

@dataclass
class EnvConfig:
    notional: float = 1_000_000.0
    max_gross_leverage: float = 2.0
    fee_bps: float = 0.75
    slippage_bps: float = 0.50
    risk_aversion: float = 0.5

class OilCurveEnv:
    def __init__(self, market, env_cfg: EnvConfig):
        self.market = market
        self.ecfg = env_cfg
        self.reset()

    def reset(self):
        self.pnl = 0.0
        self.equity = self.ecfg.notional
        self.position = np.zeros(self.market.cfg.M)
        self.history = []
        return self._obs()

    def _obs(self):
        snap = self.market.snapshot()
        F = snap["futures"]
        feat = compute_curve_features(snap["spot"], F)
        return {
            "t": snap["t"],
            "spot": snap["spot"],
            "inv_z": snap["inv_z"],
            "CY": snap["convenience_yield"],
            "F": F.copy(),
            "features": feat,
            "equity": self.equity,
            "position": self.position.copy()
        }

    def _action_to_target_position(self, action: str):
        M = self.market.cfg.M
        p = np.zeros(M)

        snap = self.market.snapshot()
        F = snap["futures"]
        scale = self.ecfg.notional / (F[0] + 1e-12)
        scale *= 0.25

        if action == "FLAT":
            return p
        if action == "LONG_F1":
            p[0] = +1.0 * scale
        elif action == "SHORT_F1":
            p[0] = -1.0 * scale
        elif action == "SPREAD_L1_S2":
            p[0] = +1.0 * scale
            p[1] = -1.0 * scale
        elif action == "SPREAD_S1_L2":
            p[0] = -1.0 * scale
            p[1] = +1.0 * scale
        elif action == "FLY_L1_S2_L3":
            p[0] = +1.0 * scale
            p[1] = -2.0 * scale
            p[2] = +1.0 * scale
        return p

    def _cost(self, delta_pos: np.ndarray):
        snap = self.market.snapshot()
        F = snap["futures"]
        traded_notional = float(np.sum(np.abs(delta_pos) * F))
        bps = (self.ecfg.fee_bps + self.ecfg.slippage_bps) / 10_000.0
        return traded_notional * bps

    def _risk_penalty(self, pos: np.ndarray):
        snap = self.market.snapshot()
        F = snap["futures"]
        gross = float(np.sum(np.abs(pos) * F))
        curv = float(np.sum(np.abs(np.diff(pos, n=2))) * np.mean(F[:3]))
        return 1e-7 * gross + 1e-7 * curv

    def step(self, action: str):
        pre = self.market.snapshot()
        F_pre = pre["futures"]

        target = self._action_to_target_position(action)

        gross = float(np.sum(np.abs(target) * F_pre))
        if gross > self.ecfg.max_gross_leverage * self.equity:
            target = target * (self.ecfg.max_gross_leverage * self.equity / (gross + 1e-12))

        delta = target - self.position
        costs = self._cost(delta)

        post = self.market.step()
        F_post = post["futures"]

        pnl_step = float(np.sum(self.position * (F_post - F_pre)))

        self.pnl += pnl_step - costs
        self.equity = self.ecfg.notional + self.pnl
        self.position = target

        reward = (pnl_step - costs) - self.ecfg.risk_aversion*self._risk_penalty(target)*self.equity

        feat = compute_curve_features(post["spot"], F_post)
        self.history.append({
            "t": post["t"],
            "action": action,
            "spot": post["spot"],
            "CY": post["convenience_yield"],
            "slope_1_6": feat.slope_1_6,
            "contango_score": feat.contango_score,
            "roll_long_front": feat.roll_yield_long_front,
            "pnl_step": pnl_step,
            "costs": costs,
            "equity": self.equity
        })

        done = self.equity <= 0.5*self.ecfg.notional
        return self._obs(), reward, done, {"pnl_step": pnl_step, "costs": costs, "equity": self.equity}

# Recreate env
env = OilCurveEnv(mkt, EnvConfig())
obs = env.reset()
print("Env ready. Actions:", ACTIONS)


Env ready. Actions: ['FLAT', 'LONG_F1', 'SHORT_F1', 'SPREAD_L1_S2', 'SPREAD_S1_L2', 'FLY_L1_S2_L3']


##6.BASELINE POLICIES

###6.1.OVERVIEW

**Cell 6 — Baseline rule agent: regime-to-trade mapping with risk-off behavior**

Cell 6 introduces a baseline decision policy that translates curve regime into actions. This baseline is critical because it provides a reference point for any “agentic” extension. If an LLM policy or a more complex heuristic cannot beat a simple baseline in a controlled synthetic environment, it is unlikely to add value in a more realistic setting. The baseline agent reads the contango score from the curve features and applies a threshold rule. When the score is meaningfully positive, it interprets the regime as contango and prefers the spread expression that is short the front contract and long the second. The intuition is that contango makes long-front carry unattractive; the inverse spread aligns exposure with the direction of the carry incentive. When the score is meaningfully negative, it interprets backwardation and chooses the long-front spread that benefits from positive roll characteristics. When the score is near zero, it chooses to stay flat, reflecting the idea that uncertain regimes often do not compensate for costs.

The cell also adds a simple risk-off trigger based on drawdown from peak equity. If drawdown exceeds a threshold, the agent goes flat regardless of curve regime. This is a crude but important teaching mechanism: it demonstrates that policy is not merely “read the curve and trade.” Policy includes survival behavior. Professional strategies often have regime filters, stop-loss structures, or volatility scaling that reduce exposure when the strategy is under stress. The baseline agent’s drawdown trigger captures that idea in a transparent way.

Finally, the baseline agent returns not only an action but also a small explanation object describing the mode of operation and current drawdown. This matters for interpretability: it allows you to trace behavior back to policy logic. By inspecting the sequence of these explanations alongside the action counts and performance metrics, you can diagnose whether the agent is behaving sensibly, whether the threshold is too tight or too loose, and whether risk-off triggers are reducing catastrophic outcomes or unnecessarily suppressing exposure. This cell therefore establishes the minimal policy logic that connects contango/backwardation to trade expression.


###6.2.CODE AND IMPLEMENTATION

In [7]:
# CELL 6 — Baseline policy agent (rule-based): contango/backwardation + risk-aware sizing via action choice
# Policy idea:
#  - if curve in backwardation (contango_score < -threshold): prefer SPREAD_L1_S2 (positive roll for long)
#  - if curve in contango (contango_score > +threshold): prefer SPREAD_S1_L2 (benefit from short front carry)
#  - if near flat: stay FLAT
#  - occasional directional trades when CY and inventory signal suggests trend (optional, mild)

@dataclass
class PolicyConfig:
    thresh: float = 0.01      # 1% contango_score threshold
    risk_off_dd: float = 0.03 # if equity drawdown exceeds 3%, go FLAT
    max_trades_per_window: int = 1

class RuleAgent:
    def __init__(self, pcfg: PolicyConfig, notional: float):
        self.pcfg = pcfg
        self.notional = notional
        self.peak_equity = notional

    def decide(self, obs):
        eq = obs["equity"]
        self.peak_equity = max(self.peak_equity, eq)
        dd = (self.peak_equity - eq) / self.peak_equity

        feat = obs["features"]
        cs = feat.contango_score

        if dd > self.pcfg.risk_off_dd:
            return "FLAT", {"mode":"risk_off", "dd":dd, "contango_score":cs}

        if cs > self.pcfg.thresh:
            return "SPREAD_S1_L2", {"mode":"contango_carry_short_front", "dd":dd, "contango_score":cs}
        if cs < -self.pcfg.thresh:
            return "SPREAD_L1_S2", {"mode":"backwardation_carry_long_front", "dd":dd, "contango_score":cs}

        return "FLAT", {"mode":"no_edge", "dd":dd, "contango_score":cs}

agent = RuleAgent(PolicyConfig(), notional=env.ecfg.notional)

a, info = agent.decide(obs)
print("Sample decision:", a, "|", info)


Sample decision: SPREAD_L1_S2 | {'mode': 'backwardation_carry_long_front', 'dd': 0.0, 'contango_score': -0.04919136752689759}


##7.OPTIONAL LLM HOOK

###7.1.OVERVIEW

**Cell 7 — Optional LLM agent hook: constrained policy selection and auditability**

Cell 7 introduces an optional language-model-driven decision layer. Its purpose is not to replace the market model or feature engineering, but to demonstrate a governance-friendly way to embed an LLM as a bounded policy. The cell defines a switch that enables or disables the LLM. When enabled, the policy function constructs a compact state description from the observation: spot level, inventory tightness proxy, convenience yield proxy, equity, current position, and the curve features that summarize regime. Crucially, it also includes the action space explicitly, so the model is instructed to choose only from a predefined menu. This constraint is the central design principle: the model is not permitted to invent trades, alter parameters, or call tools; it can only select among allowed curve expressions.

In a robust implementation, this cell also includes explicit instructions to return structured JSON with a single chosen action, a brief rationale, and a confidence measure. The rationale is not treated as truth; it is treated as an interpretability artifact. The confidence is not calibrated; it is treated as a coarse indicator the user can later analyze for consistency. The cell is also the right place to attach logging: raw model output, token usage, and any parse errors should be captured so the user can verify that the model is actually being called and is behaving within constraints. In a didactic notebook, this logging is more important than “smartness,” because it lets the user distinguish real model behavior from silent fallbacks.

The optional nature of the LLM is also pedagogically important. The notebook remains runnable and educational without external keys, and the rule baseline remains the reference. Enabling the LLM becomes an experiment: does it trade less or more than the baseline, does it react appropriately to drawdown, does it overfit on superficial signals, does it invent rationales for marginal cases, and does it respect the principle of staying flat when uncertain? This cell therefore frames the LLM not as a magic box but as a policy variant that can be tested and critiqued under the same environment and diagnostics as any other policy.


###7.2.CODE AND IMPLEMENTATION

In [13]:
# CELL 7 — PATCHED: real LLM agent (JSON-only action selection)

USE_LLM = True  # set to True to actually call the model

def _obs_to_llm_state(obs):
    feat = obs["features"]
    # keep it compact: model should not need full curve
    return {
        "t": int(obs["t"]),
        "spot": float(obs["spot"]),
        "CY": float(obs["CY"]),
        "inv_z": float(obs["inv_z"]),
        "equity": float(obs["equity"]),
        "position_front3": [float(x) for x in obs["position"][:3]],
        "curve_features": {
            "slope_1_6": float(feat.slope_1_6),
            "slope_1_12": float(feat.slope_1_12),
            "contango_score": float(feat.contango_score),
            "roll_long_front": float(feat.roll_yield_long_front),
        },
        "action_space": ACTIONS
    }

SYSTEM_PROMPT = """
You are an oil futures curve trading policy inside a synthetic simulator.

Choose exactly ONE action from the provided action_space:
- FLAT
- LONG_F1
- SHORT_F1
- SPREAD_L1_S2   (long 1M, short 2M)  -> prefers backwardation / positive roll for long-front
- SPREAD_S1_L2   (short 1M, long 2M)  -> prefers contango / carry from short-front
- FLY_L1_S2_L3   (butterfly) -> shape bet (use sparingly)

Rules:
- If contango_score is strongly positive, prefer SPREAD_S1_L2.
- If contango_score is strongly negative, prefer SPREAD_L1_S2.
- If near zero, prefer FLAT (avoid churn).
- If equity is falling or risk seems elevated, prefer FLAT.
- Avoid directional LONG_F1/SHORT_F1 unless you have a clear reason based on CY/inv_z.

Return JSON only with keys:
action (string), rationale (string <= 80 words), confidence (0..1).
"""

def llm_decide_action(obs):
    # fallback if no key/client
    if client is None:
        feat = obs["features"]
        if feat.contango_score > 0.015:
            return "SPREAD_S1_L2", {"llm":"no_client_fallback_contango"}
        if feat.contango_score < -0.015:
            return "SPREAD_L1_S2", {"llm":"no_client_fallback_backwardation"}
        return "FLAT", {"llm":"no_client_fallback_flat"}

    state = _obs_to_llm_state(obs)

    resp = client.chat.completions.create(
        model=OPENAI_MODEL,
        messages=[
            {"role":"system", "content": SYSTEM_PROMPT.strip()},
            {"role":"user", "content": json.dumps(state)}
        ],
        response_format={"type":"json_object"},
        temperature=0.2,
    )

    txt = resp.choices[0].message.content
    usage = getattr(resp, "usage", None)
    meta = {
        "raw_json": txt,
        "model": OPENAI_MODEL,
        "usage": {
            "prompt_tokens": getattr(usage, "prompt_tokens", None),
            "completion_tokens": getattr(usage, "completion_tokens", None),
            "total_tokens": getattr(usage, "total_tokens", None),
        }
    }

    try:
        obj = json.loads(txt)
        action = obj.get("action", "FLAT")
        if action not in ACTIONS:
            action = "FLAT"
        meta["parsed"] = obj
        return action, meta
    except Exception as e:
        meta["parse_error"] = str(e)
        return "FLAT", meta

print("LLM agent ready. USE_LLM =", USE_LLM, "| client:", client is not None, "| model:", OPENAI_MODEL)


LLM agent ready. USE_LLM = True | client: True | model: gpt-4o-mini


## 8.BACKTEST LOOP

###8.1.OVERVIEW

**Cell 8 — Backtest loop: closing the decision–market feedback cycle**

Cell 8 is the experimental engine of the notebook: it runs the environment forward in time and records outcomes. At each step, it obtains an action from the chosen policy—either the rule-based baseline or the optional LLM—and then applies that action to the environment. Applying the action triggers two events: a trade toward the target position (incurring costs) and a market step (updating spot, inventory, convenience yield, and the curve). The portfolio is then marked to market based on the change in futures prices. This is the notebook’s core feedback loop. Decisions affect positions; positions generate PnL as the curve changes; PnL updates equity; equity and regime features influence the next decision. The loop is what makes the notebook “agentic” in the meaningful sense: the policy is not producing isolated recommendations; it is controlling a system whose state evolves partly as a consequence of its own actions.

The cell’s design typically includes a time horizon parameter that sets how many steps the backtest runs. It may also include a cadence parameter controlling how often the agent is asked to make a new decision. This is an important practical concept: in real systems, the decision frequency interacts with transaction costs. A policy that changes positions daily can behave very differently from one that changes positions weekly. In a teaching notebook, exposing this control helps users learn how turnover can dominate expected edge.

The backtest loop also collects logs. At minimum, it collects the environment’s internal history: time, action, spot, curve regime features, step PnL, costs, and equity. For LLM-driven policies, the loop is also the right place to store raw model outputs and token usage for audit. Finally, the loop can include a “done” condition representing catastrophic failure or a stop rule. Even if this is a synthetic environment, having a stop condition helps users internalize that strategies must remain solvent and that risk controls exist to prevent runaway losses. Cell 8 thus converts all prior components—market, features, environment, policy—into a single experiment that yields a dataset suitable for interpretation and iteration.


###8.2.CODE AND IMPLEMENTATION

In [15]:
# CELL 8 — PATCHED: explicit loop + LLM call logging + sanity prints

import json
import time

def run_backtest(env: OilCurveEnv, agent: RuleAgent, T=800, decision_every=1, verbose_every=50):
    obs = env.reset()
    llm_calls = 0
    decision_log = []

    for step in range(T):
        if step % decision_every == 0:
            if USE_LLM:
                action, meta = llm_decide_action(obs)
                llm_calls += 1
            else:
                action, meta = agent.decide(obs)
        # take the action
        obs, reward, done, info = env.step(action)

        # log (every step)
        row = {
            "step": step,
            "t": int(obs["t"]),
            "action": action,
            "reward": float(reward),
            "pnl_step": float(info["pnl_step"]),
            "costs": float(info["costs"]),
            "equity": float(info["equity"]),
            "contango_score": float(obs["features"].contango_score),
            "roll_long_front": float(obs["features"].roll_yield_long_front),
            "meta": meta
        }
        decision_log.append(row)

        # debug prints
        if (step % verbose_every) == 0:
            if USE_LLM:
                usage = meta.get("usage", {})
                print(
                    f"[step {step}] LLM action={action} equity={info['equity']:.2f} "
                    f"contango_score={row['contango_score']:.4f} "
                    f"tokens={usage.get('total_tokens', None)}"
                )
            else:
                print(
                    f"[step {step}] RULE action={action} equity={info['equity']:.2f} "
                    f"contango_score={row['contango_score']:.4f}"
                )

        if done:
            print(f"STOP: done=True at step={step}, equity={info['equity']:.2f}")
            break

    # Convert env history to df (trades/steps summary)
    df_hist = pd.DataFrame(env.history)

    # Also return detailed decision log (includes LLM raw JSON)
    df_log = pd.DataFrame(decision_log)

    print(f"Backtest finished. steps={len(df_hist)} llm_calls={llm_calls} USE_LLM={USE_LLM}")

    # Optional: save log to a JSONL file so you can inspect raw LLM outputs
    path = "/content/oil_curve_decisions.jsonl"
    with open(path, "w") as f:
        for r in decision_log:
            f.write(json.dumps(r) + "\n")
    print("Saved decision log:", path)

    return df_hist, df_log

df, df_log = run_backtest(env, agent, T=400, decision_every=1, verbose_every=25)

print("DF shapes:", df.shape, df_log.shape)
df.head()


[step 0] LLM action=SPREAD_L1_S2 equity=999937.61 contango_score=-0.0600 tokens=498
[step 25] LLM action=SPREAD_L1_S2 equity=1000034.51 contango_score=-0.0664 tokens=509
[step 50] LLM action=SPREAD_L1_S2 equity=999411.24 contango_score=-0.0539 tokens=510
[step 75] LLM action=SPREAD_L1_S2 equity=999195.60 contango_score=-0.0453 tokens=508
[step 100] LLM action=SPREAD_L1_S2 equity=999529.20 contango_score=-0.0563 tokens=507
[step 125] LLM action=SPREAD_L1_S2 equity=999835.25 contango_score=-0.0624 tokens=514
[step 150] LLM action=SPREAD_L1_S2 equity=1000092.73 contango_score=-0.0670 tokens=509
[step 175] LLM action=SPREAD_L1_S2 equity=1000140.37 contango_score=-0.0639 tokens=519
[step 200] LLM action=SPREAD_L1_S2 equity=1000104.12 contango_score=-0.0593 tokens=509
[step 225] LLM action=SPREAD_L1_S2 equity=999893.56 contango_score=-0.0572 tokens=509
[step 250] LLM action=SPREAD_L1_S2 equity=1000060.14 contango_score=-0.0601 tokens=511
[step 275] LLM action=SPREAD_L1_S2 equity=999549.67 co

Unnamed: 0,t,action,spot,CY,slope_1_6,contango_score,roll_long_front,pnl_step,costs,equity
0,1954,SPREAD_L1_S2,56.124244,0.099923,-0.08397,-0.060042,0.003435,0.0,62.389868,999937.610132
1,1955,SPREAD_L1_S2,52.80803,0.097173,-0.082919,-0.058553,0.003207,-106.27555,1.176188,999830.158394
2,1956,SPREAD_L1_S2,51.906373,0.094264,-0.081807,-0.056975,0.002965,-68.598738,3.673559,999757.886097
3,1957,SPREAD_L1_S2,51.245185,0.09448,-0.08189,-0.057092,0.002983,-4.952698,1.050687,999751.882712
4,1958,SPREAD_L1_S2,49.680823,0.094744,-0.081991,-0.057236,0.003005,-17.240106,0.79605,999733.846556


##9.VISUAL DIAGNOSTICS

###9.1.OVERVIEW

**Cell 9 — Visual diagnostics: regime behavior, equity, cost drag, and policy stability**

Cell 9 is the interpretive layer that turns the backtest log into insight. Without diagnostics, users tend to overlearn from one number or one curve. This cell intentionally plots the variables that matter for curve strategies. First, it plots the spot proxy to remind the user that oil is volatile and that even curve strategies can be dragged by directional moves if their exposures are not neutral. Second, it plots the contango score over time, with a zero line, so regime shifts between contango and backwardation become visible. This is essential because the economic interpretation of roll yield depends on regime: a strategy that is structurally long-front carry will behave differently in contango than in backwardation.

Third, the cell plots the equity curve, which is the strategy’s lived reality. The equity curve shows not only profitability but also drawdown structure. A strategy can be profitable on average but unacceptable if drawdowns are frequent or severe. Fourth, the cell plots cumulative costs. This is the most important plot for curve strategies because the edge is often subtle: many naive policies fail not because their directional intuition is wrong, but because they trade too often and pay away the edge. Seeing costs accumulate alongside equity is a direct lesson in the economics of turnover. Finally, the cell outputs action counts. Action counts are a stability diagnostic. A policy that flips between opposing actions is often responding to noise rather than to regime. If the action count is dominated by frequent switching, a user can tighten thresholds, reduce decision frequency, or impose “hysteresis” to prevent rapid regime-chasing.

Cell 9 therefore functions as a dashboard for learning. It does not try to overwhelm the user with metrics. It chooses the few plots that reveal whether the policy is aligned with the curve mechanism and whether costs are silently destroying returns. In practice, most productive iteration cycles start with this cell: users modify one assumption or rule and then re-run to see how these plots change.


###9.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 9 — Visual diagnostics: spot, curve regime, equity, costs, action map

def plot_series(df):
    fig = plt.figure(figsize=(12, 4))
    plt.plot(df["spot"].values)
    plt.title("Synthetic Oil Spot (proxy)")
    plt.xlabel("Step")
    plt.ylabel("Price")
    plt.show()

    fig = plt.figure(figsize=(12, 4))
    plt.plot(df["contango_score"].values)
    plt.axhline(0.0)
    plt.title("Contango Score (positive=contango, negative=backwardation)")
    plt.xlabel("Step")
    plt.ylabel("Score")
    plt.show()

    fig = plt.figure(figsize=(12, 4))
    plt.plot(df["equity"].values)
    plt.title("Equity Curve (Synthetic)")
    plt.xlabel("Step")
    plt.ylabel("Equity")
    plt.show()

    fig = plt.figure(figsize=(12, 4))
    plt.plot(df["costs"].cumsum().values)
    plt.title("Cumulative Trading Costs (fees+slippage proxy)")
    plt.xlabel("Step")
    plt.ylabel("Cost")
    plt.show()

plot_series(df)

print("Action counts:")
print(df["action"].value_counts())


##10.USER MANUAL OUTPUTS

###10.1.OVERVIEW

**Cell 10 — User manual outputs: regime labeling, performance summary, and interpretive table**

Cell 10 converts the backtest into a small “user manual” artifact: it prints interpretive guidance, computes summary metrics, labels regimes, and shows a final table slice that is easy to read. The regime labeling step maps the continuous contango score into discrete regimes—contango, backwardation, or flat—using a small threshold band. This discretization is pedagogically valuable because it creates a vocabulary for discussing performance: “How did the strategy behave in contango versus backwardation?” That question is often more informative than overall PnL, especially in commodities where regime persistence and transitions matter.

The cell then computes core performance metrics: final equity, total PnL, maximum drawdown, win rate, and a simple Sharpe-like proxy computed from step returns. These metrics are not meant to be definitive; they are meant to provide a first diagnostic lens. The maximum drawdown is particularly important because curve strategies can look attractive on average but experience sharp drawdowns when regime shifts occur abruptly. The cell also summarizes the frequency of regimes and the frequency of actions. These counts are interpretive: if backwardation dominates the simulation but the policy spends most of its time in a contango-oriented spread, something is mis-specified. If the regime distribution is balanced but the policy trades nearly every step, cost drag is likely.

Finally, the cell prints a compact set of interpretive statements about contango, backwardation, and roll yield, and shows a tail slice of the log table with the key columns: time, spot, convenience yield, contango score, roll proxy, action, step PnL, costs, equity, and regime. This table is a bridge between plots and policy logic. It lets the user see a concrete sequence: the curve shifted, the policy chose an action, PnL was realized, costs were paid, and equity moved. For didactic purposes, this is often the most valuable artifact: it allows the user to read the strategy like a narrative of decisions and consequences, without relying on informal intuition. Cell 10 therefore finalizes the notebook as an inspectable experiment rather than a black-box backtest.


###10.2.CODE AND IMPLEMENTATION

In [None]:
# CELL 10 — User manual outputs: regime labels, roll yield intuition, performance summary

def regime_label(cs, eps=0.005):
    if cs > eps: return "CONTANGO"
    if cs < -eps: return "BACKWARDATION"
    return "FLAT"

df2 = df.copy()
df2["regime"] = [regime_label(x) for x in df2["contango_score"]]

# performance metrics
equity = df2["equity"].values
ret = np.diff(equity) / (equity[:-1] + 1e-9)

dd = 0.0
peak = equity[0]
for x in equity:
    peak = max(peak, x)
    dd = max(dd, (peak - x)/peak)

win_rate = float((df2["pnl_step"] - df2["costs"] > 0).mean())
total_pnl = float(equity[-1] - equity[0])
sharpe = float(np.mean(ret) / (np.std(ret) + 1e-9) * np.sqrt(252))

summary = {
    "steps": int(len(df2)),
    "final_equity": float(equity[-1]),
    "total_pnl": total_pnl,
    "max_drawdown": float(dd),
    "win_rate": win_rate,
    "sharpe_proxy": sharpe,
    "regime_counts": df2["regime"].value_counts().to_dict(),
    "action_counts": df2["action"].value_counts().to_dict()
}

print("USER MANUAL — WHAT TO READ")
print("- CONTANGO means far futures > near futures. Long front tends to have negative roll. Short front spreads can benefit.")
print("- BACKWARDATION means near futures > far futures. Long front tends to have positive roll. Long front spreads can benefit.")
print("- roll_long_front ~ (F1 - F2)/F1 : positive in backwardation, negative in contango.")
print("\nPERFORMANCE SUMMARY")
print(summary)

# show a small table for interpretation
cols = ["t","spot","CY","contango_score","roll_long_front","action","pnl_step","costs","equity","regime"]
display(df2[cols].tail(15))


##11.CONCLUSION

**Conclusion**

This notebook is a deliberately compact laboratory for a topic that is often taught either too abstractly or too impressionistically: trading oil by trading the **futures curve**. The essential achievement of the notebook is not the sophistication of its mathematics; it is the clarity of its mechanisms. It forces you to see the curve as an object with economic meaning, and to see trading as a sequence of constrained actions with measurable consequences. By the end of the notebook, contango and backwardation are no longer rhetorical labels. They become regimes defined by measurable curve slope, linked to a synthetic inventory and convenience yield state, and translated into concrete trade expressions with explicit PnL and cost implications.

A key lesson is that oil is the canonical commodity where “spot” is not the whole story. In oil, the most important economic tensions are temporal: do you want barrels now or later, and what does that preference imply about storage, financing, and scarcity? The curve is the marketplace where that tension is expressed. When inventory is abundant, convenience yield is low, and the curve can move into contango—an upward slope that makes carrying a long front-month position costly to roll. When inventory is tight, convenience yield rises, and the curve can move into backwardation—a downward slope that can reward a long front-month exposure through positive roll yield. These are not moral categories; they are incentives embedded in prices. The notebook’s synthetic model expresses that relationship explicitly, which means you can test your intuitions instead of repeating them.

The environment layer then teaches a second lesson: even in a simplified simulation, the difference between “a good idea” and “a good result” is mediated by constraints and frictions. Curve strategies often target comparatively subtle edges, so transaction costs and turnover are not footnotes; they are central determinants of whether a strategy survives. This notebook makes that visible by charging costs whenever positions change. A policy that flips between opposing curve trades frequently will often lose not because its intuition is wrong, but because it is paying away the edge. That is a professional lesson that most new traders learn painfully in real markets. Here, it is learnable safely and quickly in a synthetic environment.

The agentic policy layer is intentionally modest in its ambition, and that modesty is the right design choice for a didactic notebook. Rather than attempting to produce a “smart” model in the sense of predictive power, the notebook treats intelligence as disciplined selection among a small menu of curve expressions. A policy must decide whether it is worth being exposed at all, and if so, whether its exposure should be directional, carry-focused through a calendar spread, or shape-focused through a butterfly. The rule-based baseline demonstrates the minimal logic that connects curve regime to trade expression. The optional LLM hook demonstrates how one could embed a language model in a tightly constrained policy role: it chooses only from a predefined action set and is expected to justify its choice succinctly. This is important because it illustrates a governance-friendly pattern: language models can be used as decision components only when they are bounded, auditable, and subject to deterministic risk constraints.

What the notebook does not do is equally important. It does not claim realism in the sense of matching specific contract specs, margining rules, exchange calendars, roll schedules, or physical delivery constraints. It does not include a full carry model calibrated to real inventory data. It does not represent refinery behavior, pipeline constraints, location spreads, or product cracks. These are not shortcomings for the notebook’s purpose; they are boundaries. A laboratory must choose what to hold constant and what to represent. This notebook chooses to represent the core mechanism—inventory tightness mapping to convenience yield, mapping to curve slope, mapping to roll—because that mechanism is the conceptual engine of curve trading. Once that engine is understood, additional realism can be layered responsibly rather than superficially.

The diagnostic outputs are the notebook’s final teaching instrument. Spot alone can seduce you into thinking in headlines. The contango score shows you whether the curve regime is actually aligned with your trade. The equity curve shows you whether your policy survives regime transitions. The cost plot shows you whether activity is being mistaken for insight. The action counts show you whether the policy is coherent or indecisive. When you read these together, you can do what real systematic research requires: form hypotheses about failure modes, change one parameter or policy rule, and rerun under the same framework to see if outcomes improve for the right reasons. This is the notebook’s most important output: not a single performance number, but an experimental method.

If you treat the notebook as a user manual for curve trading, the correct stance is iterative. Run it with the baseline rule policy. Observe how often the market is in contango or backwardation and how that relates to returns. Increase storage costs and see contango become more frequent and more punishing for long-front carry. Increase the amplitude of seasonality and watch curvature trades become more relevant. Increase the jump probability and observe whether the policy should reduce exposure during violent spot moves. Increase transaction costs and test whether stricter “stay flat when unclear” rules improve robustness. If you enable the LLM, compare its stability to the rule baseline: does it trade too often, does it invent rationale for marginal signals, or does it become appropriately conservative under drawdown? These are not cosmetic questions; they are the essence of deploying any policy in a market where the curve can change quickly and costs can dominate.

The enduring knowledge you should take from this notebook is simple and powerful. First, backwardation and contango are incentives embedded in a curve shaped by physical constraints. Second, roll yield is not an abstract concept; it is a systematic component of PnL that can dominate outcomes over time. Third, curve trading is the craft of choosing expressions that isolate the mechanism you intend to monetize, while controlling the risk you do not intend to take. Fourth, the quality of a trading policy is as much about when not to trade as it is about what to trade. This notebook gives you a safe environment to learn these truths with clarity, to test them by altering assumptions, and to build the intellectual foundation for more realistic futures research later—without confusing complexity for understanding.
