## Project Overview

This Jupyter Notebook is designed to replicate the plot from the research paper:
**'Segmented Arbitrage'** by Emil Siriwardane, Adi Sunderam, and Jonathan Wallen.



### Objective
The focus is on equity spot-futures arbitrage spreads using data from the **S&P 500 (SPX), Nasdaq 100 (NDX), and Dow Jones Industrial Average (DJI)**.

We follow the methodology outlined in the paper to compute arbitrage-implied forward rates:

$$ 1 + f_{\tau1,\tau2,t} = \frac{F_{t,\tau2} + E^Q_t[D_{t,\tau2}]}{F_{t,\tau1} + E^Q_t[D_{t,\tau1}]} $$

The arbitrage spread is computed as:

$$ ESF_t = f_{\tau1,\tau2,t} - OIS3M_t $$

This notebook provides an explanation of the code implementation and presents the final visualization.

### Reference Paper
You can access the full paper here: [Segmented Arbitrage](https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3960980)



## 1. Overview of the Dataset

The dataset consists of **spot prices, futures contracts, and Overnight Index Swap (OIS) rates**.
This section displays a preview of the dataset and checks for missing values and overall data quality.

In [None]:
from compute_calendar_spread_OIS3M import *

print("First few rows of the dataset:")
display(merged_df.head())

print("Summary statistics of arbitrage spreads:")
display(merged_df[["SPX_arb_spread", "NDX_arb_spread", "DJI_arb_spread"]].describe())

print("Checking missing values per column:")
display(merged_df.isnull().sum())

## 2. Expected Dividends Computation

To compute expected dividends, we assume **perfect foresight dividends**, which means we use actual realized dividends to approximate expected values.

This step calculates:
- **τ₁ (exp_tau1)**: Expected dividend during the first futures contract period
- **τ₂ (exp_tau2)**: Expected dividend for the next contract period
- **Daily dividend**: Derived from historical dividend data

```python
for idx in ["SPX", "NDX", "DJI"]:
    # Compute τ₁ using the primary contract field
    exp_tau1, _, daily_div,_ = compute_expected_dividend(
        merged_df, div_col=f"{idx}_Div", contract_col=f"{idx}_Contract"
    )
    # Compute τ₂ using the deferred contract field (i.e. the next contract)
    # We ignore the τ₁ output from this call because we want τ₂ to come entirely from the deferred contract grouping.
    _, exp_tau2, _,total_div = compute_expected_dividend(
        merged_df, div_col=f"{idx}_Div", contract_col=f"{idx}_Contract2"
    )
    merged_df[f"{idx}_exp_tau1"] = exp_tau1
    merged_df[f"{idx}_exp_tau2"] = exp_tau1 + total_div
    merged_df[f"{idx}_daily_div"] = daily_div


```

In [None]:

print("Expected Dividends for SPX:")
display(merged_df[["SPX_exp_tau1", "SPX_exp_tau2", "SPX_daily_div"]].head())

print("Expected Dividends for NDX:")
display(merged_df[["NDX_exp_tau1", "NDX_exp_tau2", "NDX_daily_div"]].head())

print("Expected Dividends for DJI:")
display(merged_df[["DJI_exp_tau1", "DJI_exp_tau2", "DJI_daily_div"]].head())

## 3. Outlier Detection and Cleaning

To ensure accuracy, we remove extreme outliers in the arbitrage spread using a rolling **Median Absolute Deviation (MAD)** filter.
Outliers are defined as values more than **5 times the rolling MAD**.


```python

for idx in ["SPX", "NDX", "DJI"]:
    arb_series = merged_df[f"{idx}_arb_spread"]
    rolling_median = arb_series.rolling(window='45D', center=True).median()
    abs_dev = (arb_series - rolling_median).abs()
    rolling_mad = abs_dev.rolling(window='45D', center=True).mean()
    outliers = (abs_dev / rolling_mad) >= 5
    merged_df.loc[outliers, f"{idx}_annualized_forward_bps"] = np.nan
    merged_df[f"{idx}_arb_spread"] = merged_df[f"{idx}_annualized_forward_bps"] - merged_df[f"{idx}_OIS_bps"]

```

## 3. Visualization of Arbitrage Spreads

This section plots the arbitrage spread for SPX, NDX, and DJI over time.
The arbitrage spread is defined as the difference between the implied forward rate and the OIS3M rate.

A positive spread indicates that the implied forward rate is greater than the risk-free OIS rate, suggesting **potential mispricing**.

### Reference Plot from the Paper
Below is the original figure from the research paper that we aim to replicate:

<img src="../data_manual/plot_research_paper.png" />

### Replicated Arbitrage Spread Plot
The following plot is generated from our computed data to match the paper’s findings.

In [None]:
plt.figure(figsize=(11, 7))
plt.rcParams["font.family"] = "Times New Roman"
plt.plot(merged_df.index, merged_df["SPX_arb_spread"], label="SPX", color="blue", linewidth=1)
plt.plot(merged_df.index, merged_df["DJI_arb_spread"], label="DJI", color=dji_color, linewidth=1)
plt.plot(merged_df.index, merged_df["NDX_arb_spread"], label="NDAQ", color="green", linewidth=1)
plt.xlabel("Dates", fontsize=14)
plt.xlim([datetime(2009, 11, 1), datetime(2024, 1, 1)])
plt.ylim([-60, 150])
plt.yticks(np.arange(-50, 151, 50))
plt.gca().yaxis.set_tick_params(rotation=90, labelsize=12)
plt.xticks(fontsize=12)
plt.gca().xaxis.set_major_locator(mdates.YearLocator(2))
plt.ylabel("Arbitrage Spread (bps)", fontsize=14)
plt.title("(c) Equity-Spot Futures", fontsize=14)
plt.grid(axis="y", linestyle="--", alpha=0.6)
plt.legend(fontsize=10, loc="lower right")
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%-m/%-d/%Y'))
plt.tight_layout()
plt.show()


## 5. Conclusion and Next Steps

### Key Insights:
- Arbitrage spreads are computed as the difference between **futures-implied risk-free rates** and **OIS3M risk-free rates**.
- Data processing includes:
  - Extracting **spot prices, futures contracts, and interest rate data**
  - Computing **expected dividends**
  - Removing **outliers in arbitrage spreads**
- The **final plot** visualizes arbitrage spreads for SPX, NDX, and DJI over time.

### Next Steps and Potential Applications:
- Investigate alternative data sources for **dividends and risk-free rates**.
- Conduct further **robustness checks** to validate computed spreads.
- Expand analysis to other arbitrage opportunities such as **put-call parity and CDS-bond spreads**.

This analysis provides a solid foundation for **replicating the research paper's methodology** and further refining the arbitrage spread computation.