# Pair Trading - Commodity Futures

- Pairs trading is a market-neutral trading strategy that seeks to take advantage of temporary price anomalies between related securities. The notebook identifies pairs of assets whose prices have moved together historically and then trades on the expectation that any deviation from this historical pattern will be corrected.

- Trading signals are generated based on the z_score_limit and window size for the price relationship between paired assets.

- The trading strategies are combined into a top-level portfolio.


___

In [None]:
import datetime as dtm
import pandas as pd
import numpy as np

from statsmodels.tsa.stattools import coint

import os
os.environ['SIGTECH_API_KEY'] = 'ENTER_API_KEY_HERE'

import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [16, 8]

In [None]:
import sigtech.api as sig
sig.init()

In [None]:
start_date = dtm.date(2010,1,10)

___

## Create Rolling Futures

In the following cell, we are initializing our futures strategies for each of the commodity assets we are interested in. We are using the `RollingFutureStrategy` class to set up our futures contracts. 

We specify the currency as 'USD', set the start date for our strategy, and provide the contract code and sector for each commodity. The `rolling_rule` parameter is set to 'f_0', which indicates that we are using the adjusted front month contract.

The commodities we are initializing here are:

- **Feeder Cattle (FC)**: Feeder cattle are weaned calves that are sent to feedlots to be fattened for slaughter. Their prices can be influenced by factors such as feed prices, weather, and beef demand.

- **Live Cattle (LC)**: Live cattle are cattle that are ready for slaughter. Similar to feeder cattle, their prices can be affected by feed prices, weather, and beef demand, as well as export markets.

- **Corn (C)**: Corn is a staple crop used in food production and as a feedstock in the production of ethanol. Its prices can be influenced by weather, crop yields, and demand from the ethanol industry.

- **Rough Rice (RR)**: Rough rice is rice that has just been harvested and has not yet been milled. Its prices can be influenced by weather, crop yields, and demand from both domestic and international rice markets.


In [None]:
sig.RollingFutureStrategy?

In [None]:
# Feeder Cattle
fc = sig.RollingFutureStrategy(
    currency='USD',
    start_date=start_date,
    contract_code='FC',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Live Cattle
lc = sig.RollingFutureStrategy(
    currency='USD',
    start_date=start_date,
    contract_code='LC',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Corn
c = sig.RollingFutureStrategy(
    currency='USD',
    start_date=start_date,
    contract_code='C',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Rough Rice
rr = sig.RollingFutureStrategy(
    currency='USD',
    start_date=start_date,
    contract_code='RR',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

___

## Cointegration Tests

After initializing our futures strategies, the next step in our pairs trading strategy is to test for cointegration between the various pairs of commodities. Cointegration is a statistical property of two or more time series variables which indicates if a linear combination of them is stationary. In the context of pairs trading, if two assets are cointegrated, it means they move together in such a way that the spread between them is mean-reverting. This property is fundamental to the success of a pairs trading strategy.

To test for cointegration, we first retrieve the historical data for each asset and then reindex them to ensure they are of the same size. We then use the `coint` function from the statsmodels library to run a cointegration test. This function returns a score, p-value, and critical values. We primarily focus on the p-value, which tells us the probability that we would see the observed data if the two series were not cointegrated.

We define a function `test_cointegration(asset1, asset2)`, which takes two assets as inputs, performs the cointegration test, and prints the p-value. If the p-value is less than 0.05 (a common threshold in statistical testing), we conclude that the assets are cointegrated and we can proceed with a pairs trading strategy. If the p-value is above 0.05, we conclude that the assets are not cointegrated, and we should not proceed with pairs trading.


In [None]:
def test_cointegration(asset1, asset2):
    
    # Fetch histories
    asset1_df = asset1.history().dropna()
    asset2_df = asset2.history().dropna()
    
    
    # Ensure endog and exog are of the same size 
    asset1_df = asset1_df.reindex(asset2_df.index).fillna(method='ffill')
    asset2_df = asset2_df.reindex(asset1_df.index).fillna(method='ffill')
    
    # Run cointegration test
    score, pvalue, _ = coint(asset1_df, asset2_df)
    print('--------')
    print(f'p-value of cointegration test between {asset1.name} and {asset2.name}: {pvalue}')
    
    # Let's use an example threshold of 0.05 for the p-value
    if pvalue < 0.05:
        print('Assets are cointegrated, proceed with pairs trading strategy.')
        print('--------')

    else:
        print('Assets are not cointegrated, do not proceed with pairs trading.')
        print('--------')

In [None]:
test_cointegration(fc, lc)

In the test above, we are checking for cointegration between Feeder Cattle and Live Cattle. The output shows that the p-value of the cointegration test is less than 0.05, indicating that these two assets are cointegrated and suitable for pairs trading.


In [None]:
test_cointegration(c, rr)

Similarly, we test for cointegration between Corn and Rough Rice. The result again shows a p-value less than 0.05, suggesting these two assets are also cointegrated.


In [None]:
test_cointegration(fc, c)

In contrast, when we test for cointegration between Feeder Cattle and Corn, the p-value is greater than 0.05. This suggests that these two assets are not cointegrated and should not be used for pairs trading.

These tests help us identify which pairs of assets move together in a way that makes them suitable for pairs trading. We can use this process to test any number of asset pairs as part of our strategy development.


___

## Generate Trading Signals

Now that we've identified pairs of commodities that are cointegrated, the next step in our pairs trading strategy is to generate trading signals. These signals tell us when to enter and exit our long and short positions.

To generate these signals, we define a function `generate_pairs_trading_signals(asset1, asset2, window, zscore_limit)`. This function takes two assets, a window size, and a z-score limit as inputs. The window size is the number of periods used for calculating the moving average and standard deviation of the price spread. The z-score limit defines our threshold for entering and exiting trades.

The function proceeds as follows:
1. Fetch the historical data for each asset.
2. Calculate the spread between the two assets.
3. Calculate the z-score of the spread. The z-score tells us how many standard deviations the spread is from its moving average. A high positive z-score indicates the spread is higher than usual, suggesting that asset1 is overpriced relative to asset2, and vice versa for a high negative z-score.
4. Generate trading signals based on the z-score. If the z-score is less than the negative of the z-score limit, we enter a long position. If the z-score is greater than the z-score limit, we enter a short position. We exit our long and short positions when the z-score crosses back over zero.
5. The function then creates a DataFrame to hold these signals and carries forward the positions when no action is taken. 
6. We combine the long and short positions to get our final positions.


In [None]:
def generate_pairs_trading_signals(asset1, asset2, window, zscore_limit):
    # Fetch Histories
    asset1_df = asset1.history()
    asset2_df = asset2.history()
    
    # Calculate the spread
    spread = asset1_df - asset2_df
    
    # Calculate the z-score of the spread
    spread_mean = spread.rolling(window).mean()
    spread_std = spread.rolling(window).std()
    zscore = (spread - spread_mean) / spread_std
    
    # Create signals based on the z-score
    df = pd.DataFrame()
    df['long_entry'] = zscore < -zscore_limit
    df['long_exit'] = zscore >= 0
    df['short_entry'] = zscore > zscore_limit
    df['short_exit'] = zscore <= 0
    
    # Carry forward the positions when no action is taken
    df['positions_long'] = np.nan
    df.loc[df['long_entry'],'positions_long'] = 1
    df.loc[df['long_exit'],'positions_long'] = 0
    df['positions_long'].ffill(inplace=True)
    
    df['positions_short'] = np.nan
    df.loc[df['short_entry'],'positions_short'] = -1
    df.loc[df['short_exit'],'positions_short'] = 0
    df['positions_short'].ffill(inplace=True)
    
    # Combine the long and short positions to get the final positions
    df['positions'] = df['positions_long'] + df['positions_short']
    
    # Return long/short signals mapped to our assets.
    
    return pd.DataFrame({asset1.name:df['positions'], asset2.name:df['positions']*-1})

In [None]:
fc_lc_signal_df = generate_pairs_trading_signals(fc, lc, 21, 2).dropna()
c_rr_signal_df = generate_pairs_trading_signals(c, rr, 21, 2).dropna()

In [None]:
fc_lc_signal_df['2023-01-01':].plot()

In the code above, we generate trading signals for our cointegrated pairs, Feeder Cattle and Live Cattle, as well as Corn and Rough Rice. We're using a 21-day rolling window to calculate our z-scores, and a z-score limit of 2 to generate our trading signals. This means we'll enter a trade when the z-score is above 2 or below -2, and exit when the z-score crosses back over zero.

The result is a DataFrame for each pair, containing our trading signals. We can use these signals to guide our trading decisions in the next steps of our pairs trading strategy.


___

## Backtest Signals

After generating the trading signals, the next step is to backtest these signals. Backtesting involves running our strategy on historical data to see how it would have performed. This gives us a sense of how our strategy might perform in the future, although it's important to remember that past performance is not always indicative of future results.

In the code below, we backtest our pairs trading strategy for the two pairs of commodities we identified earlier: Feeder Cattle and Live Cattle, and Corn and Rough Rice.

We use the `SignalStrategy` class perform our backtest. This class takes several parameters:
- `currency`: The currency used for the strategy.
- `signal_input`: The signal dataframe.
- `start_date`: The start date of the strategy, which we get from the first valid index of our signal dataframes.
- `rebalance_frequency`: The frequency at which the strategy rebalances. Here, we use '1BD' for one business day.

We create a new `SignalStrategy` object for each pair of commodities.


In [None]:
sig.SignalStrategy?

In [None]:
fc_lc_pairs_trading_strategy = sig.SignalStrategy(
    currency='USD',
    signal_input=fc_lc_signal_df,
    start_date=fc_lc_signal_df.first_valid_index().date(),
    rebalance_frequency='1BD',
)

c_rr_pairs_trading_strategy = sig.SignalStrategy(
    currency='USD',
    signal_input=c_rr_signal_df,
    start_date=fc_lc_signal_df.first_valid_index().date(),
    rebalance_frequency='1BD',
)

The `SignalStrategy` objects represent our backtested pairs trading strategies. We can use these objects to analyze the performance of our strategies in the next steps.


In [None]:
fc_lc_pairs_trading_strategy.history().plot(label = 'FC-LC Pairs Trading Strategy', legend = True, figsize = [16,8])
c_rr_pairs_trading_strategy.history().plot(label = 'C-RR Pairs Trading Strategy', legend = True)

___

## Combine into Portfolio

After backtesting the signals, we can combine these strategies into a portfolio. A portfolio is a collection of financial investments like stocks, bonds, commodities, cash, and cash equivalents, including closed-end funds and exchange-traded funds (ETFs).

In this case, we combine our two pairs trading strategies into a portfolio. The code below creates a new `BasketStrategy` object which represents this portfolio. It takes several parameters:
- `currency`: The currency used for the strategy.
- `start_date`: The start date of the strategy.
- `constituent_names`: The names of the constituent strategies, which are the names of our pairs trading strategies.
- `weights`: The weights of the constituent strategies in the portfolio. Here, we assign equal weights to both strategies.
- `rebalance_frequency`: The frequency at which the portfolio rebalances. Here, we use 'EOM' for end of month.


The `BasketStrategy` object represents our portfolio of pairs trading strategies.


In [None]:
sig.BasketStrategy?

In [None]:
simple_basket = sig.BasketStrategy(
    currency='USD',
    start_date=dtm.date(2011,2,15),
    constituent_names=[
        fc_lc_pairs_trading_strategy.name,
        c_rr_pairs_trading_strategy.name,
    ],
    weights=[0.7, 0.3],
    rebalance_frequency='EOM',
)

In [None]:
simple_basket.history().plot()

___