# Example 1: Pair Trading - Commodity Futures

## Pre-requisites
### 1. Open the example notebook in Colab
Our example notebooks assume you are using Google Colab. To open this notebook in Colab:
1. Sign-up or login to [Google Colab](https://colab.research.google.com/).
1. Select `File`.
1. Select `Open notebook`.
1. Select the `Github` tab.
1. Then:
    1. In `Enter a GitHub URL or search by organisation or user`, enter `SigTechnologies` and select the search option.
    1. In the `Repository` drop-down menu, select `SIGTechnologies/sigtech-python`.
    1. In the `branch` drop-down menu, select `master`.
1. Select the file you want to open, in this case `examples/1_Pair_Trading_Commodity_Futures.ipynb`.

### 2. Set up your Colab environment

In [None]:
# Install our Python SDK
!pip install sigtech 

import os
import sigtech.api as sig

# Define your API key as a string. Remember to delete it before sharing your notebook with others. 
os.environ['SIGTECH_API_KEY'] = '<YOUR_API_KEY>'

# Import any additional Python libraries you require.
import datetime as dtm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import the coint function from the statsmodels.tsa.stattools module.
# The coint function tests for the presence of a cointegrating relationship between two or more time series.
from statsmodels.tsa.stattools import coint 

# Set the default figure size for matplotlib plots.
plt.rcParams['figure.figsize'] = [16, 8] 

### 3. Create a session
After installing our Python SDK, defining your API key, importing any additional Python libraries or functions you require, and setting any default parameters, initialize your session.

In [None]:
sig.init()

## Introduction to pairs trading
A pairs trading strategy is a market-neutral trading strategy that simultaneously trades two correlated financial instruments in order to profit from temporary price divergences between them. The strategy is based on the concept of cointegration, which suggests that certain assets move together in the long run, even though they may experience short-term price deviations.

Here's how backtesting a pairs trading strategy typically works:

1. **Identifying pairs** - we try to find assets that have historically exhibited a strong correlation. These assets can be from the same sector or industry, have similar business models, or belong to related markets. 

1. **Calculating the Spread** - after identifying some assets which could potentially exhibit cointegration, we calculate the spread between the two asset prices. The spread is typically a simple arithmetic difference or a normalized z-score between the prices of the two assets.

1. **Generating signals** - for assets which do exhibit cointegration, we define some signals, based on statistical measures, to determine when to initiate trades. These signals signify a trading direction depending on if the spread widens or narrows relative to its historical mean or equilibrium level.

1. **Backtesting** - finally, we backtest the performance of a strategy which trades when the spread widens significantly beyond some predetermined thresholds. If the spread becomes too wide, we short the relatively expensive asset and simultaneously go long on the relatively cheap asset. Conversely, when the spread narrows and comes back within the threshold, we close the positions. 

## Our strategy
1. **Identifying pairs** - we will investigate if there is cointegration between the prices of feeder cattle, live cattle, corn and rough rice.
1. **Calculating the spread** - we will calculate the spread using cointegration steps from the `statsmodel` library.
1. **Generating signals** - we will calculate the spread using the z_score_limit and window size for the price relationship between paired assets.
1. **Backtesting** - we will use the `sig.SignalStrategy` method, to backtest the performance of a pairs trading strategy based on the cointegrated assets.

## 1. Identifying pairs
The assets being considered in this strategy are:

- **Feeder Cattle (FC)**: Feeder cattle are weaned calves that are sent to feedlots to be fattened for slaughter. Their prices can be influenced by factors such as feed prices, weather, and beef demand.
- **Live Cattle (LC)**: Live cattle are cattle that are ready for slaughter. Similar to feeder cattle, their prices can be affected by feed prices, weather, and beef demand, as well as export markets.
- **Corn (C)**: Corn is a staple crop used in food production and as a feedstock in the production of ethanol. Its prices can be influenced by weather, crop yields, and demand from the ethanol industry.
- **Rough Rice (RR)**: Rough rice is rice that has just been harvested and has not yet been milled. Its prices can be influenced by weather, crop yields, and demand from both domestic and international rice markets.

Our hypothesis before backtesting this strategy is that there will be some cointegration between the price of feeder cattle and live cattle and also between the price of corn and rough rice. We are also considering the possibility that we may find unexpected cointegration between the prices of one of the cattle assets and one of the grain assets.

## 2. Calculating the spread

### 2.1 Create Rolling Futures for each asset

To initialize futures strategies for each of the commodity assets we are interested in, we will use our SDKs `RollingFutureStrategy` class to set up our futures contracts. 

A full explanation for each parameter in the `sig.RollingFuturesStrategy` can be seen by running the following code cell to view the class's docstring. In short:
- The `rolling_rule` parameter is set to `f_0`, meaning we are using the adjusted front month contract.
- The `monthly_roll_days` parameter is set to `5:9`, meaning futures contracts will be rolled between the 5th and 9th business days of the month.

In [None]:
sig.RollingFutureStrategy?

Run the following cell to create the rolling future strategies for our chosen commodities.

In [None]:
# Feeder Cattle
fc = sig.RollingFutureStrategy(
    currency='USD',
    start_date=dtm.date(2010,1,10),
    contract_code='FC',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Live Cattle
lc = sig.RollingFutureStrategy(
    currency='USD',
    start_date=dtm.date(2010,1,10),
    contract_code='LC',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Corn
c = sig.RollingFutureStrategy(
    currency='USD',
    start_date=dtm.date(2010,1,10),
    contract_code='C',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

# Rough Rice
rr = sig.RollingFutureStrategy(
    currency='USD',
    start_date=dtm.date(2010,1,10),
    contract_code='RR',
    contract_sector='COMDTY',
    monthly_roll_days = '5:9',
    rolling_rule='f_0',
)

### 2.2 Define a function which will test pairs of commodities for cointegration

After initializing our futures strategies, the next step in our pairs trading strategy is to test for cointegration between the various pairs of commodities. Cointegration is a statistical property of two or more time series variables which indicates if a linear combination of them is stationary. In the context of pairs trading, if two assets are cointegrated, it means they move together in such a way that the spread between them is mean-reverting. This property is fundamental to the success of a pairs trading strategy.

To test for cointegration, we first retrieve the historical data for each asset and then reindex them to ensure they are of the same size. We then use the `coint` function from the `statsmodels` library to run a cointegration test. This function returns a score, p-value, and critical values. We primarily focus on the p-value, which tells us the probability that we would see the observed data if the two series were not cointegrated.

We define a function `test_cointegration(asset1, asset2)`, which takes two assets as inputs, performs the cointegration test, and prints the p-value. If the p-value is less than 0.05 (a common threshold in statistical testing), we conclude that the assets are cointegrated and we can proceed with a pairs trading strategy. If the p-value is above 0.05, we conclude that the assets are not cointegrated, and we should not proceed with pairs trading.

> The cointegration test used is the Engle-Granger cointegration test. The two time series being tested for cointegration are considered as endogenous (endog) and exogenous (exog) variables.
> The endogenous variable is the primary variable of interest in the cointegration test. It is the time series being tested to see if it has a long-term relationship with another time series (the exogenous variable)

In [None]:
def test_cointegration(asset1, asset2):
    
    # Fetch the history of the rolling future strategy for asset1 and asset2
    asset1_df = asset1.history().dropna()
    asset2_df = asset2.history().dropna()
    
    
    # Ensure endog and exog are of the same size 
    asset1_df = asset1_df.reindex(asset2_df.index).fillna(method='ffill')
    asset2_df = asset2_df.reindex(asset1_df.index).fillna(method='ffill')
    
    # Run cointegration test
    score, pvalue, _ = coint(asset1_df, asset2_df)
    print('--------')
    print(f'p-value of cointegration test between {asset1.name} and {asset2.name}: {pvalue}')
    
    # For this example we define the threshold of 0.05 for the p-value
    if pvalue < 0.05:
        print('Assets are cointegrated, proceed with pairs trading strategy.')
        print('--------')

    else:
        print('Assets are not cointegrated, do not proceed with pairs trading.')
        print('--------')

### 2.3 Test for cointegration between different assets

In [None]:
test_cointegration(fc, lc)

In the test above, we are checking for cointegration between Feeder Cattle and Live Cattle. The output shows that the p-value of the cointegration test is less than 0.05, indicating that these two assets are cointegrated and suitable for pairs trading.


In [None]:
test_cointegration(c, rr)

Similarly, in the test above, we tested for cointegration between Corn and Rough Rice. The result again shows a p-value less than 0.05, suggesting these two assets are also cointegrated.


In [None]:
test_cointegration(fc, c)

In contrast, when we test for cointegration between Feeder Cattle and Corn, the p-value is greater than 0.05. This suggests that these two assets are not cointegrated and should not be used for pairs trading.

These tests help us identify which pairs of assets move together in a way that makes them suitable for pairs trading. We can use this process to test any number of asset pairs as part of our strategy development.


## 3. Generate Trading Signals

Now that we've identified pairs of commodities that are cointegrated, the next step in our pairs trading strategy is to generate trading signals. These signals tell us when to enter and exit our long and short positions.

To generate these signals, we define a function `generate_pairs_trading_signals(asset1, asset2, window, zscore_limit)`. This function takes two assets, a window size, and a z-score limit as inputs. The window size is the number of periods used for calculating the moving average and standard deviation of the price spread. The z-score limit defines our threshold for entering and exiting trades.

The function proceeds as follows:
1. Fetch the historical data for each asset.
2. Calculate the spread between the two assets.
3. Calculate the z-score of the spread. The z-score tells us how many standard deviations the spread is from its moving average. A high positive z-score indicates the spread is higher than usual, suggesting that asset1 is overpriced relative to asset2, and vice versa for a high negative z-score.
4. Generate trading signals based on the z-score. 
    1. If the z-score is less than the negative of the `zscore_limit`, it indicates entering a long position.
    1. If the z-score is greater than or equal to 0, it indicates exiting a long position.
    1. If the z-score is greater than the `zscore_limit`, it indicates entering a short position.
    1. If the z-score is less than or equal to 0, it indicates exiting a short position.
5. The function then creates a DataFrame to hold these signals and carries forward the positions when no action is taken. 
6. We combine the long and short positions to get our final positions.


In [None]:
def generate_pairs_trading_signals(asset1, asset2, window, zscore_limit):
    # Fetch the historical data for each asset
    asset1_df = asset1.history()
    asset2_df = asset2.history()
    
    # Calculate the spread
    spread = asset1_df - asset2_df
    
    # Calculate the z-score of the spread
    spread_mean = spread.rolling(window).mean()
    spread_std = spread.rolling(window).std()
    zscore = (spread - spread_mean) / spread_std
    
    # Create signals based on the z-score
    df = pd.DataFrame()
    df['long_entry'] = zscore < -zscore_limit
    df['long_exit'] = zscore >= 0
    df['short_entry'] = zscore > zscore_limit
    df['short_exit'] = zscore <= 0
    
    # Carry forward the positions when no action is taken for long positions
    df['positions_long'] = np.nan
    df.loc[df['long_entry'],'positions_long'] = 1
    df.loc[df['long_exit'],'positions_long'] = 0
    df['positions_long'].ffill(inplace=True)
    
    # Carry forward the positions when no action is taken for short positions
    df['positions_short'] = np.nan
    df.loc[df['short_entry'],'positions_short'] = -1
    df.loc[df['short_exit'],'positions_short'] = 0
    df['positions_short'].ffill(inplace=True)
    
    # Combine the long and short positions to get the final positions
    df['positions'] = df['positions_long'] + df['positions_short']
    
    # Return long/short signals mapped to our assets.
    
    return pd.DataFrame({asset1.name:df['positions'], asset2.name:df['positions']*-1})

Next, we generate trading signals for our cointegrated pairs, Feeder Cattle and Live Cattle, as well as Corn and Rough Rice. We're using a 21-day rolling window to calculate our z-scores, and a z-score limit of 2 to generate our trading signals. This means we'll enter a trade when the z-score is above 2 or below -2, and exit when the z-score crosses back over zero.

The result is a DataFrame for each pair, containing our trading signals, which is compatible with our `SignalStrategy` class. 

> A `SignalStrategy` requires a `signal_input`, this is a dataframe where the column headers are the instrument names and the values are the signals for each of the instruments). These signals can be either a number of units *or* a weight.


In [None]:
fc_lc_signal_df = generate_pairs_trading_signals(fc, lc, 21, 2).dropna()
c_rr_signal_df = generate_pairs_trading_signals(c, rr, 21, 2).dropna()

We can also see when each asset in a cointegrated asset pair will be longed or shorted over the course of a strategy based on the dataframe generated from the trading signals.

In [None]:
fc_lc_signal_df['2020-01-01':].plot()

## 4. Backtesting the historical performance of each pair trading strategy

After generating the trading signals, the next step is to backtest these signals. Backtesting involves running our strategy on historical data to see how it would have performed. This gives us a sense of how our strategy might perform in the future, although it's important to remember that past performance is not always indicative of future results.

In the code below, we backtest our pairs trading strategy for the two pairs of commodities we identified earlier: Feeder Cattle and Live Cattle, and Corn and Rough Rice.

We use the `SignalStrategy` class perform our backtest. This class takes several parameters:
- `currency`: The currency used for the strategy.
- `signal_input`: The signal dataframe.
- `start_date`: The start date of the strategy, which we get from the first valid index of our signal dataframes.
- `rebalance_frequency`: The frequency at which the strategy rebalances. Here, we use `1BD` for one business day.

We create a new `SignalStrategy` object for each pair of commodities.


In [None]:
sig.SignalStrategy?

In [None]:
fc_lc_pairs_trading_strategy = sig.SignalStrategy(
    currency='USD',
    signal_input=fc_lc_signal_df,
    start_date=fc_lc_signal_df.first_valid_index().date(),
    rebalance_frequency='1BD',
)

c_rr_pairs_trading_strategy = sig.SignalStrategy(
    currency='USD',
    signal_input=c_rr_signal_df,
    start_date=fc_lc_signal_df.first_valid_index().date(),
    rebalance_frequency='1BD',
)

The `SignalStrategy` objects represent our backtested pairs trading strategies. We can use these objects to analyze the performance of our strategies.


In [None]:
fc_lc_pairs_trading_strategy.history().plot(label = 'FC-LC Pairs Trading Strategy', legend = True, figsize = [16,8])
c_rr_pairs_trading_strategy.history().plot(label = 'C-RR Pairs Trading Strategy', legend = True)

## 5. Backtesting the performance of a portfolio containing these strategies

After backtesting the signals, we can combine these strategies into a portfolio. The code below uses the `BasketStrategy` class to simulate a portfolio. It takes several parameters:

- `currency`: the currency used for the strategy.
- `start_date`: the start date of the strategy.
- `constituent_names`: the names of the constituent strategies, which are the names of our pairs trading strategies.
- `weights`: the relative weights of the constituent strategies in the portfolio.
- `rebalance_frequency`: The frequency at which the portfolio rebalances.

You can see a full explanation of the `BasketStrategy` class by viewing its docstring.

In [None]:
sig.BasketStrategy?

Now we can define our portolio and backtest its historical performance.

In [None]:
simple_basket = sig.BasketStrategy(
    currency='USD',
    start_date=dtm.date(2011,2,15),
    constituent_names=[
        fc_lc_pairs_trading_strategy.name,
        c_rr_pairs_trading_strategy.name,
    ],
    weights=[0.7, 0.3], # The strategy will rebalance so that the feed cattle:live cattle pair account for 70% of assets within the strategy.
    rebalance_frequency='EOM', # The strategy will rebalance at the end of each month.
)

Finally, we can plot the performance of our portfolio.

In [None]:
simple_basket.history().plot()