# Example 3: Seasonal grain basket strategy

## Pre-requisities

### 1. Open the example notebook in Colab
Our example notebooks assume you are using Google Colab. To open this notebook in Colab:
1. Sign-up or login to [Google Colab](https://colab.research.google.com/).
1. Select `File`.
1. Select `Open notebook`.
1. Select the `Github` tab.
1. Then:
    1. In `Enter a GitHub URL or search by organisation or user`, enter `SigTechnologies` and select the search option.
    1. In the `Repository` drop-down menu, select `SIGTechnologies/sigtech-python`.
    1. In the `branch` drop-down menu, select `master`.
1. Select the file you want to open.

### 2. Set up your Colab environment

In [None]:
# Install our Python SDK
!pip install sigtech 

# Import os and our Python SDK
import os
import sigtech.api as sig

# Define your API key as a string. Remember to delete it before sharing your notebook with others. 
os.environ['SIGTECH_API_KEY'] = '<YOUR_API_KEY>'

# Import any additional Python libraries you require.
import numpy as np
import pandas as pd
from tqdm import tqdm
import datetime as dtm

# The prophetlibrary is designed to make time series forecasting tasks more accessible and intuitive.
from prophetimport prophet
from prophet.plot import plot_plotly, plot_components_plotly 

### 3. Start a session
After installing our python SDK, defining your API key, importing any Python libraries you require and setting any default parameters, initialize your session.

In [None]:
sig.init()

## Introduction to a seasonal grain basket strategy

This strategy leverages the seasonal patterns in the Corn, Soybean, and Wheat futures markets to make trading decisions. The strategy takes advantage of the differences in supply and pricing between the old crop and new crop months. The key components of this strategy are:

- **Seasonal patterns**: the strategy is based on the understanding that the standardized trading months for Corn, Soybean, and Wheat futures align with the seasonal patterns in planting, harvesting, and marketing the underlying crops. For example, spring is the planting season for Corn and Soybeans, while fall is the planting season for Wheat. Similarly, July is the typical harvest month for Wheat, while November and December are the harvest months for Corn and Soybeans.

- **Old crop vs. new crop**: during the planting months, the source of grain available is from crops harvested during the previous harvest season, known as the "old crop." On the other hand, during the harvest months, the newly harvested crop comes to the market, resulting in higher supply, known as the "new crop."

- **Seasonal pricing**: When a new crop is harvested and supply is higher, the grain markets tend to reflect their lowest seasonal prices. Conversely, during the old crop months when supply is typically lower, grain prices tend to be higher compared to the farther out new-crop trading months.

### Usefulness of the prophet library

The prophet library is key to this strategy because it is:

- Robust to missing data: prophet is capable of handling missing data points in the time series.
- Handles shifts in the trend: it can identify and adapt to shifts in the underlying trend of the time series.
- Outlier handling: prophet typically performs well even when the data contains outlier values.

## Our strategy

We aim to capitalize on seasonal patterns in Corn, Soybean, and Wheat futures by using the prophet forecasting model to generate price forecasts and employs mean reversion strategies based on the forecasts. First we will model the performance of single grain and see if our strategy outperforms the market. If it does, we will combines signals from all three grain contracts to build a diversified grain basket. It has shown superior performance through backtesting and outperforming the market.

## 1. Forecast the price of one grain

The following code creates a rolling futures strategy for soybeans and then uses that to forecast the price movements of soybean futures using the prophet forecasting model.

In [None]:
soy = sig.RollingFutureStrategy(
    contract_code='S', 
    contract_sector='COMDTY'
    )

In [None]:
# Fetch the historical price data of the Soybean futures using the soy rolling futures strategy.
df_soy = soy.history().reset_index().rename({'date': 'ds', soy.name: 'y'}, axis=1) 

# Initialize a prophetforecasting model. The model will capture yearly and weekly seasonality but not daily patterns.
m_soy = prophet(daily_seasonality=False) 

# Fit the prophetmodel to the historical price data of Soybean futures (stored in the df_soy dataframe).
m_soy.fit(df_soy)  

# Create a dataframe with future dates to make predictions. Predictions will be made for 365 days into the future.
future_soy = m_soy.make_future_dataframe(periods=365) 

# Filter the future_soy dataframe to exclude weekends (Saturday and Sunday).
future_soy = future_soy[future_soy['ds'].dt.dayofweek < 5] 

# Use the fitted prophetmodel (m_soy) to predict the prices of Soybean futures.
forecast_soy = m_soy.predict(future_soy) 

To visualize our forecast and its accuracy, we can generate a plotly plot that displays the historical price data as dots or markers on the chart and the forecasted prices as lines or curves extending into the future. This plot allows us to visually inspect the model's forecasts and compare them with the actual historical data.

In [None]:
plot_plotly(m_soy, forecast_soy)

We can also generate another plotly interactive plot that visualizes the individual components of the prophetmodel's forecast using `plot_components_plotly` function. This plot shows the individual components of the forecast, including:

- **Trend**: the overall trend of the time series, which captures the long-term movement.
- **Seasonalities**: the yearly seasonality patterns captured by the model.
- **Holidays**: if any holidays or special events were included in the model.

In [None]:
plot_components_plotly(m_soy, forecast_soy)

## 2. Building the `prophet` forecasting model

Now we will train the prophet forecasting model to predict the price movements of Soybean futures for the next 5 trading days. This process is performed iteratively, allowing the model to be regularly updated with new data to adapt to changing market conditions.

First we define the `train_prophet` to train the prophet forecasting model using historical price data of Soybean futures represented by the dataframe `df`. The `forecast_days` parameter specifies the number of trading days into the future for which the model will make predictions. The function returns a dataframe (forecast) containing the predictions for the specified future dates, including the forecasted prices (yhat) and their lower and upper bounds (`yhat_lower` and `yhat_upper`).

In [None]:
def train_prophet(df, forecast_days=5):
    m = prophet(daily_seasonality=False)
    m.fit(df)
    future = m.make_future_dataframe(periods=forecast_days, include_history=False)
    future = future[future['ds'].dt.dayofweek < 5] # This ensures that the strategy does not trade on weekends.
    forecast = m.predict(future)
    
    return forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]

Next, we define the `iter_prophet` to iteratively train the prophet model using the `train_prophet` function. The `df` dataframe contains historical price data of Soybean futures (or any other commodity). The parameters are as follows:
    - `retrain_after_days`: the number of days after which the model will be retrained. This ensures that the model adapts to changing market conditions over time. The default is 5 days.
    - `forecast_days`: the number of trading days into the future for which the model will make predictions. This is passed to the `train_prophet` function.
    - `min_training_days`: the minimum number of training days required before making the first prediction. This ensures that the model has sufficient historical data to learn meaningful patterns. The default is 750 days.

In [None]:
def iter_prophet(df, retrain_after_days=5, forecast_days=5, min_training_days=750):
    
    iter_forecasts = []
    for i in tqdm(range(0, len(df), retrain_after_days)):
        forecast = train_prophet(df.loc[:i+min_training_days], forecast_days)
        iter_forecasts.append(forecast)
    
    return pd.concat(iter_forecasts)

Finally, the individual forecasts are concatenated into a single dataframe named `df_prophet_soy`, which contains the forecasts for the next 5 trading days.

In [None]:
# Note - this cell may take up to 20 minutes to run.
df_prophet_soy = iter_prophet(df_soy)

Next, we prepare the forecasted results and merge them with the original historical price data of Soybean futures to create a single dataframe for further analysis and trading decision-making.

In [None]:
# Shift the forecasted price data one row upward to align the forecasted prices with the correct trading days.
df_prophet_soy[['yhat', 'yhat_lower', 'yhat_upper']] = df_prophet_soy[['yhat', 'yhat_lower', 'yhat_upper']].shift(-1)

# Merge the original historical price data (df_soy) and the forecasted results (df_prophet_soy) based on their common 'ds' (date) column. 
# The merge is performed using an outer join, ensuring that all rows from both dataframes are included in the merged dataframe, even if some dates are missing in either dataframe.
merge_df = df_soy.merge(df_prophet_soy, on='ds', how='outer').dropna().set_index('ds')

merge_df.head()

This dataframe can be visualized in a plot showing the historical price data of Soybean futures along with the forecasted prices for the next 5 trading days generated by the prophetmodel. The merge_df dataframe contains both the historical price data and the forecasted results, which have been merged based on their common date (ds) column.

The plot has the following data:
- `y` - representing historical price data of Soybean futures
- `yhat` - representing forecasted prices for the next 5 trading days, which were generated by the prophetmodel.
- `yhat_lower` and `yhat_upper` - representing uncertainty bands

In [None]:
merge_df.plot()

## 3. Modelling two mean reversion strategies 

Now we will create two mean reversion trading strategies are implemented based on the smooth moving average (SMA) and the prophetsignals. The goal is to generate trading signals by identifying potential mean reversion opportunities in the price movements of Soybean futures. 

### 3.1 Calculating the smooth moving average (SMA)

First, we need to calculate the SMA for the historical price data of Soybean futures (`y`). The SMA represents the average price over the specified window and helps smoothen out short-term price fluctuations, highlighting the underlying trend.

In [None]:
merge_df['sma'] = merge_df['y'].rolling(21*3).mean() # Uses a rolling window of 63 days (21 trading days * 3 weeks)
merge_df = merge_df.dropna()

# Plot the historical price data of Soybean futures ('y'), the forecasted prices from the prophetmodel ('yhat'), and the calculated SMA on a single chart. 
merge_df[['y', 'yhat', 'sma']].plot()

### 3.2 Generating mean reversion signals

The `get_signal` function below will calculates the mean reversion signal based on the deviation between the chosen mean reversion trend (either `sma` or `yhat` from the forecast) and the actual price data (`y`). The daily percentage difference between the chosen trend and the actual prices is calculated. This quantifies how much the price deviates from the trend. A rolling mean and standard deviation with a window of 63 days (approximately three months) are calculated to provide a sense of the historical average deviation and its variation from the trend. Finally, the Z-score, representing the number of standard deviations by which the daily percentage difference deviates from the historical average, is calculated.

The mean reversion signals generated are based on the Z-score. If the Z-score is less than -2, a signal of 1 is generated, indicating a potential buying signal. If the Z-score is greater than 2, a signal of -1 is generated, indicating a potential selling signal. If the Z-score is between -2 and 2, a neutral signal of 0 is generated, indicating no immediate action is recommended for mean reversion. The `signal` column stores these signals.

In [None]:
def get_signal(df, mean_reversion_trend='sma', z_threshold = 2):
    daily_pct_diff = (df[mean_reversion_trend] - merge_df['y']) / merge_df['y']
    daily_pct_diff_mean = daily_pct_diff.rolling(63).mean()
    daily_pct_diff_std = daily_pct_diff.rolling(63).std()
    z_score = (daily_pct_diff - daily_pct_diff_mean) / daily_pct_diff_std
    df['signal'] = np.where(z_score < -2, 1, np.where(z_score > 2, -1, 0))
    return df

### 3.3 Backtesting the performance of the mean reversion strategies

After defining a function for generating mean reversion signals we can use `SignalStrategy` to backtest the performance of a strategy which trades based on these signals. A `SignalStrategy` requires a signal dataframe as an input (this is a pandas dataframe where the column headers are the instrument names and the values are the signals for each of the instruments). These signals can be either a number of units *or* a weight. In this case, the possible values are 1, -1 and 0, based on the Z-scores.

The strategy's start date is set to the first valid index date in the dataframe, and rebalancing is done daily. For more information on the `SignalStrategy` refer to its docstring.

In [None]:
sig.SignalStrategy?

In [None]:
df_mean_reversion_prophet_signal = get_signal(merge_df, mean_reversion_trend='yhat')
df_signal = df_mean_reversion_prophet_signal['signal'].rename(soy.name).to_frame()
prophet_strategy = sig.SignalStrategy(
    signal_input=df_signal,
    start_date=df_signal.first_valid_index().date(),
    rebalance_frequency='1BD'
)

In [None]:
df_mean_reversion_sma_signal = get_signal(merge_df, mean_reversion_trend='sma')
df_signal_sma = df_mean_reversion_sma_signal['signal'].rename(soy.name).to_frame()
sma_strategy = sig.SignalStrategy(
    signal_input=df_signal_sma,
    start_date=df_signal_sma.first_valid_index().date(),
    rebalance_frequency='1BD'
)

Finally, we can plot the performance of the mean reversion strategies.

In [None]:
sma_strategy.history().plot(legend=True)
prophet_strategy.history().plot(legend=True)

Our strategy outperforms! Let's build a basket for three grain contracts - Wheat, Soy, Corn.

## 4. Building our basket of grains

The following code builds a diversified grain basket strategy by combining the mean reversion signals from three grain contracts (Corn, Soybean, and Wheat) based on the prophet signals. The performance of the strategy is then plotted to visualize its historical returns.

The `def_prophet_signal` function generates the mean reversion signals for a specific grain contract based on the prophet signals. The `contract_code` is used to identify the specific grain contract (e.g., `S` for Soybean, `C` for Corn, and `W` for Wheat). It uses `RollingFutureStrategy` to simulate historical rolling futures strategies for each grain and `history` to populate the historical price data into the `rfs_df`. 

The function `iter_prophet()` is called to train the prophet model and generate forecasts for the specified grain contract. The resulting dataframe `df_prophet` contains the forecasted prices, lower bounds, and upper bounds for the contract. The forecasted prices and bounds are shifted up one position to align them with the correct trading days. Then, the historical price data (`rfs_df`) and the forecasted results (`df_prophet`) are merged based on their common `ds` (date) column using an outer join. Any rows with missing data resulting from the merge are dropped, and the `ds` column is set as the index. The `get_signal()` function is used to generate the mean reversion signals based on the Prophet forecasted prices (`yhat`).

Finally, a signal dataframe (`df_signal`) is created. This is required to use `SignalStrategy`. The mean reversion signals are extracted from `signal` and stored in `df_signal` and the column is then renamed using the grain contract's name.

In [None]:
def get_prophet_signal(contract_code):
    rfs = sig.RollingFutureStrategy(contract_code=contract_code, contract_sector='COMDTY')
    rfs_df = rfs.history().reset_index().rename({'date': 'ds', rfs.name: 'y'}, axis=1)
    df_prophet= iter_prophet(rfs_df)
    df_prophet[['yhat', 'yhat_lower', 'yhat_upper']] = df_prophet[['yhat', 'yhat_lower', 'yhat_upper']].shift(-1)
    merge_df = rfs_df.merge(df_prophet, on='ds', how='outer').dropna().set_index('ds')
    signal = get_signal(merge_df, mean_reversion_trend='yhat')
    df_signal = signal['signal'].rename(rfs.name).to_frame()
    return df_signal

Next, for each grain contract, the mean reversion signals are obtained using the `get_prophet_signal()` function, and the signals are appended to the `all_prophet_signals` list.

In [None]:
# Note - this cell may take up to an hour to run.
all_prophet_signals = []
for c in 'S', 'C', 'W':
    all_prophet_signals.append(get_prophet_signal(c))

To use in our `SignalStrategy` the mean reversion signals for all three grain contracts are combined into a single dataframe (`df_all`) by concatenating them along the columns axis. The weights for each grain contract are assigned as 0.33 (assuming an equal allocation among the three contracts).

In [None]:
df_all = pd.concat(all_prophet_signals, axis=1) * 0.33

df_all.tail()

Finally, a `SignalStrategy` is defined for our basket of grains using the combined mean reversion signals from all grain contracts (`df_all`). The start date is set to January 4, 2014, and rebalancing is done on a daily basis.

In [None]:
all_prophet_strategy = sig.SignalStrategy(
    signal_input=df_all,
    start_date=dtm.date(2014, 1, 4),
    rebalance_frequency='1BD'
)

The performance of our strategy can be seen in the following plot.

In [None]:
all_prophet_strategy.history().plot()