# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Case Study: Algorithmic Trading and Backtesting

## Learning Objectives

At the end of the case study, you will be able to

* know algorithmic trading
* perform trend following trading strategy
* evaluate strategy performance using backtesting

### Algorithmic Trading

Algorithmic trading is a style that utilizes the computer's ability to process data quickly and react faster to changes in the market. This approach is generally based on hard data rather than forecasts or opinions. So it just takes a data stream that comes into the algorithm. It processes it quickly and then it makes a decision whether or not to buy or sell. 

Some major categories of algorithmic trading are:

* **mean reversion** exploits the tendency that many of these asset prices revert to some mean after a period of time and it really tries to capitalize on these extreme changes and price of a security assuming that it'll eventually bounce back to its mean.

* **momentum strategy** is based on the assumption that the price of an instrument has some inertia. So if it's going in one direction it's going to keep going in that direction.

* **statistical arbitrage** is a strategy that identifies pricing discrepancies between securities through modeling techniques. The strategy is market neutral as it involves a long position and a short position of different securities.

* **trend following strategies:** the goal of this strategy is to buy an asset when the price goes up and sell it when the price trend goes down. These trading decisions are based on some technical analysis and market patterns or indicators.

**Trend Following Strategy**

In this case, we use the moving average to evaluate the market and the prices. We calculate it for two time periods, over the period of some number of days.

A trading algorithm needs some sort of signal generator, which tells us to react to the market somehow. In this case, we're going to use a signal in the following manner.

* First, we create some sort of fast moving average, which is a short time period. 

* Next, we create a slow moving average, which reflects a longer term trend. 

* If the fast moving average crosses the slow moving average from below, that's a buy situation. Then, there's nothing to do once we've bought or sold our assets until there is another signal or another change in direction.

* If the fast moving average crosses the slow moving average from above, that's a sell situation and the sold assets are bought back, or we could buy more assets.

### Import required packages

In [None]:
!pip -qq install yfinance --upgrade --no-cache-dir
!pip -qq install pandas_datareader

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override()
import warnings
warnings.filterwarnings("ignore")

Let's take the stock prices of Advanced Auto Parts company and perform trend following strategy.

In [None]:
# Read the required data from Yahoo
d_ = pdr.get_data_yahoo(['AAP'], start = "2010-01-01", end = "2019-08-05")
d_.head()

As we can see, there are a lot of different columns for different prices throughout the day, but we will only focus on the ‘Adj Close’ column. This colum gives us the closing price of company’s stock on the given day.

In [None]:
# Closing price 
d = d_[['Adj Close']]
d.head()

In [None]:
# Visualize series
d["Adj Close"].plot(figsize=(20,6))
plt.ylabel("Stock price")
plt.show()

From the above plot, we can see the variations in the data.

In [None]:
# Create Return column by using log difference
d["Return"] = np.log(d['Adj Close']).diff()
d.head()

In [None]:
# Visualize returns
d["Return"].plot(figsize=(20, 6))
plt.ylabel("Return")
plt.show()

Now, we calculate fast moving average by taking short time period of 12 days.

In [None]:
# Create a simple moving average column with window = 12
short_window = 12
d["SMA12"] = d['Adj Close'].rolling(short_window).mean()
d.head()

Calculate slow moving average by taking large time period of 26 days.

In [None]:
# Create another simple moving average with window = 26
long_window = 26
d["SMA26"] = d['Adj Close'].rolling(long_window).mean()
d.head(30)

Now, let's visualize the crossover of these simple moving averages.

In [None]:
# Visualize data series, sma12 and sma26
d[["Adj Close", "SMA12", "SMA26"]].plot(figsize=(24,10))
plt.ylabel("Stock price")
plt.show()

Now, we create a signal when the short moving average crosses the long moving average. If it crosses from below we mark it as 1 ( signal is generated), else 0 (no signal is generated).

In [None]:
# Create Signal column
d["Signal"] =  [1 if i == True else 0 for i in d["SMA12"] >= d["SMA26"]]
d.head()

Now we take the difference of the signals in order to generate actual trading orders. In other words, in this column we will be able to distinguish between long and short positions, whether we are buying or selling stock. We name this column as `Position`.

In [None]:
# Generate trading orders
d['Position'] = d['Signal'].diff()
d[23:43]

In the above table, we can see that whenever there is a change in `Signal` column, we have different values in `Position` column. Let's see what these different values denote.

In [None]:
# Unique values for trading orders
d['Position'].unique()

In the above Position column, the 1 denotes a buy situation, -1 denotes a sell situation, and 0 denotes neither buy or sell.

Let's visualize this on graph.

In [None]:
# Visualize the closing price, short and long moving averages
d[["Adj Close", "SMA12", "SMA26"]].plot(figsize=(24,10))

# Visualize the buy signals
plt.plot(d.loc[d["Position"] == 1].index, d["SMA12"][d["Position"] == 1], '^', markersize=7, color='m')

# Visualize the sell signals
plt.plot(d.loc[d["Position"] == -1].index, d["SMA12"][d["Position"] == -1], 'v', markersize=7, color='k')
plt.ylabel("Stock price")
plt.show()

In the above plot, the pink triangles represent the buy situation and black signal represents the sell signal.

Note that we pick 12 days, 26 days arbitrarily for short and long moving averages. In order to evaluate our model, we do back-testing.

### Backtesting the Trading Strategy

Backtesting refers to applying a trading system to historical data to verify how a system would have performed during the specified time period. For example, in the simple moving average crossover system, the trader would be able to input (or change) the lengths of the two moving averages used in the system. The trader could backtest to determine which lengths of moving averages would have performed the best on the historical data.

Backtest usually consists of some four essential components:

* A data handler, which is an interface to a set of data,

* A strategy, which generates a signal to go long or go short based on the data,

* A portfolio, which generates orders and manages Profit & Loss, and

* An execution handler, which sends the order to the broker and receives the signals that the stock has been bought or sold.

Besides these four components, there are many more that we can add to our backtester, depending on the complexity.

**Implementation of a Simple Backtester**

Here we have already implemented a strategy above, and we also have access to a data handler, which is the `pandas-datareader` or the `Pandas` library that we use to get the saved data. 

Now, let's see how we can create a portfolio which can generate orders and manages the profit and loss:

* First, we create a variable `initial_capital` to set our initial capital and a new dataFrame `positions`. We copy the index from another dataFrame as we want to consider the time frame for which we have generated the signals.

In [None]:
# Set the initial capital
initial_capital = float(100000.0)

# Create a dataFrame `positions`
positions = pd.DataFrame(index = d.index).fillna(0.0)
positions.head()

* Next, we create a new column `AAP` in the dataFrame. On the days that the signal is 1 and the short moving average crosses the long moving average (for the period greater than the shortest moving average window), we will buy 100 shares. The days on which the signal is 0, the final result will be 0 as a result of the operation 100*signals['signal'].

 Basically, this column shows the number of shares we have on that particular day.

In [None]:
# Buy a 100 shares
positions['AAP'] = 100 * d['Signal']
positions[20:30]

* A new dataFrame `portfolio` is created to store the market value of an open position.

In [None]:
# Initialize the portfolio with value owned   
portfolio = positions.multiply(d['Adj Close'], axis=0)
portfolio[20:30]

* Next, we create a dataFrame that stores the differences in positions (or number of stock). This column represents the number of shares we bought or sold on that particular day.

In [None]:
# Store the difference in shares owned 
pos_diff = positions.diff()
pos_diff[23:43]

* Then the backtesting begins: we create a `holdings` column to the `portfolio` dataFrame, which stores the value of the positions or shares we have bought, multiplied by the `Adj Close` price i.e, their share value for that particular day.

In [None]:
# Add `holdings` to portfolio
portfolio['holdings'] = (positions.multiply(d['Adj Close'], axis=0)).sum(axis=1)
portfolio[20:30]

* The portfolio also contains a `cash` column, which is the capital that we still have left to spend: it is calculated by taking the `initial_capital` and subtracting our holdings (the price that we paid for buying stock).

 **Note** that here we use `pos_diff` which shows the stocks we bought or sold on that particular day.

In [None]:
# Add `cash` to portfolio
portfolio['cash'] = initial_capital - (pos_diff.multiply(d['Adj Close'], axis=0)).sum(axis=1).cumsum()
portfolio[20:30]

* We also add a `total` column to the portfolio dataFrame, which contains the sum of the cash and the holdings that we own, and

In [None]:
# Add `total` to portfolio
portfolio['total'] = portfolio['cash'] + portfolio['holdings']
portfolio[20:30]

* Lastly, we also add a `returns` column to the portfolio to store the returns.

In [None]:
# Add `returns` to portfolio
portfolio['returns'] = portfolio['total'].pct_change()
portfolio[20:30]

From the above table, we can see that at the beginning we have 100000 cash, then we bought shares of some amount and when the values of these shares increase, the total amount we hold also increases.

Now, let's visualize the portfolio value over the years and the results of our backtest:

In [None]:
# Plot the total portfolio
plt.figure(figsize=(24,10))
plt.plot(portfolio['total'], lw=2.)

# Plot a buy signal
plt.plot(portfolio.loc[d['Position'] == 1.0].index, portfolio.total[d['Position'] == 1.0],'^', markersize=10, color='m')

# Plot a sell signal
plt.plot(portfolio.loc[d['Position'] == -1.0].index, portfolio.total[d['Position'] == -1.0], 'v', markersize=10, color='k')
plt.ylabel('Portfolio value')
plt.show()

Note that we normalize both the series before visualizing them as in the beginning we might have initial_capital other than 100000.

In [None]:
# Visualize the standardized original data series and backtest series
plt.figure(figsize=(20, 8))
plt.plot((portfolio['total'] - portfolio['total'].mean()) / portfolio['total'].std(), label = "Backtest")
plt.plot((d['Adj Close'] - d['Adj Close'].mean()) / d['Adj Close'].std(), label = "Data series")
plt.ylabel("Price")
plt.legend()
plt.show()

From the above plot, we can see the performance of trading strategy when taking 12 and 26 as short time period and long time period for computing simple moving averages.