# Momentum Trading Strategy on the SNP 500
#### ----Correcting for Survivorship Bias 
                - Buys and hold for 1 month the top 5 performing stock. 
                - Performance based on top 5 highest returns of the past 12 months 
                 
Survivorship Bias:  taking SNP as of today would inplicity assume all the stock added were always in the SNP 500 <br>and all the stocks removed were never in the index. aka winners were always winners and vise versa
                - by including the stocks removed from the SNP 500 until the date they were removed
                  and excluding the added stocks until the date they were added 
     

## Libraries + Setup

##### Libraries and set start Date

In [None]:
import pandas as pd
import yfinance as yf
import numpy as np

In [None]:
#define a start Date
start = '2020-01-01'

##### Pull in a table of all the Stocks currently in the SNP 500 <br> the date added is already in this table
    - And start a stocklist for the data import later


In [None]:
overall = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0] # Pull in the table at idex 0 i.e. the first table from the given link 

overall

In [None]:
#get a list of tall the tickers currently in the SNP500
stock_list = overall['Symbol']
stock_list = stock_list.to_list()
stock_list

In [None]:
#only the ones added after analysis start date
overall = overall[overall['Date added'] >= start]
overall

##### Pull in a table of all the Stocks removed from the SNP500 after the analysis start date <br> and add the stocks to the stok list for import later

In [None]:
#The Date is the Date things happened. Removed Column has the stocks removed that day
removed = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[1][['Date','Removed']]

#make the column called Date the index and then make it recognisable as a Date/Time
removed = removed.set_index(removed['Date']['Date'])  
removed.index = pd.to_datetime(removed.index)       

#filter out anything that happened before the choses analysis start date
removed = removed[removed.index >= start] 

removed = removed['Removed'].dropna() #get rid of the pivoted headers.
removed

In [None]:
#add the removed stock to the stock list 
stock_list.extend(removed['Ticker'].to_list())
stock_list

### The Import

In [None]:
df = yf.download(stock_list, start)['Adj Close']

#if for some reason it did not auto recognises as Date time index
#df.index = pd.DatetimeIndex(df.index)

### Setting up the DF

In [None]:
df

#### Setting up the filtering Functions 
        - Don't show values for stocks before they were added
        - Don't show values for stocks after tehy were removed

In [None]:
#Proof of concept for filtering formulas: example is removed filter
        # A boolean mask that returns true for all the dates <= removal date
df['RHI'].index <= removed[removed['Ticker'] == 'RHI'].index[0]

In [None]:
#find the ticker in the df, look for where the ticker shows up in removed_df and call the index of it (since the index of the removed_df is the date). Then create a boolean mask returning trune where the main_df index <= removed date. And fiter the main_df by this mask 
def removal_filter(ticker):
    df[ticker] = df[ticker][df[ticker].index <= removed[removed['Ticker'] == ticker].index[0]]

#same for added flipped inequality and added Date stored in Date Added column of hte overall_df
def added_filter(ticker):
     df[ticker] = df[ticker][df[ticker].index >= overall[overall['Symbol'] == ticker]['Date added'].values[0]]

#### Filter DF

In [None]:
for removed_ticker in removed['Ticker']:
    removal_filter(removed_ticker)

for added_ticker in overall['Symbol']:
    added_filter(added_ticker)
    
df

### Now the DF is corrected for Survivorship Bias, the Strategy Operation can be coded out

## Strategy Operation

#### Setting up the required DFs

In [None]:
#simple returns. 
retdf = df.pct_change()
retdf

In [None]:
#returns resampled monthly. helpful for finding the returns in the next period and also rolling over to find the 12 returns 
monthly_retdf = (retdf +1).resample('M').prod()
monthly_retdf

In [None]:
#technically not needed as a separeate DF but easier to call a var name than the long formula
anual_return = monthly_retdf.rolling(12).apply(np.prod).dropna()
anual_return

#### Define Fucntion for Top 5 Performers over the last 12months' returns over the following month <br>(assuming equal weight portfolio)

In [None]:
# Proof of concept for the top performers function
#gets the top performers in the first row. 
top = anual_return.iloc[0].nlargest(5)  
top
#found out it stores it as a series where the index are the stock tickers and the 'name' attribute is the date.
#based on this we can use these attibutes to build a get_top performers function

In [None]:
# filter the monthly returms_df by: 
# [date from top_series onwards],[only look at the next row],[only look at the stock tickers from the top_series]
z = monthly_retdf[top.name:][1:2][top.index]
z

In [None]:
#assuming equal weight portfolio, the mean of the columns:
#.mean(axis=1) is the return of said portfolio of assets
z.mean(axis= 1).values[0]

In [None]:
#putting the above process into a function 
def top_performers(date):
   toppers= anual_return.loc[date].nlargest(5)
   relevant_return = monthly_retdf[toppers.name:][1:2][toppers.index]
   return relevant_return.mean(axis =1).values[0]

## Compute Strategy Returns & Metrics 

In [None]:
#run the above function for each line of the annual_df excetpt the last since there is no way to know what the ret=urn of a currently open portfolio weill be

#and assigh all the reeturs to the returns list
returns = []

for date in anual_return.index[:-1]:
    returns.append(top_performers(date))

returns

In [None]:
Strategy_ret = pd.Series(returns, index = anual_return.index[:-1])
Strategy_ret.cumprod().plot()

In [46]:
Strategy_ret.prod(), Strategy_ret.std(), Strategy_ret.prod() / Strategy_ret.std() 

2.3307466805945074

## Benchmarking & Comparison

### Compare to Regular SNP 500

In [None]:
snp = yf.download("^GSPC", start)['Adj Close']
snpret = snp.pct_change()

snpcumret = (snpret+1).cumprod()
snpcumret.plot(), snpcumret

### Compare to Strategy without correcting for survivorship bias

In [None]:
df3 = yf.download(overall['Symbol'].to_list(),start)['Adj Close']

#### Setting up Biased DF .i.e. Uncorrected for Survivorship Bias

In [None]:
df3rets =df3.pct_change()
df3mthlyrets = (df3rets+1).resample('M').prod()
df3anualrets = df3mthlyrets.rolling(12).apply(np.prod).dropna()

In [None]:
def biased_top_performers(date):
   toppers = df3anualrets.loc[date].nlargest(5)
   relevant_return = df3mthlyrets[toppers.name:][1:2][toppers.index]
   return relevant_return.mean(axis =1).values[0]

#### Same Strategy Operation and Analysis. 
Suspect something is wrong here. Returns shoudl be much higher if you don't account for
Survivorship Bias. 

In [None]:
biased_returns = []

for date in df3anualrets.index[:-1]:
    biased_returns.append(biased_top_performers(date))

biased_returns

In [None]:
biased_returns = pd.Series(biased_returns, index= df3anualrets.index[:-1])
biased_returns.cumprod().plot() 

In [None]:
biased_returns.cumprod()