<h3>Unsupervised Learning Trading Strategy</h3>

- Download/Load SP500 stocks prices data.
- Calculate different features and indicators on each stock.
- Aggregate on monthly level and filter top 150 most liquid stocks.
- Calculate Monthly Returns for different time-horizons.
- Download Fama-French Factors and Calculate Rolling Factor Betas.
- For each month fit a K-Means Clustering Algorithm to group similar assets based on their features.
- For each month select assets based on the cluster and form a portfolio based on Efficient Frontier max sharpe ratio optimization.
- Visualize Portfolio returns and compare to SP500 returns. 

<h3>All Packages</h3>
- pandas
- numpy
- matplotlib
- statsmodels
- pandas_datareader
- datetime
- yfinance
- sklearn
- PyPortfolioOpt

<h3>Download / Load SP500 stocks prices data.</h3>

In [22]:
from statsmodels.regression.rolling import RollingOLS
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
import pandas_ta
import warnings
warnings.filterwarnings('ignore')

sp500 = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')[0]

sp500['Symbol'] = sp500['Symbol'].str.replace('.', '-')

#Not Survivorship Bias free
symbols_list = sp500['Symbol'].unique().tolist()

# end_date = '2023-09-27'

# start_date = pd.to_datetime(end_date)-pd.DateOffset(365*8)

# df = yf.download(tickers=symbols_list,
#                start=start_date,
#                end=end_date).stack()

# df.index.names = ['date', 'ticker']
# df

# df.columns = df.columns.str.lower()
# df



Unnamed: 0_level_0,Price,adj close,close,high,low,open,volume
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-09-29,A,31.425238,33.740002,34.060001,33.240002,33.360001,2252400.0
2015-09-29,AAL,37.361622,39.180000,39.770000,38.790001,39.049999,7478800.0
2015-09-29,AAPL,24.651133,27.264999,28.377501,26.965000,28.207500,293461600.0
2015-09-29,ABBV,36.004166,52.790001,54.189999,51.880001,53.099998,12842800.0
2015-09-29,ABT,33.302025,39.500000,40.150002,39.029999,39.259998,12287500.0
...,...,...,...,...,...,...,...
2023-09-26,XYL,88.736298,89.519997,90.849998,89.500000,90.379997,1322400.0
2023-09-26,YUM,122.211006,124.010002,124.739998,123.449997,124.239998,1500600.0
2023-09-26,ZBH,111.534821,112.459999,117.110001,112.419998,116.769997,3610500.0
2023-09-26,ZBRA,223.960007,223.960007,226.649994,222.580002,225.970001,355400.0


<h3>2. Calculate features and technical indicators for each stock. </h3>
<ul>
    <li>Garman-Klass Volatility</li>
    <li>RSI</li>
    <li>Bollinger Bands</li>
    <li>ATR</li>
    <li>MACD</li>
    <li>Dollar Volume</li>
</ul>

# Garman-Klass Volatility Calculation

The Garman-Klass Volatility formula is given by:

$$
\sigma_{GK}^2 = \frac{1}{2} \ln^2 \left(\frac{High}{Low}\right) - \left(2 \ln(2) - 1\right) \ln^2 \left(\frac{Close}{Open}\right)
$$

This formula takes into account the high, low, close, and open prices of a stock.


In [24]:
df['garman_klass_vol'] = ((np.log(df['high']) - np.log(df['low']))**2)/2-(2*np.log(2)-1)*((np.log(df['adj close'])-np.log(df['open']))**2)

df['rsi'] = df.groupby(level=1)['adj close'].transform(lambda x: pandas_ta.rsi(close=x, length=20))
df

Unnamed: 0_level_0,Price,adj close,close,high,low,open,volume,garman_klass_vol,rsi
date,ticker,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2015-09-29,A,31.425238,33.740002,34.060001,33.240002,33.360001,2252400.0,-0.001082,
2015-09-29,AAL,37.361622,39.180000,39.770000,38.790001,39.049999,7478800.0,-0.000443,
2015-09-29,AAPL,24.651133,27.264999,28.377501,26.965000,28.207500,293461600.0,-0.005712,
2015-09-29,ABBV,36.004166,52.790001,54.189999,51.880001,53.099998,12842800.0,-0.057368,
2015-09-29,ABT,33.302025,39.500000,40.150002,39.029999,39.259998,12287500.0,-0.010064,
...,...,...,...,...,...,...,...,...,...
2023-09-26,XYL,88.736298,89.519997,90.849998,89.500000,90.379997,1322400.0,-0.000018,26.146750
2023-09-26,YUM,122.211006,124.010002,124.739998,123.449997,124.239998,1500600.0,-0.000051,36.057162
2023-09-26,ZBH,111.534821,112.459999,117.110001,112.419998,116.769997,3610500.0,0.000022,31.893252
2023-09-26,ZBRA,223.960007,223.960007,226.649994,222.580002,225.970001,355400.0,0.000133,29.494977
