# Steps

- Download/Load SP500 stock prices data
- Calculate different features and indicators on each stock
- Aggregate on monthly level and filter top 150 most liquid stocks
- Calculate monthly returns for different time-horizons
- Download Fama-Fetch Factors and calculate rolling factor betas
- For each month fit a k-means clustering algorithm to group similar assets based
  on their features
- For each month select assets based on the cluster and form a portfolio based on
  efficinet frontier max sharpe ratio optimization
- Visualize portfolio performance and compare with SP500


In [6]:
from statsmodels.regression.rolling import RollingOLS
import pandas_datareader.data as web
import matplotlib.pyplot as plt
import statsmodels.api as sm
import pandas as pd
import numpy as np
import datetime as dt
import yfinance as yf
import pandas_ta
import warnings

warnings.filterwarnings("ignore")

## 1. Download/Load SP500 stock prices data


In [7]:
sp500 = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]

sp500["Symbol"] = sp500["Symbol"].str.replace(".", "-")

symbols_list = sp500["Symbol"].unique().tolist()

end_date = "2023-09-27"
start_date = pd.to_datetime(end_date) - pd.DateOffset(years=8)

df = yf.download(tickers=symbols_list, start=start_date, end=end_date).stack()

df.index.names = ["date", "ticker"]

df.columns = df.columns.str.lower()

[*********************100%%**********************]  503 of 503 completed

3 Failed downloads:
['SOLV', 'GEV', 'VLTO']: Exception("%ticker%: Data doesn't exist for startDate = 1443326400, endDate = 1695787200")


## 2. Calculate different features and indicators on each stock
- **German-Klass volatility**: Measure of volatility based on high, low, and close prices
- **RSI**: Relative Strength Index indicator to measure momentum of a stock
- **Bollinger Bands**: Measure of volatility based on standard deviation of the stock
- **ATR**: Average True Range indicator to measure volatility of a stock
- **MACD**: Moving Average Convergence Divergence indicator to measure momentum of a stock
- **Dollar Volume**: Measure of liquidity of a stock

In [8]:
df["garman_klass_vol"] = (
    ((np.log(df["high"] / df["low"])) ** 2)
    - (2 * np.log(2) - 1) * ((np.log(df["adj close"] / df["open"])) ** 2)
) / 2

df["rsi"] = df.groupby(level=1)["adj close"].transform(
    lambda x: pandas_ta.rsi(close=x, length=20)
)

df["bb_low"] = df.groupby(level=1)["adj close"].transform(
    lambda x: (
        pandas_ta.bbands(close=x, length=20).iloc[:, 0]
        if pandas_ta.bbands(close=x, length=20) is not None
        else None
    )
)

df["bb_mid"] = df.groupby(level=1)["adj close"].transform(
    lambda x: (
        pandas_ta.bbands(close=x, length=20).iloc[:, 1]
        if pandas_ta.bbands(close=x, length=20) is not None
        else None
    )
)

df["bb_high"] = df.groupby(level=1)["adj close"].transform(
    lambda x: (
        pandas_ta.bbands(close=x, length=20).iloc[:, 2]
        if pandas_ta.bbands(close=x, length=20) is not None
        else None
    )
)