# Note

The Machine Learning methods presented here are only some of the ways of approaching the portfolio optimization problem. There are multiple ways of incoporating ML in the financial models and it is up to you to come up with more interesting and appropriate approaches. We have tried to use multiple ML models in different sections as an example.

The general rule of thumb in the approaches is as follows:
- Use non-linear regression for predicting future values of stocks
- Use appropriate linear regression for financial models (Single Index, CAPM) according to their standard formulas

# Load your data

We present here a basic way of importing stock data which will be used in the subsequent sections.

In [11]:
import yfinance as yf
import datetime
# Define the ticker symbols for the market index and individual assets
market_index_ticker = "^GSPC"  # S&P 500 index
google_ticker = "GOOGL"  # Google
apple_ticker = "AAPL"  # Apple

# Define the start and end dates for the historical data
end_date = datetime.datetime.now()
start_date = end_date - datetime.timedelta(days=5*365)

# Fetch the historical data using yfinance
market_index_data = yf.download(
    market_index_ticker, start=start_date, end=end_date)
google_data = yf.download(google_ticker, start=start_date, end=end_date)
apple_data = yf.download(apple_ticker, start=start_date, end=end_date)

# Print the fetched data
print("Market Index Data:")
print(market_index_data.head())
print("\nGoogle Data:")
print(google_data.head())
print("\nApple Data:")
print(apple_data.head())


[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed
Market Index Data:
                   Open         High          Low        Close    Adj Close  \
Date                                                                          
2018-06-27  2728.449951  2746.090088  2699.379883  2699.629883  2699.629883   
2018-06-28  2698.689941  2724.340088  2691.989990  2716.310059  2716.310059   
2018-06-29  2727.129883  2743.260010  2718.030029  2718.370117  2718.370117   
2018-07-02  2704.949951  2727.260010  2698.949951  2726.709961  2726.709961   
2018-07-03  2733.270020  2736.580078  2711.159912  2713.219971  2713.219971   

                Volume  
Date                    
2018-06-27  3803990000  
2018-06-28  3461100000  
2018-06-29  3586800000  
2018-07-02  3095040000  
2018-07-03  1911460000  

Apple Data:
                 Open       High

# Mean Variance Optimization

- Load the stock prices for your selected stocks
- Divide data into independent variable X (10 consecutive days of stock values), and dependent variable y (stock value at 10th day into future)
- Train non linear regression model to predict stock value 10 day into the future
- Use the 10 future stock values and apply mean-variance optimization using pypfopt module

In [43]:
from pypfopt import plotting
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from pypfopt import expected_returns, risk_models, EfficientFrontier


apple_prices = apple_data['Close'].tolist()
google_prices = google_data['Close'].tolist()

# Dependent variable - 10 consecutive days of stock prices
apple_x = [apple_prices[i:i+10] for i in range(len(apple_prices)-20)]
# Independent variable - stock price 10th day into the future
apple_y = [apple_prices[i+10] for i in range(10,len(apple_prices)-10)]

apple_test = [apple_prices[i:i+10] for i in range(len(apple_prices)-20,len(apple_prices)-10)]

reg = RandomForestRegressor()
reg.fit(apple_x,apple_y)
# Predict stock price for 10 future days
apple = reg.predict(apple_test)

google_x = [google_prices[i:i+10] for i in range(len(google_prices)-20)]
google_y = [google_prices[i+10] for i in range(10,len(google_prices)-10)]

google_test = [google_prices[i:i+10] for i in range(len(google_prices)-20,len(google_prices)-10)]

reg = RandomForestRegressor()
reg.fit(google_x,google_y)
google = reg.predict(google_test)

future_prices = {'apple':apple,'google':google}
future_prices = pd.DataFrame(future_prices)


# Construct covariance matrix of future stock prices
# cov_matrix = risk_models.sample_cov(future_prices)
S = risk_models.CovarianceShrinkage(future_prices).ledoit_wolf()
# plotting.plot_covariance(S, plot_correlation=True)

# Use capm to find expected returns on future prices
mu = expected_returns.capm_return(future_prices)
print(mu)
# Do mean variance optimization using efficient frontier
ef = EfficientFrontier(mu, S)
ef.min_volatility()
weights = ef.clean_weights()
print(weights)
# weights = ef.max_sharpe(risk_free_rate=0.02)
# cleaned_weights = ef.clean_weights()
# print(cleaned_weights)

apple    -0.188546
google   -0.644306
Name: mkt, dtype: float64
OrderedDict([('apple', 0.63984), ('google', 0.36016)])


# Single Index Model

- Load the stock prices for your selected stocks
- Divide data into independent variable X (10 consecutive days of market index values), and dependent variable y (market index value at 10th day into future)
- Train non linear regression model to predict market index value 10 day into the future
- Fit separate linear models each for a stock according to formula of single index model
- User 10 predicted market returns to predict returns of each stock
- Use the future returns of stocks to get stock weights

In [53]:
from pypfopt import plotting
import numpy as np
import pandas as pd
from sklearn.svm import SVR
from pypfopt import expected_returns, risk_models, EfficientFrontier

# Risk free rate to be subtracted from stock returns
risk_free_rate = 0.01

# Returns of stocks and market
apple_returns = ((apple_data['Close'] / apple_data['Close'].shift(1))-1-risk_free_rate).dropna().tolist()
google_returns = ((google_data['Close'] / google_data['Close'].shift(1))-1-risk_free_rate).dropna().tolist()
market_returns = ((market_index_data['Close'] / market_index_data['Close'].shift(1))-1-risk_free_rate).dropna().tolist()


# Future marker returns prediction
market_x = [market_returns[i:i+10] for i in range(len(market_returns)-20)]
market_y = [market_returns[i+10] for i in range(10, len(market_returns)-10)]

market_test = [market_returns[i:i+10]
              for i in range(len(market_returns)-20, len(market_returns)-10)]

reg = RandomForestRegressor()
reg.fit(market_x, market_y)
market_future = reg.predict(market_test)


# Fitting single index model regression on market return and stock return
single_index_reg = LinearRegression()
single_index_reg.fit(np.array(market_returns).reshape(-1, 1),y=apple_returns)
apple_future = single_index_reg.predict(np.array(market_future).reshape(-1, 1))

single_index_reg = LinearRegression()
single_index_reg.fit(np.array(market_returns).reshape(-1, 1),google_returns)
google_future = single_index_reg.predict(
    np.array(market_future).reshape(-1, 1))

future_returns = {'apple':apple,'google':google}
future_returns = pd.DataFrame(future_returns)


# S = risk_models.sample_cov(future_returns)
S = risk_models.CovarianceShrinkage(future_returns).ledoit_wolf()
# plotting.plot_covariance(S, plot_correlation=True)
# You don't have to provide expected returns in this case

ef = EfficientFrontier(future_returns.max(), S)
# ef.min_volatility()
# weights = ef.clean_weights()
# print(weights)
weights = ef.max_sharpe(risk_free_rate=0.01)
cleaned_weights = ef.clean_weights()
print(cleaned_weights)

OrderedDict([('apple', 0.71731), ('google', 0.28269)])


# CAPM

Same as Sinlg Index Model

In [60]:
from pypfopt import expected_returns
from pypfopt import plotting
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import Ridge
from pypfopt import expected_returns, risk_models, EfficientFrontier

risk_free_rate = 0.01

apple_returns = (
    (apple_data['Close'] / apple_data['Close'].shift(1))-1).dropna().tolist()
google_returns = (
    (google_data['Close'] / google_data['Close'].shift(1))-1).dropna().tolist()
market_returns = (
    (market_index_data['Close'] / market_index_data['Close'].shift(1))-1-risk_free_rate).dropna().tolist()

# apple_prices = apple_data['Close'].tolist()
# google_prices = google_data['Close'].tolist()
market_x = [market_returns[i:i+10] for i in range(len(market_returns)-20)]
market_y = [market_returns[i+10] for i in range(10, len(market_returns)-10)]

market_test = [market_returns[i:i+10]
               for i in range(len(market_returns)-20, len(market_returns)-10)]

reg = MLPRegressor(hidden_layer_sizes=(100,100))
reg.fit(market_x, market_y)
market_future = reg.predict(market_test)

# print(market_future)

single_index_reg = Ridge()
single_index_reg.fit(np.array(market_returns).reshape(-1, 1), y=apple_returns)
apple_future = single_index_reg.predict(np.array(market_future).reshape(-1, 1))

single_index_reg = Ridge()
single_index_reg.fit(np.array(market_returns).reshape(-1, 1), google_returns)
google_future = single_index_reg.predict(
    np.array(market_future).reshape(-1, 1))

future_returns = {'apple': apple, 'google': google}
future_returns = pd.DataFrame(future_returns)


S = risk_models.sample_cov(future_returns)
# S = risk_models.CovarianceShrinkage(future_returns).ledoit_wolf()
# plotting.plot_covariance(S, plot_correlation=True)
# You don't have to provide expected returns in this case


ef = EfficientFrontier(None, S)
ef.min_volatility()
weights = ef.clean_weights()
print(weights)
# weights = ef.max_sharpe(risk_free_rate=-0.2)
# cleaned_weights = ef.clean_weights()
# print(cleaned_weights)
# print(mu)


OrderedDict([('apple', 0.87149), ('google', 0.12851)])


# Multifactor model

- Load the market factors affecting your stocks (here we have selected 3 random factors)
- Divide data into independent variable X (10 consecutive days of factor values), and dependent variable y (factor value at 10th day into future)
- Train non linear regression model to predict market index value 10 day into the future
- Fit separate linear models each for a stock according to formula of multi factor model
- User 10 predicted factor returns to predict returns of each stock
- Use the future returns of stocks to get stock weights

In [68]:
from pypfopt import expected_returns
from pypfopt import plotting
import numpy as np
import pandas as pd
from sklearn.neural_network import MLPRegressor
from sklearn.linear_model import Ridge
from pypfopt import expected_returns, risk_models, EfficientFrontier

# Example tickers for S&P 500, Crude Oil, and 20+ Year Treasury Bond ETFs
factor_tickers = ['SPY', 'USO', 'TLT']

# Fetch historical data for the macroeconomic factors
factor_data = yf.download(
    factor_tickers, start=start_date, end=end_date, progress=False)
factor_data = factor_data['Adj Close']

# Calculate the returns for the macroeconomic factors
factor_returns = factor_data.pct_change().dropna()
# print(factor_returns)
spy_returns = factor_returns['SPY']
uso_returns = factor_returns['USO']
tlt_returns = factor_returns['TLT']

# Train ML model to predict future price of macroeconomic factors
spy_x = [spy_returns[i:i+10] for i in range(len(spy_returns)-20)]
spy_y = [spy_returns[i+10] for i in range(10, len(spy_returns)-10)]

spy_test = [spy_returns[i:i+10]
               for i in range(len(spy_returns)-20, len(spy_returns)-10)]
reg = MLPRegressor(hidden_layer_sizes=(100, 100))
reg.fit(spy_x, spy_y)
spy_future = reg.predict(spy_test)

uso_x = [uso_returns[i:i+10] for i in range(len(uso_returns)-20)]
uso_y = [uso_returns[i+10] for i in range(10, len(uso_returns)-10)]

uso_test = [uso_returns[i:i+10]
               for i in range(len(uso_returns)-20, len(uso_returns)-10)]
reg = MLPRegressor(hidden_layer_sizes=(100, 100))
reg.fit(uso_x, uso_y)
uso_future = reg.predict(uso_test)

tlt_x = [tlt_returns[i:i+10] for i in range(len(tlt_returns)-20)]
tlt_y = [tlt_returns[i+10] for i in range(10, len(tlt_returns)-10)]

tlt_test = [tlt_returns[i:i+10]
               for i in range(len(tlt_returns)-20, len(tlt_returns)-10)]

reg = MLPRegressor(hidden_layer_sizes=(100, 100))
reg.fit(tlt_x, tlt_y)
tlt_future = reg.predict(tlt_test)

future_factors = {'SPY': spy_future, 'TLT': tlt_future, 'USO': uso_future}
future_factors = pd.DataFrame(future_factors)
# print(future_factors)

# Fit Multi Factor Linear Regression model using Macroeconomic factors as X and stock returns as y
apple_returns = (
    (apple_data['Close'] / apple_data['Close'].shift(1))-1).dropna().tolist()
google_returns = (
    (google_data['Close'] / google_data['Close'].shift(1))-1).dropna().tolist()

single_index_reg = Ridge()
single_index_reg.fit(factor_returns, apple_returns)
apple_future = single_index_reg.predict(future_factors)

single_index_reg = Ridge()
single_index_reg.fit(factor_returns, google_returns)
google_future = single_index_reg.predict(
    future_factors)

future_returns = {'apple': apple, 'google': google}
# print(list(zip(apple,google)))
future_returns = pd.DataFrame(future_returns)


S = risk_models.sample_cov(future_returns)
# S = risk_models.CovarianceShrinkage(future_returns).ledoit_wolf()
# plotting.plot_covariance(S, plot_correlation=True)
# You don't have to provide expected returns in this case

ef = EfficientFrontier(future_returns.mean(), S)
# ef.min_volatility()
# weights = ef.clean_weights()
# print(weights)
weights = ef.max_sharpe(risk_free_rate=0.0)
cleaned_weights = ef.clean_weights()
print(cleaned_weights)


OrderedDict([('apple', 0.93245), ('google', 0.06755)])


# Equity valuation model - Dividend Model

- Load dividend values of your stock
- Identify the period of change and only keep unique values in a period
- Train non linear regression model to predict next dividend value
- Calculate average dividend growth rate of your stock
- Assume a discount rate for the stock
- Use ML model to predict next dividend value
- Use the predicted dividend value, discount rate and average dividend growth rate to find stock price

In [39]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor

# get dividends of the stock
apple_stock = yf.Ticker(apple_ticker)
apple_dividends = apple_stock.dividends.loc[start_date:end_date]

# google_stock = yf.Ticker(google_ticker)
# google_dividends = google_stock.dividends.loc[start_date:end_date]
# print(apple_dividends)
# Combine the dividend data into a single DataFrame
dividends = pd.DataFrame({'AAPL':apple_dividends})
# print(dividends)
# Calculate the dividend growth rates
dividend_growth_rates = dividends.pct_change().dropna()
# Since we only have dividend changes every 4 months, we remove the 0 values and get the average dividend growth rate which we consider as perpetual growth rate
avg_dividends_growth_rate = dividend_growth_rates.loc[~(
    dividend_growth_rates == 0).all(axis=1)].values.mean()
print(avg_dividends_growth_rate)

# Define the features and target variable for the machine learning model
X = np.unique(dividends.values)[:-1]  # Independent variables (dividend value for the past period)
y = np.unique(dividends.values)[1:]  # Dependent variable (dividend value for the next period)

# print(y)
# Train a machine learning model using regression to predict the next dividend value
model = RandomForestRegressor()
model.fit(X.reshape(-1,1), y)

# Use the trained model to predict the dividend value for the next period
predicted_dividend = model.predict(dividends.values[-1].reshape(1, -1))
next_period_dividend = pd.Series(predicted_dividend.squeeze(), index=dividends.columns)

print(next_period_dividend)
# Apply the DDM formula to calculate the intrinsic value of the stocks
discount_rate = 0.08  # Example discount rate
apple_stock_price = next_period_dividend['AAPL'] / \
    (discount_rate - avg_dividends_growth_rate)
# google_stock_price = google_dividends[-1] / (discount_rate - next_period_growth_rates['GOOGL'])

print("Intrinsic Value of Apple Stock:", apple_stock_price)
# print("Intrinsic Value of Google Stock:", google_stock_price)


0.05636662470288756
AAPL    0.2366
dtype: float64
Intrinsic Value of Apple Stock: 10.011265721697733


  apple_dividends = apple_stock.dividends.loc[start_date:end_date]


# Black litterman model

- Use non linear ML model to predict stock prices in future (same as we did in mean-variance section)
- Use the predicted future return of stocks as you "views" for black litterman model
- Assign confidence using accuracy of your ML model or arbitrarily based on your market understanding
- Fit the Black litterman model and calculate corresponding cov matrix and returns
- Find the optimal weights using bl cov matrix and returns

In [69]:
from pypfopt import BlackLittermanModel, plotting
from pypfopt import black_litterman, risk_models

# Fit non linear ML models to predict stock returns into future
apple_returns = (
    (apple_data['Close'] / apple_data['Close'].shift(1))-1).dropna().tolist()
google_returns = (
    (google_data['Close'] / google_data['Close'].shift(1))-1).dropna().tolist()

apple_x = [apple_returns[i:i+10] for i in range(len(apple_returns)-20)]
apple_y = [apple_returns[i+10] for i in range(10,len(apple_returns)-10)]

apple_test = [apple_returns[i:i+10] for i in range(len(apple_returns)-20,len(apple_returns)-10)]

reg = RandomForestRegressor()
reg.fit(apple_x,apple_y)
apple = reg.predict(apple_test)

google_x = [google_returns[i:i+10] for i in range(len(google_returns)-20)]
google_y = [google_returns[i+10] for i in range(10,len(google_returns)-10)]

google_test = [google_returns[i:i+10] for i in range(len(google_returns)-20,len(google_returns)-10)]

reg = RandomForestRegressor()
reg.fit(google_x,google_y)
google = reg.predict(google_test)

future_returns = {'apple':apple,'google':google}
print(apple[-1],google[-1])

# Use predicted future returns as views for BL model
viewdict = {'AAPL': apple[-1], 'GOOGL': google[-1]}
tickers = ['AAPL','GOOGL']
mcaps = {}
for t in tickers:
    stock = yf.Ticker(t)
    mcaps[t] = stock.info["marketCap"]
print(mcaps)

# prices = pd.DataFrame({'AAPL':apple_data['Close'].values,'GOOGL':google_data['Close'].values})
# S = risk_models.CovarianceShrinkage(prices).ledoit_wolf()
market_prices = yf.download("SPY", period="max")["Adj Close"]


delta = black_litterman.market_implied_risk_aversion(market_prices)
print(delta)
market_prior = black_litterman.market_implied_prior_returns(mcaps, delta, S)
print(market_prior)

# Assign confidence measure for stock returns based on some heuristics
confidences = [
    0.6,
    0.4
]

# Fit BL model
bl = BlackLittermanModel(S, pi=market_prior, absolute_views=viewdict, omega="idzorek", view_confidences=confidences)

# Get expected returns
ret_bl = bl.bl_returns()
print(ret_bl)

# Get cov matrix
S_bl = bl.bl_cov()


from pypfopt import EfficientFrontier, objective_functions

ef = EfficientFrontier(ret_bl, S_bl)
ef.add_objective(objective_functions.L2_reg)
ef.max_sharpe()
weights = ef.clean_weights()
print(weights)


0.003311451836866234 -0.0005286240617439164
{'AAPL': 2936233787392, 'GOOGL': 1557339176960}
[*********************100%***********************]  1 of 1 completed
2.57553131544276
AAPL     0.271605
GOOGL    0.233079
dtype: float64
AAPL     0.094193
GOOGL    0.086254
dtype: float64
OrderedDict([('AAPL', 0.53022), ('GOOGL', 0.46978)])




# Algorithmic trading - Using Bollinger Bands (You can use any alternative approach like EMA etc)

- Install talib to calculate Bollinger Bands
- Calculate Bollinger bands for your stock using historical data
- Divide data into independent variable X (stock prices) and dependent variable y (sell -1 if stock price > upper limit of bollinger band, buy 1 otherwise)
- Train a classifier to predict the buy or sell based on stock price
- Use the trained classifier to trade the stock in real time
- The algorithmic trading is more useful in intra day or high frequency trading scenario and is not generally done for long term portfolio holdings
- However, you can buy or sell the complete stock based on this everyday to maximize your capital

In [None]:
import numpy as np
import pandas as pd
import talib
from sklearn.ensemble import RandomForestClassifier


apple_prices = apple_data['Close']

# Calculate Bollinger Bands for Apple
apple_bb_upper, apple_bb_middle, apple_bb_lower = talib.BBANDS(
    apple_prices, timeperiod=20)


# Calculate the Bollinger Bands percentages for Apple
apple_bb_percentage = (apple_prices - apple_bb_lower) / \
    (apple_bb_upper - apple_bb_lower)

# # Combine the Bollinger Bands percentages into a single DataFrame
# bb_percentages = pd.concat(
#     [apple_bb_percentage, google_bb_percentage], axis=1).dropna()

# Define the features and target variable for the machine learning model
# Independent variables (Bollinger Bands percentages for the past period)
X = apple_bb_percentage.values[:-1]
# Target variable (-1 for sell, 1 for buy)
y = np.where(apple_prices.values[1:] > apple_bb_upper[:-1], -1, 1)

# Train a machine learning model using random forest classifier
model = RandomForestClassifier(n_estimators=100)
model.fit(X, y)

# Use the trained model to predict the trading signals for the testing set
# Ideally this will be your real time stock prices
y_pred = model.predict(X_test)

# Perform algorithmic trading based on the predicted signals (example logic)
capital = 100000  # Initial capital in USD
position = 0  # Current position (0 for neutral, 1 for long, -1 for short)

for i in range(len(y_pred)):
    if y_pred[i] == 1 and position != 1:  # Buy signal
        position = 1
        # Place a buy order based on your trading platform's API or logic

    elif y_pred[i] == -1 and position != -1:  # Sell signal
        position = -1
        # Place a sell order based on your trading platform's API or logic

    elif y_pred[i] == 0 and position != 0:  # Exit position
        position = 0
        # Close the existing position based on your trading platform's API or logic

# Calculate the final capital after the trading period
final_capital = capital  # Assume no transaction costs or slippage
# Calculate the final capital based on your trading platform's API or logic

print("Final Capital:", final_capital)
