In [53]:
from IPython.display import display, Math, Latex

import pandas as pd
import numpy as np
import numpy_financial as npf
import yfinance as yf
import matplotlib.pyplot as plt
from datetime import datetime, date

import random

## CFM 101: Group Assignment - Python Roboadvisor
### Team Number: 15
### Team Member Names: Landon Trinh, Ethan Zemelman, Jessie Deng
### Team Strategy Chosen: SAFE

## Goal
The focus of this project is to dynamically generate the safest possible portfolio given a file of unknown stock tickers. With a given list of stocks, the robo advisor will generate a portfolio valued at $750,000, which is composed of a minimum of 10 and maximum of 22 stocks. This portfolio will then be run from November 25, 2023 to December 4, 2023. Ultimately, the goal is to have the portfolio deviate as little as possible from the starting value. 

In this project, our goal will be to tell a convincing story to WHY we are picking our stocks. We will calculate and discuss statistics, display and intepret graphs, and explain our thought process. 

## Process
Our robo advisor is designed to process the list of stock and select desirable candidates through the following steps: 

1. Setup

Before implementing strategies, we will define some variables and constants that that will be used in later parts. 

2. Filter Stock Data

Before moving on to selecting candidates for a portfolio, we filter the stock data to eliminate stocks that do not satisfy our requirements or may have been delisted. 

3. Select Candidates of Stocks

In this step, we will calculate certain measures, including but not limited to standard deviation of expected return, beta value. We will then select candidates that possess these desirable traits. To ensure a low and stable expected return, some example of these desired traits include low beta value, low standard deviation. 

4. Select Optimal Weightings

Since stocks with a high weighting can have a more prominent effect the overall performance of a portfolio, it is crucial for stocks in the portfolio to be weighted. We will aim to achieve optimal performance by weighting the stocks in a portfolio in a way that allows for lowest deviation in portfolio value. 

5. Choose the Final Portfolio

To ensure that the best possible bundle of stocks are chosen, the robo advisor will create multiple portfolios based on different metrics, such as lowest standard deviation, lowest expected return. After obtaining the stocks and weighting of each portfolio, we optimize for the lowest standard deviation and lowest expected return to choose a final portfolio. 

## Introduction
> "Theory will only take you so far." - J. Robert Oppenheimer

Yet, throughout the many years of educations, students are taught about theory --- whether it be game theory, psychoanalytic theory. These theories are often abstract and based on various assumptions. For example, in finance, we are taught to assume that people behave rationally and with the goal to increase financial position. That is, people will not make impulse purchases; they will buy and sell stocks with the goal of earning money. 

However, in real life, there are much more complications to a situation than we assume. Therefore, more people are resorting to examining hard, cold data that does not lie. Proposed in 1952 by Harry Markowitz, the Modern Portfolio Theory is an example of such. It is a mathematical framework used to build portfolios that maximizes returns while keeping the risk below a certain level. Under this method, 

This project utilizes the Modern Portfolio Theory as an underlying framework to build a robo advisor. The robo advisor will then perform analysis and make selections to come up with a final portfolio that meets the criteria of a "safe" portfolio. A "safe" portfolio is one that deviates as little as possible from the starting value. Moreover, it is our focus to keep the difference between the initial and final value as little as possible over a short period of time. 

Diversification is a risk management strategy that focuses on mixing a wide variety of investments within a portfolio in attempt to reduce portfolio risk. Hence, we plan to create 3 sets of 22 stocks in order to maximize diversification and reduce portfolio risk, but may adjust the number of stocks based on the list of stock given as well as the statistics of these stocks. For example, the robo advisor may choose to have less than 22 stocks in a portfolio in case the vast majority of stocks have an extremely high beta values. 

## 1. Setup
Before implementing our trading strategy, we will initialize required and useful constants as part of the rules:
- Currency of valid stocks (USD or CAD)
- Required average monthly volume (150,000 shares)
- The number of stocks we wish to purchase on the start date (10-22 stocks)
- Time interval (Janurary 1, 2023 - October 31, 2023)
- Minimum number of trading days for month (18 days)
- Minimum stock weighting: $\frac{100}{2n}$%, $n$ = number of stocks in portfolio
- Maximum stock weighting: 20%
- Initial investment amount: 750,000 CAD
- Buying date of roboadvisor: November 25, 2023 - December 4, 2023
- Trading fee for each stock trade: $4.95 CAD

In the end, our roboadvisor should create two DataFrames:

1. ${\verb|Portfolio_Final|\\}$
- Index: Starts at 1 and ends at number of stocks in portfolio
- Headings: Ticker, Price (price of stock on Nov 25), Currency (CAD or USD), Shares, Value, Weight (adds up to 100%)

2. ${\verb|Stocks_Final|\\}$

We should output this DataFrame to a CSV file titled "Stocks_Group_15.csv"
- Index: Same as "Portfolio Final"
- Headings: Tickers and Shares from "Portfolio_Final"

In [54]:
# Investment amount (CAD)
capital = 750000

# Number of stocks to buy for portfolio
num_stocks = 22

min_stocks = 12
max_stocks = 22

# Maximum and minimum weightings of each stock in portfolio
min_weight = 1 / (2 * max_stocks)
max_weight = 0.20

# Start and end date for roboadvisor
# start_date = "2023-11-25"
# end_date = "2023-12-04"

# Filtering requirements
min_trading_days = 18
min_avg_volume = 150000

## 2. Filtering
After reading in the CSV file containg stock tickers, we must filter the list of stocks to make sure they are valid stock tickers according to the following rules:

- Include stocks that have an average monthly volume of at leaest 150,000 shares based on Jan 1, 2023 - Oct 31, 2023 (drop any months that don't have at least 18 trading days)
- Stock denominated in USD or CAD

To accomplish this, we first read the csv file containing all the tickers and extract these tickers to be put into a pandas data frame. 

We then set the parameters for the filtering, which includes the start and end dates as well as filter interval. 

We use a function called get_short_months to recursively check for months that have less than 18 trading days and drop these months. We then recursively use the function called filter_volume to to determine if the stock meets the trading volume requirement. The short months with less than 18 trading days that we obtain from the get_short_months function are dropped before we perform the calculation to determine the average monthly volume. 

We then retrieve the filtered tickers, which are ones that satisifies the above mentioned requirements and therefore, can be used for our portfolio. 

In [55]:
# Read in CSV ticker file
tickers = pd.read_csv("tickers_example.csv", header=None)
tickers = tickers.rename(columns={0: "ticker"})
tickers_lst = tickers["ticker"].tolist()
tickers.head()

Unnamed: 0,ticker
0,AAPL
1,ABBV
2,ABT
3,ACN
4,AGN


In [56]:
# Set parameters for filtering tickers
filter_start_date = "2023-01-01"
filter_end_date = "2023-10-31"
filter_interval = "1mo"

In [57]:
# Determines months with less than 18 trading days
def get_short_months(market_index):
    short_months = []
    for month in range(1, 11):
        trading_days = len(market_index.history(start=str(date(2023, month, 1)), end=str(date(2023, month+1, 1))))
        if trading_days < min_trading_days:
            short_months.append(month)
    return short_months

# Keeps stocks with valid average monthly volume
def filter_volume(tickers, short_months):

    # Retrieve monthly volume data for tickers
    volume_data = yf.download(tickers=tickers, interval="1mo", start=filter_start_date, end=filter_end_date).Volume

    # Drop short months from volume DataFrame
    for short_month in short_months:
        volume_data.drop(str(date(2023, short_month, 1)))

    # Determine whether stocks meets average monthly volume requirement
    for ticker in tickers:
        if (volume_data[ticker]).mean() < min_avg_volume:
            print(f"{ticker} does not meet the required minimum average monthly volume")
            tickers.remove(ticker)

    # Return finalized list of tickers
    return tickers


# Retrieve filtered tickers
def filter_tickers(tickers):
    
    # Initialize list to separately store CAD and USD tickers
    cad_tickers = []
    usd_tickers = []
    
    for ticker in tickers:
        try:
            stock_ticker = yf.Ticker(ticker)
            base_currency = stock_ticker.fast_info["currency"]
            
            # Store ticker in appropriate list
            if base_currency == "CAD":
                cad_tickers.append(ticker)
            
            elif base_currency == "USD":
                usd_tickers.append(ticker)
    
        except:
            print(f"{ticker} may be delisted")

    # Determine months that have less than 18 trading days for CAD and USD stocks
    cad_short_months = get_short_months(yf.Ticker("^GSPTSE"))
    usd_short_months = get_short_months(yf.Ticker("^GSPC"))

    # Filter months that have an average monty volume of less than 150k
    filtered_cad_tickers = filter_volume(cad_tickers, cad_short_months)
    filtered_usd_tickers = filter_volume(usd_tickers, usd_short_months)

    # Return valid tickers
    return filtered_cad_tickers, filtered_usd_tickers

cad_tickers, usd_tickers = filter_tickers(tickers_lst)
filtered_tickers = cad_tickers + usd_tickers
n_stocks = len(filtered_tickers)

AGN may be delisted
CELG may be delisted
MON may be delisted
RTN may be delisted
[*********************100%%**********************]  4 of 4 completed
[*********************100%%**********************]  32 of 32 completed


In [58]:
start_date = "2021-01-01"
end_date = "2023-01-01"

# Get exchange rate data for CAD-USD
def get_exchange_data(exchange_ticker):
    exchange_data = yf.download(exchange_ticker, start=start_date, end=end_date, interval="1d").Close
    exchange_hist = pd.DataFrame(exchange_data)
    exchange_hist.rename(columns = {"Close": "CAD-USD"}, inplace=True)
    return exchange_hist

exchange_ticker = "CADUSD=X"
exchange_hist = get_exchange_data(exchange_ticker)

[*********************100%%**********************]  1 of 1 completed


In [59]:
# Get CAD stock data
cad_stock_data = yf.download(tickers=cad_tickers, interval="1d", start=start_date, end=end_date).Close

# Get USD stock data
usd_stock_data = yf.download(tickers=usd_tickers, interval="1d", start=start_date, end=end_date).Close

# Align USD stock data and exchange data
aligned_usd_data = pd.merge(usd_stock_data, exchange_hist, left_index=True, right_index=True, how='outer').dropna()

# Convert all USD stock data to CAD
usd_converted_data = pd.DataFrame()
for usd_ticker in usd_tickers:
    close_data = aligned_usd_data[usd_ticker]
    usd_converted_data[usd_ticker] = close_data / aligned_usd_data["CAD-USD"]

# Combine into final DataFrame
stock_data = pd.concat([cad_stock_data, usd_converted_data], axis=1).dropna()
display(stock_data)


[*********************100%%**********************]  4 of 4 completed
[*********************100%%**********************]  32 of 32 completed


Unnamed: 0_level_0,RY.TO,SHOP.TO,T.TO,TD.TO,AAPL,ABBV,ABT,ACN,AIG,AMZN,...,PFE,PG,PM,PYPL,QCOM,TXN,UNH,UNP,UPS,USB
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-01-04,104.680000,139.619003,25.440001,71.940002,164.636701,134.103661,138.810834,326.270967,47.237159,202.703125,...,46.830052,175.335992,103.685115,295.050942,188.923186,206.377908,444.637397,258.271346,208.388003,58.674324
2021-01-05,105.349998,141.255997,25.709999,72.309998,167.443873,136.117649,141.178924,329.647567,48.248275,205.678889,...,47.532537,177.272465,105.302658,300.238474,194.820772,208.854324,440.688862,262.649548,206.438716,59.137687
2021-01-06,106.879997,137.604996,26.299999,74.360001,160.461696,133.819481,139.713218,330.480106,50.964966,198.890127,...,46.731616,177.648595,104.566272,287.500214,191.628787,208.143919,455.363920,265.915202,204.113354,62.118701
2021-01-07,108.070000,147.089996,26.309999,74.059998,165.815417,135.152484,140.965908,333.353348,51.788821,200.250111,...,46.937975,175.859089,105.097495,297.687557,197.200277,212.702734,462.325101,269.418394,201.379863,63.314338
2021-01-08,107.980003,151.671997,26.549999,74.110001,167.489594,136.059128,141.563902,335.055288,51.521598,201.843658,...,47.094954,176.038464,105.389697,307.531444,198.679056,217.095940,460.916658,277.546940,200.898721,62.556505
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2022-12-22,127.730003,47.259998,26.680000,87.820000,180.000714,222.254544,147.057986,360.409834,84.943244,114.060807,...,70.295977,207.171671,137.297684,93.342277,150.665354,224.201150,717.511792,284.042584,239.052609,58.330413
2022-12-23,128.279999,45.830002,26.740000,88.330002,179.814846,222.416217,147.522903,362.861607,86.130029,116.253720,...,70.679537,208.124836,138.672623,94.134829,151.150287,224.161726,724.536820,286.250075,241.384990,58.910977
2022-12-28,127.709999,44.400002,26.139999,87.559998,170.469100,219.416068,145.840077,355.869791,85.058724,110.661549,...,68.706999,205.525908,136.521353,91.361379,144.676926,218.009477,714.728638,279.791667,234.807525,58.630873
2022-12-29,128.660004,47.560001,26.410000,88.540001,176.261831,221.071852,150.014984,364.980719,86.532987,114.479754,...,69.805726,207.513248,137.612338,95.957367,149.375811,224.417314,720.605043,284.526660,239.009465,59.252588


In [60]:
# Get daily returns of stocks
daily_returns = stock_data.pct_change().dropna()
daily_returns.index = daily_returns.index.tz_localize(None, ambiguous="infer").tz_localize("UTC")
daily_returns.head()

Unnamed: 0_level_0,RY.TO,SHOP.TO,T.TO,TD.TO,AAPL,ABBV,ABT,ACN,AIG,AMZN,...,PFE,PG,PM,PYPL,QCOM,TXN,UNH,UNP,UPS,USB
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-01-05 00:00:00+00:00,0.0064,0.011725,0.010613,0.005143,0.017051,0.015018,0.01706,0.010349,0.021405,0.01468,...,0.015001,0.011044,0.015601,0.017582,0.031217,0.011999,-0.00888,0.016952,-0.009354,0.007897
2021-01-06 00:00:00+00:00,0.014523,-0.025847,0.022948,0.02835,-0.041699,-0.016884,-0.010382,0.002526,0.056306,-0.033007,...,-0.01685,0.002122,-0.006993,-0.042427,-0.016384,-0.003401,0.0333,0.012434,-0.011264,0.050408
2021-01-07 00:00:00+00:00,0.011134,0.068929,0.00038,-0.004034,0.033364,0.009961,0.008966,0.008694,0.016165,0.006838,...,0.004416,-0.010073,0.00508,0.035434,0.029074,0.021902,0.015287,0.013174,-0.013392,0.019248
2021-01-08 00:00:00+00:00,-0.000833,0.031151,0.009122,0.000675,0.010097,0.006708,0.004242,0.005106,-0.00516,0.007958,...,0.003344,0.00102,0.00278,0.033068,0.007499,0.020654,-0.003046,0.030171,-0.002389,-0.011969
2021-01-11 00:00:00+00:00,-0.001297,-0.009435,0.002637,0.001214,-0.01963,0.02008,-0.003219,-0.010771,-0.004696,-0.017894,...,0.021006,-0.003092,-0.005717,-0.016827,0.000309,0.004585,-0.003586,-0.003037,0.025695,0.004112


## 3. Stock Analysis

As widely known in finance, diversification plays an important role in reducing the risk of investment portfolios as it minimizes the effect that changes in a single stock can have on the entire portfolio. Since our trading strategy is to go safe, we could take advantage of this information by generating portfolios to contain as many stocks as possible. 

Since we want to diversify our portfolio as much as possible to reduce overall risk, we will choose the maximum number of stocks allowed (22 stocks). We will also place a large focus on LOW-RISK assets to reduce volatility within our portfolio.

For this section, our aim is to generate multiple portfolios each containing 22 stocks unless the statistics support it to be done in another way. For instance, in case the list of stocks contains a few stable stocks and some highly volatile ones, it may be more advantageous to take only the ones that are stable. 

Based from our CFM 101 class, we will focus on the following main statistical analyses:
For this section, our aim is to generate 3 lists of 22 stocks that can be potential candidates for our final portfolio.

#### 3.1 - Standard Deviation
- Take 22 stocks with the LOWEST standard deviations

#### 3.2 - Beta
- Take 22 stocks with the LOWEST beta values (stock with beta less than 1 is considered less volatile than the market)

#### 3.3 - Expected Returns
- Take 22 stocks with LOWEST expected returns

#### 3.4 - Correlation
- Get pairs of stocks that have LOWEST correlation

## 3.1 - Standard Deviation

Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean with its formula given below, where $x_{i}$ is a single data point, $\overline{X}$ is the mean of the data points and $N$ is the number of data points in a given dataset: 

$$\sigma_X=\sqrt{\frac{\sum(x_i-\overline{X})^2}{N}}
$$

In finance, standard deviation is often used as a measure of the risk of investing in an asset. For example, a volatile stock will have a high standard deviation while a stable blue-chip stock will have a lower standard deviation. 

To create a safe portfolio, we want to select stocks that are less volatile, indicated by a low standard deviation. Hence, we will take a look at stocks that have the lowest standard deviations.

The get_std_returns function below calculates the standard deviation of the daily historical returns of each stocks, and sorts it from lowest to highest. 

We use this measure to create one of our portfolios by extracting 22 stocks with the lowest standard deviation of expected return. Since low standard deviation indicates a smaller spread relative to the mean, which is the expected return in our case, we know that the stock is more predictable. Therefore, a portfolio composed of these stocks will be more likely to succeed since the future movements of the stocks are more likely to be able to be predicted by our statistics. 

In [61]:
# Calculate standard deviation of returns for each stock
def get_std_returns(stocks):

    std_returns = {} 

    for stock in stocks:
        stock_std = daily_returns[stock].std()
        std_returns[stock] = stock_std

    stock_std_returns = pd.DataFrame(std_returns.items(), columns=["ticker", "std"])
    return stock_std_returns

stock_std_returns = get_std_returns(daily_returns)
sorted_stock_std_returns = stock_std_returns.sort_values(by="std").reset_index(drop=True)
display(sorted_stock_std_returns)

Unnamed: 0,ticker,std
0,RY.TO,0.009146
1,T.TO,0.009235
2,TD.TO,0.010839
3,KO,0.012305
4,PEP,0.012432
5,CL,0.012596
6,PG,0.013055
7,BMY,0.013316
8,PM,0.014393
9,ABBV,0.014618


In [None]:
# Get stocks with lowest standard deviations
lowest_std_stocks = sorted_stock_std_returns.nsmallest(max_stocks, columns="std")
lowest_std_stocks_lst = list(lowest_std_stocks["ticker"])
print(lowest_std_stocks_lst)

## 3.2 - Expected Returns

We will calculate the expected return of each portfolio. The expected return is calculated with:
\begin{align*}
E(X)=\overline{X}=\frac{\sum x_i}{N}
\end{align*}

where $x_i$ are individual returns of some security $X$, 
$N$ is the total number of observations (time periods for us)

In this case, we are simply assuming equal weights for each stock.

In [None]:
# Calculate expected returns for each stock
def get_expected_returns(stocks):

    expected_returns = {} 

    for stock in stocks:
        stock_expected_return = daily_returns[stock].mean()
        expected_returns[stock] = stock_expected_return

    stock_expected_returns = pd.DataFrame(expected_returns.items(), columns=["ticker", "expected_return"])
    return stock_expected_returns

stock_expected_returns = get_expected_returns(filtered_tickers)
sorted_stock_expected_returns = stock_expected_returns.sort_values(by="expected_return").reset_index(drop=True)
sorted_stock_expected_returns

In [None]:
def get_minimal_returns(expected_returns):
    # Number of possible combinations
    combinations = (expected_returns.shape[0] - max_stocks) + 1

    abs_expected_returns = []
    stock_sets = []
    stds = []

    for _ in range(combinations):
        # Get set of 22 stocks
        expected_returns_set = expected_returns.head(max_stocks)

        # Get stocks and the average expected return
        stock_set = list(expected_returns_set["ticker"])
        expected_return = expected_returns_set["expected_return"].mean()
        port_std = expected_returns_set["expected_return"].std()

        # Add to list
        stock_sets.append(stock_set)
        abs_expected_returns.append(abs(expected_return))
        stds.append(port_std)

        # Drop first row
        expected_returns = expected_returns.drop(expected_returns.index[0])

    plt.scatter(stds, abs_expected_returns)
    plt.title("Risk vs. Return of Stock Portfolio")
    plt.xlabel("Standard Deviation of Set of Stocks")
    plt.ylabel("Expected Return of Set of Stocks")
    plt.show()


    # Get index that produces least absolute return
    minimal_returns_index = abs_expected_returns.index(min(abs_expected_returns))
    print(abs_expected_returns[minimal_returns_index])

    # Return list of optimal stocks that produce least expected return
    stock_minimal_returns = stock_sets[minimal_returns_index]
    return stock_minimal_returns

l_exp_returns_stocks = get_minimal_returns(sorted_stock_expected_returns)
print(l_exp_returns_stocks)


In [None]:
# Graph the relationship between standard deviation vs. expected returns for individual stocks
expeceted_returns = abs(stock_expected_returns["expected_return"])
stds = stock_std_returns["std"]

plt.scatter(stds, expeceted_returns)
plt.title("Standard Deviation vs. Expected Returns of Stocks")
plt.xlabel("Standard Deviation")
plt.ylabel("Expected Returns")
plt.show()



## 3.3 Beta
The beta value of a stock measures its volatility relative to the broader market. A beta value greater than 1 indicates that the stock is more volatile than the market and a beta value less than 1 indicates a more stable stock. Since we are choosing to go safe with our portfolio, stocks need to be chosen so that the beta values are as low as possible.

The formula of beta value of a stock relative to the market is as follows, where  ${Cov(r_i,r_m)}$  is the covariance between the stock and the market index,  ${Var(r_m)}$  is the variance of the market index: 

\begin{align*}
\beta=\frac{Cov(r_i,r_m)}{Var(r_m)}
\end{align*}

where ${r_i}$ and ${r_m}$ represent the expected return of the stock index that of market index respectively. In our calculation, we use the S&P 500 as the stock market index. 

The S&P 500 is a stock market index tracking the stock performance of the largest companies listed on the US stock exchange. Because of the sizes of these companies as well as the diversity of the companies on the S&P 500, this index is a reasonably accurate representation of the market. 

Since we are choosing to go safe with our portfolio, stocks need to be chosen so as to obtain as low of a beta value as possible. Moreover, as a volatile stock may lead to capital gains/losses, we look for the lowest magnitude beta values by taking the absolute value. The aim of this part is to use beta value as a statistical measure to quantify our various portfolios which will allow us to choose the appropriate set of stocks. 

This measure is used to create our second portfolio candidate, which is either composed of the 22 stocks with the lowest beta values if all of these values are less than one, or a minimum of 10 stocks with beta values less than one. We make this choice because a beta value greater than one indicates greater volatility than the market. Although it is important to hold a diversified portfolio, we do not want to take on more volatile stocks that could make the portfolio more risky. 

In [None]:
# Set S&P 500 as market index
market_index = yf.download(tickers="^GSPC", start=start_date, end=filter_end_date)["Close"]
market_returns = market_index.pct_change().dropna()
market_returns.index = market_returns.index.tz_localize(None, ambiguous="infer").tz_localize("UTC")
market_variance = market_returns.var()

def get_stock_betas(stocks): 
        betas = {}

        for stock in stocks:
                stock_returns = daily_returns[stock]
                covariance = stock_returns.cov(market_returns)
                stock_beta = covariance / market_variance
                betas[stock] = stock_beta
        
        stock_betas = pd.DataFrame(betas.items(), columns=["ticker", "beta"])
        return stock_betas

stock_betas = get_stock_betas(filtered_tickers)
sorted_stock_betas = stock_betas.sort_values(by="beta").reset_index(drop=True)
sorted_stock_betas

In [None]:
# Get 22 stocks with beta less than 1 but take less if any betas are greater than 1
def get_lowest_betas(stock_betas):
    l_betas = list(stock_betas[stock_betas["beta"] < 1]["ticker"])
    return l_betas
    
lowest_betas_lst = get_lowest_betas(stock_betas)

# Makes sure list is fully diversified with lowest standard deviation stocks if needed
if len(lowest_betas_lst) < max_stocks:
    l_stock_std_lst = list(stock_std_returns["ticker"])
    for stock in l_stock_std_lst:
        if stock not in lowest_betas_lst:
            lowest_betas_lst.append(stock)

            if len(lowest_betas_lst) == max_stocks:
                break

print(lowest_betas_lst)

In [None]:
# Graph the relationship between standard deviation vs. expected returns for individual stocks
betas = stock_betas["beta"]

plt.scatter(betas, expeceted_returns)
plt.title("Beta vs. Expected Returns of Stocks")
plt.xlabel("Beta")
plt.ylabel("Expected Returns")
plt.show()



## 3.4 - Correlation

In finance, correlation is a statistic that measures the degree to which two securities move in relation to each other. Correlation is closely tied to diversification. Correlations are computed as the correlation coefficent which takes on a value between -1 and 1. 

The formula of the correlation between $X$ and $Y$ is given as follows, 

where $\sigma_X$ and $\sigma_Y$ is the deviation of the stock, and the deviation of the market index respectively: 

$$ COR(X,Y)=\frac{COV(X,Y)}{\sigma_X \times \sigma_Y} $$

A positive correlation indicates that when one security moves up or down, the other security moves in the same direction. A negative correlation indicates that two assets move in opposite directions, while a zero correlation implies no relationship at all. 

Hence, we hope to factor in negatively correlated stocks in our portfolio since their opposite movements in prices should theoretically offset each other to produce minimal volatility within the portfolio. 

In [None]:
# Get correlation pairs of each stock with each other
corr_pairs = daily_returns.corr()

# Sort pairs by lowest correlations
corr_pairs["Stock 1"] = corr_pairs.index
corr_pairs = corr_pairs.melt(id_vars="Stock 1", var_name="Stock 2")
corr_pairs.rename(columns = {"value" : "Correlation"}, inplace=True)
corr_pairs.sort_values(by="Correlation", inplace=True)

# Remove duplicates within dataframe
corr_pairs = corr_pairs.iloc[:-(n_stocks)]
corr_pairs = corr_pairs.iloc[::2].reset_index(drop=True)
display(corr_pairs.head(20))

# Loop through pairs and get least correlated pairs of unique stocks
l_corr_pairs = []
for _, pair in corr_pairs.iterrows():
    stock1 = pair["Stock 1"]
    stock2 = pair["Stock 2"]
    corr = pair["Correlation"]

    if stock1 not in l_corr_pairs and stock2 not in l_corr_pairs:
        l_corr_pairs.append(stock1)
        l_corr_pairs.append(stock2)
    
    if len(l_corr_pairs) == max_stocks:
        break

print(l_corr_pairs)


In [None]:
# Create correlation matrix
corr_matrix = daily_returns.corr()

# Create heat map for correlation matrix
plt.matshow(corr_matrix)
plt.xticks(range(daily_returns.shape[1]), daily_returns.columns, rotation=45, fontsize=10)
plt.yticks(range(daily_returns.shape[1]), daily_returns.columns)

# Add a colorbar legend
cb = plt.colorbar()

plt.title("Tech Stocks Correlation Matrix", fontsize=16)

## 3.5 - Hybrid Portfolio

Finally, we want to create a mixed portfolio that encapsulates stocks with the lowest standard deviations, betas, and correlations, in an attempt to diverisfy and reduce risk.
- 22 stocks
    - 6 stocks with least correlation with each other
    - 10 stocks with lowest standard deviations
    - 6 stocks with lowest betas

In [None]:
# Number of stocks to get for each category
corr_stocks = 6
std_stocks = 10
beta_stocks = 6

hybrid_stock_lst = []

def add_unique_stocks(stock_lst, n_stocks):
    unique_stocks = []

    for stock in stock_lst:
        if stock not in hybrid_stock_lst:
            unique_stocks.append(stock)

            if len(unique_stocks) == n_stocks:
                return unique_stocks

# Add lowest correlated pairs
hybrid_stock_lst += l_corr_pairs[:corr_stocks]

# Add loweset stds
hybrid_stock_lst += add_unique_stocks(l_stock_std_lst, std_stocks)

# Add loweset beta stocks
hybrid_stock_lst += add_unique_stocks(lowest_betas_lst, beta_stocks)

print(hybrid_stock_lst)

### Preparing DataFrames For Portfolio Weightings

In [None]:
# Get DataFrames for each set of stock candidates

# 1. Lowest standard deviations
print(lowest_std_stocks_lst)
l_std_stocks_df = stock_data[lowest_std_stocks_lst]
display(l_std_stocks_df)

# 2. Lowest betas
print(lowest_betas_lst)
l_beta_stocks_df = stock_data[lowest_betas_lst]
display(l_beta_stocks_df)

# 3. Lowest returns
print(l_exp_returns_stocks)
l_er_stock_df = stock_data[l_exp_returns_stocks]
display(l_er_stock_df)

# 4. Least correlated stocks
print(l_corr_pairs)
l_corr_stocks_df = stock_data[l_corr_pairs]
display(l_corr_stocks_df)

# 5. Hybrid stock candidates
print(hybrid_stock_lst)
hybrid_stock_df = stock_data[hybrid_stock_lst]
display(hybrid_stock_df)

# Store DataFrames in a list
stock_candidates_df = [l_std_stocks_df, l_beta_stocks_df, l_er_stock_df, l_corr_stocks_df, hybrid_stock_df]

In [None]:
std_dict = {}
betas = {}
exp_returns = {}

final_closings = pd.DataFrame()
Stocks_Final = pd.DataFrame()

## 4. Portfolio Optimization
- Create random weights for each stock, and create n number of random portfolios
- Choose the portfolio with the lowest expected returns

In order to decide which portfolio to choose, we need a way to choose one that considers both the expected returns and the risk. In order to do this, we will use a geometric approach in which we find the distance between each portfolio and the origin, where the x-axis is the standard deviation for the portfolio, and the y-axis is the expected returns. The portfolio that is closest to the origin will be the one that we choose. This would find the portfolio that has the lowest risk and return closest to zero. The formula we are using is as follows:
$$d=\sqrt{x^2+y^2}$$

In [None]:
def dist_from_origin(portfolio):
    """
    Calculates the distance of the portfolio from the origin (0,0) in the risk-return plane.
    """
    portfolio_daily_returns = portfolio.sum(axis=1).pct_change()
    portfolio_std = portfolio_daily_returns.std()
    portfolio_expected_return = abs(portfolio_daily_returns.mean())

    dist = np.sqrt(portfolio_std**2 + portfolio_expected_return**2)
    return dist

In [None]:
Portfolio_Final = pd.DataFrame()

def random_weights(n, minimum, maximum):
    """
    Generates a list of n random weights that sum to 1, where each weight is within the range [minimum, maximum].
    
    Parameters:
    n (int): The number of weights to generate.
    minimum (float): The minimum value for each weight. Must be in the range [0, 1].
    maximum (float): The maximum value for each weight. Must be in the range [minimum, 1].
    
    Returns:
    list: A list of n weights that sum to 1, where each weight is within the range [minimum, maximum].
    """
    if n <= 0 or minimum < 0 or maximum > 1 or minimum >= maximum or n*minimum > 1 or n*maximum < 1:
        raise ValueError("Invalid parameters. Ensure that n > 0, 0 <= minimum < maximum <= 1, and the range [min,max] can sum to 1 with n numbers.")
    
    # These conditions would imply that the weights must all be equal
    if minimum * n == 1: return [minimum] * n
    if maximum * n == 1: return [maximum] * n
    
    while True:
        # Generate n random numbers from the interval [0, 1]
        numbers = np.random.uniform(0, 1, n-1).tolist()
        # Add 0 and 1 as boundaries to the list and sort it
        numbers.extend([0, 1])
        numbers.sort()
        
        # Calculate the differences between the numbers to get n segments
        segments = [numbers[i+1] - numbers[i] for i in range(n)]
        
        # Check if all segments are within [minimum, maximum]
        if all(minimum <= seg <= maximum for seg in segments):
            return segments
        # If any segment is not in the range, we continue and resample

def random_portfolios(num_portfolios, closing_prices):
    """
    Generates a list of num_portfolios number of random portfolios (each stored in a dataframe) by randomly assigning weights to each stock

    Parameters:
    num_portfolios (int): Number of random portfolios to generate
    closing_prices (pd.DataFrame): Dataframe containing closing prices for each stock

    Returns:
    portfolios (dictionary): Dictionary containing the randomly generated portfolios.
                       Each portfolio is a dataframe containing the stocks' daily values in the portfolio based on their weights.
    expected_returns (dictionary): Dictionary containing the expected returns for each portfolio
    """

    # Remove NaN values from closing prices or we may experience some issues
    closing_prices.dropna(inplace=True)

    portfolios = {}
    expected_returns = {}
    std_devs = {}
    distances = {} # distance from origin
    weightings = {}
    shares_amounts = {}
    # TODO: take into account the trading fee

    # Create the random portfolios, each containing the stocks' daily values based on their weights
    for i in range(num_portfolios):
        weights = np.array(random_weights(closing_prices.shape[1], min_weight, max_weight))
        weightings[i] = weights

        investment_per_stock = (weights) * capital
        # Calculate how many shares to buy (based on the closing price of the first day)
        num_shares = investment_per_stock / closing_prices.iloc[0]
        shares_amounts[i] = num_shares

        # Calculate the daily value of each stock in the portfolio
        portfolio = closing_prices * num_shares
        portfolios[i] = portfolio

        # Each row in this dataframe is the total value of the portfolio on that day
        total_portfolio_value = portfolio.sum(axis=1)
        # Calculate the expected return of the portfolio
        returns = total_portfolio_value.pct_change()

        expected_return = returns.mean()
        # We just care about the magnitude of the expected return, so we take the absolute value
        expected_returns[i] = abs(expected_return)*100

        std_dev = returns.std()
        std_devs[i] = std_dev*100

        # Compute the distance of the portfolio from (0,0)
        distance = dist_from_origin(portfolio)
        distances[i] = distance
    
    return portfolios, expected_returns, std_devs, distances, shares_amounts, weightings

TODO: insert explanation - explain why we're optimizing for lowest standard deviation & expected returns closest to 0

In [None]:
def optimize_weights(n_rand_portfolios, closing_prices):
    rand_portfolios, exp_returns, std_devs, distances, shares_amounts, weightings = random_portfolios(n_rand_portfolios, closing_prices)

    # Pick the portfolio with the distance closest to zero
    # Here, we just use the min function to find the smallest absolute value in the dictionary
    optimal_port_i = min(distances, key=lambda x: distances[x])
    print(f"The optimal portfolio has a distance of about {float(distances[optimal_port_i]):.15}.")

    optimal_weights_df = pd.DataFrame(weightings[optimal_port_i], index=closing_prices.columns, columns=["Weighting"])
    print(f"\nHere are the best weights for each stock:")
    display(optimal_weights_df)

    optimal_portfolio = rand_portfolios[optimal_port_i]
    optimal_portfolio["Total"] = optimal_portfolio.sum(axis=1)
    print("The optimal portfolio is:")
    display(optimal_portfolio)

    return exp_returns, std_devs, optimal_port_i

num_portfolios = 1000 # Number of random portfolios to generate

# TODO: remove the variable below after testing
final_closings = stock_data.iloc[:, :10] # Get first 10 stocks just for testing

exp_returns, std_devs, optimal_index = optimize_weights(num_portfolios, final_closings)

In [None]:
# Graph all the random portfolios on a scatterplot
plt.scatter(std_devs.values(), exp_returns.values())
plt.xlabel("Standard Deviation (%)")
plt.ylabel("Expected Return (%)")
plt.title("Random Portfolios ")
# Draw a line from the origin to the optimal portfolio
plt.plot([0, std_devs[optimal_index]], [0, exp_returns[optimal_index]], color="red")
# Change the colour of the optimal portfolio
plt.scatter(std_devs[optimal_index], exp_returns[optimal_index], color="green")
plt.show()


# Old code:
"""
# Graph the portfolios on a scatterplot
plt.scatter(portfolio_std*100, abs(portfolio_expected_returns)*100, c="blue")
plt.xlabel("Standard Deviation (%)")
plt.ylabel("Absolute Expected Return (%)")
plt.title("Random Portfolios")
# Remove the scientific notation from the axes
plt.ticklabel_format(style="plain")
# Change the scale
# plt.xlim(-0.005, 0.005)
# plt.ylim(-0.005, 0.005)
plt.show()
"""

In the scatterplot above, we can see that there is somewhat of a correlation between expected returns and standard deviation. However, it's still important to consider both of these metrics when choosing a portfolio since the correlation between the two are not perfect.

## Contribution Declaration

The following team members made a meaningful contribution to this assignment:

Insert Names Here.