In [3]:
from IPython.display import display, Math, Latex

import pandas as pd
import numpy as np
import numpy_financial as npf
import yfinance as yf
import matplotlib.pyplot as plt
from datetime import datetime, date

import random

## CFM 101: Group Assignment - Python Roboadvisor
### Team Number: 15
### Team Member Names: Landon Trinh, Ethan Zemelman, Jessie Deng
### Team Strategy Chosen: SAFE

## Goal
The focus of this project is to dynamically generate the safest possible portfolio given a file of unknown stock tickers. With a given list of stocks, the robo advisor will generate a portfolio valued at $750,000, which is composed of a minimum of 10 and maximum of 22 stocks. This portfolio will then be run from November 25, 2023 to December 4, 2023. Ultimately, the goal is to have the portfolio deviate as little as possible from the starting value. 

In this project, our goal will be to tell a convincing story to WHY we are picking our stocks. We will calculate and discuss statistics, display and intepret graphs, and explain our thought process. 

## Process
Our robo advisor is designed to process the list of stock and select desirable candidates through the following steps: 

1. Setup

Before implementing strategies, we will define some variables and constants that that will be used in later parts. 

2. Filter Stock Data

Before moving on to selecting candidates for a portfolio, we filter the stock data to eliminate stocks that do not satisfy our requirements or may have been delisted. 

3. Select Candidates of Stocks

In this step, we will calculate certain measures, including but not limited to standard deviation of expected return, beta value. We will then select candidates that possess these desirable traits. To ensure a low and stable expected return, some example of these desired traits include low beta value, low standard deviation. 

4. Select Optimal Weightings

Since stocks with a high weighting can have a more prominent effect the overall performance of a portfolio, it is crucial for stocks in the portfolio to be weighted. We will aim to achieve optimal performance by weighting the stocks in a portfolio in a way that allows for lowest deviation in portfolio value. 

5. Choose the Final Portfolio

To ensure that the best possible bundle of stocks are chosen, the robo advisor will create multiple portfolios based on different metrics, such as lowest standard deviation, lowest expected return. After obtaining the stocks and weighting of each portfolio, we optimize for the lowest standard deviation and lowest expected return to choose a final portfolio. 

## Introduction
> "Theory will only take you so far." - J. Robert Oppenheimer

Yet, throughout the many years of educations, students are taught about theory --- whether it be game theory, psychoanalytic theory. These theories are often abstract and based on various assumptions. For example, in finance, we are taught to assume that people behave rationally and with the goal to increase financial position. That is, people will not make impulse purchases; they will buy and sell stocks with the goal of earning money. 

However, in real life, there are much more complications to a situation than we assume. Therefore, more people are resorting to examining hard, cold data that does not lie. Proposed in 1952 by Harry Markowitz, the Modern Portfolio Theory is an example of such. It is a mathematical framework used to build portfolios that maximizes returns while keeping the risk below a certain level. Under this method, 

This project utilizes the Modern Portfolio Theory as an underlying framework to build a robo advisor. The robo advisor will then perform analysis and make selections to come up with a final portfolio that meets the criteria of a "safe" portfolio. A "safe" portfolio is one that deviates as little as possible from the starting value. Moreover, it is our focus to keep the difference between the initial and final value as little as possible over a short period of time. 

Diversification is a risk management strategy that focuses on mixing a wide variety of investments within a portfolio in attempt to reduce portfolio risk. Hence, we plan to create 3 sets of 22 stocks in order to maximize diversification and reduce portfolio risk, but may adjust the number of stocks based on the list of stock given as well as the statistics of these stocks. For example, the robo advisor may choose to have less than 22 stocks in a portfolio in case the vast majority of stocks have an extremely high beta values. 

## 1. Setup
Before implementing our trading strategy, we will initialize required and useful constants as part of the rules:
- Currency of valid stocks (USD or CAD)
- Required average monthly volume (150,000 shares)
- The number of stocks we wish to purchase on the start date (10-22 stocks)
- Time interval (Janurary 1, 2023 - October 31, 2023)
- Minimum number of trading days for month (18 days)
- Minimum stock weighting: $\frac{100}{2n}$%, $n$ = number of stocks in portfolio
- Maximum stock weighting: 20%
- Initial investment amount: 750,000 CAD
- Buying date of roboadvisor: November 25, 2023 - December 4, 2023
- Trading fee for each stock trade: $4.95 CAD

In the end, our roboadvisor should create two DataFrames:

1. ${\verb|Portfolio_Final|\\}$
- Index: Starts at 1 and ends at number of stocks in portfolio
- Headings: Ticker, Price (price of stock on Nov 25), Currency (CAD or USD), Shares, Value, Weight (adds up to 100%)

2. ${\verb|Stocks_Final|\\}$

We should output this DataFrame to a CSV file titled "Stocks_Group_15.csv"
- Index: Same as "Portfolio Final"
- Headings: Tickers and Shares from "Portfolio_Final"

In [24]:
# Investment amount (CAD)
capital = 750000

# Number of stocks to buy for portfolio
num_stocks = 22

# Maximum and minimum weightings of each stock in portfolio
min_weight = 1 / (2 * num_stocks)
max_weight = 0.20

# Start and end date for roboadvisor
# start_date = "2023-11-25"
# end_date = "2023-12-04"

# Filtering requirements
min_trading_days = 18
min_avg_volume = 150000

## 2. Filtering
After reading in the CSV file containg stock tickers, we must filter the list of stocks to make sure they are valid stock tickers according to the following rules:

- Include stocks that have an average monthly volume of at leaest 150,000 shares based on Jan 1, 2023 - Oct 31, 2023 (drop any months that don't have at least 18 trading days)
- Stock denominated in USD or CAD

To accomplish this, we first read the csv file containing all the tickers and extract these tickers to be put into a pandas data frame. 

We then set the parameters for the filtering, which includes the start and end dates as well as filter interval. 

We use a function called get_short_months to recursively check for months that have less than 18 trading days and drop these months. We then recursively use the function called filter_volume to to determine if the stock meets the trading volume requirement. The short months with less than 18 trading days that we obtain from the get_short_months function are dropped before we perform the calculation to determine the average monthly volume. 

We then retrieve the filtered tickers, which are ones that satisifies the above mentioned requirements and therefore, can be used for our portfolio. 

In [5]:
# Read in CSV ticker file
tickers = pd.read_csv("tickers_example.csv", header=None)
tickers = tickers.rename(columns={0: "ticker"})
tickers_lst = tickers["ticker"].tolist()
tickers.head()

Unnamed: 0,ticker
0,AAPL
1,ABBV
2,ABT
3,ACN
4,AGN


In [6]:
# Set parameters for filtering tickers
filter_start_date = "2023-01-01"
filter_end_date = "2023-10-31"
filter_interval = "1mo"

In [7]:
# Determines months with less than 18 trading days
def get_short_months(market_index):
    short_months = []
    for month in range(1, 11):
        trading_days = len(market_index.history(start=str(date(2023, month, 1)), end=str(date(2023, month+1, 1))))
        if trading_days < min_trading_days:
            short_months.append(month)
    return short_months

# Keeps stocks with valid average monthly volume
def filter_volume(tickers, short_months):

    # Retrieve monthly volume data for tickers
    volume_data = yf.download(tickers=tickers, interval="1mo", start=filter_start_date, end=filter_end_date).Volume

    # Drop short months from volume DataFrame
    for short_month in short_months:
        volume_data.drop(str(date(2023, short_month, 1)))

    # Determine whether stocks meets average monthly volume requirement
    for ticker in tickers:
        if (volume_data[ticker]).mean() < min_avg_volume:
            print(f"{ticker} does not meet the required minimum average monthly volume")
            tickers.remove(ticker)

    # Return finalized list of tickers
    return tickers


# Retrieve filtered tickers
def filter_tickers(tickers):
    
    # Initialize list to separately store CAD and USD tickers
    cad_tickers = []
    usd_tickers = []
    
    for ticker in tickers:
        try:
            stock_ticker = yf.Ticker(ticker)
            base_currency = stock_ticker.fast_info["currency"]
            
            # Store ticker in appropriate list
            if base_currency == "CAD":
                cad_tickers.append(ticker)
            
            elif base_currency == "USD":
                usd_tickers.append(ticker)
    
        except:
            print(f"{ticker} may be delisted")

    # Determine months that have less than 18 trading days for CAD and USD stocks
    cad_short_months = get_short_months(yf.Ticker("^GSPTSE"))
    usd_short_months = get_short_months(yf.Ticker("^GSPC"))

    # Filter months that have an average monty volume of less than 150k
    filtered_cad_tickers = filter_volume(cad_tickers, cad_short_months)
    filtered_usd_tickers = filter_volume(usd_tickers, usd_short_months)

    # Add to single list
    filtered_tickers = filtered_cad_tickers + filtered_usd_tickers
    return filtered_tickers
    
filtered_tickers = filter_tickers(tickers_lst)
print(filtered_tickers)

AGN may be delisted
CELG may be delisted
MON may be delisted
RTN may be delisted
[*********************100%%**********************]  4 of 4 completed
[*********************100%%**********************]  32 of 32 completed
['RY.TO', 'SHOP.TO', 'T.TO', 'TD.TO', 'AAPL', 'ABBV', 'ABT', 'ACN', 'AIG', 'AMZN', 'AXP', 'BA', 'BAC', 'BIIB', 'BK', 'BLK', 'BMY', 'C', 'CAT', 'CL', 'KO', 'LLY', 'LMT', 'MO', 'MRK', 'PEP', 'PFE', 'PG', 'PM', 'PYPL', 'QCOM', 'TXN', 'UNH', 'UNP', 'UPS', 'USB']


In [16]:
# Download stock data
stock_data = yf.download(tickers=filtered_tickers, interval="1d", start=filter_start_date, end=filter_end_date).Close.dropna()

# Get daily returns of stocks
daily_returns = stock_data.pct_change().dropna()
daily_returns.index = daily_returns.index.tz_localize(None, ambiguous="infer").tz_localize("UTC")
daily_returns.head()

[*********************100%%**********************]  36 of 36 completed


Unnamed: 0_level_0,AAPL,ABBV,ABT,ACN,AIG,AMZN,AXP,BA,BAC,BIIB,...,QCOM,RY.TO,SHOP.TO,T.TO,TD.TO,TXN,UNH,UNP,UPS,USB
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2023-01-04 00:00:00+00:00,0.010314,0.008067,0.014875,-0.003404,0.014778,-0.007924,0.023246,0.042223,0.0188,-0.006676,...,0.040392,0.008904,0.037303,0.012158,0.013003,0.036517,-0.027264,0.007997,0.01044,0.031138
2023-01-05 00:00:00+00:00,-0.010605,-0.001222,-0.003687,-0.023613,-0.005481,-0.023726,-0.027302,0.006629,-0.00205,0.00288,...,-0.019098,-0.002942,-0.035171,-0.001126,-0.027024,-0.013241,-0.028821,-0.02944,-0.018463,-0.007821
2023-01-06 00:00:00+00:00,0.036794,0.018717,0.013809,0.02369,0.016375,0.035611,0.025541,0.039075,0.009979,0.028204,...,0.054296,0.012734,0.01495,0.015408,-0.000463,0.049302,8.2e-05,0.043973,0.029395,0.014014
2023-01-09 00:00:00+00:00,0.004089,-0.029361,-0.001602,0.016864,-0.010535,0.01487,0.001532,-0.020798,-0.015112,-0.016222,...,-0.006329,0.007437,0.005044,-0.002221,-0.003126,0.008678,0.000122,-0.002594,0.015312,0.006478
2023-01-10 00:00:00+00:00,0.004456,-0.012495,0.015158,0.00431,-0.006576,0.028732,0.010638,-0.009014,0.006787,0.024534,...,0.020068,0.011416,-0.016061,0.001113,-0.001161,0.009735,-0.008285,0.003878,-0.016071,0.001073


## 3. Stock Analysis

As widely known in finance, diversification plays an important role in reducing the risk of investment portfolios as it minimizes the effect that changes in a single stock can have on the entire portfolio. Since our trading strategy is to go safe, we could take advantage of this information by generating portfolios to contain as many stocks as possible. 

Since we want to diversify our portfolio as much as possible to reduce overall risk, we will choose the maximum number of stocks allowed (22 stocks). We will also place a large focus on LOW-RISK assets to reduce volatility within our portfolio.

For this section, our aim is to generate multiple portfolios each containing 22 stocks unless the statistics support it to be done in another way. For instance, in case the list of stocks contains a few stable stocks and some highly volatile ones, it may be more advantageous to take only the ones that are stable. 

Based from our CFM 101 class, we will focus on the following main statistical analyses:
For this section, our aim is to generate 3 lists of 22 stocks that can be potential candidates for our final portfolio.

#### 3.1 - Standard Deviation
- Take 22 stocks with the LOWEST standard deviations

#### 3.2 - Beta
- Take 22 stocks with the LOWEST beta values (stock with beta less than 1 is considered less volatile than the market)

#### 3.3 - Expected Returns
- Take 22 stocks with LOWEST expected returns

#### 3.4 - Correlation
- Get pairs of stocks that have LOWEST correlation

## 3.1 - Standard Deviation

Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean with its formula given below, where $x_{i}$ is a single data point, $\overline{X}$ is the mean of the data points and $N$ is the number of data points in a given dataset: 

$$\sigma_X=\sqrt{\frac{\sum(x_i-\overline{X})^2}{N}}
$$

In finance, standard deviation is often used as a measure of the risk of investing in an asset. For example, a volatile stock will have a high standard deviation while a stable blue-chip stock will have a lower standard deviation. 

To create a safe portfolio, we want to select stocks that are less volatile, indicated by a low standard deviation. Hence, we will take a look at stocks that have the lowest standard deviations.

The get_std_returns function below calculates the standard deviation of the daily historical returns of each stocks, and sorts it from lowest to highest. 

We use this measure to create one of our portfolios by extracting 22 stocks with the lowest standard deviation of expected return. Since low standard deviation indicates a smaller spread relative to the mean, which is the expected return in our case, we know that the stock is more predictable. Therefore, a portfolio composed of these stocks will be more likely to succeed since the future movements of the stocks are more likely to be able to be predicted by our statistics. 

In [97]:
# Calculate standard deviation of returns for each stock
def get_std_returns(stocks):

    std_returns = {} 

    for stock in stocks:
        stock_std = daily_returns[stock].std()
        std_returns[stock] = stock_std

    stock_std_returns = pd.DataFrame(std_returns.items(), columns=["ticker", "std"])
    stock_std_returns = stock_std_returns.sort_values(by="std").reset_index(drop=True)
    return stock_std_returns

stock_std_returns = get_std_returns(daily_returns)
stock_std_returns.head()

Unnamed: 0,ticker,std
0,KO,0.008796
1,RY.TO,0.009367
2,PG,0.009581
3,PEP,0.009822
4,CL,0.01057


In [33]:
def get_expected_return(stocks):

    expected_returns = []

    for stock in stocks:
        expected_returns.append(daily_returns[stock].mean())
    
    expected_return = sum(expected_returns) / len(expected_returns)
    return expected_return

In [34]:
# Retrieve and graph expected returns of stocks with highest standard deviation
split = len(filtered_tickers) // 2

# Get stocks with highest standard deviations
highest_std_lst = list(stock_stds.nlargest(split, columns="std")["ticker"])
highest_std_return = abs(get_expected_return(highest_std_lst))
print(f"Expected Return of Stocks with Highest Standard Deviations: {highest_std_return}")

# Get stocks with lowest standard deviations
lowest_std_stocks = stock_std_returns.nsmallest(num_stocks, columns="std")
lowest_std_stocks_lst = list(lowest_std_stocks["ticker"])

Expected Return of Stocks with Highest Standard Deviations: 9.093421851470439e-05
Expected Return of Stocks with Lowest Standard Deviations: 0.0004090682729620729
Lower has higher return


## 3.2 - Expected Returns

In [114]:
# Calculate expected returns for each stock
def get_expected_returns(stocks):

    expected_returns = {} 

    for stock in stocks:
        stock_expected_return = daily_returns[stock].mean() * 100
        expected_returns[stock] = stock_expected_return

    stock_expected_returns = pd.DataFrame(expected_returns.items(), columns=["ticker", "expected_return"])
    stock_expected_returns = stock_expected_returns.sort_values(by="expected_return").reset_index(drop=True)
    return stock_expected_returns

stock_expected_returns = get_expected_returns(filtered_tickers)
stock_expected_returns

Unnamed: 0,ticker,expected_return
0,PFE,-0.244709
1,BMY,-0.162179
2,PYPL,-0.15914
3,USB,-0.141409
4,BAC,-0.116957
5,UPS,-0.103484
6,T.TO,-0.074618
7,ABT,-0.072244
8,RY.TO,-0.069821
9,BLK,-0.069371


In [157]:
def get_minimal_returns(expected_returns):
    # Number of possible combinations
    combinations = expected_returns.shape[0] - num_stocks

    abs_expected_returns = []
    stock_sets = []

    for i in range(combinations):
        # Get set of 22 stocks
        expected_returns_set = expected_returns.head(num_stocks)

        # Get stocks and the average expected return
        stock_set = list(expected_returns_set["ticker"])
        expected_return = expected_returns_set["expected_return"].mean()

        # Add to list
        stock_sets.append(stock_set)
        abs_expected_returns.append(abs(expected_return))

        # Drop first row
        expected_returns = expected_returns.drop(expected_returns.index[0])

    # Get index that produces least absolute return
    minimal_returns_index = abs_expected_returns.index(min(abs_expected_returns))

    # Return list of optimal stocks that produce least expected return
    stock_minimal_returns = stock_sets[minimal_returns_index]
    return stock_minimal_returns

l_exp_returns_stocks = get_minimal_returns(stock_expected_returns)
print(l_exp_returns_stocks)


['BIIB', 'ABBV', 'TD.TO', 'PM', 'MO', 'KO', 'PEP', 'BK', 'MRK', 'CL', 'LMT', 'BA', 'PG', 'AIG', 'AXP', 'UNP', 'UNH', 'CAT', 'QCOM', 'ACN', 'AAPL', 'SHOP.TO']


## 3.3 Beta
The beta value of a stock measures its volatility relative to the broader market. A beta value greater than 1 indicates that the stock is more volatile than the market and a beta value less than 1 indicates a more stable stock. Since we are choosing to go safe with our portfolio, stocks need to be chosen so that the beta values are as low as possible.

The formula of beta value of a stock relative to the market is as follows, where  ${Cov(r_i,r_m)}$  is the covariance between the stock and the market index,  ${Var(r_m)}$  is the variance of the market index: 

\begin{align*}
\beta=\frac{Cov(r_i,r_m)}{Var(r_m)}
\end{align*}

where ${r_i}$ and ${r_m}$ represent the expected return of the stock index that of market index respectively. In our calculation, we use the S&P 500 as the stock market index. 

The S&P 500 is a stock market index tracking the stock performance of the largest companies listed on the US stock exchange. Because of the sizes of these companies as well as the diversity of the companies on the S&P 500, this index is a reasonably accurate representation of the market. 

Since we are choosing to go safe with our portfolio, stocks need to be chosen so as to obtain as low of a beta value as possible. Moreover, as a volatile stock may lead to capital gains/losses, we look for the lowest magnitude beta values by taking the absolute value. The aim of this part is to use beta value as a statistical measure to quantify our various portfolios which will allow us to choose the appropriate set of stocks. 

This measure is used to create our second portfolio candidate, which is either composed of the 22 stocks with the lowest beta values if all of these values are less than one, or a minimum of 10 stocks with beta values less than one. We make this choice because a beta value greater than one indicates greater volatility than the market. Although it is important to hold a diversified portfolio, we do not want to take on more volatile stocks that could make the portfolio more risky. 

In [162]:
# Set S&P 500 as market index
market_index = yf.download(tickers="^GSPC", start=filter_start_date, end=filter_end_date)["Close"]
market_returns = market_index.pct_change().dropna()
market_returns.index = market_returns.index.tz_localize(None, ambiguous="infer").tz_localize("UTC")
market_variance = market_returns.var()

def get_stock_betas(stocks): 
        betas = {}

        for stock in stocks:
                stock_returns = daily_returns[stock]
                covariance = stock_returns.cov(market_returns)
                stock_beta = covariance / market_variance
                betas[stock] = stock_beta
        
        stock_betas = pd.DataFrame(betas.items(), columns=["ticker", "beta"])
        stock_betas = stock_betas.sort_values(by="beta").reset_index(drop=True)
        return stock_betas

stock_betas = get_stock_betas(filtered_tickers)
stock_betas

[*********************100%%**********************]  1 of 1 completed


Unnamed: 0,ticker,beta
0,ABBV,0.15962
1,MRK,0.233888
2,UNH,0.282467
3,CL,0.295556
4,LMT,0.310787
5,PEP,0.354425
6,KO,0.360745
7,BMY,0.386441
8,T.TO,0.394628
9,PG,0.411532


In [169]:
# Get 22 stocks with beta less than 1 but take less if any betas are greater than 1
n_lowest_beta_stocks = stock_betas.head(num_stocks)
n_lowest_beta_stocks = n_lowest_beta_stocks[n_lowest_beta_stocks["beta"] < 1]
n_lowest_beta_stocks

Unnamed: 0,ticker,beta
0,ABBV,0.15962
1,MRK,0.233888
2,UNH,0.282467
3,CL,0.295556
4,LMT,0.310787
5,PEP,0.354425
6,KO,0.360745
7,BMY,0.386441
8,T.TO,0.394628
9,PG,0.411532


## 3.4 - Correlation

In finance, correlation is a statistic that measures the degree to which two securities move in relation to each other. Correlation is closely tied to diversification. Correlations are computed as the correlation coefficent which takes on a value between -1 and 1. 

The formula of the correlation between $X$ and $Y$ is given as follows, 

where $\sigma_X$ and $\sigma_Y$ is the deviation of the stock, and the deviation of the market index respectively: 

$$ COR(X,Y)=\frac{COV(X,Y)}{\sigma_X \times \sigma_Y} $$

A positive correlation indicates that when one security moves up or down, the other security moves in the same direction. A negative correlation indicates that two assets move in opposite directions, while a zero correlation implies no relationship at all. 

Hence, we hope to factor in negatively correlated stocks in our portfolio since their opposite movements in prices should theoretically offset each other to produce minimal volatility within the portfolio. 

In [184]:
# Get correlation matirx of each stock with the S&P 500
stock_data = yf.download(tickers=lowest_std_stocks_lst, start=filter_start_date, end=filter_end_date).Close.pct_change().dropna()

corr_matrix = stock_data.corr()
corr_matrix

# Reformat correlation matrix
# corr_matrix["Stock #1"] = corr_matrix.index
# corr_matrix = corr_matrix.melt(id_vars="Stock #1", var_name="Stock #2").reset_index(drop=True)
# corr_matrix.rename(columns={"value": "Correlation"}, inplace=True)
# corr_matrix

[*********************100%%**********************]  22 of 22 completed


  stock_data = yf.download(tickers=lowest_std_stocks_lst, start=filter_start_date, end=filter_end_date).Close.pct_change().dropna()


Unnamed: 0,AAPL,ABBV,ABT,ACN,BIIB,BLK,BMY,CL,KO,LMT,...,PEP,PFE,PG,PM,RY.TO,T.TO,TD.TO,TXN,UNH,UPS
AAPL,1.0,-0.035459,0.240633,0.458374,0.289067,0.471322,0.236583,0.080034,0.182986,0.155958,...,0.15889,0.146068,0.227274,0.235382,0.267298,0.190449,0.286406,0.510317,0.073727,0.417619
ABBV,-0.035459,1.0,0.222156,0.041724,0.190128,0.048471,0.281408,0.298471,0.323551,0.155127,...,0.303137,0.234207,0.319308,0.279562,0.115614,0.078251,0.101189,0.039581,0.222012,0.121252
ABT,0.240633,0.222156,1.0,0.1542,0.229849,0.311489,0.250768,0.336859,0.317376,0.057213,...,0.328474,0.169353,0.385285,0.338978,0.276465,0.224046,0.194579,0.213637,0.016164,0.229262
ACN,0.458374,0.041724,0.1542,1.0,0.269275,0.511963,0.01074,0.088609,0.140916,0.066005,...,0.077889,0.156124,0.176682,0.173951,0.257221,0.18581,0.28601,0.544613,0.033723,0.423825
BIIB,0.289067,0.190128,0.229849,0.269275,1.0,0.279779,0.280543,0.312367,0.274076,0.195554,...,0.238309,0.259706,0.345691,0.314909,0.244368,0.296753,0.241492,0.330713,0.237602,0.31988
BLK,0.471322,0.048471,0.311489,0.511963,0.279779,1.0,0.189294,0.178538,0.25357,0.102108,...,0.179262,0.251004,0.210054,0.348609,0.555658,0.342553,0.492601,0.5463,0.025046,0.549452
BMY,0.236583,0.281408,0.250768,0.01074,0.280543,0.189294,1.0,0.271166,0.322215,0.185104,...,0.267271,0.419457,0.338768,0.247675,0.123256,0.170499,0.20003,0.071985,0.183343,0.30498
CL,0.080034,0.298471,0.336859,0.088609,0.312367,0.178538,0.271166,1.0,0.571009,0.211497,...,0.543737,0.200415,0.644542,0.320663,0.218435,0.25347,0.157513,0.054509,0.331753,0.182897
KO,0.182986,0.323551,0.317376,0.140916,0.274076,0.25357,0.322215,0.571009,1.0,0.229569,...,0.761893,0.194019,0.601358,0.417563,0.241827,0.197921,0.208533,0.136471,0.221753,0.221656
LMT,0.155958,0.155127,0.057213,0.066005,0.195554,0.102108,0.185104,0.211497,0.229569,1.0,...,0.267982,0.204495,0.196839,0.275831,0.213321,0.117657,0.207764,0.071475,0.237878,0.100953


## 4. Portfolio Optimization
- Create random weights for each stock, and create n number of random portfolios
- Choose the portfolio with the lowest expected returns

In [None]:
std_dict = {}
betas = {}
exp_returns = {}

final_closings = pd.DataFrame()
Stocks_Final = pd.DataFrame()

In [None]:
Portfolio_Final = pd.DataFrame()

# TODO: fix random_weights function (currently doesn't obey the max_val)
def random_weights(n, min_val, max_val):
    """
    Generates a list of n number of random weights, each between min and max, that sum to 1
    """

    # Adjust the weights so they sum up to 1 with specified constraints
    # by initially giving each weight the minimum value
    remaining = 1 - n * min_val
    weights = [min_val] * n
    
    for i in range(n - 1):
        # Calculate the maximum value the next weight can take
        max_next = min(remaining, max_val - min_val)
        # Generate a random weight between min_val and max_next
        weight = min_val + random.uniform(0, max_next)
        remaining -= (weight - min_val)
        
        if remaining < 0:
            # This error will occur if the min is too high and n isn't small enough
            raise ValueError("Not enough value left to distribute among the remaining weights.")
        
        weights[i] += (weight - min_val)

    # Allocate the remaining value to the last weight
    weights[-1] += remaining

    return weights

def random_portfolios(num_portfolios, closing_prices):
    """
    Generates a list of num_portfolios number of random portfolios (each stored in a dataframe) by randomly assigning weights to each stock

    Parameters:
    num_portfolios (int): Number of random portfolios to generate
    closing_prices (pd.DataFrame): Dataframe containing closing prices for each stock

    Returns:
    portfolios (dictionary): Dictionary containing the randomly generated portfolios.
                       Each portfolio is a dataframe containing the stocks' daily values in the portfolio based on their weights.
    expected_returns (dictionary): Dictionary containing the expected returns for each portfolio
    """

    # Remove NaN values from closing prices or we may experience some issues
    closing_prices.dropna(inplace=True)

    portfolios = {}
    expected_returns = {}
    weightings = {}

    # Create the random portfolios, each containing the stocks' daily values based on their weights
    for i in range(num_portfolios):
        weights = random_weights(closing_prices.shape[1], min_weight, max_weight)
        weightings[i] = weights

        investment_per_stock = weights * capital
        # Calculate how many shares to buy (based on the closing price of the first day)
        num_shares = investment_per_stock / closing_prices.iloc[0]

        # Calculate the daily value of each stock in the portfolio
        portfolio = closing_prices * num_shares
        portfolios[i] = portfolio

        # Calculate the expected return of the portfolio
        # Each row in this dataframe is the total value of the portfolio on that day
        total_portfolio_value = portfolio.sum(axis=1)
        # Calculate the returns of the portfolio
        returns = total_portfolio_value.pct_change()

        # TODO: double check expected return calculation
        expected_return = returns.mean()
        expected_returns[i] = expected_return
    
    return portfolios, expected_returns, weightings

In [None]:
num_portfolios = 1000 # Number of random portfolios to generate

# TODO: remove the variable below after testing
final_closings = stock_data.iloc[:, :10] # Get first 10 stocks just for testing

rand_portfolios, expected_returns, weightings = random_portfolios(num_portfolios, final_closings)

# Pick the portfolio with the expected return closest to zero
# Here, we just use the min function to find the smallest absolute value in the dictionary
optimal_portfolio = min(expected_returns, key=lambda x: abs(expected_returns[x]))
print(f"The optimal portfolio has an expected return of about {float(expected_returns[optimal_portfolio]):.15%}.")

optimal_weights_df = pd.DataFrame(weightings[optimal_portfolio], index=final_closings.columns, columns=["Weighting"])
print(f"\nHere are the best weights for each stock:")
display(optimal_weights_df)

print("The optimal portfolio is:")
display(rand_portfolios[optimal_portfolio])

ValueError: operands could not be broadcast together with shapes (7500000,) (10,) 

## Contribution Declaration

The following team members made a meaningful contribution to this assignment:

Insert Names Here.