# Group Assignment - Portfolio Generation

### Team Number: 7
### Team Members: Ashton, Bodhana, Johnson
### Team Strategy: Risky

In [None]:
# Import Necessary Libraries
from IPython.display import display, Math, Latex
import pandas as pd
import yfinance as yf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
from datetime import datetime

## Global Constants

We start by setting the following global constants:
* CSV file of tickers
* Start and end date interval we will be analyzing to assemble our final portfolio
* End date is also used as the date we generate our final portfolio
* Flat fee applied to each stock (NOT share) purchased
* Number of stocks in our portfolio
* Investment amount

Our final portfolio will consist of 10 stocks (the minimum allowed) since the lower the number of stocks, the less diverse and ineffective at diversifying away non-systematic risk the portfolio will be. In simpler terms, losses in one stock are unlikely to be counteracted by gains in another when there are fewer stocks and vice versa, making these conditions more risky.

In [None]:
# Ticker CSV file
ticker_file_name = 'Tickers_Example.csv'

# Start and end dates for data analysis
# Start on first trading day in 2022 and end on day of simulation
# Period is reflective of current economic situation
start_date = '2022-01-04'
end_date = '2023-11-25'

# Flat fee of $4.95 CAD charged on every stock in portfolio
flat_fee = 4.95

# Number of stocks chosen (less stocks -> less diversification)
num_stocks = 10

# Investment amount in CAD
investment = 750_000 - (flat_fee * num_stocks)

## Summary of Process

First, it is imperative to define what is considered the "riskiest" portfolio. By "riskiest," our group aimed to generate either the extreme highest or lowest percent returns on our initial investment. However, by definition, risk only describes the possibility of loss. Our data may reveal extremely high returns for a stock at one point in time, but with different market conditions, may also result in the great loss. Hence, we redefined risk for this assignment as **volatility** or the degree of a stock's fluctuation in the market, which constitutes both gains and losses. This is what we intend to maximize and measute; the greater the volatility, the greater the risk.

To do so, we first minimized diversification by selecting the minimium number of stocks: 10. Then to select our 10 stocks, we relied on a combination of two metrics: **correlation** and **beta**, whose purposes we will discuss in depth further on.

Our ideal objective is to the create a portfolio consisting of 10 stocks that exhibit perfect correlation to one another as we want all the stocks to move in the same direction. If all the stocks perform well, it would result in massive returns for the investor; however, if they do poorly, the investor could lose a significant amount of money. In other words, we want to avoid any stocks moving in opposite directions since that would support a diversified, risk-adverse portfolio.

Apart from being correlated to one another, we want the selected stocks to be the most volatile relative to the market, but we needed to determine whether market correlation or beta was the best metric to measure this. Overall, in order to figure out whether correlation between stocks, market correlation, beta, or a combined measure would yield the best performance, we designed and tested five strategies on past stock data, before selecting a final strategy based on which one generated the most volatile portfolio.

## Processing Tickers

We generate a list of tickers from the CSV file. The raw tickers are then filtered based on the given requirements:

* Ignore tickers that do not reference a valid stock denominated in either USD or CAD
* Only include stocks in your portfolio that have at least 150,000 shares of average monthly volume, as calculated based on the time interval of January 1, 2023 to October 31, 2023.
* Drop any month that does not have at least 18 trading days

In [None]:
# Get the list of tickers from the CSV file
raw_tickers = pd.read_csv(ticker_file_name, header = None)[0].tolist()
print(raw_tickers)

In [None]:
# Filter the file and return a list of valid stocks

# Start and end dates used for the filter calculations
required_start_for_filter = "2023-01-01"
required_end_for_filter = "2023-10-01"

# Minimum trading days and volume
minimum_days = 18
minimum_avg_volume = 150000

# List of valid exchange tickers
valid_us_exchanges = ['NMS', 'BTS', 'IEX', 'NAS', 'ASE', 'PCX', 'NYQ', 'OPR', 'OBB', 'PNK']
valid_can_exchanges = ['CSE', 'NEO', 'TOR', 'VAN']

# Empty list to store the valid tickers
valid_tickers = []

# Consumes a stock ticker and checks whether it is a valid stock based on assignment rules
def validate_stock(stock):

    stock_info = yf.Ticker(stock).fast_info

    # Accomodate for any tickers that give an error
    try:

        # Check if the stock is denominated in USD or CAD and is trading on a valid exchange
        if ((stock_info['currency'] == 'USD' or stock_info['currency'] == 'CAD') and
            (stock_info['exchange'] in valid_us_exchanges or stock_info['exchange'] in valid_can_exchanges)):

            # Get stock's historical data
            stock_hist = yf.Ticker(stock).history(start=required_start_for_filter, end=required_end_for_filter).dropna()

            # Create a dataframe of the number of trading days per month
            monthly_trading_days = stock_hist['Volume'].groupby(pd.Grouper(freq = 'MS')).count()

            # Create a dataframe of the stock's total monthly volume
            monthly_volume = stock_hist['Volume'].groupby(pd.Grouper(freq = 'MS')).sum()

            # Drop the months in the volumes which have less than the minimum amount of trading days
            for month in monthly_trading_days.index:

                num_of_days = monthly_trading_days.loc[month]

                if num_of_days < minimum_days:
                    monthly_volume.drop(month, inplace = True)

            # If the monthly volume requirement is met, add it to the list
            if monthly_volume.mean() >= minimum_avg_volume:
                valid_tickers.append(stock)

            else:
                print(f"{stock} did not meet the volume requirements.")

        else:
            print(f"{stock} is not in a Canadian or U.S. stock market.")

    except:
        print(f"{stock} is not valid.")

for ticker in raw_tickers:
    validate_stock(ticker)

print(f'Valid Tickers: {valid_tickers}')

## Processing Close Prices of Stocks

The close prices for the valid tickers are added to a common dataframe. Since our investment is in CAD, the stocks denominated in USD will be converted to CAD using the exchange rate. This data will be used to calculate the historical percent returns of the stocks.

In [None]:
# Add close prices to a dataframe

# Consumes a list of stock tickers and an empty dataframe and produces a dataframe of the stocks' close prices
def add_close_prices(valid_tickers, close_price_data):

    for ticker in valid_tickers:
        close_price_data[ticker] = yf.Ticker(ticker).history(start = start_date, end = end_date, interval = '1d')['Close']

    return close_price_data

In [None]:
# Convert close prices of stocks denominated in USD to CAD

# Get USD -> CAD exchange data
exchange_close_data = pd.DataFrame(yf.Ticker("USDCAD=x").history(start = start_date, end = end_date, interval = '1d')['Close'])
exchange_close_data.rename(columns = {'Close':'1 USD = ? CAD'}, inplace = True)

# Remove timestamp from indices and retain date
# Enables comparison between the close price and exchange dataframes since the indices are in the same format
exchange_close_data.index = exchange_close_data.index.strftime('%Y-%m-%d')

# Consumes the close prices and exchange dataframes
# Returns the close prices dataframe with USD converted to CAD where applicable
def convert_USDtoCAD(close_price_data, exchange_close_data):

    for col in close_price_data.columns:

        stock_info = yf.Ticker(col).fast_info

        # Check if the stock is denominated in USD and convert to CAD if so
        if (stock_info['currency'] == 'USD'):
            close_price_data[col] = close_price_data[col] * exchange_close_data['1 USD = ? CAD']

    return close_price_data

In [None]:
# Create a dataframe of the adjusted close prices (USD -> CAD)

# Consumes a list of stock tickers and exchange dataframe and produces a dataframe of the stocks' adjusted close prices
def create_close_price_dataframe(valid_tickers, exchange_close_data):

    close_price_data = pd.DataFrame()

    # Add raw close prices
    close_price_data = add_close_prices(valid_tickers, close_price_data)

    # Normalize index to index of exchange dataframe (same type)
    close_price_data.index = close_price_data.index.strftime('%Y-%m-%d')

    # Extract exchange data for ONLY trading days
    exchange_close_data = exchange_close_data[exchange_close_data.index.isin(close_price_data.index)]

    # Convert stocks denominated in USD to CAD
    convert_USDtoCAD(close_price_data, exchange_close_data)

    return close_price_data

In [None]:
# Generate STOCK CLOSE PRICES dataframe and display
close_price_data = create_close_price_dataframe(valid_tickers, exchange_close_data)
close_price_data.head()

## Processing Percent Returns of Stocks

Using the close prices, the percent returns of each stock are added to a common dataframe to later calculate correlation.

In [None]:
# Create a dataframe of percent returns

# Consumes the close price dataframe and produces a dataframe of the stocks' adjusted close prices
def create_stock_return_dataframe(close_price_data):

    stock_return_data = pd.DataFrame()

    for stock in close_price_data.columns:

      # Calculate percent return of stock and add to dataframe
      stock_return_data[stock] = close_price_data[stock].pct_change()

    # Drop day one (NaN value since previous value not in date range to compare)
    stock_return_data.drop(index = stock_return_data.index[0], inplace = True)

    return stock_return_data

In [None]:
# Generate STOCK RETURN dataframe and display
stock_return_data = create_stock_return_dataframe(close_price_data)
stock_return_data.head()

## Optimize Weightings

We started from first principles and determined that in order to maximize volatility, we should maximize the amount invested into the most volatile stock and progress down the list. After consulting with past CFM 101 students, we arrived at the following code to calculate our weights and comply with the requirements to cap a stock at 25% of the portfolio and have a minimum of 100/2n where n is the number of stocks (n = 10).

In [None]:
# Calculate optimal weightings for 10 stocks in order of volatility

max_weight = 20
min_weight = 100 / (2 * num_stocks)

# Weight already used for prior stocks in list
total_weight = 0

# Empty list to store the optimal weights
optimal_weights_lst = []

# Number of stocks left after one subtracted every time loop runs
stocks_left = num_stocks

while (stocks_left != 0):

    # Weight of next stock on list
    next_weight = min_weight * (stocks_left - 1)
    # Weight left after next weighting is applied
    rem_weight = 100 - next_weight - total_weight

    if (rem_weight > max_weight):
        optimal_weights_lst.append(max_weight / 100)
        total_weight += max_weight
        
    elif (rem_weight > min_weight):
        optimal_weights_lst.append(rem_weight / 100)
        total_weight += rem_weight
        
    else: # rem_weight <= min_weight
        optimal_weights_lst.append(min_weight / 100)
        total_weight += min_weight

    stocks_left -= 1

print(optimal_weights_lst)

## Strategy 0

To select the 10 most volatile stocks, a brute force solution is not possible. The number of possibilities would be x choose 10 with x being the number of candidate stocks. For example, even a relatively small x-value of 50 would lead to the number of possibilities spanning over 12 billion stock combinations. This is O(n^(x-10)) time complexity, reasonable with a dedicated computer for running the calculations, but impractical for the purposes of this assignment. Even looping from 0 to 12 billion was a lengthy computation for Python.

Hence, we shifted our focus to finding a heuristic solution or a strategy not guaranteed to find the optimal solution but a satisfactory one in a reasonable amount of time.

## Strategy 1

This strategy is built on the assumption that given Stock A, all other stocks that demonstrate a high correlation to Stock A have a high correlation amongst themselves.

We consider this to be a fair assumption. In an ideal world, we would invest in a singular stock, but that is not possible based on the requierements for this assignment. Thus, we want all 10 of our stocks to be very similar to carbon copies of each other. Once we select one of the stocks, then create a portfolio with the other stock, since they are carbon copies, the portfolio will not change. To reiterate, comparing the third carbon copy stock and so on would not result in a change.

In [None]:
# Create a dataframe of correlations between 2 stocks or correlations matrix

# Consumes the stock return dataframe and produces a correlations matrix
def create_stock_correl_dataframe(stock_return_data):

    correl_data = pd.DataFrame()

    # Calculate correlation using the pearson method
    # Source: https://www.geeksforgeeks.org/python-pandas-dataframe-corr/
    correl_data = stock_return_data.corr(method = "pearson")

    return correl_data

In [None]:
# Generate CORRELATIONS MATRIX dataframe and display
stock_correl_data = create_stock_correl_dataframe(stock_return_data)
stock_correl_data.head()

In [None]:
# Calculate the average of the negative correlation values
# Large magnitude (close to -1) would indicate that the bottom 10 values should also be considered

neg_num_lst = []

for col in stock_correl_data:
    for val in stock_correl_data[col]:
        # Get negative correlation values and add them to empty list
        if (val < 0):
            neg_num_lst.append(val)

np.mean(neg_num_lst)

We do recognize that stocks may be negatively correlated to one another and hence the bottom 10 stocks may consist a riskier portfolio than the top 10. However, the probability of this occurring is slim (as seen by the extremely low average of the negative correlation values for the randomized stocks given in the CSV file). As a result, an assumption has been made that we can also expect randomized stocks during the simulation and hence the redundant implementation has been omitted.

This assumption carries over to other correlations calculated in subsequent strategies.

In [None]:
# Select a list of 9 stocks that are best correlated to a singular stock

max_sum = 0
top10_dict = {}

for stock in stock_correl_data.columns:

    # Get the sum of the top 10 correlation values to each stock
    # Subtract one to get rid of the correlation value = 1 with itself
    current_col_sum = stock_correl_data[stock].nlargest(10).sum() - 1

    # If current sum is larger than the max so far
    if current_col_sum > max_sum:
        max_sum = current_col_sum
        # Add the top 10 correlation values for the stock to a list
        top10_dict = stock_correl_data[stock].nlargest(10)

In [None]:
top10_df = pd.DataFrame(top10_dict.items(), columns = ["Ticker", "Correlation"])
top10_df

In [None]:
# Final tickers from Strategy 1
s1_tickers = list(top10_df["Ticker"])
print(s1_tickers)

## Strategy 2

This strategy uses dynamic programming. The problem at hand is finding the stock that correlates the most to the current portfolio, adding it to the portfolio, and then finding the next stock that correlates with the portfolio that was just created.

This does not work on all data sets, because it is possible for the algorithm to get trapped in a local minimum. For example, if there were 5 stocks, where 2 of them are very closely correlated and the 3 other stocks are not, the model would dodge choosing the 3 stocks even if they are better correlated than the 2 very correlated stocks and a third non-correlated stock.

Despite this limitation, we thought that this could be a reasonable approach because in the real world, the correlations between stocks would be more random and it would be unlikely to have a single pair of closely correlated stocks with the rest moving in the opposite direction.

As depicted in the graph near the end of this file, this algorithm did not generate the most volatile portfolio for 10 stocks. Yet, it would be fascinating to use this strategy when a portfolio is scaled up (more stocks in the mix). The time complexity is O(n^2) so it will be a computationally viable option.

In [None]:
# Find pair with the highest correlation
# Combine pair into one portfolio
# Find the next stock, by taking the one with the most correlation to the portfolio

max_corr = 0

pair = []

stockPortfolio = pd.DataFrame()

for stock in stock_correl_data.columns:
    
    current_col_sum = stock_correl_data[stock].nlargest(2).sum() - 1
    
    if current_col_sum > max_corr:
        max_corr = current_col_sum
        pair = [stock, stock_correl_data[stock].nlargest(2).index[1]]

# Test code
# print(max_corr)
# print(pair)

stocks = stock_return_data

startingPortfolio = stocks[pair[0]] * 0.5 + stocks[pair[1]] * 0.5

curAssetDict = {}
curAssetDict[pair[0]] = True
curAssetDict[pair[1]] = True

def findMax(stocks, curPortfolio, numAssets, curAssetDict):

    if numAssets == 10:
        return curPortfolio

    runningMax = 0

    maxCandidate = pd.DataFrame()

    for stock in stock_return_data.columns:

        if stock in curAssetDict:
            continue

        correlation = stock_return_data[stock].corr(curPortfolio)

        if correlation > runningMax:
            runningMax = correlation
            maxCandidate = stock_return_data[stock]

    # Create equally-weighted portfolio
    curPortfolio = curPortfolio * (numAssets / (numAssets + 1)) + (1 / (numAssets+1)) * maxCandidate

    curAssetDict[maxCandidate.name] = True

    findMax(stocks,curPortfolio, numAssets + 1, curAssetDict)

findMax(stocks, startingPortfolio, 2, curAssetDict)

In [None]:
# Final tickers from Strategy 2
s2_tickers = list(curAssetDict)
print(s2_tickers)

## Strategy 3

We can also determine which stocks move in the same direction by comparing them to the movement of an market index. In this model, the 10 stocks that are most correlated to the market in terms of percent returns will constitute the portfolio.

Due to the larger representation of stocks denominated in USD compared to CAD in the given CSV file, the assumption that the simulation file will follow a similar format was made. Accordingly, we decided to use the S&P500 index due to its reputation of being one of the "best gauges of large U.S. stocks" due to its depth and diversity (according to https://www.investopedia.com/terms/s/sp500.asp).

In [None]:
# Create a dataframe of the chosen market's returns

# Consumes a market index ticker and produces a dataframe of market percent returns
def create_market_return_dataframe(m_index):

    market_return_data = m_index.history(start = start_date, end = end_date, interval = '1d')['Close'].pct_change()

    market_return_data.index = market_return_data.index.strftime('%Y-%m-%d')

    market_return_data.drop(index = market_return_data.index[0], inplace = True)

    return market_return_data

# Create a dataframe of correlations between each stock and the chosen market

# Consumes the market return and stock return dataframes
# Produces a dataframe of their correlation values
def create_market_correl_dataframe(market_return_data, stock_return_data):

    market_correl_data = pd.DataFrame({"Ticker": stock_return_data.columns})
    corr_list = []

    for col in stock_return_data.columns:

        corr_list.append(stock_return_data[col].corr(market_return_data))

    market_correl_data["Correlation"]  = corr_list
    # Sort values from highest to lowest
    market_correl_data.sort_values(by = ['Correlation'], ascending = False, ignore_index = True, inplace = True)

    return market_correl_data

In [None]:
# Select S&P500 as market index
sp500 = yf.Ticker('^GSPC')

market_return_data = create_market_return_dataframe(sp500)
market_correl_data = create_market_correl_dataframe(market_return_data, stock_return_data)

top10_df2 = market_correl_data.head(10)
top10_df2

In [None]:
# Final tickers from Strategy 3
s3_tickers = list(top10_df2["Ticker"])
print(s3_tickers)

## Strategy 4

This approach utilizes beta to determine the volatility of stocks in a given market. While correlation measures the tendency of 2 stocks to move in the same direction, it does not account for the relative size of these directional moves; is one stock losing or gaining value more than the other? Beta, on the other hand, accounts for both direction and relative volatility and can therefore be more insightful.

As with Strategy 3, we will be determining the 10 stocks with the largest beta values, which indicate that they are more volatile relative to the market. To calculate beta, we will be using the following formula, where the co-variance of the stock's return and market's return is divided by the variance of the market returns.

\begin{align*}
\beta=\frac{COV(x_i,r_M)}{\sigma^2(r_M)}
\end{align*}


In [None]:
# Create dataframe of the beta of each stock in relation to the chosen market

# Consumes the market return and stock return dataframes
# Produces a dataframe of the stocks' beta values
def create_beta_dataframe(market_return_data, stock_return_data):
    
    # Calculate the variance of the market returns
    market_var = market_return_data.var()
    
    beta_data = pd.DataFrame({"Ticker": stock_return_data.columns})
    
    beta_list = []

    for col in stock_return_data.columns:
        
        # Calculate beta using formula
        beta_list.append((stock_return_data[col].cov(market_return_data)) / market_var)

    beta_data["Beta"]  = beta_list
    
    beta_data.sort_values(by = ['Beta'], ascending = False, ignore_index = True, inplace = True)

    return beta_data

In [None]:
beta_data = create_beta_dataframe(market_return_data, stock_return_data)

top10_df3 = beta_data.head(10)
top10_df3

SHOP.TO has the highest beta, but can be considered an outlier due to the unique circumstances surrounding the business. In addition to being a Canadian stock which may not be represented well by the S&P500, the company's valuation has also been on the decline leading to significant fluctuations in stock prices due to constant lay-offs.

Regardless, this strategy is the top contender as it is capable of detecting SHOP.TO, which is well-known as a high-risk stock (which the previous strategies overlooked).

In [None]:
# Final tickers from Strategy 4
s4_tickers = list(top10_df3["Ticker"])
s4_tickers

## Strategy 5

This strategy takes Strategy 3, where we found the 10 most correlated stocks to the S&P500, and combines it with Strategy 4, where we found the 10 stocks with the highest beta. By ensuring that the stocks are correlated with the S&P500 and are weighted by their volatility via beta, we aim to build a portfolio that not only moves in the same direction but the stocks with larger magnitudes of change get larger weights in the investment.

In [None]:
# Get the return data of the 10 most correlated stocks from Strategy 3
# Calculate their betas and order them from greatest to least

correlated_stocks_w_beta = create_beta_dataframe(market_return_data, create_stock_return_dataframe(create_close_price_dataframe(s3_tickers, exchange_close_data)))
correlated_stocks_w_beta

In [None]:
# Final tickers from Strategy 5
s5_tickers = correlated_stocks_w_beta.Ticker.tolist()
print(s5_tickers)

## Portfolio Generation

Next up is generating portfolios for all our strategies which should contain:

* Ticker
* Price
* Currency
* Shares
* Value
* Weight

In [None]:
# Consumes the list of stocks generated by a strategy, investment, and date, and produces a portfolio dataframe
def generate_portfolio(tickers, investment, date):

    portfolio = pd.DataFrame()

    shares_in_each = []
    price_of_each = []
    investment_lst = []
    currency_lst = []

    for i in range(len(tickers)):

        investment_lst.append((optimal_weights_lst[i] * investment))
        shares_in_each.append(investment_lst[i] / close_price_data.loc[date, tickers[i]])
        price_of_each.append(close_price_data.loc[date , tickers[i]])
        
        stock_info = yf.Ticker(tickers[i]).fast_info
        if (stock_info['currency'] == 'USD'):
            currency_lst.append('USD')
        else:
            currency_lst.append('CAD')

    portfolio['Ticker'] = tickers
    portfolio['Price'] = price_of_each
    portfolio['Currency'] = currency_lst
    portfolio['Shares'] = shares_in_each
    portfolio['Value'] = investment_lst
    portfolio['Weight'] = optimal_weights_lst

    portfolio.index += 1

    return portfolio

In [None]:
# Portfolio for Strategy 1
portfolio1 = generate_portfolio(s1_tickers, investment, start_date)
portfolio1

In [None]:
# Portfolio for Strategy 2
portfolio2 = generate_portfolio(s2_tickers, investment, start_date)
portfolio2

In [None]:
# Portfolio for Strategy 3
portfolio3 = generate_portfolio(s3_tickers, investment, start_date)
portfolio3

In [None]:
# Portfolio for Strategy 4
portfolio4 = generate_portfolio(s4_tickers, investment, start_date)
portfolio4

In [None]:
# Portfolio for Strategy 5
portfolio5 = generate_portfolio(s5_tickers, investment, start_date)
portfolio5

## Comparing Strategies

Using the portfolios from above, we next generated dataframes to track the growth of the investment value over time (date range given by the start and end dates specified as global constants) if we had invested at a past date. By analyzing this data using visual (graph) and statistical (standard deviation) metrics, we can discover the portfolio with the greatest volatility and ultimately risk; this will be the final portfolio we choose for the simulation.

In [None]:
# Calculate total shares and value of each portfolio over an interval

# Consumes the list of stocks generated by a strategy, close price data dataframe, and the strategy's portfolio
# Produces dataframe of historical test data if the investment had been made at an earlier date with the given strategy
def create_test_portfolio(tickers, close_price_data, portfolio_df):

    test_portfolio_df = pd.DataFrame()
    i = 1 # Starting index of stock in portfolio dataframe

    for t in tickers:
        for col in close_price_data:
            if t == col:
                test_portfolio_df[col] = close_price_data[col] * portfolio_df["Shares"][i]
                i += 1

    test_portfolio_df["Total Value"] =  test_portfolio_df.sum(axis = 1)

    return test_portfolio_df

In [None]:
# Test Portfolio for Strategy 1
test_portfolio1 = create_test_portfolio(s1_tickers, close_price_data, portfolio1)
test_portfolio1.head()

In [None]:
# Test Portfolio for Strategy 2
test_portfolio2 = create_test_portfolio(s2_tickers, close_price_data, portfolio2)
test_portfolio2.head()

In [None]:
# Test Portfolio for Strategy 3
test_portfolio3 = create_test_portfolio(s3_tickers, close_price_data, portfolio3)
test_portfolio3.head()

In [None]:
# Test Portfolio for Strategy 4
test_portfolio4 = create_test_portfolio(s4_tickers, close_price_data, portfolio4)
test_portfolio4.head()

In [None]:
# Test Portfolio for Strategy 5
test_portfolio5 = create_test_portfolio(s5_tickers, close_price_data, portfolio5)
test_portfolio5.head()

In [None]:
# Graph the test portfolios

plt.figure(figsize = (30, 5))

# Strategy 1
plt.plot(test_portfolio1.index, test_portfolio1['Total Value'], label='S1: Correlation Between 2 Stocks Portfolio')

# Strategy 2
plt.plot(test_portfolio2.index, test_portfolio2['Total Value'], label='S2: DP Portfolio')

# Strategy 3
plt.plot(test_portfolio3.index, test_portfolio3['Total Value'], label='S3: Correlation Between Stock and S&P500 Portfolio')

# Strategy 4
plt.plot(test_portfolio4.index, test_portfolio4['Total Value'], label = 'S4: Beta Portfolio')

# Strategy 5
plt.plot(test_portfolio5.index, test_portfolio5['Total Value'], label = 'S5: Correlation Between Stock and S&P500, Weighted by Beta')

locator = mdate.MonthLocator()
plt.gca().xaxis.set_major_locator(locator)

plt.title('Comparison of Portfolios')
plt.xlabel('Dates')
plt.ylabel('Value')

plt.legend()
plt.show()

In [None]:
# Calculate the standard deviation of each portfolio's percent returns

test_portfolios = [test_portfolio1, test_portfolio2, test_portfolio3, test_portfolio4, test_portfolio5]

# Consumes a test portfolio
# Returns the standard deviations of the percent returns of the total values of investment over time
def get_test_percent_returns(test_portfolio):

    # Create a dataframe with the portfolio's total value
    pct_returns = pd.DataFrame()

    pct_returns['Value'] = portfolio['Total Value']

    # Add a column for percent returns of the total value
    pct_returns['% Returns'] = portfolio['Total Value'].pct_change()

    pct_returns.drop(index = pct_returns.index[0], inplace = True)

    # Calculate standard deviation of percent returns
    std_dev_of_returns = pct_returns['% Returns'].std()

    return std_dev_of_returns

pNum = 0

for portfolio in test_portfolios:
    pNum += 1
    print(f'Test Portfolio {pNum} Standard Deviation of % Returns: {(get_test_percent_returns(portfolio))}')

## Final Strategy: Strategy 4: Beta

It is clearly evident from the graph that Strategy 4 had the highest volatility, dropping in value the most overall from the initial investment and displaying several abrupt dips, despite trending in the same direction as the other stocks. This drop in value can be quantified using the absolute value of standard deviation(can also use variance) with a higher standard deviation indicating greater volatility and hence risk. Strategy 4 also resulted in the highest standard deviation of 0.0386, further cementing it as the "riskiest" strategy of the five tested.

## Final Portfolio Creation

In [None]:
# Our final portfolio, using strategy 4
# Need to regenerate with "2023-11-25"
Portfolio_Final = generate_portfolio(s4_tickers, investment, end_date)

# Verifying the total and sum of weights
final_portfolio_total = round(Portfolio_Final.Value.sum() + (flat_fee * num_stocks))
final_portfolio_weights = round(portfolio4.Weight.sum())

print(f'The total value of the portfolio is ${final_portfolio_total}')
print(f'The sum of the weights is: {final_portfolio_weights}')

In [None]:
Portfolio_Final

In [None]:
# Create the "Stock_Final" dataframe
Stocks_Final = pd.DataFrame()
Stocks_Final['Ticker'] = Portfolio_Final['Ticker']
Stocks_Final['Shares'] = Portfolio_Final['Shares']

# Output the final stocks and the number of shares to a CSV
Stocks_Final.to_csv("Stocks_Group_7.csv")

## Contribution Declaration

The following team members made a meaningful contribution to this assignment:

Ashton

Bodhana

Johnson