In [1]:
#These are the libraries you can use.  You may add any libraries directy related to threading if this is a direction
#you wish to go (this is not from the course, so it's entirely on you if you wish to use threading).  Any
#further libraries you wish to use you must email me, james@uwaterloo.ca, for permission.

from IPython.display import display, Math, Latex

import pandas as pd
import numpy as np
import numpy_financial as npf
import yfinance as yf
import matplotlib.pyplot as plt
import random
from datetime import datetime
import datetime
from scipy.optimize import minimize
import time
from math import comb

## Group Assignment
### Team Number: 12
### Team Member Names: Sharuga, Derek, Alex
### Team Strategy Chosen: Market Meet

Disclose any use of AI for this assignment below (detail where and how you used it).  Please see the course outline for acceptable uses of AI.

#### Strategy
We chose the safe option as our strategy. Our strategy makes use of two main principles in portfolio management. The first is that given stocks with low correlation, their aggregate has an overall lower standard deviation to its mean. This is volatility inherent to each stock, better known as non-systematic risk, “cancels out” each other. The second principle is that given a set of stock weightings, there exists a set of weights (within minimum/maximum bounds) that creates an expected return that exactly matches the market’s return. Given the weights to achieve that expected return, a tracking error can be generated. 
Generating low-correlated portfolios is done as follows: we know that low correlation comes in pairs, so given a dataframe we pick pairs of tickers with the lowest correlations, removing those tickers from the correlation matrix when they are selected. This generates a list of 12 pairs of low-correlated stocks. We then pick 5 random combinations for each of 6, 7, 8, 9, 10, and 11 pairs, and one combination for 12 pairs. This gives us a healthy pool of portfolios of different sizes to work with, while not being too computationally intensive because it results in a maximum of 31 (5 samples * 6 portfolio sizes + 1) portfolios to test from.
We’re looking to compare a variety of low-correlation portfolios optimized to minimize tracking error compared to the market. Then, because those portfolio weights changed, the expected mean return of the portfolio also changed. Now, we compare the difference between the portfolio’s expected mean and the market mean and look for the smallest discrepancy. That gives us a low-correlation portfolio (indicating generally low portfolio standard deviation) optimized for the lowest tracking error, as well as the closest average return to the market mean. With these factors in mind, we meet the market the best and return that portfolio.


#### Function Dictionary
validity(tickers) takes a list of tickers and filters for all the valid tickers in the list.
<HELPERS>
> check_delist(ticker) returns true if the ticker is delisted and false otherwise.

> check_currency(ticker) returns true if the currency of the ticker is either in CAD or USD.

> check_volume(ticker) returns true if the stock has an average monthly volume of 100 000 shares, false otherwise.

> check_first_day(ticker) returns true if the stock has a listed value on the first day. False otherwise.

> check_last_day(ticker) returns true if the stock has a listed value on the last day. False otherwise.

> check_target_day(ticker) returns true if the stock has a listed value on the target day. False otherwise.

portfolio_generator(tickers) takes a portfolio representing a list of tickers, 
fetches the close price of the stocks on November 22 to be the buy price for each stock on the ticker,
and given a portfolio made up of those tickers, the function optimizes portfolio weights so that the mean
expected return of the market is met. All fees are taken into account. It will then spit out all the
tickers created with their respective weights, and chooses the portfolio with the lowest standard deviation 
to be our final choice of tickers.

days_to_drop(markets) takes a dataframe of stock returns, and drops values in any month that doesn't have 18 or more trading days.

std_rank(returns) takes a dataframe of stock returns, and returns the 24 tickers of that stock return dataframe with the lowest standard deviation of percent returns.

yfin_pull_convert_USD(ticker,start,end) takes a ticker string and returns the stock returns of that ticker between the start and end date, converted to CAD if listed in USD.

gen_tickers(tlist,ilist) takes a list of tickers and a list of index tickers, generating the exchange-rate-adjusted stock values over a time period of all the tickers.

convert_pct_returns(stock_value_df) takes a dataframe of stock values and converts them into percent returns, interpolating anything in between.

correl(data, dropvalue) takes a dataframe of percent returns, drops the columns specified in dropvalue (going to be the index tickers) and creates a correlation matrix based on the remaining columns.

low_correl_ticker_pairs(correlation matrix) takes a correlation matrix and chooses the lowest correlation pair of tickers.

corr_pair_extract(correl_matrix) takes a correlation matrix and continuously runs low_correl_ticker_pairs, removing the ticker pairs from the matrix along the way. 

make_port_list(stock_pairs) takes a list of stock pairs and randomly chooses 5 samples each of 6-12 (or however many pairs are available) as final portfolios.

portfolio_generator() optimizes each of the portfolios generated from make_port_list for lowest tracking error. This is so we can then compare the mean returns and find the portfolio with the lowest difference between its mean return and the market mean return. USED AI.

In [2]:
#October 01, 2023 to September 30, 2024, 2023

START_DATE = pd.to_datetime('2023-10-01')
END_DATE = pd.to_datetime('2024-10-01')

# date on which we will take stock Close prices
TARGET_DATE = pd.to_datetime('2024-11-22')

INVESTMENT = 1e6

CSV_FILE = "Tickers_Example.csv"

#Arbitrarily assign a seed to keep results consistent
random.seed(123)

In [3]:
# read in CSV
csv = pd.read_csv(CSV_FILE, header=None)

In [4]:
# filter out invalid stocks
def validity(tickers):

    # list to keep all of the valid stocks
    final_list = []

    for ticker in tickers:
        #only append the stock to the final stock list if it is isn't delisted, has a US or CAD currency, and has the required monthly share volume
        if (not check_delist(ticker) and
            check_currency(ticker) and
            check_first_day(ticker) and
            check_last_day(ticker) and
            check_target_day(ticker) and
            check_volume(ticker)):
                final_list.append(ticker)
    return final_list

# checks if the stock exists on the last day of the period
def check_target_day(ticker):
    stock = yf.Ticker(str(ticker))
    data = stock.history(start=TARGET_DATE, period='1d')
    # time.sleep(0.3)
    if data.empty:
        return False
    return True

# checks if the stock exists on the target day
def check_last_day(ticker):
    stock = yf.Ticker(str(ticker))
    data = stock.history(start=END_DATE-datetime.timedelta(days=1), period='1d')
    # time.sleep(0.3)
    if data.empty:
        return False
    return True

# checks if the stock exists on the first day of the period
def check_first_day(ticker):
    stock = yf.Ticker(str(ticker))
    data = stock.history(start=START_DATE, period='1d')
    # time.sleep(0.3)
    if data.empty:
        return False
    return True

# checks if the consumed Ticker is delisted
def check_delist(ticker):
    stock = yf.Ticker(str(ticker))
    try:
        data = stock.history(period='1d')
        # time.sleep(0.3)
        if data.empty:
            #if we can't find any data on the stock, it's delisted
            return True
        else:
            #check that there is actually valid market data for this stock
            if 'Close' not in data.columns or data['Close'].isnull().all():
                return True
            return False
    except Exception as e:
        #if there is an error in finding the stock's data, we can assume that it's delisted
        return True

# checks if the consumed Ticker meets the requirement of an average monthly share volume of 100,000 shares
def check_volume(ticker):
    volume = yf.Ticker(str(ticker)).history(start=START_DATE, end=END_DATE)['Volume']
    time.sleep(0.3)
    avg_monthly_volume = volume.resample('ME').mean()
    return avg_monthly_volume.mean() >= 100000

# checks whether or not the consumed Ticker is listed in CAD or in USD
def check_currency(ticker):
    stock = yf.Ticker(str(ticker))
    currency = stock.fast_info.get('currency')
    return currency in ['USD', 'CAD']

# create a new DataFrame only containing the valid Tickers in the consumed CSV
tickers_raw = validity(csv[0])

$AGN: possibly delisted; no price data found  (period=1d) (Yahoo error = "No data found, symbol may be delisted")
$CELG: possibly delisted; no price data found  (period=1d) (Yahoo error = "No data found, symbol may be delisted")
$MON: possibly delisted; no price data found  (period=1d) (Yahoo error = "No data found, symbol may be delisted")
$RTN: possibly delisted; no price data found  (period=1d) (Yahoo error = "No data found, symbol may be delisted")


In [5]:
# finds all the days that are only on one of the markets
def days_to_drop(markets):

    markets.dropna(inplace=True)
    months = markets.resample('MS').bfill().index
    
    dates = []

    for month in months:
        temp = markets[(markets.index.month==month.month) & (markets.index.year==month.year)].index
        if len(temp) < 18:
            dates.extend(temp)
    
    return dates

In [6]:
tickers_raw_df = pd.DataFrame()

# calculate and store all returns for all all Tickers
for ticker in tickers_raw:
    data = yf.Ticker(ticker).history(start=START_DATE, end=END_DATE)['Close'].pct_change()
    time.sleep(0.3)
    data.drop(index=data.index[0], inplace=True)
    tickers_raw_df[ticker] = data  # this code calculates daily returns

tickers_raw_df.index = pd.to_datetime(tickers_raw_df.index.strftime("%Y-%m-%d"))

# tickers_raw_df

In [7]:
def std_rank(returns):
    std_devs = returns.std()
    
    # Create a DataFrame with tickers and their respective standard deviations
    ranked_stocks = pd.DataFrame({
        "ticker": std_devs.index,
        "STD": std_devs.values
    })
    
    # Sort by standard deviation in ascending order (lowest STD at the top)
    ranked_stocks = ranked_stocks.sort_values(by="STD", ascending=True)
    
    return ranked_stocks['ticker'].values[:24].tolist()

ticker_lst = std_rank(tickers_raw_df)

In [8]:
index_lst = ["XIU.TO","^GSPC"]

exch_rate = yf.download("CADUSD=x",start=TARGET_DATE, period='1d')["Close"].iloc[0] #MUST BE CHANGED TO PULL NOVEMBER 22ND

#Takes a yf.Ticker "ticker", checks if the ticker is in canadian dollars. If so, it will just do a simple API pull for the price history.
#If the price is listed in USD, a conversion operation will simply be applied to each price.
def yfin_pull_convert_USD(ticker, start, end):
    listed_currency = ticker.fast_info.get('currency')
    if listed_currency == "CAD":
        temp = ticker.history(start=start, end=end, interval="1d")["Close"]
        temp.index = pd.to_datetime(temp.index.strftime('%Y-%m-%d'))
        return temp
    elif listed_currency == "USD":
        temp = ticker.history(start=start, end=end, interval="1d")["Close"] * (1/exch_rate)
        temp.index = pd.to_datetime(temp.index.strftime('%Y-%m-%d'))
        return temp
    raise ValueError("Currency not in CAD or USD")

#Test cases
#display(yf.Ticker("NVDA").history(start=START_DATE, end=END_DATE, interval="1d")["Close"]
#yfin_pull_convert_USD(yf.Ticker("NVDA"))

#Main function to store the stock prices of the ticker
def gen_tickers(tlist, ilist):
    ret_dataframe = pd.DataFrame()
    check_month_dataframe = pd.DataFrame()
    for i_str in ilist:
        index_ticker = yf.Ticker(i_str)
        check_month_dataframe[i_str] = yfin_pull_convert_USD(index_ticker, start=START_DATE, end=END_DATE)
        ret_dataframe[i_str] = yfin_pull_convert_USD(index_ticker, start=START_DATE, end=END_DATE)
    
    for t_str in tlist:
        ticker = yf.Ticker(t_str)
        time.sleep(0.3)
        ret_dataframe[t_str] = yfin_pull_convert_USD(ticker, start=START_DATE, end=END_DATE)
    ret_dataframe.drop(days_to_drop(check_month_dataframe), inplace=True)
    return ret_dataframe

#Function call: stores a dataframe of index values and stock values
stock_values = gen_tickers(ticker_lst,index_lst)
stock_values.index = pd.to_datetime(stock_values.index.strftime('%Y-%m-%d'))
# display(stock_values)

[*********************100%***********************]  1 of 1 completed


In [9]:
market_label = "Market Returns"
#Function that transforms a list of values into a list of percent returns.
#Also adds a column of the simple average returns of the index tickers.
def convert_pct_returns(stock_value_df):
    ret_dataframe = stock_value_df.interpolate().pct_change()
    ret_dataframe.dropna(inplace=True)
    ret_dataframe[market_label] = ret_dataframe[index_lst].mean(axis=1)
    df_col_order = index_lst + ["Market Returns"] + ticker_lst
    ret_dataframe = ret_dataframe[df_col_order]
    return ret_dataframe

stock_returns = convert_pct_returns(stock_values)
# display(stock_returns)

In [10]:
#Creating the correlation matrix: takes a dataframe of values and a string of columns to drop to make the correlation matrix
def correl(data, dropvalue):
    data_marketdrop = data.drop(labels=dropvalue, axis=1)
    ret_corr = data_marketdrop.corr()
    return ret_corr

stock_correlations = correl(stock_returns, [market_label] + index_lst)
#Apply a format to the DataFrame
# display(stock_correlations.style.background_gradient(cmap='RdYlGn_r'))


In [11]:
#finding the ticker coordinates of the lowest correlation value (AI)
def low_correl_ticker_pairs(correlation_matrix):
    if correlation_matrix.shape[0] > 1:
        correl_pairs = correlation_matrix.unstack()
        correl_pairs = correl_pairs[correl_pairs.index.get_level_values(0) != correl_pairs.index.get_level_values(1)]
        lowest_corr_pair = correl_pairs.idxmin()
        lowest_corr_value = correl_pairs.min()
        return list(lowest_corr_pair)
    else:
        remaining_column_ticker = correlation_matrix.columns[0]
        return [remaining_column_ticker]

#recursing through the entire correlation matrix to extract every correlation pair
def corr_pair_extract(correl_matrix):
    #creating a list to store low-correlation pairs into
    low_corr_pairs = []
    while correl_matrix.shape[0] > 0:
        pair = low_correl_ticker_pairs(correl_matrix)
        #removing the tickers after they are packaged into a correlation pair
        if isinstance(pair, str):
            ticker1 = pair
            correl_matrix = correl_matrix.drop(ticker1, axis=0)
            correl_matrix = correl_matrix.drop(ticker1, axis=1)
        else: 
            low_corr_pairs.append(pair)
            correl_matrix = correl_matrix.drop(pair, axis=0)
            correl_matrix = correl_matrix.drop(pair, axis=1)
    return low_corr_pairs

#running the correlation pairing function and storing it in variable "stock pairs"
stock_pairs = corr_pair_extract(stock_correlations)

In [12]:
#This part creates lists of stocks, ranging from sizes 12 to 24, 5 of each size, which randomly selects stocks.
def make_port_list(stock_pairs):
    ret_list = []
    #iterates through the minimum selectable stock pairs (6) and the maximum (length of the stock pairs)
    for i in range(6, len(stock_pairs)+1):
        #simple iteration, of 5 times
        for j in range(min(comb(len(stock_pairs), i), 5)):
            #generates a random list of indexes to pick from the stock pairs
            rand_list = random.sample(range(0,len(stock_pairs)), i)
            portfolio = []
            for k in rand_list:
                portfolio += stock_pairs[k]
            ret_list.append(portfolio)
    return ret_list

portfolio_list = make_port_list(stock_pairs)

In [13]:
# Helps if the weights add up to one
def sum_weights(weights):
    return sum(weights) - 1

# Generates multiple portfolios out of 2d list of tickers given
def portfolio_generator():
    portfolio_data = []

    for portfolio in portfolio_list:
        tickers = portfolio

        # Evenly weigh each stock for now
        initial_weights = [1 / len(tickers)] * len(tickers)

        constraints = [{'type':'eq',
                        'fun': sum_weights}]

        # Sets upper and lower boundaries on the weighting
        bounds = [(1 / (2 * len(tickers)), 0.15)] * len(tickers)

        # Set the objective to minimize tracking error of the portfolio
        def objective(weights):
            return np.std(np.dot(stock_returns[tickers].values, weights) - stock_returns['Market Returns'].values)

        # Mimimize tracking error
        result = minimize(objective, initial_weights, constraints=constraints, bounds=bounds)

        if result.success:
            weights = result.x

            # Calculate the amount of money spent on each stock
            portfolio_value = INVESTMENT * weights
            
            # Calculate the amount of each share to purchase and store in an array
            shares = portfolio_value / [yfin_pull_convert_USD(yf.Ticker(ticker), start=TARGET_DATE, end=TARGET_DATE+datetime.timedelta(days=1)).values[0] for ticker in tickers]
            
            # Calculate the fees for each stock and store them in an array
            fees = np.minimum(3.95, 0.001 * shares)

            portfolio_value -= fees

            # Recalculate shares to purchase after subtracting the fees from the amount to invest into each stock
            shares = portfolio_value / [yfin_pull_convert_USD(yf.Ticker(ticker), start=TARGET_DATE, end=TARGET_DATE+datetime.timedelta(days=1)).values[0] for ticker in tickers]

            portfolio_data.append({
                'tickers': tickers,  # Tickers in the portfolio
                'weights': weights,  # Optimized weights
                'Tracking Error': result.fun,  # Standard deviation (risk)
                'Difference of Mean Returns': abs(np.dot(stock_returns[tickers].values, weights).mean() - stock_returns['Market Returns'].values.mean()),
                'num_stocks': len(portfolio),  # Number of stocks in the portfolio
                'total_cost': portfolio_value.sum() + sum(fees),  # Total cost after fees
                'shares_purchased': shares,  # Shares purchased
                'portfolio_value': portfolio_value.sum(),  # Total portfolio value after fees
                'final_fees': sum(fees)
            })

    return pd.DataFrame(portfolio_data)
            

portfolio_df = portfolio_generator()
# portfolio_df['weights'].iloc[0]

# display(portfolio_df)

In [14]:
temp = portfolio_df[portfolio_df['Difference of Mean Returns'] == portfolio_df['Difference of Mean Returns'].min()]


# Define columns for the final portfolio DataFrame
portfolio = []

for i in range(len(temp['tickers'].iloc[0])):
    price = yfin_pull_convert_USD(yf.Ticker(temp['tickers'].iloc[0][i]), start=TARGET_DATE, end=TARGET_DATE+datetime.timedelta(days=1)).values[0]
    portfolio.append({
        "Ticker": temp['tickers'].iloc[0][i],
        "Price": price,
        "Currency": yf.Ticker(temp['tickers'].iloc[0][i]).info['currency'],
        "Shares": temp['shares_purchased'].iloc[0][i],
        "Value": price * temp['shares_purchased'].iloc[0][i],
        "Weight": temp['weights'].iloc[0][i]})

Portfolio_Final = pd.DataFrame(portfolio)

# Reset index to start at 1
Portfolio_Final.index = Portfolio_Final.index + 1

total = (Portfolio_Final['Price'] * Portfolio_Final['Shares']).sum() + temp['final_fees'].values[0]
sum_weight = sum(Portfolio_Final['Weight']) * 100
print(f"Total value (with fees added): {total}")
print(f"Sum of Weights: {sum_weight}%")


display(Portfolio_Final)

Total value (with fees added): 1000000.0
Sum of Weights: 100.0%


Unnamed: 0,Ticker,Price,Currency,Shares,Value,Weight
1,T.TO,21.26,CAD,1469.827364,31248.530103,0.03125
2,BLK,1448.971003,USD,90.13662,130605.34867,0.130605
3,CL,132.698155,USD,235.495093,31249.764503,0.03125
4,AAPL,321.358248,USD,465.510554,149595.656221,0.149596
5,ABT,164.62848,USD,189.820195,31249.810179,0.03125
6,AIG,106.331875,USD,1138.015952,121007.369877,0.121009
7,ACN,501.406677,USD,103.433529,51862.262144,0.051862
8,PFE,35.858699,USD,871.451823,31249.128524,0.03125
9,LMT,758.023507,USD,41.22558,31249.958774,0.03125
10,MRK,138.653638,USD,225.380127,31249.774618,0.03125


In [15]:
Stocks_Final = Portfolio_Final[['Ticker', 'Shares']]
Stocks_Final.to_csv("Stocks_Group_12.csv", sep='\t', encoding='UTF-8')

## Contribution Declaration

The following team members made a meaningful contribution to this assignment:

Alex, Derek, & Sharuga