In [1]:
from IPython.display import display, Math, Latex

import pandas as pd
import numpy as np
import numpy_financial as npf
import yfinance as yf
import matplotlib.pyplot as plt
from datetime import datetime

## Group Assignment
### Team Number: 12
### Team Member Names: Bill Bai, Soumik Debnath, Justin Yu
### Team Strategy Chosen: Risky (RISKY OR SAFE)

Below, we define start and end dates to get ticker data in order to create our portfolio. We defined it from 2020-11-26 to 2021-11-26, or a 1 year interval.

The reason why we chose to base our portfolio off of 1 year of historical data is because we wanted a timeframe that wasn't too long that our short-term risky portfolio would capture "too much" irrelevant data - for example, we chose to use standard deviation as a factor in creating our portfolio and using too long of a timeframe might cause our code to choose a stock that used to be volatile, but is now "safe". We also didn't want to use too short of a time interval since our portfolio uses correlation to choose stocks that tend to trade in the same direction - using too short of a timeframe would capture stocks that may be correlated for a short interval by pure coincidence or by broader-market forces.

Thus, the perfect interval we found after backtesting our code many times on different timeframes and lists of tickers happened to be 1 year. 1 year perfectly captures the stocks with the most momentum to make our portfolio risky, but also ensures that the stocks in our portfolio are correlated for a reason, rather than by pure coincidence or market forces.

Later on in our code, we'll also explain why we used daily data in calculating the risk metrics for our portfolio, rather than weekly or monthly data.

In [2]:
# Start and end date to base our portfolio tickers off of
start_date = "2020-11-26"
end_date = "2021-11-26"

Next we'll load the given list of tickers to choose our portfolio from.

In [3]:
# Read in tickers file and save as DF
tickers_path = "./Tickers.csv"
tickers_df = pd.read_csv(tickers_path, header=None).rename(columns={0: 'Ticker'})

# NEW CODE: Get rid of duplicates in the tickers list.
tickers_df = tickers_df.drop_duplicates()

In [4]:
# Display tickers dataframe
tickers_df

Unnamed: 0,Ticker
0,AAPL
1,ABBV
2,LOW
3,AUST
4,HOOD
5,INVALIDTIC
6,AMZN
7,AXP
8,BAC
9,BMBL


Here, we create an info column in our tickers dataframe, where each cell contains the yf.Ticker(ticker).info dict for each ticker. We implemented this to increase efficiency in our code, since calling .info takes a few seconds for each call, and our list of tickers is very long. Thus, it is more efficient to save the info dict in memory rather than performing an API call each time we need a stock's info.

In [5]:
# Create an info cell for each ticker, saving time when filtering the tickers by volume and country.
info_column = {}

# Add info to dict
for idx, row in tickers_df.iterrows():
    ticker = row[0]  # Get ticker name
    ticker = yf.Ticker(ticker)
    info_column[idx] = ticker.info
    print('.', end='')  # So we can tell if the code is running

# Create column for the info]
tickers_df['Info'] = pd.Series(info_column)
tickers_df

.................................

Unnamed: 0,Ticker,Info
0,AAPL,"{'zip': '95014', 'sector': 'Technology', 'full..."
1,ABBV,"{'zip': '60064-6400', 'sector': 'Healthcare', ..."
2,LOW,"{'zip': '28117', 'sector': 'Consumer Cyclical'..."
3,AUST,"{'exchange': 'ASE', 'shortName': 'Austin Gold ..."
4,HOOD,"{'zip': '94025', 'sector': 'Technology', 'full..."
5,INVALIDTIC,"{'regularMarketPrice': None, 'logo_url': ''}"
6,AMZN,"{'zip': '98109-5210', 'sector': 'Consumer Cycl..."
7,AXP,"{'zip': '10285', 'sector': 'Financial Services..."
8,BAC,"{'zip': '28255', 'sector': 'Financial Services..."
9,BMBL,"{'zip': '78756-3706', 'sector': 'Technology', ..."


In the next two cells, we filter out stocks that are either delisted, have an average daily volume of less than 10,000, or aren't US-listed. This is per the requirements of the assignment.

We also create a dict called hist_dict that stores the yf.Ticker(ticker).history() dataframe for each ticker. This is also for efficiency since calling .history() takes times for Yahoo to return the data, so it's faster to just store the data in memory than to call it each time we use the data - which we do a lot of since we make many calculations for our portfolio.

In [6]:
# Dict to store the ticker history for each ticker.
hist_dict = {}


def filter_us_market(df):
    """
    Consumes a dataframe of tickers and returns a list of booleans representing whether the tickers are US-listed or not.
    :param df: DataFrame containing tickers
    :return: List of booleans representing whether the tickers are US-listed.
    """
    # Initialize mask list of booleans.
    mask = []
    for idx, row in df.iterrows():
        # Check whether stock is US-listed or not, and append the boolean to mask
        info = row['Info']
        if "market" in row['Info']:
            is_us_market = info['market'] == 'us_market'
            mask.append(is_us_market)
        else:
            mask.append(False)
    # Return mask
    return mask


def filter_volume(df):
    """
    Consumes a dataframe of tickers and returns a list of bo
    :param df:
    :return:
    """
    # Start and end date to check volume
    start = "2021-07-02"
    end = "2021-10-23"
    # Mask to filter out stocks
    mask = []
    for idx, row in df.iterrows():
        # Gets ticker history
        ticker = yf.Ticker(row[0])
        ticker_hist = ticker.history(start=start_date, end=end_date)
        # Gets subset of data to check volume
        volume_hist = ticker_hist.loc[
            (ticker_hist.index >= pd.to_datetime(start)) & (ticker_hist.index <= pd.to_datetime(end))]
        # Checks if average volume is not less than 10,0000
        valid_volume = volume_hist['Volume'].mean() >= 10000
        mask.append(valid_volume)
        if valid_volume:
            hist_dict[row[0]] = ticker_hist  # Adds the ticker history dataframe to hist_dict
    return mask

In [7]:
# Filter out non-US stocks / delisted stocks
tickers_df = tickers_df.loc[filter_us_market(tickers_df)]
# Filter out stocks without valid volume
tickers_df = tickers_df.loc[filter_volume(tickers_df)]

- AUST: No data found for this date range, symbol may be delisted


In [8]:
# Reset index on the filtered dataframe
tickers_df = tickers_df.reset_index(drop=True)
tickers_df

Unnamed: 0,Ticker,Info
0,AAPL,"{'zip': '95014', 'sector': 'Technology', 'full..."
1,ABBV,"{'zip': '60064-6400', 'sector': 'Healthcare', ..."
2,LOW,"{'zip': '28117', 'sector': 'Consumer Cyclical'..."
3,HOOD,"{'zip': '94025', 'sector': 'Technology', 'full..."
4,AMZN,"{'zip': '98109-5210', 'sector': 'Consumer Cycl..."
5,AXP,"{'zip': '10285', 'sector': 'Financial Services..."
6,BAC,"{'zip': '28255', 'sector': 'Financial Services..."
7,BMBL,"{'zip': '78756-3706', 'sector': 'Technology', ..."
8,BK,"{'zip': '10286', 'sector': 'Financial Services..."
9,SQ,"{'zip': '94103', 'sector': 'Technology', 'full..."


Next, we reformatted the data such that it contains ticker history closing prices for the 1 year interval. For this data, we kept it as daily data rather than resampling it to monthly or weekly data.

The reason why we chose to use daily data to calculate the metrics for our portfolio (such as STD and Correlation) is because we found that it produced the best results when backtesting our short-term risky portfolio. We decided against using monthly closing prices for our portfolio since using 1 year of historical data would only give us 12 closing prices to decide our portfolio off of - which is too little. We also tried using weekly data and decided against it since we found that it didn't capture the most short-term volatile stocks. Weekly data tended to give us stocks that were volatile by coincidence (such as news causing a sharp drop or rise in stock price), rather stocks that are intrinsically volatile (like oil stocks, which are volatile not by coincidence but because oil is a volatile asset). Thus, we found that daily data gave us the most intrinsically short-term volatile stocks to build our risky portfolio.

In [9]:
def create_df(df):
    """
    This function iterates through the filtered list of stocks and creates a new dict with weekly ticker closing prices
    :param df: filtered stocks DataFrame
    :return: dict containing ticker closing prices resampled to weekly data
    """
    dic = {}
    for i in range(len(df)):
        tick = df.iloc[i, 0]
        # get ticker history data for the ticker
        t_hist = hist_dict[tick]
        # resample data to
        dic[tick] = t_hist.Close
    return dic


#creates the dataframe price using the function create_df
prices = pd.DataFrame(create_df(tickers_df))
prices

Unnamed: 0_level_0,AAPL,ABBV,LOW,HOOD,AMZN,AXP,BAC,BMBL,BK,SQ,...,JPM,IBM,ORCL,OXY,DUOL,PEP,SLB,SO,SPG,PYPL
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2020-11-27,115.875648,100.089188,152.485626,,3195.340088,119.200394,28.402325,,38.898380,212.520004,...,118.250458,113.186646,56.883499,16.533474,,140.444046,21.465227,58.201691,81.473915,211.389999
2020-11-30,118.320587,99.793381,153.619385,,3168.040039,117.223442,27.589148,,38.090225,210.960007,...,114.992279,112.431152,56.844105,15.734756,,140.084671,20.395889,57.433983,78.635902,214.119995
2020-12-01,121.968102,99.278091,151.233551,,3220.080078,118.548004,28.108406,,38.421280,203.000000,...,116.806709,112.103485,57.848629,15.105765,,141.871796,20.795614,58.124916,80.940598,216.539993
2020-12-02,122.325890,100.184608,149.666016,,3203.530029,120.920341,28.470905,,38.937332,202.000000,...,119.050369,113.432411,58.183468,15.864549,,140.463470,21.358194,59.382034,83.092926,212.559998
2020-12-03,122.186752,99.946053,149.823746,,3186.729980,121.701241,28.500479,,39.063908,205.529999,...,118.269974,112.513077,58.370583,16.363747,,141.297852,21.782593,59.382034,85.359512,214.539993
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-11-18,157.869995,117.070000,247.380005,30.530001,3696.060059,176.210007,46.320000,37.970001,57.770000,230.350006,...,163.050003,116.660004,94.660004,30.639999,135.039993,163.419998,31.860001,61.630001,169.380005,200.500000
2021-11-19,160.550003,116.239998,249.520004,28.990000,3676.570068,173.539993,45.400002,36.380001,57.070000,225.139999,...,160.919998,116.050003,93.970001,29.120001,135.960007,163.809998,30.049999,62.669998,166.740005,193.610001
2021-11-22,161.020004,115.650002,252.350006,27.840000,3572.570068,170.899994,46.279999,34.299999,58.209999,211.309998,...,164.350006,116.470001,94.610001,30.010000,133.169998,164.149994,30.340000,63.119999,167.990005,189.479996
2021-11-23,161.410004,118.879997,251.910004,27.650000,3580.040039,170.850006,47.500000,34.389999,59.009998,210.550003,...,168.279999,116.790001,92.940002,31.920000,135.820007,165.250000,31.180000,63.009998,169.029999,188.050003


In the next 2 cells, we obtain the stock with the highest risk (calculated using standard deviation). This will be a key part in forming our portfolio, as explained in the next annotation cell.

In [10]:
def generate_std(df):
    """
    Creates a dict of tickers and their standard deviations for the given closing prices
    :param df: Filtered DataFrame of tickers
    :return: Dict of tickers and their STDs
    """
    column = df.columns
    stdlst_dic = {}
    # Calculate the STD for each ticker
    for i in range(len(column)):
        temp = df[str(column[i])].pct_change().std()
        stdlst_dic[str(column[i])] = temp
    return stdlst_dic


#creates the dataframe slst which is a dictionary containing the tickers and their std
slst = generate_std(prices)
slst

{'AAPL': 0.01558920547952168,
 'ABBV': 0.012687263377998066,
 'LOW': 0.01537129599705407,
 'HOOD': 0.07940157539843598,
 'AMZN': 0.015003883728443813,
 'AXP': 0.016166052773097673,
 'BAC': 0.01589839135192852,
 'BMBL': 0.04160042694409629,
 'BK': 0.015778104655159362,
 'SQ': 0.031094207967788237,
 'VZ': 0.008488789205665702,
 'CMCSA': 0.014581206859480432,
 'SHOP': 0.030291411439136825,
 'COST': 0.010578528488092568,
 'CSCO': 0.011399690529555834,
 'CVS': 0.013466679700475416,
 'GM': 0.023305596804548735,
 'GOOG': 0.014676421421111563,
 'JPM': 0.013588238474130954,
 'IBM': 0.014492199038743435,
 'ORCL': 0.014479663945859639,
 'OXY': 0.0383814592679368,
 'DUOL': 0.05012090444839221,
 'PEP': 0.008857794768454863,
 'SLB': 0.026175695300691495,
 'SO': 0.009968846479297102,
 'SPG': 0.020004639121205,
 'PYPL': 0.022811611093400495}

In [11]:
#function to find the highest std
def get_highest_std(dic):
    """
    Consumes a dictionary of tickers and their STDs and returns the ticker with the highest STD in the dict
    :param dic: Dict of tickers and their STDs
    :return: Ticker with the highest STD
    """
    std = 0
    tick = ""
    # Calculates the highest STD
    for i in dic:
        if std < dic[i]:
            std = dic[i]
            tick = i
    return tick


#checking what the highest std is
get_highest_std(slst)

'HOOD'

Next we'll generate a correlation matrix for all the stocks

In [12]:
#creating a correlation dataframe
corr_df = prices.corr()
corr_df

Unnamed: 0,AAPL,ABBV,LOW,HOOD,AMZN,AXP,BAC,BMBL,BK,SQ,...,JPM,IBM,ORCL,OXY,DUOL,PEP,SLB,SO,SPG,PYPL
AAPL,1.0,0.604065,0.729501,-0.430085,0.738855,0.637127,0.542324,-0.444024,0.688102,0.551181,...,0.486308,0.240625,0.777635,0.38629,-0.287407,0.835345,0.278684,0.575481,0.675865,0.222161
ABBV,0.604065,1.0,0.671638,-0.129397,0.587217,0.780943,0.698012,-0.725175,0.76811,0.235408,...,0.661266,0.624169,0.752798,0.495161,-0.763531,0.695487,0.631552,0.690341,0.742411,0.23614
LOW,0.729501,0.671638,1.0,-0.781479,0.595265,0.828198,0.898768,-0.571893,0.912022,0.307039,...,0.819064,0.425945,0.8777,0.698379,0.126201,0.814186,0.667312,0.693593,0.912982,0.002364
HOOD,-0.430085,-0.129397,-0.781479,1.0,-0.530936,-0.380785,-0.617421,0.389665,-0.542573,0.717148,...,-0.482237,0.696784,-0.465205,-0.536891,-0.102804,-0.698319,-0.518421,0.341571,-0.677509,0.713625
AMZN,0.738855,0.587217,0.595265,-0.530936,1.0,0.587539,0.425675,-0.396899,0.479693,0.378239,...,0.352018,0.354827,0.603717,0.343948,-0.029922,0.642862,0.326677,0.458268,0.560117,0.393595
AXP,0.637127,0.780943,0.828198,-0.380785,0.587539,1.0,0.919203,-0.713147,0.91717,0.357638,...,0.898568,0.699262,0.947063,0.811641,0.185179,0.770177,0.831438,0.713226,0.924863,0.298509
BAC,0.542324,0.698012,0.898768,-0.617421,0.425675,0.919203,1.0,-0.643149,0.956467,0.257841,...,0.968945,0.608249,0.887206,0.844727,0.171428,0.694248,0.87116,0.69212,0.941351,0.106035
BMBL,-0.444024,-0.725175,-0.571893,0.389665,-0.396899,-0.713147,-0.643149,1.0,-0.67501,0.322132,...,-0.596204,-0.484036,-0.691488,-0.156109,0.433321,-0.697313,-0.509424,-0.551679,-0.637898,0.277546
BK,0.688102,0.76811,0.912022,-0.542573,0.479693,0.91717,0.956467,-0.67501,1.0,0.322207,...,0.902587,0.580681,0.939955,0.754638,-0.077733,0.836054,0.786345,0.746677,0.933983,0.068357
SQ,0.551181,0.235408,0.307039,0.717148,0.378239,0.357638,0.257841,0.322132,0.322207,1.0,...,0.302243,0.180161,0.407532,0.239784,-0.022926,0.293583,0.066344,0.361541,0.30427,0.627711


In the next cells, we use correlation to calculate stocks that trade in the same direction as the stock with the highest STD, in order to create a portfolio where all the stocks are strongly positively correlated with each other. In our original code, we instead compiled a list of stocks in the same industry to achieve the opposite affect of inter-industry diversification. However, we then realized that it would make more sense to choose a basket stocks that are strongly positively correlated with each other, since it would acheive the same effect as an intra-industry portfolio, but would give us a larger variety of risky stocks to choose from so we can choose the riskiest of stocks.

In [13]:
def lister(corr, highest_std):
    """
    Creates a dict of stocks that are either positively or negatively correlated with the highest std stock
    :param corr: Correlation matrix of all stocks
    :param highest_std: Ticker of stock with the highest
    :return:
    """
    list1 = []
    list2 = []
    columns = corr.columns
    for i in range(len(corr.index)):
        c = corr[highest_std].iloc[i]
        tick = columns[i]
        if c > 0:
            list1.append(tick)
        elif c < 0:
            list2.append(tick)
    dic = {"positive": list1, "negative": list2}
    return dic


#creates dictionary cor_list which calls the function lister
# contains two lists which are positive correlating and negative correlating with the inputed stock
cor_list = lister(corr_df, get_highest_std(slst))
cor_list

{'positive': ['HOOD',
  'BMBL',
  'SQ',
  'VZ',
  'CMCSA',
  'CSCO',
  'IBM',
  'SO',
  'PYPL'],
 'negative': ['AAPL',
  'ABBV',
  'LOW',
  'AMZN',
  'AXP',
  'BAC',
  'BK',
  'SHOP',
  'COST',
  'CVS',
  'GM',
  'GOOG',
  'JPM',
  'ORCL',
  'OXY',
  'DUOL',
  'PEP',
  'SLB',
  'SPG']}

We then sort the both baskets of positively and negatively correlated stocks by their STDs in descending order so we can choose stocks that have the highest STD and are also strongly positively correlated with our keystone stock - the stock that has the highest std and weighting in our final portfolio.

In [14]:
def quicksort(tick_lst, std_list):
    """
    Implementation of quicksort to sort a list of tickers based on their std
    :param tick_lst: Ticker list
    :param std_list: List of respective STDs of the tickers.
    :return: Sorted list of tickers
    """
    sequence = tick_lst
    length = len(sequence)
    if length <= 1:
        return sequence
    else:
        pivot = sequence.pop()

    items_greater = []
    items_lower = []

    for item in sequence:
        i = std_list[item]
        pivoter = std_list[pivot]
        if i < pivoter:
            items_lower.append(item)
        else:
            items_greater.append(item)

    return quicksort(items_greater, slst) + [pivot] + quicksort(items_lower, slst)


positive = quicksort(cor_list["positive"], slst)
negative = (quicksort(cor_list["negative"], slst))

positive

['HOOD', 'BMBL', 'SQ', 'PYPL', 'CMCSA', 'IBM', 'CSCO', 'SO', 'VZ']

We also have a function to ensure that our code doesn't acheive the opposite of a risky portfolio - in the case that there is less than 10 stocks that are positively correlated with the stock with the highest STD, it might choose stocks that are negatively correlated and create a balanced portfolio. To overcome this, we make sure that the STD of the basket of stocks WITH the highest correlation is higher than the STD of the basket of stocks that are strongly negatively correlated against the stock with the highest STD. In the case that this is false, then our portfolio becomes the stocks with the strongly negatively correlated stocks. In either case, this ensures that our portfolio will have the highest STD possible, and all the stocks in the portfolio trade in the same direction.

In [15]:
def meanstd(lst, count, std_list, sum, lst_tick):
    """
    Recursively calculates the average STD of the top 10 stocks in each list of positive and negative correlated stocks.
    :param lst: List of tickers
    :param count: Iterative counter
    :param std_list: Ticker's stds
    :param sum: Sum of the STD
    :param lst_tick: Second list of tickers
    :return: Average STD of stocks in a list
    """
    if 0 == len(lst):
        return {"list": lst_tick, "mean": (sum / count)}
    elif 10 == count:
        return {"list": lst_tick, "mean": (sum / 10)}
    else:
        sum = sum + std_list[lst[0]]
        lst_tick.append(lst[0])
        return meanstd(lst[1:], (count + 1), std_list, sum, lst_tick)


pm = meanstd(positive, 0, slst, 0, [])
nm = meanstd(negative, 0, slst, 0, [])

len(nm["list"])

10

In [16]:
# NEW CODE: Changed line 20-30. Added extra clauses starting from `elif pl == nl:` to the `else`
def hstd(p, n):
    """
    Gets the higher average std of the positively and negatively correlated stocks
    :param p: Positively correlated stock std
    :param n: Negatively correlated stock std
    :return: Higher average STD
    """
    pm=p["mean"]
    nm=n["mean"]
    pl=len(p["list"])
    nl=len(n["list"])
    p["oppo"]=negative
    n["oppo"]=positive
    
    if pm > nm and pl >=7:
        return p
    elif nm > pm and nl >= 7:
        return n
    elif pl == nl:
        if pm > nm:
            return p
        elif nm > pm:
            return n
    elif pl > nl:
        return p
    elif nl > pl:
        return n
    else:
        raise Exception('Error! (Shouldn\'t come to this')
        


port_list = hstd(pm, nm)
port_list

{'list': ['HOOD', 'BMBL', 'SQ', 'PYPL', 'CMCSA', 'IBM', 'CSCO', 'SO', 'VZ'],
 'mean': 0.02598206150182928,
 'oppo': ['DUOL',
  'OXY',
  'SHOP',
  'SLB',
  'GM',
  'SPG',
  'AXP',
  'BAC',
  'BK',
  'AAPL',
  'LOW',
  'AMZN',
  'GOOG',
  'ORCL',
  'JPM',
  'CVS',
  'ABBV',
  'COST',
  'PEP']}

In the case that the list of tickers given to us is less than 20 (such as 6 positively correlated stocks and 5 negatively correlated stocks), our portfolio appends the negatively correlated stocks with the lowest STD to our positively correlated portfolio. This ensures that our portfolio has at least 10 stocks and the negatively correlated stocks do not affect our portfolio performance as much.

In [17]:
# NEW CODE - Changed line 12-16. Changed the code in the else statement.
def portlength(port):
    """
    Appends weakly negatively correlated stocks to our portfolio if there aren't enough positively correlated stocks in our portfolio.
    :param port: Portfolio tickers
    :return: New portfolio
    """
    lster=port["list"]
    
    if len(lster) == 10:
        return lster
    else:
        l=10-len(lster)
        for i in range(0,l) :
            lster.append(port["oppo"][-1 *(i+1)])
        return lster



finalport_lst=portlength(port_list)
finalport_lst

['HOOD', 'BMBL', 'SQ', 'PYPL', 'CMCSA', 'IBM', 'CSCO', 'SO', 'VZ', 'PEP']

In [18]:
lstofstocks = finalport_lst
lstofstocks

['HOOD', 'BMBL', 'SQ', 'PYPL', 'CMCSA', 'IBM', 'CSCO', 'SO', 'VZ', 'PEP']

Finally, we set our weights in this cell and backtest our portfolio over 2021-11-01 to 2021-11-26. For our weightings, we made them

35% 25% 5% 5% 5% 5% 5% 5% 5% 5%

Where the stock with the highest STD in our portfolio is weighted at 35% and the stock with the lowest STD is weighted at 5%. This ensures that overall, our portfolio has the highest STD as allowed by the requirements of the assignment (limited by the weightings), by giving the highest weighting to the riskiest stock. That way, a volatile move by the stock with the highest std will affect our portfolio the most.

In [19]:
# Using the 35% distribution

startdate = '2021-11-01'
enddate = '2021-11-26'

#Determine number of shares bought for each stocks under 35% weighted profolio ($100,000)
# Assuming that the stocks i the list are given in increasing order
NumOfShares = []  # number of shares will be printed in the list according to the corrosponding order


# Loopsahre function takes a list the starting date and the ending date, and produces the number of shares for each stocks in the list given
# Since the stocks given in the list is already sorted in decreasing order of standard deviation, we con proceed to apply the
# 35% percent distribution method (explained in the report)
def loopshare(lst, date_start, date_end):
    for i in range(len(lst)):
        if (i < 1):
            NumOfShares.append(35000 / ((yf.Ticker(lst[i]).history(start=date_start, end=date_end)).iloc[0, 3]))
        elif (i < 2):
            NumOfShares.append(25000 / ((yf.Ticker(lst[i]).history(start=date_start, end=date_end)).iloc[0, 3]))
        else:
            NumOfShares.append(5000 / ((yf.Ticker(lst[i]).history(start=date_start, end=date_end)).iloc[0, 3]))
    return NumOfShares


loopshare(lstofstocks, startdate, enddate)

listOfClose = []  #list of dataframe


#Extracting the closed data

# Funcation loopClose takes a lst of tickers, start date, and a end date, and produces a list of closing price of each stocks in the list date
# Withtin the start and end date
def loopClose(lst, start_date, end_date):
    for i in range(len(lst)):
        listOfClose.append(pd.DataFrame((yf.Ticker(lst[i]).history(start=start_date, end=end_date))['Close']))


loopClose(lstofstocks, startdate, enddate)

#Setting up a list before adding it to the dataframe
listBefore = []


#Function loopCloseBefore, creates a list of closing prices of tickers that is ready to be put into a dataframe
def loopCloseBefore(lst):
    for i in range(len(lst)):
        if (i == 0):
            listBefore.append(lst[i])
        else:
            listBefore.append(lst[i]['Close'])


loopCloseBefore(listOfClose)

#Create the dataframe for tracking the portfolio

#concating all the list of closign prices of stocks
finalPortfolio = pd.concat(listBefore, join='inner', axis=1)

#Renaming the column names to the corrosponding stocks
finalPortfolio.columns = lstofstocks

#Adding portfolio

#Adding the column portfolio
finalPortfolio['portfolio'] = finalPortfolio[str(finalPortfolio.columns[0])] * NumOfShares[0] + finalPortfolio[
    str(finalPortfolio.columns[1])] * NumOfShares[1] + finalPortfolio[str(finalPortfolio.columns[2])] * NumOfShares[2] + \
                              finalPortfolio[str(finalPortfolio.columns[3])] * NumOfShares[3] + finalPortfolio[
                                  str(finalPortfolio.columns[4])] * NumOfShares[4] + finalPortfolio[
                                  str(finalPortfolio.columns[5])] * NumOfShares[5] + finalPortfolio[
                                  str(finalPortfolio.columns[6])] * NumOfShares[6] + finalPortfolio[
                                  str(finalPortfolio.columns[7])] * NumOfShares[7] + finalPortfolio[
                                  str(finalPortfolio.columns[8])] * NumOfShares[8] + finalPortfolio[
                                  str(finalPortfolio.columns[9])] * NumOfShares[9]

As seen below, over the course of 1 month from 2021-11-01 to 2021-11-26, our portfolio fell nearly 18%. The backtesting of our portfolio on historical data reveals that it is very risky as desired. (We changed the return% in this annotation cell to match the new ticker list given to us)

In [20]:
finalPortfolio

Unnamed: 0_level_0,HOOD,BMBL,SQ,PYPL,CMCSA,IBM,CSCO,SO,VZ,PEP,portfolio
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2021-11-01,34.849998,51.470001,255.039993,231.279999,52.259998,119.136795,56.099998,61.825966,52.950001,161.259995,100000.0
2021-11-02,34.990002,49.560001,249.009995,229.460007,52.639999,119.04245,57.619999,61.835865,52.57,162.740005,99233.994404
2021-11-03,37.040001,52.0,252.479996,230.380005,52.950001,119.938713,57.650002,61.895233,52.939999,164.300003,102723.948006
2021-11-04,37.07,51.700001,247.460007,228.220001,52.73,119.258591,57.119999,61.845757,51.849998,164.309998,102259.799398
2021-11-05,37.009998,52.0,237.380005,225.779999,53.900002,121.982246,57.07,62.439476,52.240002,166.0,102453.92423
2021-11-08,37.98,52.25,236.770004,229.419998,53.490002,122.900002,57.0,61.9645,52.330002,162.429993,103468.710404
2021-11-09,36.700001,50.84,230.779999,205.419998,53.75,120.849998,57.439999,62.686859,52.240002,163.509995,100923.514905
2021-11-10,34.490002,47.75,227.210007,204.639999,54.169998,120.220001,57.77,63.191521,52.599998,164.039993,97250.673211
2021-11-11,34.169998,38.560001,226.509995,202.029999,53.66,120.269997,56.759998,62.389999,52.450001,162.690002,92137.822555
2021-11-12,35.209999,36.549999,227.300003,208.300003,53.5,118.959999,56.82,61.889999,52.34,162.649994,92240.037114


In [21]:
# Portfolio starting date
comp_date = '2021-11-26'

Please note that while we were testing our code, Yahoo Finance sometimes updates their data unpredictably, so we couldn't always get the closing prices for the current day. However, this shouldn't be a problem when you run the code past the 26th.

In [22]:
def price_data(lst,date):
    dic={}
    for i in range(len(lst)):
        t=yf.Ticker(lst[i])
        t_hist=t.history(start=date).iloc[0]
        dic[lst[i]]=t_hist.Close
    return dic

comp_closing=pd.DataFrame(price_data(lstofstocks, comp_date), index=[pd.to_datetime(comp_date)])

def calc_shares(df):
    dic={}
    for i in range(len(df.index)):
        for j in range(len(df.columns)):
            if j <= 0:
                dic[df.columns[j]]=35000/df.iloc[i,j]
            elif j <= 1:
                dic[df.columns[j]]=25000/df.iloc[i,j]
            else:
                dic[df.columns[j]]=5000/df.iloc[i,j]
    return dic

shares=calc_shares(comp_closing)
sa=pd.DataFrame.from_dict(shares,orient="index")

comp_closing=comp_closing.transpose()
FinalPortfolio=comp_closing

FinalPortfolio["Shares"]=sa[0]
FinalPortfolio=comp_closing.reset_index()
FinalPortfolio.columns=["Ticker","Prices","Shares"]
FinalPortfolio["Values"]=FinalPortfolio["Prices"]*FinalPortfolio["Shares"]
FinalPortfolio["Weight"]=(FinalPortfolio["Values"]/100000)*100
FinalPortfolio

Unnamed: 0,Ticker,Prices,Shares,Values,Weight
0,HOOD,27.92,1253.581658,35000.0,35.0
1,BMBL,33.830002,738.989023,25000.0,25.0
2,SQ,212.080002,23.576009,5000.0,5.0
3,PYPL,187.789993,26.625487,5000.0,5.0
4,CMCSA,51.099998,97.847361,5000.0,5.0
5,IBM,115.809998,43.174165,5000.0,5.0
6,CSCO,54.669998,91.457841,5000.0,5.0
7,SO,62.040001,80.593165,5000.0,5.0
8,VZ,51.799999,96.525098,5000.0,5.0
9,PEP,161.139999,31.028919,5000.0,5.0


In [23]:
Stocks=pd.DataFrame(FinalPortfolio["Ticker"])
Stocks["Shares"]=FinalPortfolio["Shares"]
Stocks

Unnamed: 0,Ticker,Shares
0,HOOD,1253.581658
1,BMBL,738.989023
2,SQ,23.576009
3,PYPL,26.625487
4,CMCSA,97.847361
5,IBM,43.174165
6,CSCO,91.457841
7,SO,80.593165
8,VZ,96.525098
9,PEP,31.028919


In [24]:
Stocks.to_csv('Stocks_Group_12.csv', index=False)
pd.read_csv('Stocks_Group_12.csv')

Unnamed: 0,Ticker,Shares
0,HOOD,1253.581658
1,BMBL,738.989023
2,SQ,23.576009
3,PYPL,26.625487
4,CMCSA,97.847361
5,IBM,43.174165
6,CSCO,91.457841
7,SO,80.593165
8,VZ,96.525098
9,PEP,31.028919


## How we came up with our list of tickers
When thinking about what a risky portfolio is, we directed our focus to creating the least diverse portfolio made with the most volatile stocks. Instantly, we wanted to create a portfolio with 10 volatile stocks with high stds over a reasonable time interval.

Our original thought process was to compare industries and find the industry with stocks which have the highest std. This would create very little diversification as stocks in the same industry tend to trend the same way. However, this method was ineffective for numerous reasons. The code would be far less efficient as we must call the info dictionary to get the industry and track down how many industries there are, moreover, we must check if there are even 10 stocks in each industry. Furthermore, one overtly volatile stock in one industry would lead us to base the portfolio on that industry; this is a problem as the industry might be relatively safe other than that one volatile stock creating a safe portfolio consequently.

Hence, we shifted our approach to look at correlation of each stock to the stock with the highest std. This would grant us to create two groups of stocks; stocks that positively correlate with the stock with the highest std and stocks that negatively correlate. The two lists of stocks were then reorganized by highest to lowest std and the top 10 were picked from each list. This method grants us to create two lists of stocks which are very risky and have a history of correlating with each other (less diversification within each list). We then take the list of stocks which have the higher average std. If the group of stocks with the higher average std has less than 10 stocks, we just take stocks from the bottom of the list of the original organized list of oppositely correlating stocks. In other words, from the list of tickers, we have limited it down to 10 stocks which all relatively correlate in the same direction and are very volatile. Correlation is better than industry as industries also correlate with each other so by being inclusive of multiple industries through correlation, we get access to other volatile stocks in other industries.


## 35% diversification method:
In the final part of the program, we’ve implemented a diversification method to determine the proportion of stocks bought using the 100,000, which we called the 35% diversification method. Our goal for the group is to create the most risky portfolio possible. As stated above, our condition for a risky portfolio is to have the least amount of variation of stocks possible, a low volume for each stock, and a high standard deviation for each stock, more importantly, the stocks in the portfolio have to correlate in the same direction.
However, in the rules for the competition, we are required to have a minimum of 10 stocks, and no value of a specific stock in the portfolio should weigh over 35%. Therefore, according to our condition for a high risk portfolio, we decided to make a portfolio with only 10 stocks. Another rule for the assignment is that each stock must make up a minimum of (100/(2n))% of the portfolio therefore, this leaves us with a portfolio of 10 stocks, and no stocks could be weighted higher than 35% or lower than 5%.
At the point of the code before implementing the 35% diversification method, we’ve already got a list of 10 stocks that are given in a decreasing order of standard deviation(calculated by previous historic data), and also correlated in the same direction, that means the first stock in the list is the most volatile, where as the last stock is the most volatile, but among the ten stocks. Since we know the list is given in a decreasing order, we wish to allocate our proportion of investment in a decreasing order as well, meaning that we put the most money in the first stock in the list and the least in the last stock, which optimizes the risk. Keeping in mind the fact that no values of stocks can make up more than 35%, and no values of stocks and be below 5%, by mathematical calculation, this leaves us with the optimum weighting of 35% for the first stock, 25% for the second stocks, and 5% each for the rest of the 8 stocks. In this case we can guarantee that 60% of our portfolio is driven by the two riskiest stocks, that correlates the same way, and the rest 40% of our portfolio is driven by 8 stocks that are less risky, but also correlates the same way. In this case, we created a portfolio that follows our condition for a risky portfolio, which is more concentrated, and have the least amount of variation stocks possible, but also not violating the rules for the assignment at the same time.

## Contribution Declaration

The following team members made a meaningful contribution to this assignment:

Bill:
Filtration code, annotation in the actual code and overall error checking.

Soumik:
Correlation code and generating the actual list of tickers to be used. Also worked on the final written portion.

Justin:
Code for calculating the weighting and outputting it to the csv. ALso worked on the final written portion.

<p style="color:red">
What we changed: <br>
Firstly, we added a single line of code to drop the duplicate tickers in the list of tickers given to us. <br><br>
As well we also added a few conditions to the <i>hstd()</i> function to handle more conditions in the ticker list given to us - For example, when the overall negatively and positively correlated lists have less than 7 tickers. There was no bug in this method, but we decided to add extra conditions in case a given ticker list would have less than 13 tickers in it. <br><br>
Lastly, we changed the everything under the else condition in the function <i>portlength()</i>. Originally we had set to if i == 10 when it was supposed to be i == (10-len(port)) so that meant that if the list length was less then 10, nothing was outputted hence running into the error NoneType. So just completely got rid of the if statement and printed out the list outside for the for loop which fixed the bug.
</p>