# Portfolio Analysis with Statistics

To access stock data, we will use the [yfinance library](https://github.com/ranaroussi/yfinance). We first download the historical monthly stock prices for the chosen stocks/tickers(slightly modifying the code in the library tutorial). The data comes in the form of a pandas dataframe with multi-level headers, so we also unstack the levels for simpler access.

In [265]:
import yfinance as yf
import numpy as np
import scipy.stats as stats
import pandas as pd
ticks = ["AAPL", "MSFT", "LABU"]
data = yf.download(tickers = ticks, interval = "1mo", group_by = 'ticker', auto_adjust = True, threads = True)

[*********************100%***********************]  3 of 3 completed


In [276]:
sp500_raw = yf.download(tickers = "^GSPC", interval = "1mo", group_by = 'ticker', auto_adjust = True, threads = True)

[*********************100%***********************]  1 of 1 completed


In [274]:
# unstack
stocks_raw = data.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
stocks_raw = stocks_raw.sort_values(by=['Date', 'Ticker'])
stocks_raw.tail(10)

Unnamed: 0_level_0,Ticker,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2021-04-01,MSFT,251.59903,262.583675,237.501592,237.920623,568661600.0
2021-05-01,AAPL,124.398697,133.842662,122.042698,131.81609,1711935000.0
2021-05-01,LABU,63.0,81.25,50.0,80.150002,71122100.0
2021-05-01,MSFT,249.104782,253.764037,237.521544,252.816213,495084900.0
2021-06-01,AAPL,133.110001,134.639999,123.129997,125.080002,1416442000.0
2021-06-01,LABU,78.839996,79.139999,58.630001,64.089996,51240900.0
2021-06-01,MSFT,265.019989,267.850006,243.0,251.229996,447269400.0
2021-06-25,AAPL,133.110001,133.889999,132.809998,133.460007,70783750.0
2021-06-25,LABU,78.839996,79.139999,75.442902,77.0,1961534.0
2021-06-25,MSFT,265.019989,267.25,264.76001,266.230011,25611110.0


Since we only want one stock price per month, we filter out the last row of each asset if it does not fall on the first day of the month. We also only keep the last 5 years of data to maintain an accurate representation of each company's relevant returns(profitability in the 1990s does not entail profitability in 2020s). Because we require the previous month's price to compute the current month's return, we need to keep an extra month(a total of 61 months)

In [277]:
recent = stocks_raw.index[-1] - pd.DateOffset(day = 1)
begin = recent - pd.DateOffset(years = 5) - pd.DateOffset(months = 1)
stocks = stocks_raw.loc[(stocks_raw.index <= latest) & (stocks_raw.index >= begin)].copy()
sp500 = sp500_raw.loc[(sp500_raw.index <= latest) & (sp500_raw.index >= begin)].copy()
stocks.head()

Unnamed: 0_level_0,Ticker,Close,High,Low,Open,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2016-05-01,AAPL,23.18079,23.382746,20.768929,21.813527,3602686000.0
2016-05-01,LABU,36.286392,36.86111,22.017576,30.707682,104283800.0
2016-05-01,MSFT,48.487541,48.487541,45.248939,45.742963,530704800.0
2016-06-01,AAPL,22.327021,23.796027,21.369481,23.125748,3117991000.0
2016-06-01,LABU,27.962915,39.526602,21.740127,35.691859,110038400.0


To compute the (percent) return of a specified observation, we subtract the current price with last month's price and divide by last month's price. We can easily vectorize this by subtracting an array of the (open) prices without the last observation from an array of the (open) prices without the first observation. We then divide by the former.  

In [284]:
for t in ticks:
    # turn into np array to avoid indexing issues
    nofirst = np.array(stocks.loc[stocks['Ticker'] == t, 'Open'].iloc[1:])
    nolast = np.array(stocks.loc[stocks['Ticker'] == t, 'Open'].iloc[:len(stocks.loc[stocks['Ticker'] == t]) - 1])
    # add back index before assignment
    stocks.loc[stocks['Ticker'] == t, 'PercReturns'] = pd.Series((nofirst - nolast) / nolast, index = stocks.loc[stocks['Ticker'] == t, 'Open'].iloc[1:].index)

spnofirst = np.array(sp500['Open'].iloc[1:])
spnolast = np.array(sp500['Open'].iloc[:len(sp500) - 1])
spreturns = (spnofirst - spnolast) / spnolast

To make calculating the mean returns and covariances easier, we can pivot the dataframe such that each ticker's percent returns form individual columns. Note that while the ticker labels are lost, the order remains the same. 

In [291]:
stockpivot = stocks.pivot(columns = 'Ticker', values = 'PercReturns').iloc[1:]
returnarr = stockpivot.to_numpy()
returnmean = returnarr.mean(axis = 0)
returncov = np.cov(m = returnarr, rowvar = False)
invcov = np.linalg.lstsq(a = returncov, b = np.eye(len(ticks)), rcond = None)[0]
returncov

array([[0.00766766, 0.01096201, 0.0030766 ],
       [0.01096201, 0.07438112, 0.00741464],
       [0.0030766 , 0.00741464, 0.0025471 ]])

In [296]:
betas = np.zeros(len(ticks))
alphas = np.zeros(len(ticks))
unsyserr = np.zeros(len(ticks))
for i in np.arange(len(ticks)):
    betas[i], alphas[i], r, p, se = stats.linregress(spreturns, returnarr[:,i])
    unsyserr[i] = np.sum((returnarr[:,i] - alphas[i] - betas[i]*spreturns)**2) / (len(spreturns) - 2)

In [299]:
simdf = pd.DataFrame(data = {'alpha': alphas, 'beta': betas, 'eps': unsyserr}, index = stockpivot.columns.values)
simdf

Unnamed: 0,alpha,beta,eps
AAPL,0.016725,1.246643,0.004229
LABU,-0.001609,3.880041,0.041071
MSFT,0.019163,0.806234,0.001098


'AAPL'