#Downloading Stock Prices from Yahoo Finance Using YFinance API


*   This Colab Notebook shows how to download stock prices from Yahoo Finance and create a stock return series for analysis
*   Steps:
  * Use YFinance API to download hsitorical stock prices for MSFT
  * Dowload data from Jan 2016 - Dec 2020
  * Convert Daily Stock Data to Monthly Values
  * Convert Monthly Prices into Monthly Stock Returns
  * Upload Fama-French Risk Factors
  * Run FF Regression to determine Alpha and Factor



###Install packages and libraries
* Note: You may have to run the pip install twice to propoerly install YFinance

In [4]:
!pip install yfinance



In [5]:
import yfinance as yf
import pandas as pd
import statsmodels.api as sm

###Get data from Yahoo Finance using YFinance API (Example: Ticker = "MSFT")
* Collect daily historical stock prices for MSFT from end of 2015 to beginning of 2021

In [6]:
msft = yf.Ticker('msft')
df_msft = msft.history(start="2015-12-01", end="2021-01-05")

#Display dataframe containing data for MSFT
df_msft

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-12-01,49.136528,49.877052,49.037188,49.868023,39952800,0.0,0
2015-12-02,49.958333,50.536303,49.723534,49.858994,47274900,0.0,0
2015-12-03,50.111841,50.364702,48.703037,48.946869,38627800,0.0,0
2015-12-04,48.874630,50.780127,48.856568,50.491142,43963700,0.0,0
2015-12-07,50.382775,50.545329,49.931235,50.400837,30709800,0.0,0
...,...,...,...,...,...,...,...
2020-12-28,222.124943,223.688577,220.709763,222.629669,17933500,0.0,0
2020-12-29,223.965677,224.826660,221.263961,221.828049,17403200,0.0,0
2020-12-30,222.896850,223.292715,219.175805,219.383621,20272300,0.0,0
2020-12-31,219.403424,220.689960,217.404345,220.115967,20942100,0.0,0


###Convert daily data to End of Month Data for the period

In [7]:
# Convert daily into monthly picked by last day of month
df_msft = df_msft.loc[df_msft.groupby(df_msft.index.to_period('M')).apply(lambda x: x.index.max())]
df_msft

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2015-12-31,50.608551,50.744010,50.048639,50.102825,27334100,0.0,0
2016-01-29,49.425509,49.750618,48.766262,49.750618,83611700,0.0,0
2016-02-29,46.706068,46.978940,46.078471,46.278576,31654000,0.0,0
2016-03-31,49.980492,50.562612,49.898631,50.235168,26360500,0.0,0
2016-04-29,44.886936,45.705544,44.886936,45.359909,48411700,0.0,0
...,...,...,...,...,...,...,...
2020-09-30,205.041342,209.236335,203.866742,207.607697,33829100,0.0,0
2020-10-30,200.866078,201.645847,197.036293,199.849411,36953700,0.0,0
2020-11-30,211.882166,212.535318,208.655926,211.852478,33064800,0.0,0
2020-12-31,219.403424,220.689960,217.404345,220.115967,20942100,0.0,0


###Create Monthly Stock Returns
* Use Close Price of last day in each month
* Skip first row (NA for return)
* Only include data up to the end of 2020 (<2021)

In [8]:
# compute monthly returns using pandas pct_change()
df_msft['ret'] = df_msft['Close'].pct_change()
# skip first row with NA 
df_msft = df_msft[1:]
# Include less than the year 2021
df_msft=df_msft.loc[df_msft.index < '2021-01-01 00:00:00'] 

#Display resulting dataframe that include stock returns
df_msft

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,ret
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2016-01-29,49.425509,49.750618,48.766262,49.750618,83611700,0.0,0,-0.00703
2016-02-29,46.706068,46.97894,46.078471,46.278576,31654000,0.0,0,-0.069789
2016-03-31,49.980492,50.562612,49.898631,50.235168,26360500,0.0,0,0.085495
2016-04-29,44.886936,45.705544,44.886936,45.359909,48411700,0.0,0,-0.097049
2016-05-31,47.866236,48.544022,47.701372,48.544022,37653100,0.0,0,0.070197
2016-06-30,46.455707,46.986942,46.254202,46.86787,28527800,0.0,0,-0.034528
2016-07-29,51.529925,51.987888,51.337582,51.914616,30558700,0.0,0,0.10768
2016-08-31,53.132171,53.270413,52.809597,52.957058,20860300,0.0,0,0.02008
2016-09-30,53.058446,53.242773,52.846471,53.086094,29910800,0.0,0,0.002437
2016-10-31,55.445475,55.685098,55.224281,55.224281,26434700,0.0,0,0.040278


###Upload CSV file containing FF Risk Factors for 2016-2020 (Monthly data)

In [9]:
ff_factors = pd.read_csv('FF-Factors-2016-2020.csv', parse_dates = ['dateff'],  index_col=['dateff'])

###Print out dataframe of FF Risk Factors
* Rename index to "Date" to match "Date" index for MSFT stock data

In [10]:
#ff_factors.rename(columns={'dateff':'date'}, inplace=True)
ff_factors.index.rename('Date', inplace=True)
ff_factors.head()

Unnamed: 0_level_0,mktrf,smb,hml,rf,umd
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2016-01-29,-0.0577,-0.0339,0.0207,0.0001,0.0134
2016-02-29,-0.0008,0.0081,-0.0057,0.0002,-0.0406
2016-03-31,0.0696,0.0075,0.011,0.0002,-0.0513
2016-04-29,0.0092,0.0067,0.0321,0.0001,-0.0626
2016-05-31,0.0178,-0.0019,-0.0165,0.0001,0.0214


###Join the two dataframes based on "Date" index

In [11]:
all = df_msft.join(ff_factors, how='outer')

#Print out combined dataframe
all

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,ret,mktrf,smb,hml,rf,umd
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2016-01-29,49.425509,49.750618,48.766262,49.750618,83611700,0.0,0,-0.00703,-0.0577,-0.0339,0.0207,0.0001,0.0134
2016-02-29,46.706068,46.97894,46.078471,46.278576,31654000,0.0,0,-0.069789,-0.0008,0.0081,-0.0057,0.0002,-0.0406
2016-03-31,49.980492,50.562612,49.898631,50.235168,26360500,0.0,0,0.085495,0.0696,0.0075,0.011,0.0002,-0.0513
2016-04-29,44.886936,45.705544,44.886936,45.359909,48411700,0.0,0,-0.097049,0.0092,0.0067,0.0321,0.0001,-0.0626
2016-05-31,47.866236,48.544022,47.701372,48.544022,37653100,0.0,0,0.070197,0.0178,-0.0019,-0.0165,0.0001,0.0214
2016-06-30,46.455707,46.986942,46.254202,46.86787,28527800,0.0,0,-0.034528,-0.0005,0.0059,-0.0145,0.0002,0.0411
2016-07-29,51.529925,51.987888,51.337582,51.914616,30558700,0.0,0,0.10768,0.0395,0.0251,-0.0129,0.0002,-0.0308
2016-08-31,53.132171,53.270413,52.809597,52.957058,20860300,0.0,0,0.02008,0.005,0.0117,0.0311,0.0002,-0.0317
2016-09-30,53.058446,53.242773,52.846471,53.086094,29910800,0.0,0,0.002437,0.0025,0.0213,-0.0121,0.0002,-0.0053
2016-10-31,55.445475,55.685098,55.224281,55.224281,26434700,0.0,0,0.040278,-0.0202,-0.0441,0.0412,0.0002,0.0069


### Run FF Regression to Explain Monthly Stock Returns of MSFT
* See Lecture 5/6 notes on Fama French Regressions
 * [Ret(MSFT) - Rf] = Alpha + B1(MktRet-Rf) + B2(SMB) + B3(HML) + e

In [12]:
import statsmodels.api as sm
#MSFT Regression
y = all["ret"] - all["rf"]
X = all[['mktrf' , 'smb' , 'hml']] 
# Use statsmodels
X = sm.add_constant(X) # adding a constant
model = sm.OLS(y, X).fit()

#Print Regression Statistics
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.560
Model:                            OLS   Adj. R-squared:                  0.536
Method:                 Least Squares   F-statistic:                     23.76
Date:                Thu, 07 Apr 2022   Prob (F-statistic):           4.75e-10
Time:                        19:00:40   Log-Likelihood:                 116.47
No. Observations:                  60   AIC:                            -224.9
Df Residuals:                      56   BIC:                            -216.6
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0114      0.005      2.268      0.0

  x = pd.concat(x[::order], 1)
