# Obtain Dataset from Yahoo Finance

We will be getting our dataset from yahoo finance website. The module yfinance allows us to extract the stock data by python code into our notebooks. The module yfinance only works with pandas version 0.24.2 and newer. Therefore, before we run this notebook it is necessary to upgrade pandas to the newest version. 

In [1]:
# Modules needed to extract data 
import pandas as pd
from pandas_datareader import data as pdr
import yfinance as yf 

## Store stock data as CSV

The modules of zipline and yfinance are not compatible since zipline uses an older version of pandas. To circumvent this problem, we are using this notebook to obtain the dataset. We have saved the dataset into a CSV file and then open the CSV file in the main notebook. The function for saving and reading CSV files is the same for both versions of pandas. For the project, we will be working the pandas version 0.22.0 since that is the version that is compatiable to zipline. 

In [2]:
# Allows us to Download data from yahoo finance with module
yf.pdr_override()

# Conditions for our stock data
startDate = '2018-01-01'
endDate = '2018-12-31'
intervalCycle = '1mo'

# Create a class to download 
class StockDownLoad:
    def __init__(self, stock, start, end, interval):
        self.stock = stock
        self.start = start
        self.end = end
        self.interval = interval
    
    def GetData(self): # Function extract data from yahoo finance
        return pdr.get_data_yahoo(self.stock, self.start, self.end, interval=self.interval)
        
AAPLStock = StockDownLoad('AAPL',startDate, endDate, intervalCycle).GetData()
AMZNStock = StockDownLoad('AMZN',startDate, endDate, in tervalCycle).GetData()
BAStock = StockDownLoad('BA',startDate, endDate, intervalCycle).GetData()
FBStock = StockDownLoad('FB',startDate, endDate, intervalCycle).GetData()
GOOGStock = StockDownLoad('GOOG',startDate, endDate, intervalCycle).GetData()
MAStock = StockDownLoad('MA',startDate, endDate, intervalCycle).GetData()
MSFTStock = StockDownLoad('MSFT',startDate, endDate, intervalCycle).GetData()
NVDAStock = StockDownLoad('NVDA',startDate, endDate, intervalCycle).GetData()
UNHStock = StockDownLoad('UNH',startDate, endDate, intervalCycle).GetData()
VStock = StockDownLoad('V',startDate, endDate, intervalCycle).GetData()

[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded


In [3]:
# Zipline only works with lowercase column names, convert column in our dataframe to lower
# All the stock dataframes
StockData = [AAPLStock, AMZNStock, BAStock, FBStock, GOOGStock, MAStock, MSFTStock,
            NVDAStock, UNHStock, VStock]
# Iteration to convert columns
for stock in StockData:
    stock.columns = map(str.lower, stock.columns)
    stock.dropna(inplace=True)

In [4]:
# Iteration to save dataframe as csv
stocks  = ["AAPL", "AMZN", "BA", "FB", "GOOG", "MA", "MSFT", "NVDA", "UNH", "V"]

for data, stock in zip(StockData,stocks):
    data.to_csv('{}.csv'.format(stock))
