# Obtain Dataset from Yahoo Finance

We will be getting our dataset from yahoo finance website. The module yfinance allows us to extract the stock data by python code into our notebooks. The module yfinance only works with pandas version 0.24.2 and newer. Therefore, before we run this notebook it is necessary to upgrade pandas to the newest version. 

In [27]:
# Modules needed to extract data 
import pandas as pd
from pandas_datareader import data as pdr
import yfinance as yf 

## Store stock data as CSV

The modules of zipline and yfinance are not compatible since zipline uses an older version of pandas. To circumvent this problem, we are using this notebook to obtain the dataset. We have saved the dataset into a CSV file and then open the CSV file in the main notebook. The function for saving and reading CSV files is the same for both versions of pandas. For the project, we will be working the pandas version 0.22.0 since that is the version that is compatiable to zipline. 

In [31]:
# Download the stock data from yahoo finance in pandas dataframe
yf.pdr_override()
AAPLStock = pdr.get_data_yahoo("AAPL",
                               start = '2013-01-01',
                               end = '2018-12-31',
                               interval = '1d')

AMZNStock = pdr.get_data_yahoo("AMZN",
                               start = '2013-01-01',
                               end = '2018-12-31',
                               interval = '1d')

BAStock = pdr.get_data_yahoo("BA",
                             start = '2013-01-01',
                             end = '2018-12-31',
                            interval = '1d')

FBStock = pdr.get_data_yahoo("FB",
                             start = '2013-01-01',
                             end = '2018-12-31',
                             interval = '1d')

GOOGStock = pdr.get_data_yahoo("GOOG",
                          start = '2013-01-01',
                          end = '2018-12-31',
                          interval = '1d')

MAStock = pdr.get_data_yahoo("MA",
                        start = '2013-01-01',
                        end = '2018-12-31',
                        interval = '1d')

MSFTStock = pdr.get_data_yahoo("MSFT",
                          start = '2013-01-01',
                          end = '2018-12-31',
                          interval = '1d')

NVDAStock = pdr.get_data_yahoo("NVDA",
                          start = '2013-01-01',
                          end = '2018-12-31',
                          interval = '1d')

UNHStock = pdr.get_data_yahoo("UNH",
                         start = '2013-01-01',
                         end = '2018-12-31',
                         interval = '1d')

VStock = pdr.get_data_yahoo("V",
                       start = '2013-01-01',
                       end = '2018-12-31',
                       interval = '1d')

[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded
[*********************100%***********************]  1 of 1 downloaded


In [32]:
# Zipline only works with lowercase column names, convert column in our dataframe to lower
VStock.columns = map(str.lower, VStock.columns)
# All the stock dataframes
StockData = [AAPLStock, AMZNStock, BAStock, FBStock, GOOGStock, MAStock, MSFTStock,
            NVDAStock, UNHStock, VStock]
# Iteration to convert columns
for stock in StockData:
    stock.columns = map(str.lower, stock.columns)
    

Unnamed: 0_level_0,open,high,low,close,adj close,volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2013-01-02,27.44,28.18,27.42,28.00,28.00,69846400
2013-01-03,27.88,28.47,27.59,27.77,27.77,63140600
2013-01-04,28.01,28.93,27.83,28.76,28.76,72715400
2013-01-07,28.69,29.79,28.65,29.42,29.42,83781800
2013-01-08,29.51,29.60,28.86,29.06,29.06,45871300
2013-01-09,29.67,30.60,29.49,30.59,30.59,104787700
2013-01-10,30.60,31.45,30.28,31.30,31.30,95316400
2013-01-11,31.28,31.96,31.10,31.72,31.72,89598000
2013-01-14,32.08,32.21,30.62,30.95,30.95,98892800
2013-01-15,30.64,31.71,29.88,30.10,30.10,173242600


In [33]:
# Iteration to save dataframe as csv
stocks  = ["AAPL", "AMZN", "BA", "FB", "GOOG", "MA", "MSFT", "NVDA", "UNH", "V"]

for data, stock in zip(StockData,stocks):
    data.to_csv('{}.csv'.format(stock))


AAPL
              open    high     low   close  adj close     volume
Date                                                            
2013-01-02   79.12   79.29   77.38   78.43      55.47  140129500
2013-01-03   78.27   78.52   77.29   77.44      54.77   88241300
2013-01-04   76.71   76.95   75.12   75.29      53.25  148583400
2013-01-07   74.57   75.61   73.60   74.84      52.93  121039100
2013-01-08   75.60   75.98   74.46   75.04      53.07  114676800
2013-01-09   74.64   75.00   73.71   73.87      52.25  101901100
2013-01-10   75.51   75.53   73.65   74.79      52.89  150286500
2013-01-11   74.43   75.05   74.15   74.33      52.57   87626700
2013-01-14   71.81   72.50   71.22   71.68      50.69  183551900
2013-01-15   71.19   71.28   69.05   69.42      49.09  219193100
2013-01-16   70.66   72.78   70.36   72.30      51.13  172701200
2013-01-17   72.90   72.96   71.72   71.81      50.79  113419600
2013-01-18   71.22   71.75   70.91   71.43      50.52  118230700
2013-01-22   72.08  

NVDA
              open    high     low   close  adj close    volume
Date                                                           
2013-01-02   12.56   12.73   12.51   12.72      11.82  11970900
2013-01-03   12.72   12.87   12.58   12.73      11.83   7472200
2013-01-04   12.75   13.19   12.71   13.15      12.22  13124200
2013-01-07   13.14   13.18   12.68   12.77      11.87  15268300
2013-01-08   12.80   12.84   12.40   12.49      11.61  11660600
2013-01-09   12.59   12.65   12.13   12.21      11.35  17375500
2013-01-10   12.32   12.38   12.16   12.23      11.37  12659200
2013-01-11   12.28   12.29   12.09   12.21      11.35  12829300
2013-01-14   12.29   12.29   12.06   12.20      11.34   7642100
2013-01-15   12.14   12.14   11.91   11.98      11.14   9397200
2013-01-16   11.96   12.19   11.96   12.09      11.24   8434400
2013-01-17   12.13   12.30   12.10   12.25      11.39  14518400
2013-01-18   12.25   12.25   12.02   12.17      11.31   9927200
2013-01-22   12.16   12.27   12.05 