# Yahoo Finance data gather

#### by Vinicius Lima

In this notebook you will apply a few things you learned in the [FinanceHub's Python lectures](https://github.com/Finance-Hub/FinanceHubMaterials/tree/master/Python%20Lectures):

* You will use and manipulate different kinds of variables in Python such as text variables, booleans, floats, dictionaries, lists, etc.;
* We will also use `Pandas.DataFrame` objects and methods which are very useful in manipulating financial time series;
* You will use if statements, loops, and list comprehensions, and;
* You will use two tools for especificaly download yahoo finance data

## Introduction

In this lecture, you will learn how to extract data from Yahoo finances using two methods.

The first one will use the library pandas_datareader to gather Apple's stocks data.

https://learndatasci.com/tutorials/python-finance-part-yahoo-finance-api-pandas-matplotlib/

In [19]:
from pandas_datareader import data
import pandas as pd

## Data gather

The following method is quite simple, however keep in mind that it **does not work** with every stock or other data like currencies or commodities. This will be covered with the second method.

In [4]:
# We would like all available data from 01/01/2000 until 12/31/2016.
start_date = '2010-01-01'
end_date = '2019-12-31'

# User pandas_reader.data.DataReader to load the desired data. As simple as that.
# Notice that, the second parameter on the 'DataReader' function refers to where the data will be downloaded
panel_data = data.DataReader('AAPL', 'yahoo', start_date, end_date)
panel_data

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-01-04,30.642857,30.340000,30.490000,30.572857,123432400.0,26.601469
2010-01-05,30.798571,30.464285,30.657143,30.625713,150476200.0,26.647457
2010-01-06,30.747143,30.107143,30.625713,30.138571,138040000.0,26.223597
2010-01-07,30.285715,29.864286,30.250000,30.082857,119282800.0,26.175119
2010-01-08,30.285715,29.865715,30.042856,30.282858,111902700.0,26.349140
...,...,...,...,...,...,...
2019-12-24,284.890015,282.920013,284.690002,284.269989,12119700.0,284.269989
2019-12-26,289.980011,284.700012,284.820007,289.910004,23280300.0,289.910004
2019-12-27,293.970001,288.119995,291.119995,289.799988,36566500.0,289.799988
2019-12-30,292.690002,285.220001,289.459991,291.519989,36028600.0,291.519989


## Filtering data

Note that the data has multiples values. So for the next part we can isolate what we need to analyse. At this point, you can start to introduce tools that you learned from other lectures and start making use of all this data.

In [5]:
# Now that we have the data we need, we can start to manipulate it, let's isolate all the Close prices.
close = panel_data['Close']

# Then we can get all weekdays from the start to end date.
all_weekdays = pd.date_range(start_date, end_date, freq = 'B')

# With the dates set, we have to align them with the close prices we have, to do it we can use the method reindex.
close = close.reindex(all_weekdays)
pd.DataFrame(close)

Unnamed: 0,Close
2010-01-01,
2010-01-04,30.572857
2010-01-05,30.625713
2010-01-06,30.138571
2010-01-07,30.082857
...,...
2019-12-25,
2019-12-26,289.910004
2019-12-27,289.799988
2019-12-30,291.519989


In [6]:
# As you can see, there are NaNs on our dataframe, we can fill the missing by replacing them
# with the latest available price for each instrument
close = close.fillna(method='ffill')
pd.DataFrame(close)

Unnamed: 0,Close
2010-01-01,
2010-01-04,30.572857
2010-01-05,30.625713
2010-01-06,30.138571
2010-01-07,30.082857
...,...
2019-12-25,284.269989
2019-12-26,289.910004
2019-12-27,289.799988
2019-12-30,291.519989


In [21]:
# Next we can get some more information from this data using the "describe" method.
pd.DataFrame(close.describe())

Unnamed: 0,Close
count,2607.0
mean,110.315829
std,56.108225
min,27.435715
25%,64.781429
50%,99.519997
75%,150.445
max,293.649994


## Second Method 

The previous method is simple and can be effective depending on your goals, however it is limited because Yahoo deprecieted some features.

This second method utilize a library created as a alternative, but it keeps the simplicity and the returned data is in the same format as the **pandas_datareader**, so with you already have some work with that library you won't need to change it.

https://aroussi.com/post/python-yahoo-finance

First we import this library called **yfinance**.

In [8]:
import yfinance as yf

This works around the method **Ticker** which the input parameter is the market data you want. In this case we will use Apple to compare with the previous method.

In [25]:
#The core of the library is base on the method Ticker.
apple = yf.Ticker("AAPL")

In [27]:
#After we instanciate it, we can start manipulating it's data.
#We can get some general information about the stock with info.
apple.info

{'zip': '95014',
 'sector': 'Technology',
 'fullTimeEmployees': 137000,
 'longBusinessSummary': 'Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. It also sells various related services. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple TV, Apple Watch, Beats products, HomePod, iPod touch, and other Apple-branded and third-party accessories. It also provides digital content stores and streaming services; AppleCare support services; and iCloud, a cloud service, which stores music, photos, contacts, calendars, mail, documents, and others. In addition, the company offers various service, such as Apple Arcade, a game subscription service; Apple Card, a co-branded credit card; Apple News+, a subscription news and magazine service; and Apple Pay, a cashless payment service, as well 

For a more quantitative data, we use the history method.

This method has some useful paremeters list below:

- period: data period to download (Either Use period parameter or use start and end) Valid periods are: 1d, 5d, 1mo, 3mo, 6mo, 1y, 2y, 5y, 10y, ytd, max


- interval: data interval (intraday data cannot extend last 60 days) -Valid intervals are: 1m, 2m, 5m, 15m, 30m, 60m, 90m, 1h, 1d, 5d, 1wk, 1mo, 3mo


- start: If not using period - Download start date string (YYYY-MM-DD) or datetime.


- end: If not using period - Download end date string (YYYY-MM-DD) or datetime.


- prepost: Include Pre and Post market data in results? (Default is False)

In [26]:
apple.history(start="2010-01-01", end="2019-12-31")

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-01-04,26.53,26.66,26.40,26.60,123432400,0.0,0.0
2010-01-05,26.67,26.80,26.51,26.65,150476200,0.0,0.0
2010-01-06,26.65,26.75,26.20,26.22,138040000,0.0,0.0
2010-01-07,26.32,26.35,25.98,26.18,119282800,0.0,0.0
2010-01-08,26.14,26.35,25.99,26.35,111902700,0.0,0.0
...,...,...,...,...,...,...,...
2019-12-23,280.53,284.25,280.37,284.00,24643000,0.0,0.0
2019-12-24,284.69,284.89,282.92,284.27,12119700,0.0,0.0
2019-12-26,284.82,289.98,284.70,289.91,23280300,0.0,0.0
2019-12-27,291.12,293.97,288.12,289.80,36566500,0.0,0.0


## Mass download of market data

We can download multiple tickers at once using **download**.

In [29]:
data = yf.download("AAPL GOOG MSFT", start="2010-01-01", end="2019-12-31")

[*********************100%***********************]  3 of 3 completed


In [31]:
data.T

Unnamed: 0,Date,2010-01-04,2010-01-05,2010-01-06,2010-01-07,2010-01-08,2010-01-11,2010-01-12,2010-01-13,2010-01-14,2010-01-15,...,2019-12-16,2019-12-17,2019-12-18,2019-12-19,2019-12-20,2019-12-23,2019-12-24,2019-12-26,2019-12-27,2019-12-30
Adj Close,AAPL,26.60147,26.64746,26.2236,26.17512,26.34914,26.1167,25.81962,26.18382,26.03218,25.59713,...,279.86,280.41,279.74,280.02,279.44,284.0,284.27,289.91,289.8,291.52
Adj Close,GOOG,312.2048,310.8299,302.9943,295.9407,299.886,299.4326,294.1375,292.4488,293.8237,288.9171,...,1361.17,1355.12,1352.62,1356.04,1349.59,1348.84,1343.56,1360.4,1351.89,1336.14
Adj Close,MSFT,24.36073,24.3686,24.21905,23.96717,24.13247,23.8255,23.66807,23.88846,24.3686,24.28989,...,155.53,154.69,154.37,155.71,157.41,157.41,157.38,158.67,158.96,157.59
Close,AAPL,30.57286,30.62571,30.13857,30.08286,30.28286,30.01571,29.67429,30.09286,29.91857,29.41857,...,279.86,280.41,279.74,280.02,279.44,284.0,284.27,289.91,289.8,291.52
Close,GOOG,312.2048,310.8299,302.9943,295.9407,299.886,299.4326,294.1375,292.4488,293.8237,288.9171,...,1361.17,1355.12,1352.62,1356.04,1349.59,1348.84,1343.56,1360.4,1351.89,1336.14
Close,MSFT,30.95,30.96,30.77,30.45,30.66,30.27,30.07,30.35,30.96,30.86,...,155.53,154.69,154.37,155.71,157.41,157.41,157.38,158.67,158.96,157.59
High,AAPL,30.64286,30.79857,30.74714,30.28572,30.28572,30.42857,29.96714,30.13286,30.06571,30.22857,...,280.79,281.77,281.9,281.18,282.65,284.25,284.89,289.98,293.97,292.69
High,GOOG,313.5796,312.7477,311.7614,303.8611,300.4987,301.1014,297.9632,293.0914,295.9905,295.6718,...,1364.68,1365.0,1360.47,1358.1,1363.64,1359.8,1350.26,1361.327,1364.53,1353.0
High,MSFT,31.1,31.1,31.08,30.7,30.88,30.76,30.4,30.52,31.1,31.24,...,155.9,155.71,155.48,155.77,158.49,158.12,157.71,158.73,159.55,159.02
Low,AAPL,30.34,30.46428,30.10714,29.86429,29.86572,29.77857,29.48857,29.15714,29.86,29.41,...,276.98,278.8,279.12,278.95,278.56,280.37,282.92,284.7,288.12,285.22


## Exercise

Suppose we would like to plot the MSFT time-series. We would also like to see how the stock behaves compared to a short and longer term moving average of its price.

A simple moving average of the original time-series is calculated by taking for each date the average of the last W prices (including the price on the date of interest). pandas has rolling(), a built in function for Series which returns a rolling object for a user-defined window, e.g. 20 days.

Once a rolling object has been obtained, a number of functions can be applied on it, such as sum(), std() (to calculate the standard deviation of the values in the window) or mean().

Try to plot the MSFT time-series with the comparison of short and longer term moving average.