# Summary - Web Scraping and APIs

In Python, if you think "There must be an easier way to do this!" then you are probably right.

Below is an example of custom code to pull finance data. it uses the Python requests library and regular expressions to parse the pages.

However, there is an API for this data through the [Pandas DataReader](https://pandas-datareader.readthedocs.io/en/latest/remote_data.html) or through Quandl.

I prefer Quandl to PDR. It is more stable and has additional functions. To use Quandl, you need to register for a free account at www.quandl.com.

# Tutorial Overview
This tutorial is divided into 3 part:
1. A custom web scraping solution (Not recommended)
2. Example with Pandas DataReader
3. Example with Quandl
4. You're turn!

In [9]:
# https://flinhong.com/2017/10/30/query-yahoo-finance-historical-data-via-python-requests/

import re
import sys
import time
import datetime
import requests
import pandas as pd


def get_cookie_value(r):
    return {'B': r.cookies['B']}

def get_page_data(symbol):
    url = "https://finance.yahoo.com/quote/%s/?p=%s" % (symbol, symbol)
    r = requests.get(url)
    cookie = get_cookie_value(r)
    lines = r.content.decode('unicode-escape').strip(). replace('}', '\n')
    return cookie, lines.split('\n')

def find_crumb_store(lines):
    for l in lines:
        if re.findall(r'CrumbStore', l):
            return l
    print("Did not find CrumbStore")

def split_crumb_store(v):
    return v.split(':')[2].strip('"')

def get_cookie_crumb(symbol):
    cookie, lines = get_page_data(symbol)
    crumb = split_crumb_store(find_crumb_store(lines))
    return cookie, crumb

def get_data(symbol, start_date, end_date, cookie, crumb):
    filename = '%s.csv' % (symbol)
    url = "https://query1.finance.yahoo.com/v7/finance/download/%s?period1=%s&period2=%s&interval=1d&events=history&crumb=%s" % (symbol, start_date, end_date, crumb)
    response = requests.get(url, cookies=cookie)
    return response

def get_now_epoch():
    return int(time.time())

def convert_to_dataframe(data, symbol):
    data = data.content.decode("utf-8").splitlines()
    data = [i.strip(',') for i in data]
    data = pd.DataFrame(data)
    data = data[0].str.split(',', expand=True)
    cols = data.iloc[0,:].values.tolist()
    data.columns = cols
    data = data[1:]
    data['symbol'] = symbol
    data['Date'] = pd.DatetimeIndex(data['Date'])
    data[['Open','High','Low','Close','Adj Close','Volume']] = data[['Open','High','Low','Close','Adj Close','Volume']].astype(float)
    return data

def download_quotes(symbols):

    if type(symbols)!=list:
        symbols = [symbols]
        
    start_date = 0
    end_date = get_now_epoch()
    
    stocks = []
    for symbol in symbols:
        cookie, crumb = get_cookie_crumb(symbol)
        data = get_data(symbol, start_date, end_date, cookie, crumb)
        stocks.append(convert_to_dataframe(data, symbol))
        
    return pd.concat(stocks, axis=0)


In [10]:
df = download_quotes(['^GSPC','AMZN'])
df.head()



Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,symbol
1,1970-01-02,92.059998,93.540001,91.790001,93.0,93.0,8050000.0,^GSPC
2,1970-01-05,93.0,94.25,92.529999,93.459999,93.459999,11490000.0,^GSPC
3,1970-01-06,93.459999,93.809998,92.129997,92.82,92.82,11460000.0,^GSPC
4,1970-01-07,92.82,93.379997,91.93,92.629997,92.629997,10010000.0,^GSPC
5,1970-01-08,92.629997,93.470001,91.989998,92.68,92.68,10670000.0,^GSPC


# Example with Pandas DataReader

In [17]:
import pandas_datareader.data as web
symbol = 'WIKI/AMZN'
api_key = "upASDYvVpTLeNUhQxSgE"
df = web.DataReader(symbol, 'quandl', api_key = api_key)
df.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,ExDividend,SplitRatio,AdjOpen,AdjHigh,AdjLow,AdjClose,AdjVolume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2015-04-28,438.51,439.0,428.04,429.31,4140483.0,0.0,1.0,438.51,439.0,428.04,429.31,4140483.0
2015-04-27,443.86,446.99,437.41,438.56,5430949.0,0.0,1.0,443.86,446.99,437.41,438.56,5430949.0
2015-04-24,439.0,452.65,439.0,445.1,17176904.0,0.0,1.0,439.0,452.65,439.0,445.1,17176904.0
2015-04-23,390.21,391.88,386.15,389.99,7979985.0,0.0,1.0,390.21,391.88,386.15,389.99,7979985.0
2015-04-22,391.91,394.28,388.0,389.8,3474724.0,0.0,1.0,391.91,394.28,388.0,389.8,3474724.0


# Example with Quandl

In [4]:
import quandl
df = quandl.get("WIKI/AMZN", trim_start="1975-01-01", authtoken="upASDYvVpTLeNUhQxSgE")
df.tail()

In [8]:
df.loc['2012-02-15']['Close']

184.47

# Your Turn!


1\. What was the "Open" price of Amazon stock (AMZN) on February 15, 2012?

2\. What was the highest volume of Amazon stock traded in one day?


3\. On what day day did this occur?

4\. What was Amazon's biggest one day stock growth in dollar terms? in percentage terms?

5\. Create line chart showing Amazon stock price over time

6\. Create line chart showing average monthly stock price of Amazon (AMZN) vs S&P 500 (^GSPC) since January 2000 using Adj Close

hint: df['column'].dt.to_period("M") can convert a date to a month

7\. Create same chart in 2) but normalize it so that both stocks are set to 1.0 in January 2000

# Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.
* Describe three examples when using APIs for data would be better than downloading data as spreadsheets.

# Further Reading
This section provides more resources on the topic if you are looking to go deeper.

## Books
* Python for Finance, by Yves Hilpisch. http://shop.oreilly.com/product/0636920032441.do
 * [Git Repo](https://github.com/yhilpisch/py4fi2nd)
* Mastering Python for Finance,  James Ma Weiming. https://www.amazon.com/dp/1789346460/
 * [Git Repo](https://github.com/PacktPublishing/Mastering-Python-for-Finance-Second-Edition)
* Python Finance Cookbook, Eryk Lewinson. https://www.packtpub.com/data/python-for-finance-cookbook
 * [Git Repo](https://github.com/erykml/Python-for-Finance-Cookbook)

## APIs
* Pandas DataReader. https://pandas-datareader.readthedocs.io/en/latest/ 
* Quandl. https://www.quandl.com/tools/python
    
# Summary

In this tutorial, you were introduced to pulling data using web scraping, Pandas DataReader, and Quandl. Specifically, you learned:
* How to use Pandas DataReader and Quandl to access stock data as a Pandas DataFrame.

# Next

Use the easiest option to get data and then take the data and do amazing analysis.