# Lecture9 5/2 

I set up a git hub repository for notes in jupyter notebook formats. You can find it through this link: https://github.com/Fay-Wu/134Note You can clone it to your own notebook and play with the codes.

# Web Services and Data Interfaces

Data is central to data science. The ability to find and gather useful data is extremely useful. As we have seen with NBA data, things can be challenging to reverse engineer websites. We will look at other sources of data and various software terminologies around web services that provide data interfaces.


## Frontend and Backend

There are many layers to a web service we interact with everyday. Take our jupyter notebook for example. 

* We open the URL for the course Jupyter notebook,
* When we enter commands into a notebook,
* An instance of python "kernel" runs the line(s) of code,
* Takes the output and sends it back to the user, and
* The running notebook displays the output.

Somethings happen in our browser, and somethings happen on the servers. What we see and what we interact with is called the __frontend__, and our requests are taken care of by the __backend__. The frontend and backend communicate using a __protocol__. So, 

[Wikipedia on Hypertext Transfer Protocol](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)

* Jupyter notebook web interface is the frontend
* Python kernel is the backend
* The frontend and the backend communicate via http (hypertext transfer protocol).

Take the example of downloading the data from NBA using `wget`.

* We enter a command in the notebook containing `wget ...nba_url...`
* The code is communicated to the python kernel
* Python kernel executes the code (running on backend)
* Python kernel retrieves the content from `...nba_url...`
* The content is communicated back to the running notebook
* Notebook interface is updated with the output

### Frontend

Frontend is where information is displayed, and interactions occur. Modern web browsers understand javascript. In fact, HTML, CSS, and Javascript make up the core technologies for the frontend.

In the url window, enter `javascript:alert('hello')`.Any browser would work except the chrome incognito. There is a javascript engine that interprets your javascript code. Javascript language is the basis for many frontend libraries. IPython widgets (for selecting basketball players) are such examples.

### Backend

Backend is the business-end of web services. The backend usually is made up of a server (a real or virtual computer), an application (that interprets your requests), and a database (where information is stored). We usually do not see what goes on in the backend, and we are allowed access through service providers' means to communicate.



## Application programming interface (API)

Many web services provide interface to their "backend". This is a direct programmable interface to the hosting website. Although unpublished, we used NBA's API to pull JSON data by reverse engineering their site.

Since web service providers thrive on selling data, and other people building on top of their data, often they provide their own API.

### One prepackaged option: `pandas_datareader`

[Pandas datareader](https://pandas-datareader.readthedocs.io/en/latest/) is a package dedicated to interface with various web data sources. Its dcumentation lists that the following are the [data sources](https://pandas-datareader.readthedocs.io/en/latest/remote_data.html) `pandas_datareader` can interface with.

* Google Finance
* Morningstar
* IEX
* Robinhood
* Enigma
* [Quandl](https://www.quandl.com/)
* [St.Louis FED (FRED)](https://fred.stlouisfed.org/)
* Kenneth French’s data library
* World Bank
* OECD (Organisation for Economic Co-operation and Development)
* Eurostat
* Thrift Savings Plan
* Nasdaq Trader symbol definitions
* [Stooq](https://stooq.com/db/h/)
* [MOEX](https://www.moex.com/en/)

That is an impressive list of sources; however, since the package depends on an existing API (application programming interface), things break when the data source website (e.g. Google Finance) make changes.

In [None]:
import pandas_datareader.data as web

## files containing codes can be downloaded here:
## https://help.quandl.com/article/92-how-do-i-download-the-quandl-codes-of-all-the-datasets-in-a-given-database
## or they can be queried with API
## quandl is a data selling bussiness, with some free data provided

symbol = 'WIKI/AAPL'  # or 'AAPL.US' gives the stock info of Apple

# inorder to get the symbols, go to the url : 
# https://help.quandl.com/article/92-how-do-i-download-the-quandl-codes-of-all-the-datasets-in-a-given-database
# then you can download a zip of datasets code

df = web.DataReader(symbol, 'quandl', '2013-01-01', '2015-02-01')
df.reset_index(inplace=True)
df.head()

#### (Off-topic) Bokeh: Interactive visualization library

Visualize the output using a more feature-rich package, [Bokeh](https://bokeh.pydata.org/en/latest/). Bokeh provides interactivity with the plots rendered in web browsers

In [None]:
from bokeh.io import push_notebook, show, output_notebook
from bokeh.layouts import row
from bokeh.plotting import figure
output_notebook()

Open, High, Low, Close (OHLC) data is often visualized with candle sticks: If you zoom into the plot (use the zoom buttom on the right), each data looks like candle stick.

In [None]:
from math import pi

inc = df.Close > df.Open
dec = df.Open > df.Close
w = 12*60*60*1000 # half day in ms

p = figure(x_axis_type="datetime", 
           plot_width=750, plot_height=400, 
           title = "AAPL Candlestick")

p.segment(df.Date, df.High, df.Date, df.Low, color="black") #low and high are two ends
p.vbar(df.Date[inc], w, df.Open[inc], df.Close[inc], fill_color="#D5E1DD", line_color="black") 
p.vbar(df.Date[dec], w, df.Open[dec], df.Close[dec], fill_color="#F2583E", line_color="black")
## two v bars(boxes), one for increasing (red), and one for decreasing (black)
## note there is a big jump around june 2014
show(p)

In [None]:
# Adjusted, now the huge jump is gone

inc = df.AdjClose > df.AdjOpen
dec = df.AdjOpen > df.AdjClose
w = 12*60*60*1000 # half day in ms

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=750, plot_height=400, title = "AAPL Candlestick")
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3

p.segment(df.Date, df.AdjHigh, df.Date, df.AdjLow, color="black")
p.vbar(df.Date[inc], w, df.AdjOpen[inc], df.AdjClose[inc], fill_color="#D5E1DD", line_color="black")
p.vbar(df.Date[dec], w, df.AdjOpen[dec], df.AdjClose[dec], fill_color="#F2583E", line_color="black")

show(p)  # open a browser


### Custom package: Quandl API

[Quandl](https://www.quandl.com/) is a financial data provider company, and much of their data is available as products (to buy). However, some are provided freely to the community.
If you browse their data and choose a free on, ex: Zillow Home Value Index. There is a side menu that help you will the data through different libraries.

* [Quandl API documentation](https://docs.quandl.com/)
* [Quickstart page for python](https://www.quandl.com/tools/python)
* [WIKI data](https://www.quandl.com/databases/WIKIP) is free stock data, it also gives you the API key in the documentation tab

There is a wealth of other free data:
* [Search for free data](https://www.quandl.com/search?query=&filter[]=Free)
* [Free data from Zillow: Percent of homes decreasing in value -Bardstown, KY](https://www.quandl.com/data/ZILLOW/M632_PHDVAH-Zillow-Home-Value-Index-Metro-Percent-Of-Homes-Decreasing-In-Values-All-Homes-Bardstown-KY)
* On the right-side menu under "EXPORT DATA", locate the link for "python"
* My code reads something like `quandl.get("ZILLOW/M632_PHDVAH", authtoken="...myapikey...")`

To downloan the myapikeys into your directoy, use 


(wget https://gist.githubusercontent.com/syoh/639d1db55370937465c09d0f5732b68d/raw/47ed2fec0f621a2cc163ccf210bb20666f4ec19f/myapikeys.py) 
in your terminal under the file directory you want the file to be.

In [None]:
import myapikeys as m # my api keys are saved here
# remember to put the myapikeys.txt into the same file with this notebook
import quandl
## API key is needed
out = quandl.get("ZILLOW/M632_PHDVAH", authtoken=m.apikeys['quandl'])
out.head()

In [None]:
p = figure(x_axis_type="datetime", 
           plot_width=750, plot_height=400, 
           title = "Zillow: Proportion of homes decreasing in value")
p.line(out.index.values, out.Value)

show(p)

In [None]:
quandl.ApiConfig.api_key = None
quandl.ApiConfig.api_key = m.apikeys['quandl']

# get the table for daily stock prices and,
# filter the table for selected tickers, columns within a time range
# set paginate to True because Quandl limits tables API to 10,000 rows per call

symbols = ['AA','AXP','BA','BAC','CAT',
           'CSCO','CVX','DD','DIS','GE',
           'HD','HPQ','IBM','INTC','JNJ',
           'JPM','KFT','KO','MCD','MMM',
           'MRK','MSFT','PFE','PG','T',
           'TRV','UTX','VZ','WMT','XOM']

data = quandl.get_table('WIKI/PRICES', ticker = symbols, 
                        qopts = { 'columns': ['ticker', 'date', 'adj_close'] }, 
                        date = { 'gte': '2015-12-31', 'lte': '2016-12-31' }, 
                        paginate=True)
data.tail()

### Custom package: Yelp API

[Yelp](https://www.yelp.com/developers/documentation/v3) needs no introduction. Yelp allows applications to programmatically interact with their data. The table of [endpoints](https://www.yelp.com/developers/documentation/v3/get_started) outlines how you can query their site. For example, if you wanted to search for different businesses, you can use [`business_search`](https://www.yelp.com/developers/documentation/v3/business_search) endpoint.

In order to use Yelp's API, you need an API key. API keys can be thought of as a key string for both login and password. Usually API keys can be revoked; however, until revoked, they can be used as your login credentials, so be careful how you store them: e.g., do not store it on a public github repository!

Yelp's python package can be used if using python as programming language: [github](https://github.com/Yelp/yelp-fusion#code-samples). This facilitates having to deal with web page protocols: i.e., you don't have to construct a GET URL string and use `wget`!

* [Interface libraries](https://github.com/Yelp/yelp-fusion/tree/master/fusion) for different backends
* [Yelp python example](https://github.com/Yelp/yelp-fusion/blob/master/fusion/python/sample.py) for accessing Yelp's data

In [None]:
## sample is sample.py from 
## https://github.com/Yelp/yelp-fusion/blob/master/fusion/python/sample.py
import sample as s

This local module is a sample python file available from [Yelp's own repository](https://github.com/Yelp/yelp-fusion/blob/master/fusion/python/sample.py).

In [None]:
#try run this line to see the file
#! cat sample.py

Note the functions calling `requests.request('GET', url, ...)`. This line is essentially doing what `wget` was doing but with functions in python.

Let's search for "restaurants" in "isla vista, ca"

In [None]:
## API key is needed
s.API_KEY = m.apikeys['yelp']
s.query_api('restaurants','isla vista, ca')

Search limit defaults to 3; we can change it to 10. Then search for your favorite food:

In [None]:
s.SEARCH_LIMIT = 10 ## otherwise defaults to 3 results
out = s.search(s.API_KEY, 'mexican', 'goleta, ca');
#out

### Custom package: FRED API

[Federal Reserve Economic Data (FRED)](https://fred.stlouisfed.org/) provides extensive economic data. 

* [API documentation](https://research.stlouisfed.org/docs/api/) including other offerings
* [GeoFRED](https://geofred.stlouisfed.org/)
* [FRED documentation](APIhttps://research.stlouisfed.org/docs/api/fred)
* [Third-party python package for FRED](https://github.com/avelkoski/FRB)

Some providers may offer a package that is not already installed. Some packages can be installed to your user directory: `pip install --user FRB` run this line in your terminal

After installing and optaining an API key, data can be retrieved using the library. Following retrieves from [this page](https://research.stlouisfed.org/docs/api/fred/category_series.html).

In [None]:
import json
from fred import Fred

## API key is needed
fr = Fred(api_key=m.apikeys['fred'], response_type='dict')

params = {
         'limit':2,
         'tag_names':'trade;goods'
         }

res = fr.category.series(125, params=params)

res


Let's download [real gross domestic product series](https://fred.stlouisfed.org/series/A191RL1Q225SBEA) using the installed API.

In [None]:
params = {
    'realtime_start':'1947-04-01', 
    'realtime_end':'2018-01-01'
}
out = fr.series.observations('A191RL1Q225SBEA', response_type='df', 
                         params=params)
#out

In [None]:
p = figure(x_axis_type="datetime", 
           plot_width=750, plot_height=400, 
           title = "Real gross domestic product series")
p.line(out.date, out.value)

show(p)