In [164]:
import requests, json, os, warnings, re
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import altair as alt
from altair_data_server import data_server

# Web and Cloud Computing (DATA 534): Lab 2
## General Lab Instructions


- This assignment is to be completed in python, submitting both a `.ipynb` file (you can add your answers directly to this one) along with a rendered `.md`.
- I added an Intro section to help you with the basics for this lab.

# Intro

You have been using the numpy package a lot, haven't you? Let's see the people that have been contributing to this package. Using Github API, let's request the list of contributors of the numpy package. Let's use all the default parameters for now.  From the documentation, we can see that we make all our requests to `api.github.com`. But how to request the contributors specifically? Let's go back to the documentation: [list contributors](https://docs.github.com/en/rest/reference/repos#list-repository-contributors) (please read it - it is just like three lines). Read it? Really? Are you sure? Ok, let's move on.

So the API is saying that, to request the contributors we make a `GET` request `GET /repos/:owner/:repo/contributors`.
The `:` is placeholder for the specific values that you want. In the case of numpy, `:owner` = `numpy` and `:repo`=`numpy`. Let's make our request. 



In [3]:
domain = "https://api.github.com" # The domain name
request_contributors = "/repos/numpy/numpy/contributors" # our request

response = requests.get(domain + request_contributors)

print("Status: {} - {}".format(response.status_code, response.reason))

Status: 200 - OK


Good, it seems that our request was successful. Let's check it out the data we got. 

In [4]:
response.json()[0] 

{'login': 'charris',
 'id': 77272,
 'node_id': 'MDQ6VXNlcjc3Mjcy',
 'avatar_url': 'https://avatars.githubusercontent.com/u/77272?v=4',
 'gravatar_id': '',
 'url': 'https://api.github.com/users/charris',
 'html_url': 'https://github.com/charris',
 'followers_url': 'https://api.github.com/users/charris/followers',
 'following_url': 'https://api.github.com/users/charris/following{/other_user}',
 'gists_url': 'https://api.github.com/users/charris/gists{/gist_id}',
 'starred_url': 'https://api.github.com/users/charris/starred{/owner}{/repo}',
 'subscriptions_url': 'https://api.github.com/users/charris/subscriptions',
 'organizations_url': 'https://api.github.com/users/charris/orgs',
 'repos_url': 'https://api.github.com/users/charris/repos',
 'events_url': 'https://api.github.com/users/charris/events{/privacy}',
 'received_events_url': 'https://api.github.com/users/charris/received_events',
 'type': 'User',
 'site_admin': False,
 'contributions': 6198}

Look how cool is that, the `response` object already has a built-in JSON decoder. 
Above we printed just the first contributor to be easier to read. But let's save this JSON response into a variable, and check out how many contributiors we have.

In [5]:
data = response.json()
print("Number of contributors:", len(data))

Number of contributors: 30


Hmmm, weird! Something is not right here. Is it possible that a open-source project as big as `numpy` has only 30 contributors? Let's investigate this. In the [repository's page](https://github.com/numpy/numpy) we can see that the number of contributors is over 1200! 

So, what is going on here? Why didn't the API returned all the contributors? Well, Github uses [Pagination](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#pagination) (again, please read this - it is very short). So, basically, by default, Github uses pagination with 30 elements per page. Oh, now things start make sense. Also, there should be pagination information in the `response`'s headers, more precisely, in the `link` field, let's check it out.

In [6]:
response.headers['link']

'<https://api.github.com/repositories/908607/contributors?page=2>; rel="next", <https://api.github.com/repositories/908607/contributors?page=15>; rel="last"'

So, in the headers of the response, the API provides the hyperlink to the next page of results. Cool!

But we also saw in [Pagination](https://docs.github.com/en/rest/overview/resources-in-the-rest-api#pagination) that we can change this default. In order to do that, all we need to do is to pass parameters to the server. This can be done by including `?param1=value` after your request. For example, we could use:

1. domain: `https://api.github.com`
2. request: `/repos/numpy/numpy/contributors`
3. parameter: `?per_page=100`

which results in https://api.github.com/repos/numpy/numpy/contributors?per_page=100 (go ahead, click it!).

Although you can just build the url and pass it to `requests.get`, `requests` makes it easier for you to deal with parameters. Let's change the number of parameter in our request now using requests. 

In [87]:
# This is what we defined before
domain = "https://api.github.com" # The domain name
request_contributors = "/repos/numpy/numpy/contributors" # our request

# Now, we can define the set of parameter as a dictionary 
params = {"per_page": 100}

# Now we call requests
response2 = requests.get(domain + request_contributors, params)

print("Status: {} - {}".format(response2.status_code, response2.reason))

int

Cool, the request was successful. Let's check our new data now.

In [30]:
data = response2.json()
len(data)

100

It worked, now we have 100 contributors in the first page! But how did `requests` handle this for us? No mystery. Let's check the url used by `requests`.

In [27]:
response2.url

'https://api.github.com/repos/numpy/numpy/stats/contributors?per_page=100'

Exactly the one we built above. All `requests` did was to merge everything together for us, so our code can be more organized. But still, pretty cool!

In [9]:
# Let's check the number of contributions of the first 10 users
for repo in data[0:10]:
    print(repo['login'] + " - " + repo['contributions'].__str__())

charris - 5478
teoliphant - 2065
mattip - 1809
cournape - 1525
seberg - 1370
eric-wieser - 1238
pearu - 1106
rgommers - 926
pv - 817
mwiebe - 759


Ok, now you know how to make a request to Github and also how to pass parameters with your request. One last thing before you jump to your exercises. Here we only worked with the `GET` request. But you will frequently need to use other verbs, such as

1. `PUT` - usually used to change things
2. `POST` - usually used to create things
3. `DELETE` - usually used to remove things

Also, different APIs will work differently. So, you'll always need to consult the documentation. In fact, Github has a version 4 API that is not a REST API, but a [GraphQL API](https://developer.github.com/v4/), which seems to be increasing in popularity.

## Exercise 0 - Warm up.

We could also fetch some statistics per contributor, take a look [here](http://docs2.lfe.io/v3/repos/statistics/#contributors). Your job is to get the contributors list of the same `numpy` repository with additions, deletions, and commits counts using the Github API.

In [84]:
# TODO

domain = "https://api.github.com" # The domain name
request_contributors = "/repos/numpy/numpy/stats/contributors" # our request

# Now, we can define the set of parameter as a dictionary 
params = {"per_page": 100}

# Now we call requests
response2 = requests.get(domain + request_contributors)

data2 = response2.json()

for i in data2:
    print(i['author']['login'])
    sum_a = 0
    sum_d = 0
    sum_c = 0
    for k in i['weeks']:
        sum_a += int(k['a'])
        sum_d += int(k['d'])
        sum_c += (k['c'])
    print("Additions: ", sum_a)
    print("Deletions: ", sum_a)
    print("Commits: ", sum_a)

cmarmo
Additions:  0
Deletions:  0
Commits:  0
fperez
Additions:  0
Deletions:  0
Commits:  0
MarsBarLee
Additions:  0
Deletions:  0
Commits:  0
liang3zy22
Additions:  0
Deletions:  0
Commits:  0
thomasjpfan
Additions:  0
Deletions:  0
Commits:  0
Mousius
Additions:  0
Deletions:  0
Commits:  0
arinkverma
Additions:  0
Deletions:  0
Commits:  0
cowlicks
Additions:  0
Deletions:  0
Commits:  0
bonn0062
Additions:  0
Deletions:  0
Commits:  0
j-towns
Additions:  0
Deletions:  0
Commits:  0
hoodmane
Additions:  0
Deletions:  0
Commits:  0
touqir14
Additions:  0
Deletions:  0
Commits:  0
chatcannon
Additions:  0
Deletions:  0
Commits:  0
nouiz
Additions:  0
Deletions:  0
Commits:  0
jbrockmendel
Additions:  0
Deletions:  0
Commits:  0
zerothi
Additions:  0
Deletions:  0
Commits:  0
ericmariasis
Additions:  0
Deletions:  0
Commits:  0
chanley
Additions:  0
Deletions:  0
Commits:  0
zoj613
Additions:  0
Deletions:  0
Commits:  0
bharatr21
Additions:  0
Deletions:  0
Commits:  0
gfyoung
Addit

# Exercise 1

In this exercise we are going to build a very simple python wrapper for the [Alpha Vantage API](https://www.alphavantage.co/). It is a free API to collect stock market data, however it still requires authentication. So, get your [free api key today! ](https://www.alphavantage.co/support/#api-key) Just fill in the form, and get your key (careful, the key is generated in the same webpage, just below the "Get free api key" button). Save this key, it is kind of important for this exercise. Cool! We are set. 

Just one important note, here we are dealing with a real API, not just a Toy one. They are kind enough to give us free access to their API, and the only thing they ask is for you to **"make API requests sparingly (up to 5 API requests per minute and 500 requests per day) to achieve the best server-side performance"**. So, let's try to follow this request.

A useful way to work with api keys is to set enviromment variables with the value of the key. Then you can just retrieve them using python `os` package. To create new environment variables:

- on windows: open Powershell and type: `[Environment]::SetEnvironmentVariable('variable_name', 'variable_value', 'User')`, where `variable_name` is the name for your variable (e.g., ALPHA_API), `variable_value` is your api key, and `Users` is specyfing the scope of the variable (e.g., if you want your variable available to all the users you would use `Machine` instead. But here we don't want that). Note that the variable will be available only in the next sessions.

- on Mac/Linux: `export variable_name=variable_value`. This will make the variable available only in the current session. As far as I know, to add a variable persistently in linux/Mac you need to store in the .bash_profile (but you might want to check that out).

- To access an environment variable in a python session you run:
    ```
        import os
        os.environ['variable_name']
    ```

Although there are some wrappers already defined for Alpha Vantage API, this is for you to learn. In the future, if you find a useful API that has no wrapper in either R or Python, you could make your own wrapper, create a package and publish it. The community would be grateful. 

The idea of a wrapper is to create a set of functions to handle the requests for us, so it is easier to work with the API in a specific language.

Now, take a look at the API [documentation](https://www.alphavantage.co/documentation/).

## Exercise 1.0 - A free helper method.

Create a function to build and return your request. I'll give you this one for free. Feel free to change this function if you want to.

In [99]:
def data_request(params):
    url = "https://www.alphavantage.co/query"
    params['apikey'] = "0VGGFWJJ4OK67OE3"
    params['outputsize'] = 'full'

    return requests.get(url, params)

For all the following exercises use `outputsize= "full"` (if relevant).

## Exercise 1.1

Let's make a wrapper function to retrieve a [Stock Time Series](https://www.alphavantage.co/documentation/#time-series-data). For this exercise, you can focus only in the [daily](https://www.alphavantage.co/documentation/#daily), [weekly](https://www.alphavantage.co/documentation/#weekly), and [monthly](https://www.alphavantage.co/documentation/#monthly), periods. Write a function that receives the name of a stock, and the frequency and returns the data in a pandas' dataframe.  Then, using this function to retrieve the data, plot the time series of weekly `low` price, weekly `close` price and weekly `high` price, all in the same plot. You can pick whatever stock you want. ([Hint](https://pandas.pydata.org/pandas-docs/stable/user_guide/visualization.html)). 

*Remember the server's guideline. Don't make too many requests.*

In [182]:
def Get_TimeSeries(stock, period):
    """
    A wrapper function to obtain the the time series of a the desired stock from the Alpha Vantage API.
    
    Parameters:
    -----------
    stock: the desired stock symbol (e.g., "GOOG", "AMD", "FB", "INTC").
    period: "daily", "weekly", "monthly"
    
    Returns:
    --------
    A pandas dataframe with the columns `open`, `high`, `low`, `close`, and `volume`.
    """
    
    # Process here the parameter `period`
    # which affects the function parameter in the API
    
    # TODO
    if period == 'daily':
        fun = "TIME_SERIES_DAILY"
    elif period == 'weekly':
        fun = "TIME_SERIES_WEEKLY"
    elif period == 'monthly':
        fun = "TIME_SERIES_MONTHLY"
    else:
        print('Incorrect period')
    
    # Now let's create a dictionary to hold the parameters of 
    # our request.
    
    # TODO
    params = {'function': fun, 'symbol': stock} 
    
    # Requesting the data
    
    # TODO
    
    resp = data_request(params)
    
    # Check if the request was successful, if not raise an
    # exception
    
    # TODO
    if resp.status_code == 200:
        print('Request call was 200 status')
    else:
        print('Issue with request, status was not 200.')
        print(resp.reason)
        raise Exception("An error occurred")
    
    
    # Create the pandas dataframe (see: pd.read_json() )
    
    # TODO
    d = resp.json()
    data = pd.DataFrame(d[list(d.keys())[1]])

    data = data.transpose()
    data = data.rename(columns={'1. open':'open','2. high':'high', '3. low':'low','4. close': 'close', '5. volume':'volume'})
    return data

In [183]:
data = Get_TimeSeries('AAPL', 'weekly')

alt.Chart(data.reset_index()).mark_line().encode(
    x=alt.X('index:T', title='Date'), 
    y=alt.Y('close:Q', title='Closing Price')).properties(title = 'Stock Price Over Time')

Request call was 200 status


## Exercise 1.2

Similarly to the previous exercise, let's make a wrapper function but this time to retrieve the [Foreign Exchange](https://www.alphavantage.co/documentation/#fx) data. Don't forget providing a docstring for your function. Again, you can focus on processing only `daily`, `weekly`, and `monthly`. As in Exercise 1.1, your function must return a pandas dataframe. Plot the monthly `low`, monthly `close` and monthly `high` time series from `USD` to `CAD`. Look carefully at the documentation so you can properly process the parameters.

*Remember the server's guideline. Don't make too many requests.*

In [239]:
# TODO
def Get_FXseries(from_symbol, to_symbol, period):
    """
    A wrapper function to obtain FX data for a given period from the Alpha Vantage API.
    
    Parameters:
    -----------
    from_symbol: the desired currency to convert from (eg. USD, EUR, CAD)
    to_symbol: the desired currency to convert to (eg. USD, EUR, CAD)
    period: "daily", "weekly", "monthly"
    
    Returns:
    --------
    A pandas dataframe with the columns `open`, `high`, `low`, `close`.
    """

    if period == 'daily':
        fun = "FX_DAILY"
    elif period == 'weekly':
        fun = "FX_WEEKLY"
    elif period == 'monthly':
        fun = "FX_MONTHLY"
    else:
        print('Incorrect period')
    
    params = {'function': fun, 'from_symbol': from_symbol, 'to_symbol': to_symbol} 
    

    resp2 = data_request(params)
    
    # Check if the request was successful, if not raise an
    # exception

    if resp2.status_code == 200:
        print('Request call was 200 status')
    else:
        print('Issue with request, status was not 200.')
        print(resp2.reason)
        raise Exception("An error occurred")
    
    d = resp2.json()
    data = pd.DataFrame(d[list(d.keys())[1]])

    data = data.transpose()
    data = data.rename(columns={'1. open':'open','2. high':'high', '3. low':'low','4. close': 'close'})
    
    return data

In [238]:
data2 = Get_FXseries('USD', 'CAD', 'weekly')

high = alt.Chart(data2.reset_index()).mark_line(color='green').encode(
    x=alt.X('index:T', title='Date'), 
    y=alt.Y('high:Q', title='FX Rate', scale=alt.Scale(zero=False)), tooltip=['index:T', 'high:Q']).properties(title='High')

low = alt.Chart(data2.reset_index()).mark_line(color='red').encode(
    x=alt.X('index:T', title='Date'), 
    y=alt.Y('low:Q', title='FX Rate', scale=alt.Scale(zero=False)), tooltip=['index:T', 'low:Q']).properties(title='Low')

close = alt.Chart(data2.reset_index()).mark_line(color='blue').encode(
    x=alt.X('index:T', title='Date'), 
    y=alt.Y('close:Q', title='FX Rate', scale=alt.Scale(zero=False)), tooltip=['index:T', 'close:Q']).properties(title='Close')

combine = (high + low + close).properties(title='All Layered')

alt.vconcat(alt.hconcat(high, low, close), combine) 

## Exercise 1.3 (Optional)

Modify your function in Exercise 1.2 to also process a [realtime exchange rate](https://www.alphavantage.co/documentation/#currency-exchange) request in case the user passes `period="live"`.

In [14]:
# TODO

## Exercise 1.4 (Optional)

Create a wrapper function for the [cryptocurrency](https://www.alphavantage.co/documentation/#digital-currency). Again, you can consider only daily, weekly, and monthly, frequencies.

In [15]:
# TODO

## Exercise 1.5 - Last part!!!

This time, just so we can also have some technical indicators to play with, create a wrapper function to retrieve some [technical indicators data](https://www.alphavantage.co/documentation/#technical-indicators). Note that some of the indicators require different parameters. So, to keep it simple, let's focus only on technical indicator with the same required parameters as `SMA`. You can assume that it is the user's responsability to use the function with an appropriate technical indicator. As usual, your function must return a pandas dataframe. Do not forget to provide a docstring for your function. Retrieve two technical indicators using your function and plot their time series (in different plots).

*Remember the server's guideline. Don't make too many requests.*

In [252]:
# TODO
def Get_T_indicators(function, symbol, period):
    """
    A wrapper function to obtain technical indicator data from the Alpha Vantage API.
    
    Parameters:
    -----------
    function: the indicator type (eg. SMA, EMA)
    symbol: symbol of ticker (eg. AAPL, GOOGL, MSFT)
    period: "daily", "weekly", "monthly"
    
    Returns:
    --------
    A pandas dataframe with the columns as closing value of the technical indicator.
    """
    
    series_type = 'close'
    time_period = 200
    
    params = {'function': function, 'symbol': symbol, 'interval': period, 'time_period': time_period, 'series_type': series_type} 
    

    resp3 = data_request(params)
    
    # Check if the request was successful, if not raise an
    # exception

    if resp3.status_code == 200:
        print('Request call was 200 status')
    else:
        print('Issue with request, status was not 200.')
        print(resp3.reason)
        raise Exception("An error occurred")

    d = resp3.json()
    data = pd.DataFrame(d[list(d.keys())[1]])

    data = data.transpose()    
    return data


In [266]:
data3 = Get_T_indicators('SMA', 'GOOGL', 'monthly')

val = list(data3.columns)[0] + ':Q'

alt.Chart(data3.reset_index()).mark_line().encode(
    x=alt.X('index:T', title='Date'), 
    y=alt.Y(val, title='Closing Indicator Value', )).properties(
    title = list(data3.columns)[0] + ' Indicator Over Time')


In [267]:
from canvasutils.submit import submit, convert_notebook

# Uncomment and run if you want to automatically convert your notebook
convert_notebook("Lab 2 - DATA 534.ipynb")

Notebook successfully converted! 


## Final Comments

This lab is just to give you a taste of what you can do with APIs.There are tons of APIs out in the wild. Take a look at [this Github page](https://github.com/toddmotto/public-apis) for example. Now, APIs are not just for connecting to data sources. For example, there's a [Google Calendar API](https://developers.google.com/google-apps/calendar/) for programmatically interacting with Google Calendar, an [eBay API](https://go.developer.ebay.com/what-ebay-api) for interacting with eBay, etc. The Slack API allows a user to programmatically join channels. The GitHub API also isn't just for grabbing data; you can use it to programmatically open issues, make commits, etc. This is actually how we use it here in MDS. Imagine if we had to create repositories for each student manually? 

Always take a look at the documentation of the API you are going to use and have fun!