# Week 7 - APIs

**Optional Reading:** Data Wrangling with Python, Chapter 13 (pages 361 - 375)
<img align="right" style="padding-right:10px;" src="figures_7/Data_Wrangling_Book.jpg" ><br>

**Overview:**<br>

* Application Programming Interface (API)

  * What is an API?
  * Who creates APIs?
  * When would I use an API?
    
* Use Case: REST API

  * NASA: Astronomy Picture of the Day (APOD)
     * APOD: Current Day
     * APOD: Specific Day
     
* Use Case: Streaming API

  * Twitter's streaming API
     * Specific number of 'live tweets'
     * Streaming of Twitter data   

# Application Programming Interface (API)

An application programming interface (API) is "a set of protocols used by programmers to create applications for a specific operating system or to interface between the different modules of an application." <br> 
https://www.dictionary.com

It sounds complicated and complex based on that definition, but it is not.


## What is an API?
In basic terms, APIs allow applications to communicate with each other via the internet and governs access to information.
 
<img align="left" style="padding-right:10px;" src="figures_7/API-communication.png" ><br>
https://medium.com/@perrysetgo/what-exactly-is-an-api-69f36968a41f

## Who creates APIs?
"Large tech companies, especially social media companies frequently make their aggregate data available to the public, but APIs are also maintained by government organizations, conferences, publishing houses, software startups, fan groups, eSports leagues and even individuals, in order to share anything from social media content to trivia questions, rankings, maps, song lyrics, recipes, parts lists and more.

In short, any person or organization that collects data might have an interest in making that data available for use by a different app."

https://medium.com/@perrysetgo/what-exactly-is-an-api-69f36968a41f

## When would I use an API?
One of the most common APIs that students use to gain an understanding of how to work with an API is Twitter. 

Imagine that as a student; you are asked to complete an analysis of all the tweets mentioning #RegisUniversity. Twitter's internal systems store that information, but you do not have access to those systems. 

You have a couple of options at this point:
   * Search a variety of social media websites looking for an employee within Twitter and ask their help in retrieving this information for you to use.
   * Contact Twitter directly and request access to their systems.
   * Ask Twitter to send you a copy of the data you are looking for
   * Any number of less legal methods of obtaining your data
   
   <strong> OR </strong> <br><br>
   
   * You could use the API that Twitter provides for public use.  

# Use Case: REST API
There are different types of APIs. The most common type is known as a REST API. <b>REST</b> stands for <b>Re</b>presentational <b>S</b>tate <b>T</b>ransfer 

A REST API uses the HTTP requests to GET, PUT, POST and DELETE data. 

## NASA: Astronomy Picture of the Day (APOD)
NASA releases a lot of data to the general public, including an Astronomy Picture of the Day (APOD). To get the picture you need to issue an HTTP GET request and parse the JSON that is returned. 

This task is made easier with the requests package. Although requests is not standard in python3, it is becoming a standard package in industry.

### APOD: Current Day

<div class="alert alert-block alert-success">
<b>Installation - requests</b> <br>
    pip install requests 
</div>

In [None]:
pip install requests

In [None]:
import requests

Using the requests package is very easy. 

For example, NASA gives us an example URL with a demo security key that can be used to query the APOD interface. 
Generally speaking, the creator of an API will require individuals to register and receive their own credentials to use their API. Given the popularity of the APOD API, NASA has provided the public with a guest credential, 'DEMO_KEY' to view 30 images (based on ip address) each day.  of 'DEMO_KEY'. 

The request url, at this point, would look like: https://api.nasa.gov/planetary/apod?api_key=DEMO_KEY

Everything after the '?' is a variable and its associated value. In this case, NASA expects a variable called "api_key" which we will supply the guest credential of 'DEMO_KEY'.

Many times these variables (or 'parameters') will come from user input and so should not be hard-coded. Below is an example of making a request, with the parameter(s) contained in a dictionary variable:

In [None]:
# building and sending our request
apiKey = {'api_key':'DEMO_KEY'}

# storing the results in a variable
result =  requests.get('https://api.nasa.gov/planetary/apod',params=apiKey)

The response that was returned from our request has specific attributes associated with it. Based on the documentation for APOD, the response will return the requesting url that was used. To look at the URL that was used in the original request.

In [None]:
print(result.url)

To look at the entire text of the response that was returned from our request.

In [None]:
print(result.text)

Hmmm... That's a bit messy and hard to read.  Remember from above, we know that the response is JSON. Fortunately for us, JSON data can go straight into a dictionary. Requests can handle JSON data by calling json() function.

In [None]:
dict_current = result.json()

Let's get a list of all the keys in our JSON dictionary.

In [None]:
dict_current.keys()

Let's try to access several of the keys within our dictionary.

In [None]:
dict_current['date']

In [None]:
dict_current['explanation']

So far so good, but we still haven't seen a picture. One would assume that "Picture of the Day" would have an actual picture associated with the data. 

After consulting the documentation for the APOD API on (https://github.com/nasa/apod-api/blob/master/README.md), we are able to determine that 'url' contains a link to the APOD.

In [None]:
dict_current['url']

Time to actually view the picture. Notice that this url is pointing to a jpg image. Luckily for us, we have already seen how to view a jpg images during this course. 

In prior weeks, we used Image(filename = 'some_filename.jpg').  This week, we will be using Image() with a url.

<div class="alert alert-block alert-info">
<b>Be patient!</b> It might take a bit to get the picture to display
</div>

In [None]:
from IPython.display import Image

Image(dict_current['url'])

### APOD: Specific Day

By changing the values that we used in our original request, we should be able to view a APOD for a specific date.  

The documentation on NASA's API page (https://api.nasa.gov/) details the query parameters for the APOD API.

<img align="left" style="padding-right:10px;" src="figures_7/APOD-parameters.png" ><br>


Let's try using the datetime package within python and the above information to get different pictures.

In [None]:
import datetime
now = datetime.datetime.now()
print(now.date())

To get a different day/week/month, etc. we can use the timedelta() function:

In [None]:
yesterday = now - datetime.timedelta(days = 1)
print(yesterday.date())

So, to get yesterday's picture:

In [None]:
# using a dictionary for the query parameters
data = {'api_key':'DEMO_KEY', 'date':yesterday.date()}
data

In [None]:
# using the paramas argument in our request
result =  requests.get('https://api.nasa.gov/planetary/apod',params=data)

# create a dictionary for yesterday's picture
dict_yesterday = result.json()

# verify the date
print(dict_yesterday['date'])

Viewing yesterday's picture

In [None]:
Image(dict_yesterday['url'])

Okay, one more, how about we get the picture from a specific date. Let's use Jan 01, 2015.

<div class="alert alert-block alert-warning">
<b>Note:</b> There are some dates that do not have a APOD, if the date you choose isn't available NASA will return the APOD for the current date. https://apod.nasa.gov/apod/archivepix.html has a list of the dates with available pictures.
</div>

We need to make a <strong>date</strong> object to pass to the APOD API.

In [None]:
# datetime format is yyyy-mm-dd
my_date = datetime.date(int('2015'),int('01'),int('01'))
my_date

In [None]:
# same process as above
data = {'api_key':'DEMO_KEY', 'date': my_date}
results =  requests.get('https://api.nasa.gov/planetary/apod',params=data)
dict_my_date = results.json()
Image(dict_my_date['url'])

In [None]:
new_date = datetime.date(int('2010'),int('01'),int('01'))

# same process as above
data = {'api_key':'DEMO_KEY', 'date': new_date}
results =  requests.get('https://api.nasa.gov/planetary/apod',params=data)
dict_new_date = results.json()
Image(dict_new_date['url'])

# Use Case: Streaming API
In the prior use case, we covered REST APIs.  Now we are going to take a look at Streaming APIs. The main difference between these two types of APIs is the timing/revelance of the data being returned. A streaming API returns 'live' data. 

One of the easiest streaming API is Twitter.

## Twitter's streaming API
To fully reproduce this section of the FTE, you will need to obtain your own credentials from Twitter. You might have already done this for a prior MSDS class.  If so, you can reuse those credentials for this exercise.

To create a Twitter developer account, if you do not already have one, following these steps:

Go to https://developer.twitter.com/en/apps and log in with your Twitter user account.
* Click “Create an app”
* Fill out the form, and click “Create”
* A pop up window will appear for reviewing Developer Terms. Click the “Create” button again.
* In the next page, click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret” from the Consumer API keys section.
* Scroll down to Access token & access token secret section and click “Create”. Then copy your “Access token” and “Access token secret”.

Once you have established you Twitter developer credentials, you will need to install the tweepy package.

### Specific number of 'live tweets'

<div class="alert alert-block alert-success">
<b>Installation:: tweepy</b> <br>
    pip install tweepy
</div>

In [1]:
import tweepy
import json

<div class="alert alert-block alert-danger">
<b>Important::</b> You can remove the following cell and use the commented out cell just below to load your Twitter credentials. The auth.csv will not be provided to you. Please notice that the individual credential fields are stored as strings.
</div>

In [7]:
# ### You should uncomment this cell and use this one with your credentials

# # setting up some variables for Twitter. 
consumer_key = 'fcxfNISibidvpidEQac8oWvoQ'
consumer_secret = '2gDtvaHlYB550LSrO6UTNmUEK7ndrECWVVj60YzezoIvbV1wla'
access_token = '905700504277213185-6zc2RVM3rxKSQPFWRKTeK4NXMOKyaIY'
access_token_secret = 'M4H74NZuyWmMqdMHsbxRV5pZCoQCVkojAQpxoFzb4Wy6U'

Now that we have the individual user's credentials established and stored in variables, we can continue with setting up the connection to Twitter using tweepy.

Parameters that we will use:
* 'wait_on_rate_limit= True' will make the api to automatically wait for rate limits to replenish
* 'wait_on_rate_limit_notify= True' will make the api  to print a notification when Tweepyis waiting for rate limits to replenish

In [11]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True, compression=True)

Time to go get a tweet. The following loop will allow us to gather a specific number of tweets. 

Since we didn't specific any restrictions on the types of tweets to gather, our output can include retweets, original posts, etc. for the user's credentials that are being used.

In [18]:
for status in tweepy.Cursor(api.home_timeline).items(2):
    print('*** Tweet ***\n',status._json)

We can see in the above output that we got 2 tweets back. Which is good since we asked for 2 tweets to be returned. We can also see that the dates for these 2 tweet are recent. However, we still aren't "streaming" live data.

### Streaming of Twitter data

The first thing we will do is create a user defined listener class. (http://docs.tweepy.org/en/latest/streaming_how_to.html) 

For our example, we will only work with the on_status and on_error methods associated with tweepy's StreamListener class. 

More specifically, we will:
   * Create a listener that prints the text of any tweet that comes from out API (on_status method)
   * Handle errors the come from our API (on_error method)

Here is our very basic user defined StreamListener:

In [None]:
class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)
        
    def on_error(self, status_code):
        if status_code == 420:
            return False

Time to "open the gates" and start streaming data.  To make things a bit more interesting, let's filter our tweepy data to only consider tweets in English (languages=["en"]) and related to Google (track=["google"]).

<div class="alert alert-block alert-danger">
<b>Important::</b> You will have to kill the following cell at some point!  <i>This will run as long as you allow it to.</i>
</div>

In [None]:
stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)

stream.filter(track=["google"],languages=["en"])