ISRC Python Workshop: Using APIs

___Getting data using APIs___

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

---

#### Introduction

APIs, application programming interfaces, are services designed for easier software developments. APIs can be in many different forms, including software libraries and database systems. Generally, you can think of APIs as Lego pieces used for specific models. I found a somewhat brief but interesting read on APIs [here](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/).

Before we dive into playing with APIs using Python, let's first try some simple examples.

##### [Google Maps Geocoding API](https://developers.google.com/maps/documentation/geocoding/start)

We use this all the time whenever we are using Google Maps to navigate. When we search for some place ny typing its name, we get the location. Instead of using the app, let's use the underlying API directly:

API Request Format:

https://maps.googleapis.com/maps/api/geocode/outputFormat?parameters

where:
- outputFormat can either be [JSON](https://www.json.org/) or [XML](https://www.w3schools.com/xml/default.asp)
- parameters: `address` of interest AND your [API key](https://stackoverflow.com/questions/1453073/what-is-an-api-key).

As a simple example, let's try to search where the university is by: https://maps.googleapis.com/maps/api/geocode/json?address=university+of+iowa.

And now we get:
```json
{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "Iowa City",
               "short_name" : "Iowa City",
               "types" : [ "locality", "political" ]
            },
            {
               "long_name" : "Johnson County",
               "short_name" : "Johnson County",
               "types" : [ "administrative_area_level_2", "political" ]
            },
            {
               "long_name" : "Iowa",
               "short_name" : "IA",
               "types" : [ "administrative_area_level_1", "political" ]
            },
            {
               "long_name" : "United States",
               "short_name" : "US",
               "types" : [ "country", "political" ]
            },
            {
               "long_name" : "52242",
               "short_name" : "52242",
               "types" : [ "postal_code" ]
            }
         ],
         "formatted_address" : "Iowa City, IA 52242, USA",
         "geometry" : {
            "location" : {
               "lat" : 41.6626963,
               "lng" : -91.5548998
            },
            "location_type" : "GEOMETRIC_CENTER",
            "viewport" : {
               "northeast" : {
                  "lat" : 41.6640452802915,
                  "lng" : -91.55355081970851
               },
               "southwest" : {
                  "lat" : 41.6613473197085,
                  "lng" : -91.55624878029151
               }
            }
         },
         "place_id" : "ChIJ__-_SfJB5IcRYTLnsT_j0Us",
         "types" : [ "establishment", "point_of_interest", "university" ]
      }
   ],
   "status" : "OK"
}
```

This is the return object in JSON format, where there are key-value pairs to store specific values for different attributes. For example, the ___formatted address___ is ___Iowa City, IA, 52242, USA___

Note that API key is not needed here if we only make a couple of requests. If you want to build an app or query more often, you will need to pay attention to the [rate limit](https://developers.google.com/maps/documentation/geocoding/usage-limits)

##### [Weather API](https://openweathermap.org/api)

As a second example, let's try get some weather data. This name, we will need an API key!

Whenever we go to a webpage with list of API choices, we should first find what we really want. Suppose we want to find out the current weather data, we will go to the [___api doc___ for that API](https://openweathermap.org/current). Let's try the first method: getting weather by city name:

API call:

- https://api.openweathermap.org/data/2.5/weather?q={city}

- https://api.openweathermap.org/data/2.5/weather?q={city},{country}

Parameters:
___q___ city name and country code divided by comma, use [___ISO 3166 country codes___](https://en.wikipedia.org/wiki/ISO_3166-1#Officially_assigned_code_elements)

Examples of API calls:

- https://api.openweathermap.org/data/2.5/weather?q=London

- https://api.openweathermap.org/data/2.5/weather?q=London,uk

This time, we get an error without an API key, saying that:

```json
{"cod":401, "message": "Invalid API key. Please see http://openweathermap.org/faq#error401 for more info."}
```

Note that this is also a JSON object, with an error code of 401 and an error message.

_I just found that it takes up to 10 minutes for new accounts' keys to be activated. For this reason, let's use my key: `a236f384f5bced47bbba86335cdb1d2a`, which will be deleted after this workshop_

Let's try to get an API key [here](http://openweathermap.org/appid). After creating an account, you are able to find an API key [here](https://home.openweathermap.org/api_keys). For privacy and security issue, I will save my API key locally in a file called `weather_keys.csv`. Now that you have the key, you can run the following line in your browser:

https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=apikey

In our case, it should be https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a

Looking at the structure of the API call, we know that different parameters are seperated by a `&` sign.

Overall we see that this is very easy and straightforward.

---

#### Send API requests in Python

While the use of APIs are pretty simple, we might not want to do all these copy and paste manually. Python can help us to send requests and parse results automatically with less human supervision.

To do this, we need to know how to send requests first. We will use an amazing package called [`requests`](http://docs.python-requests.org/en/master/). If you did not have it, please install it by `pip` or `conda`:

```bash
$ pip install requests
```

or 

```bash
$ conda install requests
```

In [6]:
# Let's load the library first
import requests

Using weather as an example, we should first know what is the request URL (where the request goes to), with what inputs (e.g., API key and city name). In our case, we know that our API key and the city to query so we can do the following.

In [9]:
weather_url = "https://api.openweathermap.org/data/2.5/weather"
city_name = "Shanghai"
print(weather_url)
print(city_name)
print(apikey)

https://api.openweathermap.org/data/2.5/weather
Shanghai
a236f384f5bced47bbba86335cdb1d2a


Now, we should let `requests` do its work.

In [11]:
r = requests.get(weather_url, params={'q': city_name, 'APPID': apikey})
r.url # `requests` help us encode the URL in the correct format

'https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a'

In [14]:
r.status_code # 200 means success

200

As a side note, the `requests.get` method here means we want to use `GET` method, as opposed to `POST` method. The former refers to obtaining data, whereas the latters refers to modifying data. See [this post](https://www.w3schools.com/tags/ref_httpmethods.asp) for more details.

To get the JSON response, we call `r.json()` method.

In [12]:
result = r.json()
result

{'base': 'stations',
 'clouds': {'all': 0},
 'cod': 200,
 'coord': {'lat': 31.23, 'lon': 121.49},
 'dt': 1518033600,
 'id': 1796236,
 'main': {'humidity': 74,
  'pressure': 1025,
  'temp': 272.67,
  'temp_max': 273.15,
  'temp_min': 272.15},
 'name': 'Shanghai',
 'sys': {'country': 'CN',
  'id': 7452,
  'message': 0.0044,
  'sunrise': 1517956912,
  'sunset': 1517996094,
  'type': 1},
 'visibility': 10000,
 'weather': [{'description': 'clear sky',
   'icon': '01n',
   'id': 800,
   'main': 'Clear'}],
 'wind': {'deg': 350, 'speed': 3}}

JSON object will be converted into a `dict` type, which is the data structure in Python holding key value pairs. To access certain values, we just access them like a `dict`.

In [15]:
result['name']

'Shanghai'

In [18]:
for key, value in result['main'].items():
     print(key, value) # default temperature is in Kelvin

temp 272.67
pressure 1025
humidity 74
temp_min 272.15
temp_max 273.15


---

#### Use packages: Twitter API as an example

Many web servers have their own APIs ready to use. By using these convenient tools, we can get started right off following their documentations and examples without any manual efforts. We will be using <a href="https://apps.twitter.com/" target="_blank">Twitter API</a> as an example. We will first install this package as shown [here](https://python-twitter.readthedocs.io/en/latest/installation.html)

Then, we have to register an account for Twitter Developer and register an app. Let's go to https://dev.twitter.com/ and get an app togther. <a href="https://python-twitter.readthedocs.io/en/latest/getting_started.html" target="_b lank">Here</a>'s a quick start on how you can do this. After we obtain *__consumer key__*, *__consumer secret__*, *__access token__*, and *__access token secret__*, we are ready to retrieve some data from Twitter!

In [22]:
## suppress warnings
import warnings
warnings.filterwarnings('ignore')

I saved my own keys into a text file with four lines of commented code below:
```
consumer_key = "your_consumer_key"        
consumer_secret = "your_consumer_secret"
access_token = "your_access_token"
access_secret = "your_access_secret"
```

In [25]:
with open("./twitter_keys.csv", "r") as twitter_keys:
    keys = twitter_keys.read()
    consumer_key, consumer_secret, access_token, access_secret = \
        keys.split("\n")[:-1]

In [26]:
## load twitter package, which a well-written Python package for Twitter APIs
import twitter
api = twitter.Api(consumer_key=consumer_key,
                  consumer_secret=consumer_secret,                  
                  access_token_key=access_token,
                  access_token_secret=access_secret)

## check status
api.VerifyCredentials()

User(ID=2740697738, ScreenName=zhiyzuo)

Try to do some simple tasks: get my ownstatuses

In [28]:
statuses = api.GetUserTimeline(screen_name="Zhiya Zuo")
for s in statuses:
    print(s.text)

RT @jcdl2018: We have extended the #jcdl2018 deadline for panels, posters, and demonstrations to February 2, 2018. https://t.co/OrA407HlcT
Say Trello to boards in @Bitbucket Cloud. https://t.co/bZFOhGIDqH #BitbucketTrends
The state and evolution of U.S. iSchools: From talent acquisitions to research... https://t.co/t5wY6YvxQl
RT @kangzhao: Our paper on @iSchools published--The state and evolution of U.S. iSchools: from talent acquisitions to research https://t.co…
@JASIST 😆
#fabric #myowntwitterapp fun app!
test my fabric composer
Test pic http://t.co/C8CJbKg19b
Test url twitter api http://t.co/3eXsFUEZPo
This is a test tweet for the api.
Test tweet


We can also use our `user id`

In [30]:
statuses = api.GetUserTimeline(user_id="2740697738")
for s in statuses:
    print(s.created_at)

Sat Jan 27 07:24:13 +0000 2018
Thu Sep 14 17:55:58 +0000 2017
Tue May 23 15:15:19 +0000 2017
Tue May 23 15:04:10 +0000 2017
Tue May 23 14:56:41 +0000 2017
Mon Oct 31 16:40:09 +0000 2016
Mon Oct 31 04:03:32 +0000 2016
Wed Dec 10 17:11:50 +0000 2014
Wed Dec 10 17:01:40 +0000 2014
Fri Dec 05 22:28:49 +0000 2014
Fri Dec 05 17:54:31 +0000 2014


You can also get a friend list

In [31]:
friends = api.GetFriends()
for f in friends:
    print(f.name)

/r/datasets
JCDL 2018
Yann LeCun
Duncan Watts
Tiago Peixoto
Command Line Magic
Unix tool tip
Data Science Fact
ASIS&T SIG/MET
Nick Street
Yuanyang Liu
Kristina Bigsby
Mendeley Support
Xun Zhou
New York Internships
Santa Fe Institute
Microsoft Academic
Network Science PhDs
JASIST
ASIS&T
iSchools iConference
iSchools
WASD Keyboards
Yong-Yeol Ahn
Lada Adamic
Jure Leskovec
Network Fact
Dan Larremore
Simon DeDeo
Jevin West
Aaron Clauset
John Jairo Rios R.
Michael Lash
Kang Zhao
sharelatex
Lincoln Wang
Xi Wang
qix
Overleaf
Andrew Ng
Iowa Memorial Union
University of Iowa
kevin Garnet


More interestingly, let's go get some tweets from Twitter. Let's try to search for any popular tweets (limit to 20) related to `uiowa` since 12/01/2014 in English.
- See https://dev.twitter.com/rest/public/search for more informaiton on how to construct a query
- How to set `lang` parameter -> https://dev.twitter.com/rest/reference/get/help/languages

In [32]:
results = api.GetSearch(
    raw_query="q=uiowa&result_type=popular&since=2014-12-01&count=20&lang=en")

We only got 15 results though.

In [41]:
len(results)

15

Show all the text in the retrieved tweets, with user screen name highlited

In [33]:
from IPython.display import clear_output
for tw in results:
    print("%s. Tweeted by \033[41m%s\033[0m"%(tw.text, tw.user.screen_name))

Walking in a Hawkeye wonderland. ☃️❄️ https://t.co/ex7PThPrcI. Tweeted by [41muiowa[0m
#Giantviruses may play an intriguing role in evolution of life on Earth @uiowa https://t.co/MLrza20qk1. Tweeted by [41mphysorg_com[0m
Still in awe of @UIDM on becoming only the second school in Dance Marathon history to raise more than $3 million do… https://t.co/qX1UHwwskE. Tweeted by [41muiowa[0m
HAWKEYE GAMEDAY!!
vs. Mich State
  
⏰: 8:05 pm (CT) 
📍: CHA
📺: @ESPN
🎟: https://t.co/8phFJZyKDd
📱:… https://t.co/tztfuMoQal. Tweeted by [41mIowaHoops[0m
.@UIDM makes history. They raised  $3,011,015.24 For The Kids this year, becoming the second school ever to raise m… https://t.co/1PbBrNRHhe. Tweeted by [41muiowa[0m
Weather update: Classes are on as scheduled. The university follows its Extreme Weather Protocol:… https://t.co/lpgUS3Pmpt. Tweeted by [41muiowa[0m
It's #NationalSigningDay! See the #UIResearch that's able to predict where a recruit will commit:… https://t.co/qM3of5O020. Tweeted by

Finally, we can save these text into a file for further analyses. Note that we may want to remove all the newlines within each tweet.

In [38]:
tweet_list = [tw.text.replace('\n', ' ') for tw in results]
tweet_list[0]

'Walking in a Hawkeye wonderland. ☃️❄️ https://t.co/ex7PThPrcI'

In [39]:
import numpy as np

In [40]:
np.savetxt('sample-data/sample_tweets.csv', tweet_list, encoding='utf-8', fmt='%s')

---

#### Conclusions

In this workshop, we went through some examples of using APIs to get various types of data in Python. The last Twitter example is relatively superficial and does not go deep enough to get meaningful data for social media analysis. Here I would like to recommend reading more materials, especially those on ___streaming API___:

- http://adilmoujahid.com/posts/2014/07/twitter-analytics/
- http://socialmedia-class.org/twittertutorial.html


Further, [`tweepy`] package seems to be pretty popular as well: http://www.tweepy.org/