ISRC Python Workshop: Using APIs

___Getting data using APIs___

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

---

#### Introduction

APIs, application programming interfaces, are services designed for easier software developments. APIs can be in many different forms, including software libraries and database systems. Generally, you can think of APIs as Lego pieces used for specific models. I found a somewhat brief but interesting read on APIs [here](https://www.analyticsvidhya.com/blog/2016/11/an-introduction-to-apis-application-programming-interfaces-5-apis-a-data-scientist-must-know/).

Before we dive into playing with APIs using Python, let's first try some simple examples.

##### [Open Geocoding APIs](https://developer.mapquest.com/documentation/open/geocoding-api/)

We can use this API product to convert between addresses and latitude/longtitude.

API Request Format:

http://open.mapquestapi.com/geocoding/v1/address?key=KEY&location=LOCATION

where:
- `KEY` is the [API key](https://stackoverflow.com/questions/1453073/what-is-an-api-key)
- `LOCATION`: address name (e.g., the University of Iowa)

The first thing we need to do is to register for API keys: [link here](https://developer.mapquest.com/plan_purchase/steps/business_edition/business_edition_free/register)

As a simple example, let's try to search where the university is by: http://open.mapquestapi.com/geocoding/v1/address?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&location=The%20University%20of%20Iowa

And now we get:
```json
{"info":{"statuscode":0,"copyright":{"text":"\u00A9 2019 MapQuest, Inc.","imageUrl":"http://api.mqcdn.com/res/mqlogo.gif","imageAltText":"\u00A9 2019 MapQuest, Inc."},"messages":[]},"options":{"maxResults":-1,"thumbMaps":true,"ignoreLatLngInput":false},"results":[{"providedLocation":{"location":"The University of Iowa"},"locations":[{"street":"200 Hawkins Drive","adminArea6":"","adminArea6Type":"Neighborhood","adminArea5":"Iowa City","adminArea5Type":"City","adminArea4":"Johnson County","adminArea4Type":"County","adminArea3":"IA","adminArea3Type":"State","adminArea1":"US","adminArea1Type":"Country","postalCode":"52245","geocodeQualityCode":"P1XXX","geocodeQuality":"POINT","dragPoint":false,"sideOfStreet":"N","linkId":"0","unknownInput":"","type":"s","latLng":{"lat":41.659429,"lng":-91.548385},"displayLatLng":{"lat":41.659429,"lng":-91.548385},"mapUrl":"http://open.mapquestapi.com/staticmap/v5/map?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&type=map&size=225,160&locations=41.65942875,-91.5483848030346|marker-sm-50318A-1&scalebar=true&zoom=15&rand=-925387094"},{"street":"21 North Clinton Street","adminArea6":"","adminArea6Type":"Neighborhood","adminArea5":"Iowa City","adminArea5Type":"City","adminArea4":"Johnson County","adminArea4Type":"County","adminArea3":"IA","adminArea3Type":"State","adminArea1":"US","adminArea1Type":"Country","postalCode":"52242","geocodeQualityCode":"P1XXX","geocodeQuality":"POINT","dragPoint":false,"sideOfStreet":"N","linkId":"0","unknownInput":"","type":"s","latLng":{"lat":41.661337,"lng":-91.536149},"displayLatLng":{"lat":41.661337,"lng":-91.536149},"mapUrl":"http://open.mapquestapi.com/staticmap/v5/map?key=f08MixE5Tcq8uFS8ltbu3xeJ7qPF4SRw&type=map&size=225,160&locations=41.6613368,-91.5361491|marker-sm-50318A-2&scalebar=true&zoom=15&rand=-1788318072"}]}]}
```

This is the return object in JSON format, where there are key-value pairs to store specific values for different attributes. For example, the ___street___ is ___200 Hawkins Drive___

Note that API key is not needed here if we only make a couple of requests. If you want to build an app or query more often, you will need to pay attention to the [rate limit](https://developer.mapquest.com/user/me/plan)

##### [Weather API](https://openweathermap.org/api)

As a second example, let's try get some weather data. This name, we will need an API key!

Whenever we go to a webpage with list of API choices, we should first find what we really want. Suppose we want to find out the current weather data, we will go to the [___api doc___ for that API](https://openweathermap.org/current). Let's try the first method: getting weather by city name:

API call:

- https://api.openweathermap.org/data/2.5/weather?q={city}

- https://api.openweathermap.org/data/2.5/weather?q={city},{country}

Parameters:
___q___ city name and country code divided by comma, use [___ISO 3166 country codes___](https://en.wikipedia.org/wiki/ISO_3166-1#Officially_assigned_code_elements)

Examples of API calls:

- https://api.openweathermap.org/data/2.5/weather?q=London

- https://api.openweathermap.org/data/2.5/weather?q=London,uk

This time, we get an error without an API key, saying that:

```json
{"cod":401, "message": "Invalid API key. Please see http://openweathermap.org/faq#error401 for more info."}
```

Note that this is also a JSON object, with an error code of 401 and an error message.

_I just found that it takes up to 10 minutes for new accounts' keys to be activated. For this reason, let's use my key: `a236f384f5bced47bbba86335cdb1d2a`, which will be deleted after this workshop_

Let's try to get an API key [here](http://openweathermap.org/appid). After creating an account, you are able to find an API key [here](https://home.openweathermap.org/api_keys). For privacy and security issue, I will save my API key locally in a file called `weather_keys.csv`. Now that you have the key, you can run the following line in your browser:

https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=apikey

In our case, it should be https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a

Looking at the structure of the API call, we know that different parameters are seperated by a `&` sign.

The output is:
```json
{"coord":{"lon":121.49,"lat":31.23},"weather":[{"id":701,"main":"Mist","description":"mist","icon":"50d"}],"base":"stations","main":{"temp":286.11,"pressure":1024,"humidity":87,"temp_min":283.15,"temp_max":289.82},"visibility":6000,"wind":{"speed":4,"deg":220,"gust":9},"clouds":{"all":48},"dt":1552610491,"sys":{"type":1,"id":9659,"message":0.0044,"country":"CN","sunrise":1552601099,"sunset":1552644100},"id":1796236,"name":"Shanghai","cod":200}
```

Overall we see that this is very easy and straightforward.

---

#### Send API requests in Python

While the use of APIs are pretty simple, we might not want to do all these copy and paste manually. Python can help us to send requests and parse results automatically with less human supervision.

To do this, we need to know how to send requests first. We will use an amazing package called [`requests`](http://docs.python-requests.org/en/master/). If you did not have it, please install it by `pip` or `conda`:

```bash
$ pip install requests
```

or 

```bash
$ conda install requests
```

In [1]:
# Let's load the library first
import requests

Using weather as an example, we should first know what is the request URL (where the request goes to), with what inputs (e.g., API key and city name). In our case, we know that our API key and the city to query so we can do the following.

In [4]:
apikey = 'a236f384f5bced47bbba86335cdb1d2a'

In [5]:
weather_url = "https://api.openweathermap.org/data/2.5/weather"
city_name = "Shanghai"
print(weather_url)
print(city_name)
print(apikey)

https://api.openweathermap.org/data/2.5/weather
Shanghai
a236f384f5bced47bbba86335cdb1d2a


Now, we should let `requests` do its work.

In [6]:
r = requests.get(weather_url, params={'q': city_name, 'APPID': apikey})
r.url # `requests` help us encode the URL in the correct format

'https://api.openweathermap.org/data/2.5/weather?q=Shanghai&APPID=a236f384f5bced47bbba86335cdb1d2a'

In [7]:
r.status_code # 200 means success

200

As a side note, the `requests.get` method here means we want to use `GET` method, as opposed to `POST` method. The former refers to obtaining data, whereas the latters refers to modifying data. See [this post](https://www.w3schools.com/tags/ref_httpmethods.asp) for more details.

To get the JSON response, we call `r.json()` method.

In [8]:
result = r.json()
result

{'coord': {'lon': 121.49, 'lat': 31.23},
 'weather': [{'id': 701,
   'main': 'Mist',
   'description': 'mist',
   'icon': '50d'}],
 'base': 'stations',
 'main': {'temp': 286.1,
  'pressure': 1024,
  'humidity': 87,
  'temp_min': 283.15,
  'temp_max': 289.82},
 'visibility': 6000,
 'wind': {'speed': 4, 'deg': 220, 'gust': 9},
 'clouds': {'all': 48},
 'dt': 1552610612,
 'sys': {'type': 1,
  'id': 9659,
  'message': 0.0043,
  'country': 'CN',
  'sunrise': 1552601099,
  'sunset': 1552644100},
 'id': 1796236,
 'name': 'Shanghai',
 'cod': 200}

JSON object will be converted into a `dict` type, which is the data structure in Python holding key value pairs. To access certain values, we just access them like a `dict`.

In [9]:
result['name']

'Shanghai'

In [10]:
for key, value in result['main'].items():
     print(key, value) # default temperature is in Kelvin

temp 286.1
pressure 1024
humidity 87
temp_min 283.15
temp_max 289.82


---

#### Use packages: Twitter API as an example

Many web servers have their own APIs ready to use. By using these convenient tools, we can get started right off following their documentations and examples without any manual efforts. We will be using <a href="https://apps.twitter.com/" target="_blank">Twitter API</a> as an example. We will first install this package as shown [here](https://python-twitter.readthedocs.io/en/latest/installation.html)

Then, we have to register an account for Twitter Developer and register an app. Let's go to https://dev.twitter.com/ and get an app togther. <a href="https://python-twitter.readthedocs.io/en/latest/getting_started.html" target="_b lank">Here</a>'s a quick start on how you can do this. After we obtain *__consumer key__*, *__consumer secret__*, *__access token__*, and *__access token secret__*, we are ready to retrieve some data from Twitter!

In [14]:
## suppress warnings
import warnings
warnings.filterwarnings('ignore')

I saved my own keys into a text file with four lines of commented code below:
```
consumer_key = "your_consumer_key"        
consumer_secret = "your_consumer_secret"
access_token = "your_access_token"
access_secret = "your_access_secret"
```

In [19]:
with open("./twitter_keys.csv", "r") as twitter_keys:
    keys = twitter_keys.read()
    consumer_key, consumer_secret, access_token, access_secret = \
        keys.split("\n")[:-1]

In [20]:
## load twitter package, which a well-written Python package for Twitter APIs
import twitter
api = twitter.Api(consumer_key=consumer_key,
                  consumer_secret=consumer_secret,                  
                  access_token_key=access_token,
                  access_token_secret=access_secret)

## check status
api.VerifyCredentials()

User(ID=2740697738, ScreenName=zhiyzuo)

Try to do some simple tasks: get my ownstatuses

In [21]:
statuses = api.GetUserTimeline(screen_name="Zhiya Zuo")
for s in statuses:
    print(s.text)

Paper with @kangzhao and @ChaoqunNi on faculty hiring within iSchools  is out: https://t.co/XYvd0xqauZ …; @iSchools… https://t.co/uiX9QCknhX
I'm using Overleaf, the free online collaborative LaTeX editor - it's awesome and easy to use! https://t.co/fcNXTDoXPs
RT @kangzhao: Paper with @zhiyzuo: "The More Multidisciplinary the Better?--The Prevalence and Interdisciplinarity of Research Collaboratio…
@ulfaslak @ScienceNews no, that’s dr. phalange!
@ulfaslak @m_rosvall @suneman Congrats !
RT @jcdl2018: We have extended the #jcdl2018 deadline for panels, posters, and demonstrations to February 2, 2018. https://t.co/OrA407HlcT
Say Trello to boards in @Bitbucket Cloud. https://t.co/bZFOhGIDqH #BitbucketTrends
The state and evolution of U.S. iSchools: From talent acquisitions to research... https://t.co/t5wY6YvxQl
RT @kangzhao: Our paper on @iSchools published--The state and evolution of U.S. iSchools: from talent acquisitions to research https://t.co…
@JASIST 😆
#fabric #myowntwitterapp fun ap

We can also use our `user id`

In [22]:
statuses = api.GetUserTimeline(user_id="2740697738")
for s in statuses:
    print(s.created_at)

Sat Feb 02 02:51:28 +0000 2019
Fri Dec 28 18:02:15 +0000 2018
Sat Jul 07 04:39:16 +0000 2018
Mon Jun 25 22:04:10 +0000 2018
Fri Jun 22 18:48:15 +0000 2018
Sat Jan 27 07:24:13 +0000 2018
Thu Sep 14 17:55:58 +0000 2017
Tue May 23 15:15:19 +0000 2017
Tue May 23 15:04:10 +0000 2017
Tue May 23 14:56:41 +0000 2017
Mon Oct 31 16:40:09 +0000 2016
Mon Oct 31 04:03:32 +0000 2016
Wed Dec 10 17:11:50 +0000 2014
Wed Dec 10 17:01:40 +0000 2014
Fri Dec 05 22:28:49 +0000 2014
Fri Dec 05 17:54:31 +0000 2014


You can also get a friend list

In [23]:
friends = api.GetFriends()
for f in friends:
    print(f.name)

Liangfei Qiu
Rong Su
GSOM at Clark Univ.
Yuqing Ren
Twitter API
Amin Vahedian
PHD Comics
Nitesh Chawla
Inside Higher Ed
The Chronicle of Higher Education
JCDL Conference
Quantitative Science Studies
ISSI President
Associate Deans
Tippie College of Business
Ann Melissa Campbell
INFORMS
INFORMS2019
Peter Fennell
Chaomei Chen
Academic JobTracker
Shangguan Wang
Weiguo Fan
IOWAInformsStuChap
Jason Dou
CSS
yrCSS
Bodo Winter
Penny Skateboards
Skateboarding
ASIS&T SIG SI
Jiepu Jiang
Tippie Analytics
Blei Lab
Ulf Aslak
Center For Open Science
Time.Graphics
GHTorrent
Complexity Challenge
Brian Uzzi
Yang
Andrej Karpathy
NeurIPS Conference
IC2S2
USC ISI
Albert-László Barabási
Roberta Sinatra
Dashun Wang
Kristina Lerman
Not just Google Scholar's Digest
Loet Leydesdorff
Vincent Larivière
Cassidy R. Sugimoto
Lutz Bornmann
Ludo Waltman
ASIS&T Social Media Special Interest Group
Emilio Ferrara
David Mimno
/r/datasets
JCDL 2018
Duncan Watts
Command Line Magic
Unix tool tip
Data Science Fact
ASIS&T SIG/M

More interestingly, let's go get some tweets from Twitter. Let's try to search for any popular tweets (limit to 20) related to `uiowa` since 12/01/2014 in English.
- See https://dev.twitter.com/rest/public/search for more informaiton on how to construct a query
- How to set `lang` parameter -> https://dev.twitter.com/rest/reference/get/help/languages

In [24]:
results = api.GetSearch(
    raw_query="q=uiowa&result_type=popular&since=2014-12-01&count=20&lang=en")

We only got 15 results though.

In [25]:
len(results)

14

Show all the text in the retrieved tweets, with user screen name highlited

In [26]:
from IPython.display import clear_output
for tw in results:
    print("%s. Tweeted by \033[41m%s\033[0m"%(tw.text, tw.user.screen_name))

We'll have a slice of that 👇 #PiDay https://t.co/Gfzhl5n9Ly. Tweeted by [41muiowa[0m
That smile you get on your face when you realize you're #AlwaysAHawkeye. https://t.co/5wiAl9sbiX. Tweeted by [41muiowa[0m
In a first of its kind study, #uiowa researchers proved that early intervention for children with hearing loss can… https://t.co/7dcmQ4k63G. Tweeted by [41muiowa[0m
Hawkeyes are #B1GTourney champions! Congratulations, @IowaWBB! It's time to go dancing. #FightForIowa 🖤💛🏀 https://t.co/cS4P336JfI. Tweeted by [41muiowa[0m
Judges have a history of finding ways to avoid big sentences in white collar economic crimes, with the exception of… https://t.co/KeGgfeiUSs. Tweeted by [41mmayawiley[0m
Creator and co-star of the hit Netflix series Love, Hawkeye Paul Rust said it best this weekend when he visited Iow… https://t.co/jLpGSp3XCd. Tweeted by [41muiowa[0m
On #InternationalWomensDay we're celebrating the many incredibly talented and inspirational women of #uiowa. 🖤💛 https://t.co/

Finally, we can save these text into a file for further analyses. Note that we may want to remove all the newlines within each tweet.

In [27]:
tweet_list = [tw.text.replace('\n', ' ') for tw in results]
tweet_list[0]

"We'll have a slice of that 👇\xa0#PiDay https://t.co/Gfzhl5n9Ly"

In [28]:
import numpy as np

In [29]:
np.savetxt('sample-data/sample_tweets.csv', tweet_list, encoding='utf-8', fmt='%s')

---

#### Conclusions

In this workshop, we went through some examples of using APIs to get various types of data in Python. The last Twitter example is relatively superficial and does not go deep enough to get meaningful data for social media analysis. Here I would like to recommend reading more materials, especially those on ___streaming API___:

- http://adilmoujahid.com/posts/2014/07/twitter-analytics/
- http://socialmedia-class.org/twittertutorial.html


Further, [`tweepy`] package seems to be pretty popular as well: http://www.tweepy.org/