# Working with Data APIs

**Adapted from: Sam Maurer // maurer@berkeley.edu // Oct. 3, 2016**

This notebook provides a demonstration of data-access APIs that operate over the web. See README.md for setup instructions.

In Part 1, we'll load and parse results from an API feed of earthquake data.  
In Part 2, we'll add query parameters to the workflow, using the Google Maps Geolocation API as an example.  
In Part 3, we'll use an authenticated API to query public Twitter posts. 

# Part 1: Reading from an automated data feed

### USGS real-time earthquake feeds

This is an API for near-real-time data about earthquakes. Data is provided in JSON format over the web. No authentication is needed, and there's no way to customize the output. Instead, the API has a separate endpoint for each permutation of the data that users might want.

**API documentation:**  
http://earthquake.usgs.gov/earthquakes/feed/v1.0/geojson.php

**Sample API endpoint, for magnitude 4.5+ earthquakes in past day:**  
http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/4.5_day.geojson  


In [1]:
%matplotlib inline

import pandas as pd

import json    # library for working with JSON-formatted text strings
import requests  # library for accessing content from web URLs

import pprint  # library for making Python data structures readable
pp = pprint.PrettyPrinter()

In [2]:
# download data on magnitude 2.5+ quakes from the past week

endpoint_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
response = requests.get(endpoint_url)
results = response.text

# what is the data type of the results?
print(type(results))

<class 'str'>


In [3]:
# print the first 500 characters to see a sample of the data

print(results[:500])

{"type":"FeatureCollection","metadata":{"generated":1519322314000,"url":"https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson","title":"USGS Magnitude 2.5+ Earthquakes, Past Week","status":200,"api":"1.5.8","count":315},"features":[{"type":"Feature","properties":{"mag":4.2,"place":"115km S of San Pedro de Atacama, Chile","time":1519307135460,"updated":1519319434040,"tz":-240,"url":"https://earthquake.usgs.gov/earthquakes/eventpage/us2000d6d1","detail":"https://earthquake.us


In [4]:
# it looks like the results are a string with JSON-formatted data inside

# parse the string into a Python dictionary
data = json.loads(results)

print(type(data))

<class 'dict'>


In [None]:
# print the most recent quake

quakes = data['features']
print(quakes[0])

In [None]:
# print it more clearly

pp.pprint(quakes[0]['geometry'])
pp.pprint(quakes[0]['properties'])

In [None]:
# pull out the title from each earthquake listing

for q in quakes:
    print(q['properties']['title'])

In [None]:
# pull out magnitudes and depths into a Pandas dataframe, using
# a more compact Python syntax for iterating through lists

d = {'magnitude': [q['properties']['mag'] for q in quakes],
     'depth': [q['geometry']['coordinates'][2] for q in quakes]}

df = pd.DataFrame.from_dict(d)

# how many earthquakes were loaded into the dataframe?
print(len(df))

In [None]:
# print the first few lines of data

print(df.head())

In [None]:
# print some descriptive statistics

print(df.describe())

In [None]:
# plot the depth vs. magnitude

df.plot(x='magnitude', y='depth', kind='scatter')

In [None]:
# save the dataframe to disk

df.to_csv('usgs_earthquake_data.csv')

print('file saved')

In [None]:
# read it back later

new_df = pd.DataFrame.from_csv('usgs_earthquake_data.csv')

print(new_df.head())

# Part 2: Querying an API endpoint

### Google Maps Geocoding API

Google has lots of APIs that let you access its services through code instead of through GUI apps. This one from Google Maps lets you look up the latitude-longitude coordinates of street addresses.

It works similarly to the earthquakes example, but with query parameters added to the URL endpoint!

**API documentation:**  
https://developers.google.com/maps/documentation/geocoding/intro

**API endpoint:**  
https://maps.googleapis.com/maps/api/geocode/json

**API endpoint with query parameters:**  
https://maps.googleapis.com/maps/api/geocode/json?address=Wurster+Hall

In [5]:
import json    # library for working with JSON-formatted text strings
import requests  # library for accessing content from web URLs

import pprint  # library for making Python data structures readable
pp = pprint.PrettyPrinter()

In [6]:
# we have to encode the search query so that it can be passed as a URL, 
# with spaces and other special characters removed

endpoint = 'https://maps.googleapis.com/maps/api/geocode/json'

params = {'address': 'young library uky'}

url = requests.Request('GET', endpoint, params=params).prepare().url
print(url)

https://maps.googleapis.com/maps/api/geocode/json?address=young+library+uky


In [11]:
# download and parse the results

response = requests.get(url)
results = response.text
data = json.loads(results)


#https://maps.googleapis.com/maps/api/geocode/json?address=young+library+uky&key= 'AIzaSyD1LS-O4_KvtNOTIinGa5mQnczeqb3sDBY' 

print(data)

{'results': [{'address_components': [{'long_name': '401', 'short_name': '401', 'types': ['street_number']}, {'long_name': 'Hilltop Avenue', 'short_name': 'Hilltop Ave', 'types': ['route']}, {'long_name': 'Lexington', 'short_name': 'Lexington', 'types': ['locality', 'political']}, {'long_name': 'Fayette County', 'short_name': 'Fayette County', 'types': ['administrative_area_level_2', 'political']}, {'long_name': 'Kentucky', 'short_name': 'KY', 'types': ['administrative_area_level_1', 'political']}, {'long_name': 'United States', 'short_name': 'US', 'types': ['country', 'political']}, {'long_name': '40506', 'short_name': '40506', 'types': ['postal_code']}, {'long_name': '0001', 'short_name': '0001', 'types': ['postal_code_suffix']}], 'formatted_address': '401 Hilltop Ave, Lexington, KY 40506, USA', 'geometry': {'location': {'lat': 38.0328721, 'lng': -84.5017179}, 'location_type': 'ROOFTOP', 'viewport': {'northeast': {'lat': 38.0342210802915, 'lng': -84.50036891970849}, 'southwest': {'lat

In [12]:
# print it more nicely

pp.pprint(data)

{'results': [{'address_components': [{'long_name': '401',
                                      'short_name': '401',
                                      'types': ['street_number']},
                                     {'long_name': 'Hilltop Avenue',
                                      'short_name': 'Hilltop Ave',
                                      'types': ['route']},
                                     {'long_name': 'Lexington',
                                      'short_name': 'Lexington',
                                      'types': ['locality', 'political']},
                                     {'long_name': 'Fayette County',
                                      'short_name': 'Fayette County',
                                      'types': ['administrative_area_level_2',
                                                'political']},
                                     {'long_name': 'Kentucky',
                                      'short_name': 'KY',
               

In [13]:
# pull out the lat-lon coordinates

for r in data['results']:
    coords = r['geometry']['location']
    print(coords['lat'], coords['lng'])

38.0328721 -84.5017179


### Exercises

1. Search for some other addresses or landmarks!
2. Take a look at the [API documentation](https://developers.google.com/maps/documentation/geocoding/intro). What are the usage limits?

# Part 3: Querying an API with authentication

### Twitter REST APIs

Twitter's APIs also operate over the web, but they require a back-and-forth authentication process at the beginning of each connection. It's easier to have a Python library handle this than to create the query URLs ourselves.

The Twitter "REST" APIs perform stand-alone operations: you submit a query and receive results, like in earlier examples. ([REST](https://en.wikipedia.org/wiki/Representational_state_transfer) is a particular set of guidelines that many APIs follow.) Twitter also has a "streaming" API that continues sending results in real time until you disconnect.

**API documentation:**  
https://dev.twitter.com/rest/public  
https://dev.twitter.com/overview/api/tweets

**Documentation for the Python helper library**:  
https://github.com/geduldig/TwitterAPI

In [17]:
from TwitterAPI import TwitterAPI

import pprint  # library for making Python data structures readable
pp = pprint.PrettyPrinter()

In [18]:
# import API credentials from keys.py file in the
# same directory as this notebook

from keys import *

In [19]:
# set up an API connection using credentials from the keys file

api = TwitterAPI(consumer_key, consumer_secret, 
                 access_token, access_token_secret)

print("Connection is set up but not tested")

Connection is set up but not tested


### Making a simple data request

In [20]:
# most recent tweet from @UCBerkeley's timeline

endpoint = 'statuses/user_timeline'
params = {
    'screen_name': 'UKAthletics', 
    'count': 1
}
r = api.request(endpoint, params)

for tweet in r.get_iterator():
    print(tweet['text'])

RT @UKTix: Purchase or renew 2018 @UKFootball season tickets and receive great benefits and incentives! It's that easy!

BUY: https://t.co/…


In [21]:
# what other data is there?

pp.pprint(tweet)

{'contributors': None,
 'coordinates': None,
 'created_at': 'Thu Feb 22 15:22:19 +0000 2018',
 'entities': {'hashtags': [],
              'symbols': [],
              'urls': [],
              'user_mentions': [{'id': 933121999,
                                 'id_str': '933121999',
                                 'indices': [3, 9],
                                 'name': 'UK Ticket Office',
                                 'screen_name': 'UKTix'},
                                {'id': 360022514,
                                 'id_str': '360022514',
                                 'indices': [34, 45],
                                 'name': 'Kentucky Football',
                                 'screen_name': 'UKFootball'}]},
 'favorite_count': 0,
 'favorited': False,
 'geo': None,
 'id': 966694651326615552,
 'id_str': '966694651326615552',
 'in_reply_to_screen_name': None,
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': None,
 'in_rep

### Other API endpoints allow different types of searches

In [22]:
# search for public tweets about #BBN

endpoint = 'search/tweets'
params = {
    'q': '#BBN', 
    'count': 5
}
r = api.request(endpoint, params)

for tweet in r.get_iterator():
    print(tweet['text'] + '\n')

The #UPAC, #AMF influence peddling has been mimicked by Fabrice Benoit at #Osler with his wife Nathaly Marcoux who… https://t.co/u9WRZ89U8F

@RCMingione @Coach_Mingione would love to keep this precious family in the #BBN.  If I know him (and I don't) I'll… https://t.co/UlZ4xjM0Th

Is it just me or do most SEC teams lose the game after the #Kentucky game? Damn son don’t metaphorically blow your wad #bbn

Uhm yes please. #BBN https://t.co/a9bWkVrpXU

RT @JEdwar247: 23 offers and counting. The #Kentucky Junior Day visitor list is LOADED [VIP] - https://t.co/ByYcDuZoMy #BBN



In [28]:
# search for public tweets in Hindi

endpoint = 'search/tweets'
params = {
    'q': '*', 
    'lang': 'hi', 
    'count': 5
} 
r = api.request(endpoint, params)

for tweet in r.get_iterator():
    print(tweet['text'] + '\n')

RT @HindiNews18: कर्नाटक में 'सीक्रेट मीटिंग', BJP की नईया पार लगाने आगे आए रेड्डी भाई!
https://t.co/NBgoLJdSck
@BJP4Karnataka @INCKarnataka

RT @ihirarafique: YE WALI TWEET YAAD AGAYE HAFEEZ KO DEKH KAR 😂😭

RT @AgrwalRani: समय समय पर #राजनीतिक #पार्टियों द्वारा #वादे बहुत किये गए पर कोई भी पार्टी पूरी तरह #खरी नही #उतरी अपने वादों पर #समझ तो गए…

Mazaq Mazaq mein 50 kar gaya banda 🔥

@madhukishwar @pbhushan1 Are Madam,sochiye jis aadmi ko Kejriwal jaise chor aur deshdrohi ne laat maarke nikal diya… https://t.co/t1WL2sXkdr



In [24]:
# search for public tweets geotagged near the UK campus

endpoint = 'search/tweets'
params = {
    'q': '*', 
    'geocode': '38.034,-84.500,0.5km', 
    'count': 10
} 
r = api.request(endpoint, params)

for tweet in r.get_iterator():
    print(tweet['text'] + '\n')

There’s still time to sign up for our last event of Raise the Dough tonight at 6pm! There is a… https://t.co/8aX10jJJ33

This our world u just livin in it @ Kappa Delta - Epsilon Omega Chapter https://t.co/29f6Q3gosu

Pop-up Research Rescue @UKLittleLibrary 6:30 - 9pm  @UKLibraries https://t.co/STS1MNpqwg

It’s like Christmas Eve anticipation in the studio!!  New Reformers arriving tomorrow!! 🎉💪🏻🥂… https://t.co/w0Lz2QWBMe

Lots of bright and creative minds in the University of Kentucky College of Communication and Information! 

We had… https://t.co/zvDv8lB7Mf

fuckinrealtyrone is such a punk...that’s why I kicked him in his… https://t.co/JCAOxxSHFI

We are set up and excited to talk to eager @uk_ci students at their departments Career Fair! https://t.co/i23kbxEFxZ



### Exercises

1. Try some different search queries!
2. Display some more data fields in addition to the tweet text

In [29]:
# search for public tweets in Hindi

endpoint = 'search/tweets'
params = {
    'q': 'climate change',  
    'count': 5
} 
r = api.request(endpoint, params)

for tweet in r.get_iterator():
    print(tweet['text'] + '\n')

RT @ProfStrachan: Very good read

#Nuclear power's dismal economics and very slow build rate makes it virtually a non starter when it comes…

RT @MousseauJim: The temperature goes up in New York for one Day and this proves Climate Change. 😂😂😂😂😂😂😂😂😂😂 You’ve got to be fucking kiddin…

RT @Julia_Panzer: Looking for an internship with purpose? Do you want to contribute to mitigating climate change or delivering on the SDGs?…

Seven climate change myths put about by big oil companies - CityMetric https://t.co/Vv0XruEcqW

Cities ramming through climate change policies despite limited tools and mounting costs for consumers - Financial P… https://t.co/BmWir5ew5c



### Bonus: Streaming live tweets in real time 

In [30]:
# Twitter limits simultaneous connections to the streaming API,
# so this part may not work using the demo API keys during class

endpoint = 'statuses/filter'
params = {'locations': '-180,-90,180,90'}
r = api.request(endpoint, params)
LIMIT = 20

# 'enumerate' lets us count tweets as we receive them

for i, tweet in enumerate(r.get_iterator()):
    print(tweet['created_at'])
    print(tweet['place']['full_name'] + ', ' + tweet['place']['country'])
    print(tweet['text'] + '\n')
    if (i > LIMIT): break

# close the streaming connection
r.close()

Thu Feb 22 18:39:46 +0000 2018
Rio de Janeiro, Brasil, Brasil
Pique sem pique

Thu Feb 22 18:39:46 +0000 2018
Barcelona, España, España
@TheHallOfStars @BTS_twt @_BTSEsp @BTS_Spain PARK JIMIN 

[ #iHeartAwards #BestFanArmy #BTSARMY ]   
[ #THOSFans @BTS_twt ]

Thu Feb 22 18:39:46 +0000 2018
North West, England, United Kingdom
From @VivaBananarama smallest fan to there biggest fan @hailthehairking driving 9 hours in his pimped up #SUV to… https://t.co/WkX0DcKqMr

Thu Feb 22 18:39:46 +0000 2018
Bodrum, Türkiye, Türkiye
I'm at Küdür in Muğla, Bodrum https://t.co/b5DdZYowTs

Thu Feb 22 18:39:46 +0000 2018
Elvan, Etimesgut, Türkiye
I'm at Kahve Aşkına in Ankara, Etimesgut https://t.co/4MRgp0J7Tr

Thu Feb 22 18:39:46 +0000 2018
Massachusetts, USA, United States
@fajarodgers @SherryTerm @benshapiro Like putty.

Thu Feb 22 18:39:46 +0000 2018
Los Angeles, CA, United States
Just Seattle things I don’t miss. https://t.co/I7CvuzSgqn

Thu Feb 22 18:39:46 +0000 2018
الرياض, المملكة العربية السعودية

### Exercises for the remainder of class

1. Make a scatter plot of the lat-lon coordinates of earthquakes.  
   &nbsp;
   
2. Using the geocoding example as a starting point, try searching the Google Maps Directions API or Elevation API instead. Descriptions are in the [API documentation](https://developers.google.com/maps/documentation/geocoding/intro).  
   &nbsp;
   

### For next time...

In the next class, you will try out another API that provides data you're interested in. You will be asked to try connecting to it using Python code, and performing some basic operations on the data.  To come prepared for next time, please explore some of the transportation-related APIs that may be valuable, and choose one that is of interest to you.

Here are a a few to get you started. 

Public Transit
https://www.programmableweb.com/news/how-smart-cities-are-using-apis-public-transport-apis/2014/05/22

Long-Distance Travel
http://www.olery.com/blog/the-best-travel-apis-discover-contribute/ 

Transportation
https://www.programmableweb.com/category/transportation/api


Start by reading the public transit page, because that provides a nice overview of the types of applications out there, and some of the issues in using them.  These lessons often apply to traffic and transportation more generally.  

Keep in mind that there are a number of different organizations that provide APIs, with different motivations and quality of what is provided.  If it is a private company, what is their business model?  What is the underlying source of the data, and what might that imply about how representative it is of the real world?  There is a ton of stuff out there.  How do we go about sorting out what is useful to us and what is now.  Spend some time exploring these and thinking about these questions.  
