In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [2]:
import pandas as pd

In [3]:
import json
import requests

In [4]:
import urllib.request

# OMDb example
### use `urllib.request`

Let's start by looking at [OMDb API](https://www.omdbapi.com/).

The OMDb API is a free web service to obtain movie information, all content and images on the site are contributed and maintained by users.

The Python package [urllib](https://docs.python.org/3/howto/urllib2.html) can be used to fetch resources from the internet.

OMDb tells us what kinds of requests we can make. We are going to do a title search. As you can see below, we have an additional parameter "&Season=1" which does not appear in the parameter tables. If you read through the change log, you will see it documented there. 

Using the urllib and json packages allow us to call an API and store the results locally.

### key file

In [6]:
with open("config_secret.json") as key_file:
    api_keys = json.load(key_file)

api_keys

{'DPLA_api_key': '7a694bf0cb053c8367b48751e511cc70',
 'OMDb_api_key': '2385c828'}

In [8]:
my_key = api_keys['OMDb_api_key']
my_key
len(my_key)
type(my_key)
#int(my_key)

'2385c828'

8

str

### get data and make json

In [None]:
# original
data = json.loads(urllib.request.urlopen('http://www.omdbapi.com/?t=Game%20of%20Thrones&Season=1').read().decode('utf8'))

In [9]:
data = json.loads(urllib.request.urlopen('http://www.omdbapi.com/?t=Game%20of%20Thrones&Season=1&apikey=2385c828').read().decode('utf8'))

What should we expect the type to be for the variable data?

In [10]:
print(type(data))

<class 'dict'>


### explore data

What do you think the data will look like?

In [11]:
data.keys()

dict_keys(['Title', 'Season', 'totalSeasons', 'Episodes', 'Response'])

In [12]:
data

{'Title': 'Game of Thrones',
 'Season': '1',
 'totalSeasons': '8',
 'Episodes': [{'Title': 'Winter Is Coming',
   'Released': '2011-04-17',
   'Episode': '1',
   'imdbRating': '9.1',
   'imdbID': 'tt1480055'},
  {'Title': 'The Kingsroad',
   'Released': '2011-04-24',
   'Episode': '2',
   'imdbRating': '8.8',
   'imdbID': 'tt1668746'},
  {'Title': 'Lord Snow',
   'Released': '2011-05-01',
   'Episode': '3',
   'imdbRating': '8.7',
   'imdbID': 'tt1829962'},
  {'Title': 'Cripples, Bastards, and Broken Things',
   'Released': '2011-05-08',
   'Episode': '4',
   'imdbRating': '8.8',
   'imdbID': 'tt1829963'},
  {'Title': 'The Wolf and the Lion',
   'Released': '2011-05-15',
   'Episode': '5',
   'imdbRating': '9.1',
   'imdbID': 'tt1829964'},
  {'Title': 'A Golden Crown',
   'Released': '2011-05-22',
   'Episode': '6',
   'imdbRating': '9.2',
   'imdbID': 'tt1837862'},
  {'Title': 'You Win or You Die',
   'Released': '2011-05-29',
   'Episode': '7',
   'imdbRating': '9.2',
   'imdbID': 't

In [13]:
data['Episodes']

[{'Title': 'Winter Is Coming',
  'Released': '2011-04-17',
  'Episode': '1',
  'imdbRating': '9.1',
  'imdbID': 'tt1480055'},
 {'Title': 'The Kingsroad',
  'Released': '2011-04-24',
  'Episode': '2',
  'imdbRating': '8.8',
  'imdbID': 'tt1668746'},
 {'Title': 'Lord Snow',
  'Released': '2011-05-01',
  'Episode': '3',
  'imdbRating': '8.7',
  'imdbID': 'tt1829962'},
 {'Title': 'Cripples, Bastards, and Broken Things',
  'Released': '2011-05-08',
  'Episode': '4',
  'imdbRating': '8.8',
  'imdbID': 'tt1829963'},
 {'Title': 'The Wolf and the Lion',
  'Released': '2011-05-15',
  'Episode': '5',
  'imdbRating': '9.1',
  'imdbID': 'tt1829964'},
 {'Title': 'A Golden Crown',
  'Released': '2011-05-22',
  'Episode': '6',
  'imdbRating': '9.2',
  'imdbID': 'tt1837862'},
 {'Title': 'You Win or You Die',
  'Released': '2011-05-29',
  'Episode': '7',
  'imdbRating': '9.2',
  'imdbID': 'tt1837863'},
 {'Title': 'The Pointy End',
  'Released': '2011-06-05',
  'Episode': '8',
  'imdbRating': '9.0',
  'i

We now have a dictionary object of our data. We can use python to manipulate it in a variety of ways. For example, we can print all the titles of the episodes.

In [14]:
for episode in data['Episodes']:
  print(episode['Title'], episode['imdbRating'])

Winter Is Coming 9.1
The Kingsroad 8.8
Lord Snow 8.7
Cripples, Bastards, and Broken Things 8.8
The Wolf and the Lion 9.1
A Golden Crown 9.2
You Win or You Die 9.2
The Pointy End 9.0
Baelor 9.6
Fire and Blood 9.5


### make DataFrame
We can use pandas to convert the episode information to a dataframe.

In [15]:
df = pd.DataFrame.from_dict(data['Episodes'])

In [16]:
df

Unnamed: 0,Title,Released,Episode,imdbRating,imdbID
0,Winter Is Coming,2011-04-17,1,9.1,tt1480055
1,The Kingsroad,2011-04-24,2,8.8,tt1668746
2,Lord Snow,2011-05-01,3,8.7,tt1829962
3,"Cripples, Bastards, and Broken Things",2011-05-08,4,8.8,tt1829963
4,The Wolf and the Lion,2011-05-15,5,9.1,tt1829964
5,A Golden Crown,2011-05-22,6,9.2,tt1837862
6,You Win or You Die,2011-05-29,7,9.2,tt1837863
7,The Pointy End,2011-06-05,8,9.0,tt1837864
8,Baelor,2011-06-12,9,9.6,tt1851398
9,Fire and Blood,2011-06-19,10,9.5,tt1851397


### write to json file
And, we can save our data locally to use later.

In [None]:
with open('omdb_api_data.json', 'w') as f:
    json.dump(data, f)

# DPLA example
### use `requests`

Let's try an API that requires an API key!

"The [Digital Public Library of America](https://dp.la/) brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world. It strives to contain the full breadth of human expression, from the written word, to works of art and culture, to records of America’s heritage, to the efforts and data of science."

And, they have an [API](https://dp.la/info/developers/codex/api-basics/).

In order to use the API, you need to [request a key](https://dp.la/info/developers/codex/policies/#get-a-key). You can do this with an HTTP POST request.


If you are using **OS X or Linux**, replace "YOUR_EMAIL@example.com" in the cell below with your email address and execute the cell. This will send the rquest to DPLA and they will email your API key to the email address you provided. To successfully query the API, you must include the ?api_key= parameter with the 32-character hash following.

In [None]:
# execute this on OS X or Linux by removing '#' on the next line and excuting the cell
#! curl -v -XPOST http://api.dp.la/v2/api_key/YOUR_EMAIL@example.com

If you are on **Windows 7 or 10**, [open PowerShell](http://www.tenforums.com/tutorials/25581-windows-powershell-open-windows-10-a.html). Replace "YOUR_EMAIL@example.com" in the cell below with your email address. Copy the code and paste it at the command prompt in PowerShell. This will send the rquest to DPLA and they will email your API key to the email address you provided. To successfully query the API, you must include the ?api_key= parameter with the 32-character hash following.

In [None]:
#execute this on Windows by running the line below, without the leading '#', in PowerShell
#Invoke-WebRequest -Uri ("http://api.dp.la/v2/api_key/YOUR_EMAIL@example.com") -Method POST -Verbose -usebasicparsing
#Invoke-WebRequest -Uri ("http://api.dp.la/v2/api_key/LoriNewhouse100@gmail.com") -Method POST -Verbose -usebasicparsing

You will get a response similar to what is shown below and will receive an email fairly quickly from DPLA with your key.

    shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
    *   Trying 52.2.169.251...
    * Connected to api.dp.la (52.2.169.251) port 80 (#0)
    > POST /v2/api_key/YOUR_EMAIL@example.com HTTP/1.1
    > Host: api.dp.la
    > User-Agent: curl/7.43.0
    > Accept: */*
    > 
    < HTTP/1.1 201 Created
    < Access-Control-Allow-Origin: *
    < Cache-Control: max-age=0, private, must-revalidate
    < Content-Type: application/json; charset=utf-8
    < Date: Thu, 20 Oct 2016 20:53:24 GMT
    < ETag: "8b66d9fe7ded79e3151d5a22f0580d99"
    < Server: nginx/1.1.19
    < Status: 201 Created
    < X-Request-Id: d61618751a376452ac3540b3157dcf48
    < X-Runtime: 0.179920
    < X-UA-Compatible: IE=Edge,chrome=1
    < Content-Length: 89
    < Connection: keep-alive
    < 
    * Connection #0 to host api.dp.la left intact
    {"message":"API key created and sent via email. Be sure to check your Spam folder, too."}

It is good practice not to put your keys in your code. You can store them in a file and read them in from there. If you are pushing your code to GitHub, make sure you put your key files in .gitignore.

I created a file on my drive called "config_secret.json". The contents of the file look like this:

{
	"api_key" : "my api key here"
}

I can then write code to read the information in.

A template called config_secret_template.json has been provided for you to add your keys to.

In [17]:
with open("config_secret.json") as key_file:
    key = json.load(key_file)

In [18]:
key

{'DPLA_api_key': '7a694bf0cb053c8367b48751e511cc70',
 'OMDb_api_key': '2385c828'}

### perform query

Then, when I create my API query, I can use a variable in place of my actual key.

The Requests library allows us to build urls with different parameters. You build the parameters as a dictionary that contains key/value pairs for everything after the '?' in your url.

In [None]:
#import requests

In [19]:
# we are specifying our url and parameters here as variables
#url = 'http://api.dp.la/v2/items/'
url = 'http://api.dp.la/v2/items'
params = {'api_key' : key['DPLA_api_key'], 'q' : 'goats+AND+cats'}

In [20]:
# we are creating a response object, r
r = requests.get(url, params=params)

In [21]:
type(r)

requests.models.Response

In [22]:
# we can look at the url that was created by requests with our specified variables
r.url

'http://api.dp.la/v2/items?api_key=7a694bf0cb053c8367b48751e511cc70&q=goats%2BAND%2Bcats'

In [23]:
# we can check the status code of our request
r.status_code

200

[HTTP Status Codes](http://www.restapitutorial.com/httpstatuscodes.html)

In [24]:
# we can look at the content of our request
print(r.content)

b'{"count":30,"start":1,"limit":10,"docs":[{"id":"1e3a1a91d96428e4ca063b2386e60b7b","@context":"http://dp.la/api/items/context","@id":"http://dp.la/api/items/1e3a1a91d96428e4ca063b2386e60b7b","aggregatedCHO":"#sourceResource","dataProvider":"Bradley University","ingestDate":"2020-10-05T18:52:09.191Z","ingestType":"item","isShownAt":"http://collections.carli.illinois.edu/cdm/ref/collection/bra_jack/id/630","object":"http://collections.carli.illinois.edu/utils/getthumbnail/collection/bra_jack/id/630","originalRecord":{"stringValue":"<record \\nxmlns=\\"http://www.openarchives.org/OAI/2.0/\\" xmlns:xsi=\\"http://www.w3.org/2001/XMLSchema-instance\\">\\n  <header>\\n    <identifier>\\n      urn:dpla-repox.carli.illinois.edu:carli_bra_jack:oai:collections.carli.illinois.edu:bra_jack/630\\n    </identifier>\\n    <datestamp>2020-10-04</datestamp>\\n    <setSpec>carli_bra_jack</setSpec>\\n  </header>\\n  <metadata>\\n    <oai_qdc:qualifieddc \\n    xsi:schemaLocation=\\"http://worldcat.org/xm

### look at some data items

In [25]:
my_results = json.loads(r.content)
my_results

{'count': 30,
 'start': 1,
 'limit': 10,
 'docs': [{'id': '1e3a1a91d96428e4ca063b2386e60b7b',
   '@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/1e3a1a91d96428e4ca063b2386e60b7b',
   'aggregatedCHO': '#sourceResource',
   'dataProvider': 'Bradley University',
   'ingestDate': '2020-10-05T18:52:09.191Z',
   'ingestType': 'item',
   'isShownAt': 'http://collections.carli.illinois.edu/cdm/ref/collection/bra_jack/id/630',
   'object': 'http://collections.carli.illinois.edu/utils/getthumbnail/collection/bra_jack/id/630',
   'originalRecord': {'stringValue': '<record \nxmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n  <header>\n    <identifier>\n      urn:dpla-repox.carli.illinois.edu:carli_bra_jack:oai:collections.carli.illinois.edu:bra_jack/630\n    </identifier>\n    <datestamp>2020-10-04</datestamp>\n    <setSpec>carli_bra_jack</setSpec>\n  </header>\n  <metadata>\n    <oai_qdc:qualifieddc \n    xsi:sche

In [26]:
my_results['count']
my_results['start']
my_results['limit']

30

1

10

In [27]:
my_results['docs'][0]['id']
my_results['docs'][0]['object']
my_results['docs'][0]['provider']['name']

'1e3a1a91d96428e4ca063b2386e60b7b'

'http://collections.carli.illinois.edu/utils/getthumbnail/collection/bra_jack/id/630'

'Illinois Digital Heritage Hub'

### get more than default number of items

By default, DPLA returns 10 items at a time. We can see from the count value, our query has 29 results. DPLA does give us a paramter we can set to change this to get up to 500 items at a time.



In [28]:
params = {'api_key' : key['DPLA_api_key'], 'q' : 'goats+AND+cats', 'page_size': 500}
r = requests.get(url, params=params)
#print(r.content)
my_results_large = json.loads(r.content)
my_results_large

{'count': 30,
 'start': 1,
 'limit': 500,
 'docs': [{'id': '1e3a1a91d96428e4ca063b2386e60b7b',
   '@context': 'http://dp.la/api/items/context',
   '@id': 'http://dp.la/api/items/1e3a1a91d96428e4ca063b2386e60b7b',
   'aggregatedCHO': '#sourceResource',
   'dataProvider': 'Bradley University',
   'ingestDate': '2020-10-05T18:52:09.191Z',
   'ingestType': 'item',
   'isShownAt': 'http://collections.carli.illinois.edu/cdm/ref/collection/bra_jack/id/630',
   'object': 'http://collections.carli.illinois.edu/utils/getthumbnail/collection/bra_jack/id/630',
   'originalRecord': {'stringValue': '<record \nxmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">\n  <header>\n    <identifier>\n      urn:dpla-repox.carli.illinois.edu:carli_bra_jack:oai:collections.carli.illinois.edu:bra_jack/630\n    </identifier>\n    <datestamp>2020-10-04</datestamp>\n    <setSpec>carli_bra_jack</setSpec>\n  </header>\n  <metadata>\n    <oai_qdc:qualifieddc \n    xsi:sch

In [29]:
my_results_large['docs'][27]['id']
my_results_large['docs'][27]['object']
my_results_large['docs'][27]['provider']['name']

'6ffb448af5c545f556afe5b627b8a665'

'http://dlg.galileo.usg.edu/do:ugabma_bmahm_hm-euban-0001'

'Digital Library of Georgia'

# SeeClickFix example
### looping to pull multiple pages

If we were working with an API that limited us to only 100 items at a time, we could write a loop to pull our data.

The file [seeclickfix_api.py](./seeclickfix_api.py) in the api folder of this repo is an example of how you can pull multiple pages of data from an API. It uses the [SeeClickFix API](http://dev.seeclickfix.com/). "[SeeClickFix](https://seeclickfix.com/) allows you to play an integral role in public services — routing neighborhood concerns like potholes and light outages to the right official with the right information."

In [30]:
# %load seeclickfix_api.py
'''
Pull one page of 100 results from seeclickfix using the global PARAMS
value if the parameters are not supplied. If there are more than 100
results, make another pull passing paramters that include the next page to
be pulled.

Nicole Donnelly 30May2016, updated 10May2017
'''

import requests
import json

'\nPull one page of 100 results from seeclickfix using the global PARAMS\nvalue if the parameters are not supplied. If there are more than 100\nresults, make another pull passing paramters that include the next page to\nbe pulled.\n\nNicole Donnelly 30May2016, updated 10May2017\n'

In [43]:
def get_seeclickfix(page=1, pulled=0, search_params={'place_url':
                    #'district-of-columbia', 'after': '2016-10-01T00:00:00Z',
                    'district-of-columbia', 'after': '2019-10-01T00:00:00Z',
                    'per_page': 100}):

    # base_url for seeclickfix api to build the request url
    base_url = 'https://seeclickfix.com/api/v2/issues'

    # send a get request with the url, parameters, and header
    myResponse = requests.get(url=base_url, params=search_params)

    # For successful API call, response code will be 200 (OK)
    if(myResponse.ok):

        # Loading the response data into a dict variable
        data = json.loads(myResponse.content.decode('utf-8'))
        print('page just read ', data['metadata']['pagination']['page'])

        # get the total search result count and set it to count_all. the
        # API only allows 100 results per page
        count_all = data['metadata']['pagination']['entries']

        # track the number of items we have pulled with our requests
        pulled = pulled + 100

        # create a file name that reflects which page of results it contains
        # and write that file
        #file_name = 'tutorial_output/seeclickfix%d.json' % page
        #with open(file_name, 'w') as outfile:
            #json.dump(data, outfile)

        # check to see if we pulled all the results. If not, increment the
        # page count, update the parameters dictionary to include the page
        # number, and run the process again.
        if pulled < count_all:
            page += 1
            page_param = {'page': page}
            search_params.update(page_param)
            #you can print the params to monitor progress
            print('next search:')
            print(search_params)
            get_seeclickfix(page, pulled, search_params)

    else:
        # If response code is not ok (200), print the resulting http error
        # code with description
        myResponse.raise_for_status()

### work thru individual steps
##### get data and make json

In [31]:
base_url = 'https://seeclickfix.com/api/v2/issues'
search_params={'place_url': 'district-of-columbia', 'after': '2016-10-01T00:00:00Z', 'per_page': 100}
myResponse = requests.get(url=base_url, params=search_params)
data = json.loads(myResponse.content.decode('utf-8'))

In [32]:
data.keys()

dict_keys(['issues', 'metadata', 'errors'])

In [33]:
data['errors']

{}

In [34]:
data['metadata']

{'pagination': {'entries': 5486,
  'page': 1,
  'per_page': 100,
  'pages': 55,
  'next_page': 2,
  'next_page_url': 'https://seeclickfix.com/api/v2/issues?after=2016-10-01T00%3A00%3A00Z&page=2&per_page=100&place_url=district-of-columbia',
  'previous_page': None,
  'previous_page_url': None}}

In [35]:
data

{'issues': [{'id': 9123318,
   'status': 'Open',
   'summary': 'Bulk trash not picked up from December 2, which was original date scheduled. ',
   'description': 'Dresser has been sitting for a month since scheduled pickup ',
   'rating': 1,
   'lat': 38.8527805,
   'lng': -76.965927,
   'address': '3017 30th St Se Washington, DC 20020, USA',
   'created_at': '2020-12-30T11:41:02-05:00',
   'acknowledged_at': None,
   'closed_at': None,
   'reopened_at': None,
   'updated_at': '2020-12-30T11:41:24-05:00',
   'shortened_url': None,
   'url': 'https://seeclickfix.com/api/v2/issues/9123318',
   'point': {'type': 'Point', 'coordinates': [-76.965927, 38.8527805]},
   'private_visibility': False,
   'html_url': 'https://seeclickfix.com/issues/9123318',
   'request_type': {},
   'comment_url': 'https://seeclickfix.com/api/v2/issues/9123318/comments',
   'flag_url': 'https://seeclickfix.com/api/v2/issues/9123318/flag',
   'transitions': {'close_url': 'https://seeclickfix.com/api/v2/issues/9123

In [36]:
count_all = data['metadata']['pagination']['entries']
count_all

5486

In [37]:
data['metadata']['pagination']['page']

1

##### update search parameters

In [38]:
search_params
search_params.update({'page' : 2})
search_params

{'place_url': 'district-of-columbia',
 'after': '2016-10-01T00:00:00Z',
 'per_page': 100}

{'place_url': 'district-of-columbia',
 'after': '2016-10-01T00:00:00Z',
 'per_page': 100,
 'page': 2}

### run it

In [44]:
get_seeclickfix()

page just read  1
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 2}
page just read  2
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 3}
page just read  3
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 4}
page just read  4
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 5}
page just read  5
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 6}
page just read  6
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 7}
page just read  7
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T00:00:00Z', 'per_page': 100, 'page': 8}
page just read  8
next search:
{'place_url': 'district-of-columbia', 'after': '2019-10-01T