# Working with APIs in Python

Making API requests in Python can be really simple. There's a low-level module called urllib that can also make the kinds of web requests that we want, but it's not as friendly as the `requests` module, which we'll be using.

In [2]:
import requests

## Authentication

You'll have to authenticate each request to the Harvard Art Museum API with an API key. Other APIs may require different kinds of authentication (sometimes very complicated auth! Look for libraries at that point), but HAM has some pretty simple authentication, which makes things easy for us. You can sign up for a key [here](https://www.harvardartmuseums.org/collections/api).

In [3]:
APIKEY = "b0cde630-ce66-11e8-951c-b3d75228cc98" # Enter your API key here

## Basic request

We're going to start off with a basic request to the API. This API, like many others, has a variety of endpoints, each with their own url, slightly modified from a base url. We'll worry about the general case in a bit, for now let's look at a basic API request.

In this example, we'll re-create the first example in the [Object endpoint documentation](https://github.com/harvardartmuseums/api-docs/blob/master/object.md), which will give each of you the records for 10 objects that have never been viewed online in the museum's collections.

In [4]:
url = "https://api.harvardartmuseums.org/object"
parameters = {
    "q":"totalpageviews:0",
    "size":10,
    "apikey":APIKEY
}
R = requests.get(url,params=parameters)
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?q=totalpageviews%3A0&size=10&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98&page=2',
  'page': 1,
  'pages': 5622,
  'totalrecords': 56216,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Transfer',
   'accessionyear': 2011,
   'accesslevel': 1,
   'century': '20th century',
   'classification': 'Photographs',
   'classificationid': 17,
   'colorcount': 0,
   'commentary': None,
   'contact': 'am_moderncontemporary@harvard.edu',
   'contextualtextcount': 0,
   'copyright': None,
   'creditline': 'Harvard Art Museums/Fogg Museum, Transfer from the Carpenter Center for the Visual Arts, American Professional Photographers Collection',
   'culture': 'American',
   'datebegin': 1940,
   'dated': 'c. 1945',
   'dateend': 1950,
   'dateoffirstpageview': None,
   'dateoflastpageview': None,
   'department': 'Department of Photographs',
   'description': None,
   'dimensions': 'image: 10.16 x 12.7 cm (4 x 5 in.)',
   'div

### Refresher on Dictionaries

Python dictionaries are sets of key / value pairs, where a value can be accessed by its key. You're essentially naming a value in a container, so you can easily call it up later.

Dictionaries have very fast lookups, so you can get a value from its key very quickly, no matter how large the dictionary is. However, they are also unordered, so if you iterate through all of the key / value pairs in the dictionary, there's no guarantee that they'll be in the same order.

We're just going to be looking up data in dictionaries, so here's a quick refresher on the syntax:

In [5]:
parameters['q']

'totalpageviews:0'

In [6]:
parameters['apikey'] # This also works when we've set the value to another variable

'b0cde630-ce66-11e8-951c-b3d75228cc98'

In [7]:
parameters['q'] = "totalpageviews:1" # You can also set the value of a key like you would a variable

## Making a Request

The request syntax is so simple, you might have missed it. Let's query again for objects with only one pageview, and take a closer look.

In [8]:
R = requests.get(url,params=parameters)

### Formatted parameters

That request has created a request object, which contains not only the data that we get from the Harvard Art Museums, but information on the request we sent, like the URL that it used. Notice that requests has turned our query parameter dictionary into a GET request at the end of our URL.

If you've been working with API requests or web scraping before, you might be used to seeing URLs get constructed like this:

```python
url = "https://api.harvardartmuseums.org/object?q=" + query + "&apikey=" + apikey
```

If you have, I'm sure you'll appreciate how much simpler this is, especially when dealing with more query parameters.

In [9]:
R.url

'https://api.harvardartmuseums.org/object?q=totalpageviews%3A1&size=10&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98'

### Taking a look at the results

Request objects have a built-in method, `.json()`, which converts a JSON file received as a response to a request from a string of text that happens to be in this data format into Python native data structures, like lists, dictionaries, numbers and strings. We can use this method to see a dictionary representation of what we've gotten from the API request.

In [10]:
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?q=totalpageviews%3A1&size=10&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98&page=2',
  'page': 1,
  'pages': 2396,
  'totalrecords': 23954,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Transfer',
   'accessionyear': 2011,
   'accesslevel': 1,
   'century': '20th century',
   'classification': 'Photographs',
   'classificationid': 17,
   'colorcount': 0,
   'commentary': None,
   'contact': 'am_moderncontemporary@harvard.edu',
   'contextualtextcount': 0,
   'copyright': None,
   'creditline': 'Harvard Art Museums/Fogg Museum, Transfer from the Carpenter Center for the Visual Arts, American Professional Photographers Collection',
   'culture': 'American',
   'datebegin': 1945,
   'dated': 'c. 1950',
   'dateend': 1955,
   'dateoffirstpageview': '2012-08-24',
   'dateoflastpageview': '2012-08-24',
   'department': 'Department of Photographs',
   'description': None,
   'dimensions': 'image: 10.16 x 12.7 cm (4 x 

## Changing our request

Let's say we're not interested in the most obscure parts of the collection (pot sherds, apparently), but rather in the most popular parts of the collection. There are a few ways we might go about doing this. One way might be to sort our search results by `totalpageviews`, and see what the top 10 are.

To do that, we can go back to the [API documentation](https://github.com/harvardartmuseums/api-docs/blob/master/object.md) and look for hints about what we might be able to do.

In [11]:
parameters = {
    "size":10,
    "apikey":APIKEY,
    "sort": "totalpageviews",
    "sortorder": "desc"
}
R = requests.get(url,params=parameters)
R.json()

{'info': {'next': 'https://api.harvardartmuseums.org/object?size=10&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98&sort=totalpageviews&sortorder=desc&page=2',
  'page': 1,
  'pages': 23295,
  'totalrecords': 232947,
  'totalrecordsperquery': 10},
 'records': [{'accessionmethod': 'Bequest',
   'accessionyear': 1951,
   'accesslevel': 1,
   'century': '19th century',
   'classification': 'Paintings',
   'classificationid': 26,
   'colorcount': 10,
   'colors': [{'color': '#64af7d',
     'css3': '#5f9ea0',
     'hue': 'Green',
     'percent': 0.2979781420765,
     'spectrum': '#4fb94f'},
    {'color': '#64c896',
     'css3': '#66cdaa',
     'hue': 'Green',
     'percent': 0.21289617486339,
     'spectrum': '#47b853'},
    {'color': '#323219',
     'css3': '#2f4f4f',
     'hue': 'Brown',
     'percent': 0.19814207650273,
     'spectrum': '#3db657'},
    {'color': '#7d7d4b',
     'css3': '#696969',
     'hue': 'Green',
     'percent': 0.056775956284153,
     'spectrum': '#6cbd45'},
    {'color

### Looking at the results

Often, you'll want to look at some specific aspect of the data you're getting. Since the API returns everything, you'll have to format the output in some friendly, readable format. In this next cell, we'll iterate through the results and print them  out in a nicer format.

We're being pretty low-level with the text formatting here, and one important key to understanding this bit is that "\t" means "tab", so that you can insert that character, which normally does something else.

Feel free to play around with this cell to format it more to your liking.

In [12]:
records = R.json()['records']
print("views\tartwork")
print()
for record in records:
    print("{}\t{}".format(record['totalpageviews'],record['title']))
    # `.format` puts its arguments sequentially in the string calling it wherever there are {} pairs
    # It does a lot more than that, with more advanced documentation here: 
    # https://docs.python.org/3.4/library/string.html#id1
    print()

views	artwork

23814	Self-Portrait Dedicated to Paul Gauguin

18504	The Gare Saint-Lazare: Arrival of a Train

13989	Bahram Gur Fights the Horned Wolf (painting, verso; text, recto), illustrated folio from a manuscript of the Great Ilkhanid Shahnama (Book of Kings)

12878	Odalisque with a Slave

11825	A Mother and Child and Four Studies of Her Right Hand, 1904; verso:  Self-Portrait Standing, 1903

10838	Jeanne-Antoinette Poisson, Marquise de Pompadour

9172	Red Boats, Argenteuil

8907	Court of Gayumars (painting, recto; text, verso), folio from a manuscript of the Shahnama by Firdawsi

7257	Self-Portrait in Tuxedo

6967	Mother and Child



The top result from this query is a Van Gogh painted titled "Self-Portrait Dedicated to Paul Gauguin." You can grab just the first object by accessing the records list (which is indexed from 0):

In [13]:
topResult = R.json()['records'][0]
topResult

{'accessionmethod': 'Bequest',
 'accessionyear': 1951,
 'accesslevel': 1,
 'century': '19th century',
 'classification': 'Paintings',
 'classificationid': 26,
 'colorcount': 10,
 'colors': [{'color': '#64af7d',
   'css3': '#5f9ea0',
   'hue': 'Green',
   'percent': 0.2979781420765,
   'spectrum': '#4fb94f'},
  {'color': '#64c896',
   'css3': '#66cdaa',
   'hue': 'Green',
   'percent': 0.21289617486339,
   'spectrum': '#47b853'},
  {'color': '#323219',
   'css3': '#2f4f4f',
   'hue': 'Brown',
   'percent': 0.19814207650273,
   'spectrum': '#3db657'},
  {'color': '#7d7d4b',
   'css3': '#696969',
   'hue': 'Green',
   'percent': 0.056775956284153,
   'spectrum': '#6cbd45'},
  {'color': '#969664',
   'css3': '#808080',
   'hue': 'Green',
   'percent': 0.043715846994536,
   'spectrum': '#84c441'},
  {'color': '#afaf7d',
   'css3': '#bdb76b',
   'hue': 'Green',
   'percent': 0.042622950819672,
   'spectrum': '#9ecb3b'},
  {'color': '#4b9664',
   'css3': '#2e8b57',
   'hue': 'Green',
   'perc

You can easily access properties from the image record:

In [14]:
topResult['title']

'Self-Portrait Dedicated to Paul Gauguin'

Try adding an additional code field to the notebook below to access information about Van Gogh. If that's easy, try displaying all HAM works by Van Gogh and filtering to only records with an image associated.

## More endpoints to love

You might notice, looking at the documentation, that we've only been accessing the "objects" API endpoint, when there are many other endpoints that we could ask for information.

A note on terminology, an API endpoint is a one place that you can go to ask specific questions about a certain part of a dataset or service. Many APIs, especially commercial APIs, contain many, many endpoints, to facilitate all sorts of different activity on a platform.

For example, you can take a look at the [reddit API documentation](https://www.reddit.com/dev/api/) (which we won't be using, this is just an example), to see all of the different endpoints that an application might need to serve as an alternative front end for reddit. 

Endpoints on the same API are likely to behave similarly, but they will all serve different purposes. Looking at our HAM endpoints, it looks like they all follow the same basic formulation: https://api.harvardartmuseums.org/RESOURCE_TYPE . We can use this to our advantage, and create a function to query any endpoint easily.

In [15]:
def ham_query(apikey, endpoint, **kwargs):
    """Sends kwargs to the specified endpoint, using apikey for authentication"""
    params = kwargs
    params['apikey'] = apikey
    url = "https://api.harvardartmuseums.org/{}".format(endpoint)
    R = requests.get(url,params=params)
    return R

In [16]:
response = ham_query(APIKEY, "gallery", floor=2)

In [17]:
response.json()

{'info': {'next': 'https://api.harvardartmuseums.org/gallery?floor=2&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98&page=2',
  'page': 1,
  'pages': 3,
  'totalrecords': 24,
  'totalrecordsperquery': 10},
 'records': [{'floor': 2,
   'galleryid': 2200,
   'gallerynumber': '2200',
   'id': 2200,
   'labeltext': 'A sustained commitment to the tenets of neoclassicism persisted after the French Revolution, and artists such as Jacques-Louis David and his school continued to adhere to the sculptural and archaeological approach to form that they had helped popularize in the previous century. However, their style was soon challenged by romanticism, which proposed a radically different kind of representation. Its development over the first half of the nineteenth century produced myriad reformulations and contradictions in its definition. But at its height, romantic painting was characterized by a bold and vibrant palette and by loose and expressive brushstrokes that often obscured any careful prep

### Boy, that's convenient!

That function works because Python has this neat ability to take arbitrary arguments in functions, if you tell it to. Essentially, there are two special arguments in function definitions: `*args` and `**kwargs`. These make available `args` and `kwargs` objects, respectively, in your function. `args` is a list, and `kwargs` is a dictionary. This makes it so that you don't have to specify all of the arguments your function can take, you can just give it general rules for lists or key pairs of data as input.

You might be wondering why you wouldn't just use a dictionary or list instead of those arguments. In our case, it's mostly a stylistic choice, and one that saves us a couple of key strokes.

In [19]:
# Try out some other endpoints!

In [20]:
# Here's an example: all of the current exhibits with their begin and end dates
response = ham_query(APIKEY, "exhibition", status="current")
current_exhibits = response.json()['records']
current_exhibits
print()
for exhibit in current_exhibits:
    print("{} ({} to {})".format(exhibit['title'],exhibit['begindate'],exhibit['enddate'])) 


John Russell: Australia's French Impressionist (2018-07-21 to 2018-11-11)
Animal-Shaped Vessels from the Ancient World: Feasting with Gods, Heroes, and Kings (2018-09-07 to 2019-01-06)
Lorenzo Lotto Portraits / Lorenzo Lotto Retratos (2018-06-19 to 2019-02-10)
Harvey Quaytman: A Retrospective (2018-10-17 to 2019-01-27)
The Chiaroscuro Woodcut in Renaissance Italy (2018-06-03 to 2019-01-20)
Diana Extended Loan at Huntington Library (1984-07-01 to None)
Drawing in Tintoretto's Venice (2018-10-12 to 2019-05-26)
Delacroix (2018-09-17 to 2019-01-06)
The Construction of the World - Art and the Economy (2018-10-11 to 2019-02-03)
Adam and Eve (2018-09-01 to 2019-01-06)


## Individual Objects and IIIF

The HAM object API can provide more information (such as `exhibition`, `citation`, `publication`, and `marks`) if you ask for a specific object by its objectid. For some records that have been extensively annotated (often those with `verificationlevel` == 4) the lists for these properties can contain hundreds of entries.

In [21]:
objectid = topResult['objectid']
objectid
parameters = {
    "apikey": APIKEY
}
objectUrl = url + "/" + str(objectid)
R = requests.get(objectUrl, parameters)
topResultFull = R.json()
print(topResultFull['verificationlevel'] == 4)
topResultFull

True


{'accessionmethod': 'Bequest',
 'accessionyear': 1951,
 'accesslevel': 1,
 'century': '19th century',
 'classification': 'Paintings',
 'classificationid': 26,
 'colorcount': 10,
 'colors': [{'color': '#64af7d',
   'css3': '#5f9ea0',
   'hue': 'Green',
   'percent': 0.2979781420765,
   'spectrum': '#4fb94f'},
  {'color': '#64c896',
   'css3': '#66cdaa',
   'hue': 'Green',
   'percent': 0.21289617486339,
   'spectrum': '#47b853'},
  {'color': '#323219',
   'css3': '#2f4f4f',
   'hue': 'Brown',
   'percent': 0.19814207650273,
   'spectrum': '#3db657'},
  {'color': '#7d7d4b',
   'css3': '#696969',
   'hue': 'Green',
   'percent': 0.056775956284153,
   'spectrum': '#6cbd45'},
  {'color': '#969664',
   'css3': '#808080',
   'hue': 'Green',
   'percent': 0.043715846994536,
   'spectrum': '#84c441'},
  {'color': '#afaf7d',
   'css3': '#bdb76b',
   'hue': 'Green',
   'percent': 0.042622950819672,
   'spectrum': '#9ecb3b'},
  {'color': '#4b9664',
   'css3': '#2e8b57',
   'hue': 'Green',
   'perc

When we printed the 10 most popular records above (under Looking at the Results), you may have noticed a sharp dropoff after the first few records. Our Van Gogh painting is particularly popular, with ~5000 more views than the second most popular record and almost 4x as many as the tenth most popular. This particular Art Museum record is used as the default image asset for the demo installation of [Project Mirador](http://projectmirador.org/demo/), an image viewer for [IIIF (International Image Interoperability Framework)](https://iiif.io/). 

We're not going to go deep into IIIF in this workshop, but want to mention that IIIF is both a community of developers and a collection of APIs and API-compliant tools that you can use to share, manipulate, and display visual materials. The [Image API]() and [Presentation API]() are the most used outputs as of now, though there are also APIs for Authentication, Search, and beta versions for other media (video and VR).

(Explanation of Image and Presi APIs (manifest, canvas especially) - images)

Within our topResultFull object, there is an images list, which contains IIIF baseurls as well as Image Delivery Service URLs:

In [22]:
images = topResultFull['images']
images

[{'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC251942_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 1,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 47174896,
  'iiifbaseuri': 'https://ids.lib.harvard.edu/ids/iiif/47174896',
  'imageid': 429030,
  'publiccaption': None,
  'renditionnumber': 'DDC251942',
  'width': 2087},
 {'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC000072_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 2,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 18737483,
  'iiifbaseuri': 'https://ids.lib.harvard.edu/ids/iiif/18737483',
  'imageid': 185978,
  'publiccaption': None,
  'renditionnumber': 'DDC000072',
  'width': 2088},
 {'baseimageurl': 'https://nrs.harvard.edu/urn-3:HUAM:DDC251934_dynmc',
  'copyright': 'President and Fellows of Harvard College',
  'displayorder': 3,
  'format': 'image/jpeg',
  'height': 2550,
  'idsid': 47174892,
  'iiifbaseuri': 'htt

This particular record has 6 images associated with it. Try opening some of the urls:

In [23]:
for index, image in enumerate(images, start=1):
    print("image{} baseimageurl: {}\nimage{} iiifbaseuri: {}".format(index,image['baseimageurl'],index,image['iiifbaseuri']))

image1 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC251942_dynmc
image1 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/47174896
image2 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC000072_dynmc
image2 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/18737483
image3 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:DDC251934_dynmc
image3 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/47174892
image4 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:30033_dynmc
image4 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43182083
image5 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:50493_dynmc
image5 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43183405
image6 baseimageurl: https://nrs.harvard.edu/urn-3:HUAM:50849_dynmc
image6 iiifbaseuri: https://ids.lib.harvard.edu/ids/iiif/43183422


You'll notice that the `baseimageurls` use Harvard's Name Resolution service, which redirects to an Image Delivery Service URL. We're more interested in the `iiifbaseuris` because we can manipulate IIIF resources using the Image API. Try opening one of those.

The IIIF Image API spec requires that we pass not just a baseurl, but a well-formed IIIF-compliant URI to get an image. Let's check out that [documentation](https://iiif.io/api/image/2.1/) and see what else we need to construct one of those.

From the docs:

>The IIIF Image API URI for requesting an image must conform to the following URI Template:
>
>`{scheme}://{server}{/prefix}/{identifier}/{region}/{size}/{rotation}/{quality}.{format}`
>
>For example:
>
>`http://www.example.org/image-service/abcd1234/full/full/0/default.jpg`
>
The parameters of the Image Request URI include region, size, rotation, quality and format, which define the characteristics of the returned image. These are described in detail in Image Request Parameters.

The `iiifbaseuri`s include up through the `{identifier}`, but we need to include additional parameters to get the server to actually render the image for us. These parameters are passed within the URI itself, rather than in a query string appended after a delimiter (usually `?`), which is what we've been using `requests` to do. Let's write a function that can generate IIIF URIs for us. Because all of the parameters we want to insert are required, we won't use `**kwargs` - instead we'll set default params which you can override by passing in new ones.

In [24]:
def iiif_query(baseuri, region="full", size="full", rotation=0, quality="default", format="jpg", info=False):
    """Creates a valid IIIF URL, with the option to request image information"""
    if baseuri[-1:] != "/":
        baseuri += "/"
    if info == True:
        return baseuri+"info.json"
    else:
        url = baseuri+"{}/{}/{}/{}.{}".format(region, size, rotation, quality, format)
        return url

In [25]:
for image in images:
    print(iiif_query(image['iiifbaseuri']))

https://ids.lib.harvard.edu/ids/iiif/47174896/full/full/0/default.jpg
https://ids.lib.harvard.edu/ids/iiif/18737483/full/full/0/default.jpg
https://ids.lib.harvard.edu/ids/iiif/47174892/full/full/0/default.jpg
https://ids.lib.harvard.edu/ids/iiif/43182083/full/full/0/default.jpg
https://ids.lib.harvard.edu/ids/iiif/43183405/full/full/0/default.jpg
https://ids.lib.harvard.edu/ids/iiif/43183422/full/full/0/default.jpg


Now we have some valid image URLs that you can open in your browser!

This is nice, but the Image API lets us do a lot more by just by passing in some parameters. Maybe we want to generate some square, grayscale images for a gallery:

In [26]:
for image in images:
    print(iiif_query(image['iiifbaseuri'], quality="gray", region="square"))

https://ids.lib.harvard.edu/ids/iiif/47174896/square/full/0/gray.jpg
https://ids.lib.harvard.edu/ids/iiif/18737483/square/full/0/gray.jpg
https://ids.lib.harvard.edu/ids/iiif/47174892/square/full/0/gray.jpg
https://ids.lib.harvard.edu/ids/iiif/43182083/square/full/0/gray.jpg
https://ids.lib.harvard.edu/ids/iiif/43183405/square/full/0/gray.jpg
https://ids.lib.harvard.edu/ids/iiif/43183422/square/full/0/gray.jpg


Let's try requesting only the right half of an image, in black and white, and getting back a PNG:

In [27]:
print(iiif_query([image][0]['iiifbaseuri'], quality='bitonal', region="pct:50,0,100,100", format="png"))

https://ids.lib.harvard.edu/ids/iiif/43183422/pct:50,0,100,100/full/0/bitonal.png


Feel free to try to manipulate the images in other ways as well! That's it for our quick introduction to the Image API.

Later in the session we'll consume these resources using [Mirador](http://projectmirador.org/), an image viewer which uses the IIIF Image and Presentation APIs. If you're interested in the Presi API, check out [this documentation](https://iiif.io/api/presentation/2.1/) to learn how IIIF manifests structure sequences of canvases which image viewers then present to end users. You can find an HAM Object's manifest in the `seeAlso` property, or by by appending the object ID to a baseurl:

In [28]:
print(topResult['seeAlso'])
print('https://iiif.harvardartmuseums.org/manifests/object/{}'.format(topResult['id']))

[{'id': 'https://iiif.harvardartmuseums.org/manifests/object/299843', 'type': 'IIIF Manifest', 'format': 'application/json', 'profile': 'http://iiif.io/api/presentation/2/context.json'}]
https://iiif.harvardartmuseums.org/manifests/object/299843


## More stuff!

So far, we've only been getting limited sets of object data. But what if there were a big query we wanted to make? Let's try it out on "Unidentified culture" materials in the museum.

In [29]:
unknown = ham_query(APIKEY, "object", culture="Unidentified culture", size=100).json()

Looking at our previous queries, it looks like we've got some information about our query in the "info" section. Let's take a look at that...

In [30]:
unknown['info']

{'next': 'https://api.harvardartmuseums.org/object?culture=Unidentified%20culture&size=100&apikey=b0cde630-ce66-11e8-951c-b3d75228cc98&page=2',
 'page': 1,
 'pages': 7,
 'totalrecords': 607,
 'totalrecordsperquery': 100}

### Iterating through pages

It looks like we have 7 pages of data to get, and our response gives us a "next" url for easy iteration. Nice!

However, let's look at how we would iterate even without this convenience factor.

In [31]:
unknown.keys()

dict_keys(['info', 'records'])

It looks like we have two components to our response, info and records. Since `info` is request specific, we're just after `records`, and we'll want to combine them all. 

We could set this up in a regular loop, which would query the API as fast as our processors can go, which can produce many queries per second, and is usually limited more by network speed than by processor speed. However, this can put a strain on the API endpoint, so it can be good practice to build in timers when making many requests. Sometimes an API will specify a number of requests/second that you're allowed to make, sometimes not. Putting even a fraction of a second delay in your code will help make sure that you don't accidentally get yourself banned from the API.

In [32]:
import time

In [33]:
unknown_records = []
keepGoing = True
page = 1

while keepGoing:
    R = ham_query(APIKEY, "object", culture="Unidentified culture", size=100, page=page)
    time.sleep(0.5)
    response = R.json()
    unknown_records.extend(response['records'])
    if response['info']['pages'] == page:
        keepGoing = False
    else:
        page += 1

In [34]:
len(unknown_records)

607

## Exporting

Now we have some cool data, but maybe we want to do something with it outside of Python. It's common to see CSV data traded around, since it's just a plain text spreadsheet file, so most things can parse it. Let's make one of those! We could use the relatively low level `csv` library, but instead, let's use a higher level library, `pandas`

In [35]:
import pandas as pd # Common invocation of pandas. Gotta save those 4 keystrokes.

### "Be a dataframe!" - us

Pandas thinks of things in terms of dataframes, which will be familiar if you work in R. Basically, they're really efficient arrays of data. They also translate really well to a tabular format.

To make an iterable object into a dataframe, sometimes you can just get away with shouting "Hey you! Be a dataframe!" at it (in code). Since we have a list of dictionaries with consistent keys, there's a good chance this process will do something smart for us:

In [36]:
pd.DataFrame(unknown_records)

Unnamed: 0,accessionmethod,accessionyear,accesslevel,century,classification,classificationid,colorcount,colors,commentary,contact,...,technique,techniqueid,title,titlescount,totalpageviews,totaluniquepageviews,url,verificationlevel,verificationleveldescription,worktypes
0,Gift,2002.0,1,16th-17th century,Drawings,21,0,,,am_europeanamerican@harvard.edu,...,,,Praying Monk,1,0,0,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '125', 'worktype': 'drawing'}]"
1,Transfer,2011.0,1,19th century,Photographs,17,0,,,am_europeanamerican@harvard.edu,...,Albumen silver print,110.0,Untitled (full length portrait of woman standi...,1,5,4,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph..."
2,Transfer,2011.0,1,19th-20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,Untitled (group of nine people in bathing cost...,1,4,4,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph..."
3,Transfer,2011.0,1,20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,Untitled (bride and groom cutting the cake),1,0,0,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
4,Transfer,2011.0,1,20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,"Joachim, Ecumenical Patriarch of Constantinopl...",1,70,52,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
5,,,1,19th-20th century,Architectural Elements,133,0,,,am_asianmediterranean@harvard.edu,...,Cast,233.0,Detail of the West Frieze of the Parthenon: Re...,1,11,7,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '15', 'worktype': 'architectur..."
6,,,1,19th-20th century,Architectural Elements,133,0,,,am_asianmediterranean@harvard.edu,...,Cast,233.0,Detail of the West Frieze of the Parthenon: Mo...,1,0,0,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '15', 'worktype': 'architectur..."
7,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Bronze Age Vessel,1,0,0,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"
8,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Bronze Age Vessel,1,0,0,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"
9,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Vessel,1,2,1,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"


What do you know! It worked. But let's take a look at a more hands on approach to the same thing.

In [37]:
pd.DataFrame.from_dict(unknown_records)

Unnamed: 0,accessionmethod,accessionyear,accesslevel,century,classification,classificationid,colorcount,colors,commentary,contact,...,technique,techniqueid,title,titlescount,totalpageviews,totaluniquepageviews,url,verificationlevel,verificationleveldescription,worktypes
0,Gift,2002.0,1,16th-17th century,Drawings,21,0,,,am_europeanamerican@harvard.edu,...,,,Praying Monk,1,0,0,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '125', 'worktype': 'drawing'}]"
1,Transfer,2011.0,1,19th century,Photographs,17,0,,,am_europeanamerican@harvard.edu,...,Albumen silver print,110.0,Untitled (full length portrait of woman standi...,1,5,4,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph..."
2,Transfer,2011.0,1,19th-20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,Untitled (group of nine people in bathing cost...,1,4,4,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph..."
3,Transfer,2011.0,1,20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,Untitled (bride and groom cutting the cake),1,0,0,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
4,Transfer,2011.0,1,20th century,Photographs,17,0,,,am_moderncontemporary@harvard.edu,...,Gelatin silver print,123.0,"Joachim, Ecumenical Patriarch of Constantinopl...",1,70,52,https://www.harvardartmuseums.org/collections/...,3,Good. Object is well described and information...,"[{'worktypeid': '259', 'worktype': 'photograph'}]"
5,,,1,19th-20th century,Architectural Elements,133,0,,,am_asianmediterranean@harvard.edu,...,Cast,233.0,Detail of the West Frieze of the Parthenon: Re...,1,11,7,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '15', 'worktype': 'architectur..."
6,,,1,19th-20th century,Architectural Elements,133,0,,,am_asianmediterranean@harvard.edu,...,Cast,233.0,Detail of the West Frieze of the Parthenon: Mo...,1,0,0,https://www.harvardartmuseums.org/collections/...,2,Adequate. Object is adequately described but i...,"[{'worktypeid': '15', 'worktype': 'architectur..."
7,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Bronze Age Vessel,1,0,0,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"
8,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Bronze Age Vessel,1,0,0,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"
9,Transfer,1977.0,1,,Fragments,94,0,,,am_asianmediterranean@harvard.edu,...,,,Fragment from a Vessel,1,2,1,https://www.harvardartmuseums.org/collections/...,0,Unchecked. Object information has not been ver...,"[{'worktypeid': '319', 'worktype': 'sherd'}]"


`pd.DataFrame.from_dict` gives you more control over the conversion process, so you can provide more options if things don't look how you expect them to.

As a side note, we do have some data structures in here that don't make a lot of sense in a tabular format. Look at `worktypes` at the very end. That's a list, and each cell has list data in it. We won't be able to do much with that in Excel or some other tabular data processing tool, but it also won't break anything for us. It just looks weird. Within the dataframe, they still work like lists though, so you can access the data while you're still in Python if you're clever about it.

### Exporting

From here, our export process is really easy. We just say "Hey you! Be a CSV file now!", and so it shall be.

In [38]:
df = pd.DataFrame(unknown_records)
df.to_csv("unknown_ham_records.csv",index=None)

# Data collected!

Now we've got some interesting data, and we can throw it into a program like Tableau or Excel to visualize it or further explore it. We could also continue to explore it in Python, but that's a workshop for another day.

# Reverse the flow!

Now that we've got some information from the Harvard Art Museum API, let's look at how we can send that information somewhere else to add content to another site. 

Here, we'll use the API for Omeka. Omeka is a content management system, like WordPress, but focused on making the collections of libraries, archives, and museums more easily accessible on the web. It's built around the concept of items, and focuses on describing those items, collecting them sensibly, and incorporating them into online narratives.

The site we'll be using is the site that we use for testing our Omeka service here: http://demo.omeka-dev.fas.harvard.edu/

You'll find documentation for the API here: http://omeka.readthedocs.io/en/latest/Reference/api/index.html

Our goal for this portion will be to use the documentation and what we've already learned to create items in Omeka representing each of the places in our dataframe.

Before we get started on that, we'll want to see how the API represents items, so we can copy that when creating new ones.

In [41]:
omeka_api_key = '11db6a2b70226f1c55b63a6df75e8093e9bcd01a' # We'll give you a key to use for the site

In [42]:
R = requests.get('http://demo.omeka.fas.harvard.edu/api/items/378', params={'key':omeka_api_key})
demo_item = R.json()

In [43]:
demo_item

{'added': '2018-11-07T04:05:04+00:00',
 'collection': {'id': 18,
  'resource': 'collections',
  'url': 'http://demo.omeka.fas.harvard.edu/api/collections/18'},
 'element_texts': [{'element': {'id': 50,
    'name': 'Title',
    'resource': 'elements',
    'url': 'http://demo.omeka.fas.harvard.edu/api/elements/50'},
   'element_set': {'id': 1,
    'name': 'Dublin Core',
    'resource': 'element_sets',
    'url': 'http://demo.omeka.fas.harvard.edu/api/element_sets/1'},
   'html': False,
   'text': 'Complete scroll recto (seq. 1)'},
  {'element': {'id': 48,
    'name': 'Source',
    'resource': 'elements',
    'url': 'http://demo.omeka.fas.harvard.edu/api/elements/48'},
   'element_set': {'id': 1,
    'name': 'Dublin Core',
    'resource': 'element_sets',
    'url': 'http://demo.omeka.fas.harvard.edu/api/element_sets/1'},
   'html': False,
   'text': 'https://iiif.lib.harvard.edu/manifests/drs:459660579'},
  {'element': {'id': 55,
    'name': 'Original @id',
    'resource': 'elements',
   

## Well that looks weird

There are a lot of different fields, and some of them have a few different layers. It's a lot less flat than our HAM response. Let's take a look at it in a different way...

In [71]:
"{0!s: >30}".format(type(demo_item))

"                <class 'dict'>"

In [81]:
for k,v in demo_item.items():
    print("{0: >18}:\t{1} type".format(k, type(v)))
    # I just went a bit wild with string formatting in the line above, documentation here:
    # https://docs.python.org/2/library/string.html#formatstrings
    # Basically, I right justified the first format target with spaces to a width of 18 characters
    # You don't need to know or ever use this, but I think it's cool and handy.

                id:	<class 'int'> type
               url:	<class 'str'> type
            public:	<class 'bool'> type
          featured:	<class 'bool'> type
             added:	<class 'str'> type
          modified:	<class 'str'> type
         item_type:	<class 'NoneType'> type
        collection:	<class 'dict'> type
             owner:	<class 'dict'> type
             files:	<class 'dict'> type
              tags:	<class 'list'> type
     element_texts:	<class 'list'> type
extended_resources:	<class 'dict'> type


So, this thing has some basic properties, and something that sounds interesting: "element_texts". Each piece of metadata about an Item in Omeka is referred to as an "element", so this sounds promising. Since it's a list, let's look at the first item in that list.

In [84]:
demo_item['element_texts'][0]

{'element': {'id': 50,
  'name': 'Title',
  'resource': 'elements',
  'url': 'http://demo.omeka.fas.harvard.edu/api/elements/50'},
 'element_set': {'id': 1,
  'name': 'Dublin Core',
  'resource': 'element_sets',
  'url': 'http://demo.omeka.fas.harvard.edu/api/element_sets/1'},
 'html': False,
 'text': 'Complete scroll recto (seq. 1)'}

### Element stuff

So there are some different parts of each element text, including the element, element set, text, and what looks like a boolean flag for whether or not this text should be rendered as HTML. That's kind of complicated and annoying. Let's think about this real hard for a little bit so we don't have to think about it again.

## New function!

Since making items isn't exactly intuitive, let's make a quick function to construct items from dictionaries. We're relying on some things specific to this site, namely the IDs of each element, and for a more general solution we'd want to do something more nuanced than hard coding those IDs into our workflow. For now, though, this is a workable solution.

Also, you'll notice we're only specifying the element ID, because it turns out that's all you actually need when creating a new Item in Omeka.

In [44]:
def make_item(element_texts):
    """
    Takes a dictionary with format {element_id:element_text, ...}
    """
    base_item = {
        'element_texts':[],
        'featured': False,
        'public': True,
    }
    for _id, text in element_texts.items():
        element = {
            'element': { 'id': int(_id) },
            'text': text,
            'html': True
        }
        base_item['element_texts'].append(element)
    return base_item

In [45]:
test = {
    50: 'A Test Item',
    41: "The description of the test item. It might be a bit longer, which is fine since it won't be used as a page title or anything."
}
test_item = make_item(test)
print(test_item)

{'element_texts': [{'element': {'id': 50}, 'text': 'A Test Item', 'html': True}, {'element': {'id': 41}, 'text': "The description of the test item. It might be a bit longer, which is fine since it won't be used as a page title or anything.", 'html': True}], 'featured': False, 'public': True}


## POST new data

Now that we have content to upload, let's take a look at how we'll do that. We're using a different method of sending data to the url, you'll notice. We're POSTing data, which usually means we're adding something new. We can still use our `params` argument, but our data is in our `json` argument.

The `requests` module has this as a convenient parameter, so you don't have to turn your dictionary into a string to use it as a data payload. Since this is such a common task, `requests` has built it into this method call so we can just use the dictionary object we've created.

We'll still get a response, but in this case, we'll get a representation of the item that we just created, as long as it was created successfully. We know this from the documentation, which tells us what response to expect from each kind of query we can send to the items API endpoint.

In [46]:
R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=test_item, params={'key':omeka_api_key})

In [47]:
R.json()

{'added': '2018-11-07T00:12:21+00:00',
 'collection': None,
 'element_texts': [{'element': {'id': 50,
    'name': 'Title',
    'resource': 'elements',
    'url': 'http://demo.omeka.fas.harvard.edu/api/elements/50'},
   'element_set': {'id': 1,
    'name': 'Dublin Core',
    'resource': 'element_sets',
    'url': 'http://demo.omeka.fas.harvard.edu/api/element_sets/1'},
   'html': True,
   'text': 'A Test Item'},
  {'element': {'id': 41,
    'name': 'Description',
    'resource': 'elements',
    'url': 'http://demo.omeka.fas.harvard.edu/api/elements/41'},
   'element_set': {'id': 1,
    'name': 'Dublin Core',
    'resource': 'element_sets',
    'url': 'http://demo.omeka.fas.harvard.edu/api/element_sets/1'},
   'html': True,
   'text': "The description of the test item. It might be a bit longer, which is fine since it won't be used as a page title or anything."},
  {'element': {'id': 64,
    'name': 'UUID',
    'resource': 'elements',
    'url': 'http://demo.omeka.fas.harvard.edu/api/elem

## Functions in functions

We can make another function to take a dictionary that represents our item in a pretty convenient way and add that directly to Omeka. We're using the function that we made to create an item within this function, so we don't have to add that functionality to this function too. We might want to keep these functions separate, in case we want to use the `make_item` function on its own for some other purpose, like creating several items and then adding them all at once.

Each of the functions has a short string right after the definition, surrounded by triple quotes. This lets you have a multi-line string, and is almost always used the way we're using it here, to define a docstring for a function. This is basically a standard for the helper text you'll see with shift+tab on your functions. Try it when we use the functions!

In [54]:
def add_item_to_omeka(element_texts):
    """Add an item to our Omeka site"""
    item = make_item(element_texts)
    R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=item, params={'key':omeka_api_key})
    return R.json()

In [56]:
def get_manifest(record):
    """Get a manifest from a HAM object record"""
    try:
        for see in record['seeAlso']:
            if see['type'] == "IIIF Manifest":
                manifest = requests.get(see['id']).json()
                return manifest
    except:
        return None

In [57]:
def get_canvas(manifest):
    """Get the first canvas from a IIIF manifest"""
    try:
        return manifest['sequences'][0]['canvases'][0]
    except:
        return None

We need another library for the next function, so we're importing it here. Normally, it's a good practice to move any new libraries you find you'll need to the top of your script or notebook, but we're breaking rules* today!

**really more of best practices*

In [58]:
import json 

In [59]:
def record_to_omeka(record):
    """Take a Harvard Art Museum object record and add it to our Omeka site"""
    
    # Set up info we will add to the item
    title = record['title']
    description = record['description']
    manifest = get_manifest(record)
    canvas = get_canvas(manifest)
    
    # Set up element texts, using element IDs figured out from other item JSON representations
    element_texts = {
        '50': title,              # DC:Title
        '41': description,        # DC:Description
        '39': 'Your name here!',  # DC:Creator
        '56': json.dumps(canvas)  # IIIF:Json Data
    }
    
    # Add the item to Omeka
    added_item = add_item_to_omeka(element_texts)
    
    # If there's a manifest, there's a thumbnail to add to enable the image viewer
    if manifest != None:
        
        # get the thumbnail from the manifest as a raw file object
        thumbnail = requests.get(manifest['thumbnail']['@id']).content
        
        # Set up file data
        files = {
            "file":(manifest['thumbnail']['@id'], thumbnail, "application/octet-stream"),
            "data":(None, json.dumps({'item':{'id':added_item['id']}}))
        }
        
        # Post data to file endpoint of the API
        R = requests.post(
            "https://demo.omeka.fas.harvard.edu/api/files", 
            params={"key":omeka_api_key}, 
            files=files
        )
    
    # Return the JSON representation of the new item
    return added_item

In [60]:
added_item = record_to_omeka(unknown_records[7])

# Mischief Managed!

Now we have the tools at our disposal to take content from the Harvard Art Museums and put it into an Omeka site, or export it so we can analyze it somewhere else. We're ready for a grand data heist!*

**not a real heist, we are using freely available data that th