# Uploading data

This is a part of the workshop that we've tried to run in the past, but run out of time. Still, it's a nice follow-up to the workshop content using the Harvard Art Museums as an example of how you can use APIs to not only gather data, but also programmatically interact with web services to accomplish tasks.

# Reverse the flow!

Now that we've got some information from the Harvard Art Museum API, let's look at how we can send that information somewhere else to add content to another site. 

Here, we'll use the API for Omeka. Omeka is a content management system, like WordPress, but focused on making the collections of libraries, archives, and museums more easily accessible on the web. It's built around the concept of items, and focuses on describing those items, collecting them sensibly, and incorporating them into online narratives.

The site we'll be using is the site that we use for testing our Omeka service here: http://demo.omeka-dev.fas.harvard.edu/

You'll find documentation for the API here: http://omeka.readthedocs.io/en/latest/Reference/api/index.html

Our goal for this portion will be to use the documentation and what we've already learned to create items in Omeka representing each of the places in our dataframe.

Before we get started on that, we'll want to see how the API represents items, so we can copy that when creating new ones.

In [None]:
omeka_api_key = '11db6a2b70226f1c55b63a6df75e8093e9bcd01a' # We'll give you a key to use for the site

In [None]:
R = requests.get('http://demo.omeka.fas.harvard.edu/api/items/378', params={'key':omeka_api_key})
demo_item = R.json()

In [None]:
demo_item

## Well that looks weird

There are a lot of different fields, and some of them have a few different layers. It's a lot less flat than our HAM response. Let's take a look at it in a different way...

In [None]:
for k,v in demo_item.items():
    print("{0: >18}:\t{1} type".format(k, type(v)))
    # I just went a bit wild with string formatting in the line above, documentation here:
    # https://docs.python.org/2/library/string.html#formatstrings
    # Basically, I right justified the first format target with spaces to a width of 18 characters
    # You don't need to know or ever use this, but I think it's cool and handy.

So, this thing has some basic properties, and something that sounds interesting: "element_texts". Each piece of metadata about an Item in Omeka is referred to as an "element", so this sounds promising. Since it's a list, let's look at the first item in that list.

In [None]:
demo_item['element_texts'][0]

### Element stuff

So there are some different parts of each element text, including the element, element set, text, and what looks like a boolean flag for whether or not this text should be rendered as HTML. That's kind of complicated and annoying. Let's think about this real hard for a little bit so we don't have to think about it again.

## New function!

Since making items isn't exactly intuitive, let's make a quick function to construct items from dictionaries. We're relying on some things specific to this site, namely the IDs of each element, and for a more general solution we'd want to do something more nuanced than hard coding those IDs into our workflow. For now, though, this is a workable solution.

Also, you'll notice we're only specifying the element ID, because it turns out that's all you actually need when creating a new Item in Omeka.

In [None]:
def make_item(element_texts):
    """
    Takes a dictionary with format {element_id:element_text, ...}
    """
    base_item = {
        'element_texts':[],
        'featured': False,
        'public': True,
    }
    for _id, text in element_texts.items():
        element = {
            'element': { 'id': int(_id) },
            'text': text,
            'html': True
        }
        base_item['element_texts'].append(element)
    return base_item

In [None]:
test = {
    50: 'A Test Item',
    41: "The description of the test item. It might be a bit longer, which is fine since it won't be used as a page title or anything."
}
test_item = make_item(test)
print(test_item)

## POST new data

Now that we have content to upload, let's take a look at how we'll do that. We're using a different method of sending data to the url, you'll notice. We're POSTing data, which usually means we're adding something new. We can still use our `params` argument, but our data is in our `json` argument.

The `requests` module has this as a convenient parameter, so you don't have to turn your dictionary into a string to use it as a data payload. Since this is such a common task, `requests` has built it into this method call so we can just use the dictionary object we've created.

We'll still get a response, but in this case, we'll get a representation of the item that we just created, as long as it was created successfully. We know this from the documentation, which tells us what response to expect from each kind of query we can send to the items API endpoint.

In [None]:
R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=test_item, params={'key':omeka_api_key})

In [None]:
R.json()

## Functions in functions

We can make another function to take a dictionary that represents our item in a pretty convenient way and add that directly to Omeka. We're using the function that we made to create an item within this function, so we don't have to add that functionality to this function too. We might want to keep these functions separate, in case we want to use the `make_item` function on its own for some other purpose, like creating several items and then adding them all at once.

Each of the functions has a short string right after the definition, surrounded by triple quotes. This lets you have a multi-line string, and is almost always used the way we're using it here, to define a docstring for a function. This is basically a standard for the helper text you'll see with shift+tab on your functions. Try it when we use the functions!

In [None]:
def add_item_to_omeka(element_texts):
    """Add an item to our Omeka site"""
    item = make_item(element_texts)
    R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=item, params={'key':omeka_api_key})
    return R.json()

In [None]:
def get_manifest(record):
    """Get a manifest from a HAM object record"""
    try:
        for see in record['seeAlso']:
            if see['type'] == "IIIF Manifest":
                manifest = requests.get(see['id']).json()
                return manifest
    except:
        return None

In [None]:
def get_canvas(manifest):
    """Get the first canvas from a IIIF manifest"""
    try:
        return manifest['sequences'][0]['canvases'][0]
    except:
        return None

We need another library for the next function, so we're importing it here. Normally, it's a good practice to move any new libraries you find you'll need to the top of your script or notebook, but we're breaking rules* today!

**really more of best practices*

In [None]:
import json 

In [None]:
def record_to_omeka(record):
    """Take a Harvard Art Museum object record and add it to our Omeka site"""
    
    # Set up info we will add to the item
    title = record['title']
    description = record['description']
    manifest = get_manifest(record)
    canvas = get_canvas(manifest)
    
    # Set up element texts, using element IDs figured out from other item JSON representations
    element_texts = {
        '50': title,              # DC:Title
        '41': description,        # DC:Description
        '39': 'Your name here!',  # DC:Creator
        '56': json.dumps(canvas)  # IIIF:Json Data
    }
    
    # Add the item to Omeka
    added_item = add_item_to_omeka(element_texts)
    
    # If there's a manifest, there's a thumbnail to add to enable the image viewer
    if manifest != None:
        
        # get the thumbnail from the manifest as a raw file object
        thumbnail = requests.get(manifest['thumbnail']['@id']).content
        
        # Set up file data
        files = {
            "file":(manifest['thumbnail']['@id'], thumbnail, "application/octet-stream"),
            "data":(None, json.dumps({'item':{'id':added_item['id']}}))
        }
        
        # Post data to file endpoint of the API
        R = requests.post(
            "https://demo.omeka.fas.harvard.edu/api/files", 
            params={"key":omeka_api_key}, 
            files=files
        )
    
    # Return the JSON representation of the new item
    return added_item

In [None]:
added_item = record_to_omeka(unknown_records[7])

# Mischief Managed!

Now we have the tools at our disposal to take content from the Harvard Art Museums and put it into an Omeka site, or export it so we can analyze it somewhere else. We're ready for our big data heist!*

**Not a real heist, we are using freely available data that the museum has generously made available. Please do not steal any physical art.*