# API Documentation

## 1. Introduction

### 1.1. Installation of Python3

In this tutorial, I will use some examples to illustrate how to use Python to obtain the information from the [UMedia](https://umedia.lib.umn.edu/) sites. The version I use is Python3 (more specifically, the version is 3.7.4). 

The official website of Python3 is: https://www.python.org/downloads/.

This third-party website introduces how to download Python3 in different operating systems: https://realpython.com/installing-python/. 

### 1.2. Required packages

After the environment has been set up, let's first import several required packages: 

The package named 'json' is a python built-in module that functions as a json encoder and decoder. With this module, users can both encrypt data into json format and decrypt json file. 

The package called 'requests' is a Python module that you can use to send all kinds of HTTP requests. It is an easy-to-use library with a lot of features ranging from passing parameters in URLs to sending custom headers and SSL Verification. In this tutorial, you will learn how to use this library to send simple HTTP requests in Python. 

In [2]:
import requests
import json

## 2. Formatting the URLs

Creating the URLs is a quite important step in using the API to download the information from the UMedia. This is because we need the URLs to make the request to the website. If the API we will talk about later is the car, then the URLs are just like the steering wheel of the car. Without it the API would never know the direciton.

We have a tutorial on how to format the URLs for the contents in the UMedia. There are a bunch of tools, examples, and tips on creating the URLs in that tutorial. The users may refer to the following two pages: 

https://github.com/liu00222/api_document/blob/master/json_umedia/json_umedia.md
https://github.com/liu00222/api_document/blob/master/IIIF_umedia/construct_iiif_umedia.md

## 3. Using the Packages

### 3.1. Download json from UMedia

In this part, we will focus on how to use the relative packages and their built-in functions to download the required data from the UMedia. As an example, we use the UMedia search with collection being Digitizing Imigrant Letters. 

As mentioned in part 2 about how to format the URLs, we can create the url and store it into the variable called "my_url". Note that we can concatenate the strings by the "plus operation" in Python. 

In [3]:
base_url = "https://umedia.lib.umn.edu/search.json?"
my_filter = "facets%5Bcollection_name_s"
my_key = "%5D%5B%5D=Digitizing+Immigrant+Letters"

my_url = base_url + my_filter + my_key

Also, notice that in default, the url we created just now would only include the first 20 items that are most relavant to the search keyword. In other word, if we simply use the url we form above, we would only download the first twenty items. 

In order to download more items, we can concatenate the following thing to the end of the url we created. For example, 50 items:

In [4]:
my_item_num = "&rows=50"

my_url_50 = my_url + my_item_num

To make things easier, we will keep using the url (variable called 'my_url') which includes 20 items in this documentation. After creating the URL, we can make the request to get the source code downloaded. This can be done through the "get()" function in the "requests" package: 

In [5]:
r = requests.get(my_url)

The variable called "r" to receive the results is a response object. It is like an encrypted json raw data, which is impossible for human to read and understand. This is where we should apply the imported "json" package. We can use the "loads" function in "json" to decrypt the result: 

In [6]:
data = json.loads(r.text)

Now, all the information has been stored in the variable called "data", which is a list of dictionary. This may be a little bit confusing. How does the content really look like in "data"? Let's first play around with it before actually working with it: 

Let's first check how many items are stored in the 'data'. This can be done by 'len' function, which is an abbreviation of 'length': 

In [9]:
print(len(data))

20


To access the first item in 'data', we use: 

In [1]:
data[0]

NameError: name 'data' is not defined

Why we use the number zero to get the first item? This is due to the index system in Python (and in a huge number of programming languages). In Python, the index of a list starts from 0. 

This leads to another thing to take care in mind. Suppose we have a list with 20 items, then we can access each item by list\[0\], ..., list\[19\]. Notice that we can't put '20' into the bucket, because it is actually indicating the 21st item! Since there are only 20 items in the list, Python would complain for this, just like the following:

In [13]:
# Python is comfortable with this
data[19]

# Python would complain for this
data[20]

IndexError: list index out of range

The next concern would be: what information are exactly stored in each item? 

To answer this question, we need to dive into the inner dictionary and see what information are stored in there. We can use the following code to stored the keys of the dictionary into the variable called "keys" and print them out by a for-loop:  

In [14]:
keys = list(data[0].keys())

for i in range(len(keys)):
    print(keys[i])

id
object
set_spec
collection_name
collection_name_s
collection_description
title
title_s
title_t
title_search
title_sort
description
date_created
date_created_ss
date_created_sort
creator
creator_ss
creator_sort
contributor
contributor_ss
notes
types
format
format_name
dimensions
subject
subject_ss
language
city
state
country
continent
parent_collection
parent_collection_name
contributing_organization
contributing_organization_name
contributing_organization_name_s
contact_information
local_identifier
dls_identifier
persistent_url
local_rights
page_count
record_type
first_viewer_type
viewer_type
attachment
document_type
featured_collection_order
date_added
date_added_sort
transcription
_version_
type
collection
is_compound
parent_id
thumb_url
thumb_cdn_url


From the above results, we can see a bunch of keys that we can use to grab the relative information, like its title, creator, transcript, page count, type, data added, etc. Here is an example of printing out the title and the transcript of the first item: 

In [15]:
print(data[0]["title"])

Alessandro Sisca (Riccardo Cordiferro) - letter to Lucia Fazio, 1900-10-18


In [16]:
print(data[0]["transcription"])

Boston, Mass., 18 Ottobre 1900 Carissima Lucia, Sai perchè t’accludo questa lettera di Falcidia? Per dimostrarti coi fatti che io ho scritto a Grella, prima pre- gandolo, [poscia?] insultando, per farti toccare con mano che io non mi ero mica messo d’accordo con lui onde abbandonarti, per farti in- fire ancora una volta palese la mia innocenza. E in tutte le nostre quistioni, assieme sempre così, sap- pilo. Ma quando ti correggerai? Quando muterai sistema? Intanto ieri scrissi a Grella un’altra lettera dicendogliene di cotte e di crude. Io ti consiglio per il tuo bene e per la mia pace di essere in av- venire un pochino più seria. Se Grella ti sembra un cafone, Fal- cidia un uomo tozzo, [Pascucci?] un delinguente, Mignone un guar- diano di porci, Marziale un ricattaro[?] ecc., ecc., ti potresti però anche sba- gliare e non dovresti esser così leggiera sul dar dei giudizii. Tanto meno avresti dovuto dire che io mi ero abbaccato[?] con Grella, e che so io. Prima di scrivere e di parlare 

To print out the titles of all 20 items we grabbed from UMedia, we can use a for-loop: 

In [None]:
for i in range(len(data)):
    print(data[i]["title"])

Now, we can work with our "data" variable to grab the required indormation. Here as an example, I will work to answer the following question: How to store the transcripts of all the items I just got from UMedia into my local machine? 

In Python3, we can use the "open()" function to create files in .txt format. This function takes two arguments in string type. The first argument specifies the name of the file. The second argument is a symbol that specifies the task we need to do with the file. To write things, we use the symbol "w+". Here the users can check more about the "open()" function and its usage: 
https://docs.python.org/3/library/functions.html. 

The following line of code does three things: 

1. create a file object called "f"
2. create a txt file called "Digitizing Immigrant Letters" on the local computer
3. specify that what we need to do with this file is to write things into it 

In [24]:
f = open("letters.txt", "w+")

After opening (or creating) the file, we are able to write things into it by "write()" function which belongs to the file object class. For our answer to the 3rd question, we would use a for-loop to iteratively write the transcripts into the file: 

In [25]:
for i in range(len(data)):
    f.write(data[i]["title"] + "\n" + data[i]["transcription"] + "\n\n")

Now if you check your local machine, there would be a file called "Digitizing Immigrant Letters.txt" in the same directory with the code that you just run! When opening it, you will see the title along with the transcript of each item with no suprise. 

Remember to close the file every time you open one. This would avoid the problem of the memory leak, which would lower your CPU performance. The code to do this is simply here: 

In [26]:
f.close()

### 3.2. Dowload Objects from UMedia

In this part, we will talk about how to use the API in Python to download the objects from UMedia. We will mainly focus on how to download static images in this documentation. 

To download objects like image, we need another built-in module in Python called "shutil". This module provides plenty of high-level operations on local files and collections of local files. In particular, functions are offered in this module which support file copying and removal. If you are interested in more details and functions in this module, you can see: 
https://docs.python.org/3/library/shutil.html

Run the following code first before starting: 

In [17]:
import shutil

Then, as we did in the last part, now we need to form the URL for the image we want to download. As an example, we use the image that we introduced in the [How to construct IIIF calls to UMedia](https://github.com/liu00222/api_document/blob/master/IIIF_umedia/construct_iiif_umedia.md) documentation: 

In [None]:
image_url = "https://cdm16022.contentdm.oclc.org/digital/iiif/p16022coll208/4833/full/full/0/default.jpg"

Next, we make requests to open the url image. 

In [None]:
resp = requests.get(image_url, stream = True)

Notice that, different from sending requests to json URL that we did in the previous part, here we need to set stream to 'True'. This is because the image information are binary instead of well-constructed in json format. 

Then, we open a local file to store the image that we grabbed: 

In [None]:
local_file = open('local_image.jpg', 'wb')

Also notice here, the second argument of the "open" function is 'wb' instead of 'w+'. This is beacuse we need to write binary data into the file, which is indicated by 'b' in the 'wb'. 

Before writing the data into the 'local_image.jpg' we just created, we must set the decode_content to be True! Otherwise the size of this file will always be zero, and thus cannot be opened by any computer. 

In [None]:
resp.raw.decode_content = True

Now, we are free to write the image data into the file. This is done by one line of code, using the 'copyfileobj' function in the shutil module. Don't forget to delete the image url response object before celebrating: 

In [None]:
# write the data into the local_file
shutil.copyfileobj(resp.raw, local_file)

# remove the image response object
del resp

After running all the lines of code above, you will surprisingly find that the image called "local_image.jpg" is in the same directory as the code! Now you know how to download the images from UMedia. 

This is the end of the tutorial. Thank you for reading! 