# Working with APIs in Python

Making API requests in Python can be really simple. There's a low-level module called urllib that can also make the kinds of web requests that we want, but it's not as friendly as the `requests` module, which we'll be using.

In [2]:
import requests

## Authentication

You'll have to authenticate each request to Geonames with your username. Other APIs may require a key or token of some kind, but GeoNames has some very simple authentication, which makes things easy for us. You can sign up for a username [here](http://www.geonames.org/login).

In [28]:
USERNAME = 'cdc43339' # Enter your own username here

## Query Parameters

We're going to be using the same base URL for all of our requests, but we're also going to want to use several different query parameters to search through the GeoNames database. We have a full list of available parameters [here](), but for now we're going to set up the simplest keyword search, using a Python dictionary to contain our parameter / value pairs.

In [31]:
url = "http://api.geonames.org/searchJSON"
Q = {
    'q':'Cambridge',
    'username':USERNAME
}

### Refresher on Dictionaries

Python dictionaries are sets of key / value pairs, where a value can be accessed by its key. You're essentially naming a value in a container, so you can easily call it up later.

Dictionaries have very fast lookups, so you can get a value from its key very quickly, no matter how large the dictionary is. However, they are also unordered, so if you iterate through all of the key / value pairs in the dictionary, there's no guarantee that they'll be in the same order.

We're just going to be looking up data in dictionaries, so here's a quick refresher on the syntax:

In [32]:
Q['q']

'Cambridge'

In [33]:
Q['username'] # This also works when we've set the value to another variable

'cdc43339'

In [34]:
Q['q'] = 'Somerville' # You can also set the value of a key like you would a variable

## Making a Request

We have a base url, and we have our query parameters in a dictionary, so now we can make our request, like so:

In [35]:
R = requests.get(url,params=Q)

### Formatted parameters

That request has created a request object, which contains not only the data that we get from GeoNames, but information on the request we sent, like the URL that it used. Notice that requests has turned our query parameter dictionary into a GET request at the end of our URL.

If you've been working with API requests or web scraping before, you might be used to seeing URLs get constructed like this:

```python
url = "http://api.geonames.org/searchJSON?q=" + query + "&username=" + username
```

If you have, I'm sure you'll appreciate how much simpler this is, especially when dealing with more query parameters.

In [36]:
R.url

'http://api.geonames.org/searchJSON?q=Somerville&username=cdc43339'

### Taking a look at the results

Request objects have a built-in method, `.json()`, which converts a JSON file received as a response to a request from a string of text that happens to be in this data format into Python native data structures, like lists, dictionaries, numbers and strings. We can use this method to see a pretty friendly view of what we've gotten as a response to our request.

In [37]:
R.json()

{'geonames': [{'adminCode1': 'MA',
   'adminCodes1': {'ISO3166_2': 'MA'},
   'adminName1': 'Massachusetts',
   'countryCode': 'US',
   'countryId': '6252001',
   'countryName': 'United States',
   'fcl': 'P',
   'fclName': 'city, village,...',
   'fcode': 'PPL',
   'fcodeName': 'populated place',
   'geonameId': 4951257,
   'lat': '42.3876',
   'lng': '-71.0995',
   'name': 'Somerville',
   'population': 80318,
   'toponymName': 'Somerville'},
  {'adminName1': '',
   'countryCode': 'AQ',
   'countryId': '6697173',
   'countryName': 'Antarctica',
   'fcl': 'T',
   'fclName': 'mountain,hill,rock,... ',
   'fcode': 'ISL',
   'fcodeName': 'island',
   'geonameId': 6625691,
   'lat': '-65.37611',
   'lng': '-64.30702',
   'name': 'Somerville Island',
   'population': 0,
   'toponymName': 'Somerville, islote'},
  {'adminCode1': '07',
   'adminCodes1': {'ISO3166_2': 'VIC'},
   'adminName1': 'Victoria',
   'countryCode': 'AU',
   'countryId': '2077456',
   'countryName': 'Australia',
   'fcl':

## Changing our request

Looking at the response, the Somerville in Massachusetts seems to be at the top of the pile. That's great for us, but sometimes you actually are looking for "Paris, TX" and not "Paris, France". In that case, you'll want to narrow your results. Let's try that with our request.

We can specify the country we're talking about, and we can also specify the state abbreviation under "adminCode1", which is a general purpose place to put abbreviation for things like states and provinces. Finally, we can specify the kind of place that we want with "featureCode" set to "PPL" for "populated place", so we don't get various entries like schools and economic regions.

All of these parameters are being drawn from the [GeoNames search API documentation](http://www.geonames.org/export/geonames-search.html), but it's also useful to look at the search results that you receive for filtering ideas. Usually, the information that tells you that a result is irrelevant to your query is information that you can put into a filter. That's how I found the "adminCode1" and "featureCode" parameters.

In [39]:
Q['country'] = 'US'
Q['adminCode1'] = 'MA' 
Q['featureCode'] = 'PPL' # These keys didn't exist until we set them to something! We can create keys this way.
Q

{'adminCode1': 'MA',
 'country': 'US',
 'featureCode': 'PPL',
 'q': 'Somerville',
 'username': 'cdc43339'}

In [41]:
R = requests.get(url,params=Q)

In [43]:
R.url

'http://api.geonames.org/searchJSON?q=Somerville&username=cdc43339&country=US&adminCode1=MA&featureCode=PPL'

In [44]:
R.json()

{'geonames': [{'adminCode1': 'MA',
   'adminCodes1': {'ISO3166_2': 'MA'},
   'adminName1': 'Massachusetts',
   'countryCode': 'US',
   'countryId': '6252001',
   'countryName': 'United States',
   'fcl': 'P',
   'fclName': 'city, village,...',
   'fcode': 'PPL',
   'fcodeName': 'populated place',
   'geonameId': 4951257,
   'lat': '42.3876',
   'lng': '-71.0995',
   'name': 'Somerville',
   'population': 80318,
   'toponymName': 'Somerville'},
  {'adminCode1': 'MA',
   'adminCodes1': {'ISO3166_2': 'MA'},
   'adminName1': 'Massachusetts',
   'countryCode': 'US',
   'countryId': '6252001',
   'countryName': 'United States',
   'fcl': 'P',
   'fclName': 'city, village,...',
   'fcode': 'PPL',
   'fcodeName': 'populated place',
   'geonameId': 4955085,
   'lat': '42.39593',
   'lng': '-71.12255',
   'name': 'West Somerville',
   'population': 0,
   'toponymName': 'West Somerville'},
  {'adminCode1': 'MA',
   'adminCodes1': {'ISO3166_2': 'MA'},
   'adminName1': 'Massachusetts',
   'countryC

## Try it out!

Now it's your turn. Try limiting your search to a particular part of the state of Massachusetts! You can always look back at the [API Documentation](http://www.geonames.org/export/geonames-search.html) for ideas.

*Hint: [this website](http://boundingbox.klokantech.com/) might be of assistance. I've found the DublinCore output to be pretty readable.*

In [45]:
Q

{'adminCode1': 'MA',
 'country': 'US',
 'featureCode': 'PPL',
 'q': 'Somerville',
 'username': 'cdc43339'}

## Working with our results

We can set up our results as their own object that we can work with. Don't worry, you're not re-requesting the data every time you work with the request object "R", the online request gets made once when the object is created, and from then on, the request object is just information about that request stored in memory.

In [46]:
result = R.json()

In [47]:
type(result)

dict

In [48]:
result['geonames'][0] # Here is the first response, since list slicing starts counting from 0

{'adminCode1': 'MA',
 'adminCodes1': {'ISO3166_2': 'MA'},
 'adminName1': 'Massachusetts',
 'countryCode': 'US',
 'countryId': '6252001',
 'countryName': 'United States',
 'fcl': 'P',
 'fclName': 'city, village,...',
 'fcode': 'PPL',
 'fcodeName': 'populated place',
 'geonameId': 4951257,
 'lat': '42.3876',
 'lng': '-71.0995',
 'name': 'Somerville',
 'population': 80318,
 'toponymName': 'Somerville'}

In [None]:
print(result['geonames'][0]['lat'], result['geonames'][0]['lng']) # Prints the lat and the long from the first result

## Making a function

That's all well and good, but if we had to write all that code for every lat/long pair that we wanted to get, we wouldn't be saving ourselves any time. However, we can create a function to do our search for us, and return the information that we're interested in.

If you're still pretty new to Python, you may not have made many functions, so we'll go over the basics.

When you set up a function, you start with something like this:

```python
def somefunction(argument1, argument2, argument3=None):
```

This says "I'm making a function called `somefunction`, and when you run this function, you can specify some parameters that I'm going to calll `argument1`, `argument2`, and `argument3`. Also, you don't have to specify `argument3`. It'll default to `None` if you don't specify it."

Within the context of your function, you can do whatever you want with those inputs. In our case, we're using them to make an API request. The code inside is up to you, the important thing is that you end with a `return` statement. That's what makes your function spit out something. In our case, we want the latitude and longitude back from our function.

You'll notice that we're returning two values, separated by a comma, but not enclosed by anything. When you do that in a return statement, Python interprets that as "return a tuple", which is a kind of container, very similar to a list, but not set up to be modifiable. We can get away with using this very simple data container, so we'll use it.

In [None]:
def search_us_place(placename,state_abbrev):
    url = "http://api.geonames.org/searchJSON"
    Q = {
        'q':placename,
        'country':'US',
        'username':USERNAME,
        'adminCode1':state_abbrev
    }
    results = requests.get(url,params=Q).json()
    top_result = results['geonames'][0]
    return top_result['lat'], top_result['lng']

In [None]:
test = search_us_place('Worcester','MA')
print(type(test))
print(test)

## Try it out!

Try making your own function to get some data that is of interest to you. If you want an idea to implement, imagine that you have a lot of location names in English, and you want to get the Mandarin names. How would you write a function that returns those names from a search query in English?

Don't forget to check the [documentation,](http://www.geonames.org/export/geonames-search.html) and you might have to do a bit of Googling.

In [None]:
def myfunc():
    return 0

## Applying our function to a dataset

Now that we have a function to give us lat/long pairs from a city name and state, we can actually apply it to a dataset. I've prepared a small dataset: a list of botanical gardens and arboretums in Massachusetts ([from Wikipedia](https://en.wikipedia.org/wiki/List_of_botanical_gardens_and_arboretums_in_Massachusetts)). 

The way I like to do that is with the Pandas module in Python. Pandas is a module that has a lot of tools for working with tabular data, and is a favorite among data scientists. We're using it today because it makes working with CSV files really really easy.

In [None]:
import pandas as pd

### Don't panic...

This cell is going to produce an error. Take a look at the error message and our csv file to see if you can figure out what the problem is. You can look ahead to the next cell for the fix, but that'd be cheating!

In [None]:
df = pd.read_csv('botanical_gardens.csv')
df.head()

In [None]:
df = pd.read_csv('botanical_gardens.csv',sep='\t') # Working file opener with separator as tabs
df.head()

### Being nice to APIs

A lot of APIs, especially smaller APIs, aren't built to withstand a bunch of users hammering away at them as fast as their processors will allow. APIs like some of Google's have explicit rate limits, some of which are quite high, because the back ends are so robust. For our purposes, we'll go nice and easy with a 1 second delay between requests.

You'll also notice that there are some new print statements. If you're going to be running a function a whole bunch of times, it's good to get feedback on where it is in its loop, and so if things go wrong, you know where they went wrong.

In [None]:
import time

def search_us_place(placename,state_abbrev):
    url = "http://api.geonames.org/searchJSON"
    Q = {
        'q':placename,
        'country':'US',
        'username':USERNAME,
        'adminCode1':state_abbrev
    }
    print('making request for {}, {}'.format(placename,state_abbrev))
    R = requests.get(url,params=Q)
    print(R.url)
    results = R.json()
    top_result = results['geonames'][0]
    print('got result for {}, {}'.format(placename,state_abbrev))
    time.sleep(1)
    return top_result['lat'], top_result['lng']

In [None]:
search_us_place('Boston','MA') # Now we'll run this new function

### Applying functions to our csv

Another nice thing about pandas is how efficiently the library applies functions to dataframes. We want to apply our function to the spreadsheet. Since all of our data is in the same state, we can send `'MA'` as the state abbreviation every time.

We can accomplish that with a lambda function, which is just an anonymous function to use in a place where you need a function, but you don't think you'll need to reuse that function. We're using it to apply a function with a consistent parameter, but you can use it for simpler things like adding a number or a prefix to a column.

We're setting the results of that function to the variable `latLngs`.

In [None]:
latLngs = df.City.apply(lambda x: search_us_place(x,'MA'))

### What is our output?

If you look at our `latLngs` output, you'll notice that it's pretty similar to other columns in our existing dataframe. That's because it's the same sort of thing, except that our lat/longs are still tuples. We could have a column of tuples, but wouldn't it be nice to have two columns, one for latitude and one for longitude?

In [None]:
latLngs

In [None]:
df.City

### Zip up the results

You'll find a more detailed explanation of why this works at the bottom of this notebook, but for now, just know that this is a way to set multiple columns from output like what we have from applying our function.

In [None]:
df['lat'],df['lng'] = zip(*latLngs)

In [None]:
df.head()

## Saving the results

Saving a csv file is incredibly easy, you can do it with just a filename. However, it does add the index as an extra column of numbers, which we don't need in our dataset, so I'm choosing to omit that.

In [None]:
df.to_csv('botanical_gardens_with_location.csv',index=None)

## Try it out!

Using the function that you made earlier, add a new column (or columns!) to our dataframe, then save it with an appropriate filename.

In [None]:
# Put your code here, adding more cells as needed

# Data enriched!

That's it, we've gone through the whole process of using an API to enrich data in python. To review, we:

* Explored the API we wanted to use
* Built a function to make a request we wanted
* Imported our csv data
* Ran our function on our dataset
* Saved our outpu

Whatever API you use to enrich a dataset, you can typically use some variation of this workflow. I recommend using a notebook like this one, especially for early explorations, so you can leave a record of your exploration process, and re-use portions of code elsewhere.

# Reverse the flow!

We can also use APIs to add data from our datasets to a website, too.

Here, we'll use the API for Omeka. Omeka is a content management system, like WordPress, but focused on making the collections of libraries, archives, and museums more easily accessible on the web. It's built around the concept of items, and focuses on describing those items, collecting them sensibly, and incorporating them into online narratives.

The site we'll be using is the site that we use for testing our Omeka service here: http://testing.omeka-dev.fas.harvard.edu/

You'll find documentation for the API here: http://omeka.readthedocs.io/en/latest/Reference/api/index.html

Our goal for this portion will be to use the documentation and what we've already learned to create items in Omeka representing each of the places in our dataframe.

Before we get started on that, we'll want to see how the API represents items, so we can copy that when creating new ones.

In [None]:
omeka_api_key = '' # We'll give you a key to use for the site

In [None]:
# We're not using the key yet, since we're just viewing public information
R = requests.get('http://demo.omeka.fas.harvard.edu/api/items/36')
demo_item = R.json()

In [None]:
demo_item

## New function!

Since making items isn't exactly intuitive, let's make a quick function to construct items from dictionaries. We're relying on some things specific to this site, namely the IDs of each element, and for a more general solution we'd want to do something more nuanced than hord coding those IDs into our workflow. For now, though, this is a workable solution

In [None]:
def make_item(element_texts):
    """
    Takes a dictionary with format {element_id:element_text, ...}
    """
    base_item = {
        'element_texts':[],
        'featured': False,
        'public': True,
    }
    for _id, text in element_texts.items():
        element = {
            'element': { 'id': int(_id) },
            'text': text,
            'html': True
        }
        base_item['element_texts'].append(element)
    return base_item

In [None]:
test = {
    50: 'A Test Item',
    41: "The description of the test item. It might be a bit longer, which is fine since it won't be used as a page title or anything."
}
test_item = make_item(test)
print(test_item)

## POST new data

Now that we have content to upload, let's take a look at how we'll do that. We're using a different method of sending data to the url, you'll notice. We're POSTing data, which usually means we're adding something new. We can still use our `params` argument, but our data is in our `json` argument.

The `requests` module has this as a convenient parameter, so you don't have to turn your dictionary into a string to use it as a data payload. Since this is such a common task, `requests` has built it into this method call so we can just use the dictionary object we've created.

We'll still get a response, but in this case, we'll get a representation of the item that we just created, as long as it was created successfully. We know this from the documentation, which tells us what response to expect from each kind of query we can send to the items API endpoint.

In [None]:
R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=test_item, params={'key':omeka_api_key})

In [None]:
R.json()

## Functions in functions

We can make another function to take a dictionary that represents our item in a pretty convenient way and add that directly to Omeka. We're using the function that we made to create an item within this function, so we don't have to add that functionality to this function too. We might want to keep these functions separate, in case we want to use the `make_item` function on its own for some other purpose, like creating several items and then adding them all at once.

The second function is designed to go from a single row from our spreadsheet directly to the creation of an Item in Omeka. You can change the items that you add by changing this function. In fact, I encourage you to do so, since we'll all be adding items, and it'll be hard to tell which ones are yours. 

In [None]:
def add_item_to_omeka(element_texts):
    item = make_item(element_texts)
    R = requests.post('http://demo.omeka.fas.harvard.edu/api/items',json=item, params={'key':omeka_api_key})
    return R.json()

In [None]:
def row_to_omeka(row):
    title = row['Name']
    description = "Located in {}, MA ({},{})".format(row['City'],row['lat'],row['lng'])
    element_texts = {
        '50': title,
        '41': description,
        '39': 'Your name here!'
    }
    response = add_item_to_omeka(**element_texts)
    return response

In [None]:
df.apply(row_to_omeka,axis=1)

# Addendum: The zip() function

In case you're curious, this is how that funky line we used to assign multiple columns worked:

```python
df['lat'],df['lng'] = zip(*latLngs)
```

The `zip()` function returns what's called a "generator" object, which is a Python object that won't show you its contents, but rather contains instructions to quickly generate those contents on the fly. You'll find something similar with the `range()` function.

In [None]:
print(zip([1,2],[3,4],[5,6],[7,8]))

So, to look inside, we have to iterate through and print the contents. Notice that I'm providing these lists as individual arguments. Zip takes as many arguments as you care to give it, through the `*args` keyword in its function definition. This means that there will be a list `args` in the function that something happens to. In this case, we're taking the first value of each list, and making one tuple from those values, and taking the second value and making another list. This will work no matter how many items are in each list.

In [None]:
for z in zip([1,2],[3,4],[5,6],[7,8]):
    print(z)

You may be thinking "Great! I have a list of lists, I'll just put that in as my argument!"

Let's see what happens when you do that:

In [None]:
test = [[1,2],[3,4],[5,6],[7,8]]
for z in zip(test):
    print(z)

That doesn't look like what we want. What we did in that line was to pass the whole list as a single argument, when what we wanted was to pass each list item as its own argument, like this:

In [None]:
for z in zip(test[0],test[1],test[2],test[3]):
    print(z)

However, that would be a lot of typing for a long list. Fortunately, we can prefix our list with `*`, to do just what we want, and pass our entire list directly to the function's `*args` keyword.

In [None]:
for z in zip(*test):
    print(z)

But how does that help our problem? Well, our collection of lat/long coordinates is iterable in the same way that a list is, so in this case we can use it in the same way. This next block is just a demonstration of that iterability.

In [None]:
for ll in latLngs:
    print(ll)

And now, we have all of the pieces together, so we can see that the output of our `zip()` function is in fact two tuples, each with one of our desired columns of data. Since they're each the same length as our dataframe (being derived from it), we can turn them into new columns just by setting the new columns equal to them.

In [None]:
for z in zip(*latLngs):
    print(z)