# Lab 7

## APIs

In the previous lab we saw how you can use the WWW as a source of data, even when the information presented is not there for the purpose of data analysis (i.e., it is there for you as a browser to read it).

Sometimes, though, you are lucky and the data source you're after will have an **API** set up so that you don't need to scrape it. An **API** is a detailed and rigorous set of rules that you must use in order to get something from a server.

In particular, many APIs in the web gives you back if you write the _right_ url.

Consider searching "something" on http://www.duckduckgo.com. If you take a look at the result page, you will see that the url is something like:  
https://duckduckgo.com/?q=something&t=h_&ia=about

The part before the `?` is the _base url_, the human readable name of the _server_ that is giving you the data. what is after the `?` defines the various argument of the duckduckgo API. `q` probably stands for "query", and after the `=` is where you put the terms you are looking for; `&` separates various arguments; `t` and `ia` define other aspects of the behaviour of the search engine.

You can pass more arguments, and change the behaviour of the server. For example, in duckduckgo we can appen a `format=xml` to get the data in an xml format (instead of the html format with all the fancy visualization stuff). We may do it because the xml format **is** intended for programmatic data extraction and we are trying to get that data. Try to browse to:

https://duckduckgo.com/?q=something&t=h_&ia=about&format=xml

Try a couple of other websites, you will notice that the `?argument=values` format is very common. Websites offering API for accessing their data often have a lot of information about how to do it.

For example, duckduckgo's API is explained here: https://duckduckgo.com/api

A great resources to learn about what are APIs is https://zapier.com/learn/apis/.

### httr

The tool of the trade for APIs interaction in R is the library `httr`. Get familiar with it reading the [introduction](http://httr.r-lib.org/articles/quickstart.html). If you are unfamiliar with how HTTP works (the common underlaying network protocol that rules the web) read also the two resources suggested at the start of the introduction. Or maybe ask your peer to explain you a bit.

GET("http://www.colourlovers.com/api/color/6B4106?format=xml"### API in the zoo

The website [numbersapi](http://www.numbersapi.com) offers a funny example to try and use a RESTful API. (You can use *different* APIs, take a look at [programmableweb](http://programmableweb.com) for a selection of available ones, and I actually encourage you to choose a different example if you are already familiar with the concept.)

Using some tool to deal with strings (I like `glue`, but you can do this stuff with the base `paste` if you are more comfy) write some examples of interaction with numbersapi (tl;dr, write some query string and feed it to `httr`'s `GET`: e.g., `GET("http://www.colourlovers.com/api/color/6B4106?format=xml")`).

_kudos if you use an APIs that allows you to POST, PUT, DELETE, ... instead of just GETting._

In [3]:
library(httr)
test <- GET("http://www.colourlovers.com/api/color/6B4106?format=xml")
test

Response [http://www.colourlovers.com/api/color/6B4106?format=xml]
  Date: 2022-10-08 06:28
  Status: 403
  Content-Type: text/html; charset=UTF-8
  Size: 7.09 kB
<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>Just a moment...</title>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=Edge" />
    <meta name="robots" content="noindex,nofollow" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <link href="/cdn-cgi/styles/challenges.css" rel="stylesheet" />
    
...

### API in the zoo but tamed

Dealing with strings is not ideal: you need to tweak theme everytime you want to perform a different query, and that opens the door to errors. Also, our credo is that everytime we have to repeat some task, it's better to write a function for it.

Thus, write wrapper functions to perform the queries you wrote in the previous exercise. Try and write both very specific, _atomic_, functions that do just one very specific thing, and some more general function that can do more than one thing combining the more atomic functions together.

For example, if you are using the numbersapi, write both something like `get_integer_math()` that only allows you to query `[integer]/math` (e.g., Ramanujan's taxi plate [1729](http://numbersapi.com/1729/math)); and something like `get_number_type()` which allows for different types specifications (trivia, math, date, or year).

_kudos if these functions check the inputs (e.g., for `get_integer_math`, the function checks wether the input is indeed an integer and returns an error otherwise) and handles the eventual errors risen by the APIs._


### API authorized

Not all APIs are free and completely open to everybody. Some of them require an autentification/authorization step: the server that is responding to your query wants to know who you are, because certain services are available only to some users.

A common method of authentification is called `OAuth` or `OAuth2.0` (if you are very curious, see [here](https://en.wikipedia.org/wiki/OAuth). `httr` has functions to create and handle ouaths. See this [paragraph](http://httr.r-lib.org/articles/api-packages.html#authentication) or read more [here](https://support.rstudio.com/hc/en-us/articles/217952868-Generating-OAuth-tokens-for-a-server-using-httr).

Register on https://developer.twitter.com/, obtain the _secrets_, register an app (it does not matter which website you provide) and write the code to connect to it (see [here](https://developer.twitter.com/) for more details).

### API in the wild

Sometimes you are fortunate enought that the information you are looking for is provided by a website through an API. This is a somewhat open ended exercise. The first one you encounter in this labs, but now you are grown up. We ask you to _pull_ data from an API of your choice, do some simple wrangling (and some super simple analysis if you really want) and visualize your result.

The focus of the exercise is on the interaction with the API you chose, not so much the visualization. We would like to see that you did some complex query in a programmatic fashion (that is, not by hand-writing the query but using a function to do that for you).

Some possible APIs you can use are (in order of what-Giulio-likes):

* digitalnz [API](https://digitalnz.org/developers) get cultural, education and gov data from NZ. It contains many different API. Some of them require authentification and/or subscription (a good one is the DOC [campsites and huts](https://api.doc.govt.nz) api).
* [Geonet](https://api.geonet.org.nz/) for geological hazard information
* [Trademe](https://developer.trademe.co.nz/) very nice one, but not supereasy to get the autentification done (this [queries](https://developer.trademe.co.nz/api-reference/catalogue-methods/) do not require auth).
* all of thise [references](https://www.programmableweb.com/category/reference/api) APIs are (probably, I did not check ALL of them) good examples.
* [GDELT](https://blog.gdeltproject.org/gdelt-geo-2-0-api-debuts/) This is a rewarding, yet **tough**, API. Take a look at [leaflet](https://rstudio.github.io/leaflet/json.html) for an idea on how to use what you get back from GDELT. Also, Alex Bresler has started working on an [R wrapper](https://github.com/abresler/gdeltr2) that may get you inspired.
* [car2go](https://github.com/car2go/openAPI) (requires [registration](https://github.com/car2go/openAPI/wiki/Access-protected-Functions-via-OAuth-1.0#registration-as-consumer))
* [Quandl](https://www.quandl.com/docs/api)
* [Lufthansa](https://developer.lufthansa.com/docs) most methods are open, so good one if you don't want to deal with authentification.

##### Let's do the JSON dance

Many website returns JSON files when you query them. Json is similar to the XML (of which HTML is an example) in that it is a **tree** (not tabular nor relational) data format. Roughly speaking, it is a list of lists of lists of ... `purrr` is very handy when you want to extract information from a JSON file. But do see also the `jsonlite` package, [intro here](https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html). If you are lost trying to wrangle the results you get from these APIs, consider working through Jenny Brian's [tutorial on purrr](https://jennybc.github.io/purrr-tutorial/ex26_ny-food-market-json.html).

(this is it for this week, we will speak about databases next week)