# PC Session 2

**Authors:**
Jonathan Chassot, [Helge Liebert](https://hliebert.github.io/), and [Anthony Strittmatter](http://www.anthonystrittmatter.com)

# Web APIs

## Requirements

In [1]:
## Libraries
library(xml2)
library(jsonlite)
library(httr)
library(rvest)

## Examples

You query APIs by sending a HTTP GET request for the data you desire. The request is specified using a *unique resource identifier* (URI) string. The process is similiar to how you specify a URL (*uniform resource locator*) when visiting a website using your computer's browser. For specific information on how to construct the API query, consult the documentation of the specific API you want to use. 

This tutorial mostly relies on the `kiva.org` API, a crowdfunding site for small business loans for entrepreneurs in developing countries. The link to the API documentation is given below.

In [2]:
## Request specific info from KIVA API

## Examples
## https://build.kiva.org/api
## https://build.kiva.org/docs/getting_started


## Get the 20 most recent loans
newloans <- fromJSON("https://api.kivaws.org/v1/loans/newest.json")
head(newloans)

ERROR: Error in open.connection(con, "rb"): Could not resolve host: api.kivaws.org


Simple queries can be immediately passed to the `jsonlite::fromJSON()` function. You can vary the method of your request and specify additional parameters to narrow down your request. Methods and parameters are listed in the documentation. 

In [7]:
## Only (sector==Agriculture), returned in html, look at in your browswer
## https://api.kivaws.org/v1/loans/search.html?sector=Agriculture

## Only (sector == Agriculture) & (country == Vietnam)
vnagsectorloans <- fromJSON("https://api.kivaws.org/v1/loans/search.json?sector=Agriculture&country=VN")
head(vnagsectorloans)

ERROR while rich displaying an object: Error in names(parts) <- colnames(x): Attribut 'names' [4] muss dieselbe Länge haben wie der Vektor [2]

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     if (!mime %in% names(repr::mime2repr)) 
 .         stop("No repr_* for mimetype ", mime, " in repr::mime2repr")
 .     rpr <- repr::mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. repr::mime2rep

$paging
$paging$page
[1] 1

$paging$total
[1] 409634

$paging$page_size
[1] 20

$paging$pages
[1] 20482


$loans
        id              name languages      status funded_amount basket_amount
1  1687502      Mary's Group        en fundraising            50             0
2  1687503    Robert's Group        en fundraising             0             0
3  1687479     Norah's Group        en fundraising            50             0
4  1687497   Leonard's Group        en fundraising            50             0
5  1687498   Margret's Group        en fundraising             0             0
6  1687499      Jane's Group        en fundraising             0             0
7  1687494             Chren        en fundraising             0             0
8  1687482     Henry's Group        en fundraising             0             0
9  1680232              Jane        en fundraising             0             0
10 1687480    Stella's Group        en fundraising             0             0
11 1687477      Ma

In [8]:
## All lenders for a particular loan id
loans <- fromJSON("http://api.kivaws.org/v1/loans/38239/lenders.json")
str(loans)

List of 2
 $ paging :List of 4
  ..$ page     : int 1
  ..$ total    : int 31
  ..$ page_size: int 50
  ..$ pages    : int 1
 $ lenders:'data.frame':	31 obs. of  6 variables:
  ..$ lender_id   : chr [1:31] "carol1867" "liliane9851" "alancheuk" "larah4877" ...
  ..$ name        : chr [1:31] "Carol" "Liliane" "Alan" "Larah" ...
  ..$ image       :'data.frame':	31 obs. of  2 variables:
  .. ..$ id         : int [1:31] 189255 167024 2161104 881365 726677 726677 726677 383590 726677 726677 ...
  .. ..$ template_id: int [1:31] 1 1 1 1 1 1 1 1 1 1 ...
  ..$ whereabouts : chr [1:31] "Dallas TX" "Dallas TX" "Vancouver British Columbia" "New York NY" ...
  ..$ country_code: chr [1:31] "US" "US" "CA" "US" ...
  ..$ uid         : chr [1:31] "carol1867" "liliane9851" "alancheuk" "larah4877" ...


JSON (*Java Script Object Notation*), like XML (*Extended Markup Language*), is a tree-like nested data format. Both data types are popular response formatsfor APIs. JSON was explicitly developed for this purpose. 

In [28]:
toJSON(loans, pretty = TRUE)

{
  "paging": {
    "page": [1],
    "total": [31],
    "page_size": [50],
    "pages": [1]
  },
  "lenders": [
    {
      "lender_id": "carol1867",
      "name": "Carol",
      "image": {
        "id": 189255,
        "template_id": 1
      },
      "whereabouts": "Dallas TX",
      "country_code": "US",
      "uid": "carol1867"
    },
    {
      "lender_id": "liliane9851",
      "name": "Liliane",
      "image": {
        "id": 167024,
        "template_id": 1
      },
      "whereabouts": "Dallas TX",
      "country_code": "US",
      "uid": "liliane9851"
    },
    {
      "lender_id": "alancheuk",
      "name": "Alan",
      "image": {
        "id": 2161104,
        "template_id": 1
      },
      "whereabouts": "Vancouver British Columbia",
      "country_code": "CA",
      "uid": "alancheuk"
    },
    {
      "lender_id": "larah4877",
      "name": "Larah",
      "image": {
        "id": 881365,
        "template_id": 1
      },
      "whereabouts": "New York NY",
      "coun

Using the `flatten = TRUE` option, or the `jsonlite::flatten()` function will simplify the structure of some nested elements.

In [None]:
## Simplify structure
loans <- fromJSON("http://api.kivaws.org/v1/loans/38239/lenders.json",
                  flatten = TRUE)
str(loans)
loans <- as.data.frame(loans$loans)
toJSON(loans, pretty = TRUE)

The Kiva API also has a method that lists all available methods. The API is well documented on their website. 

In [31]:
## List of all API methods
methods <- fromJSON("https://api.kivaws.org/v1/methods.json", flatten = TRUE)
methods

id,http_method,uri,argument_count,status,oauth,documentation.en.summary
GET*|methods,GET,/methods,1,production,,Lists all methods available via the Kiva API.
GET*|methods|:ids,GET,/methods/:ids,2,production,,Returns documentation for specific methods of the Kiva API.
GET*|lending_actions|recent,GET,/lending_actions/recent,1,production,,Lists the 100 most recent loans made on Kiva by public lenders.
GET*|loans|search,GET,/loans/search,22,production,,Search and sort loan listings by multiple criteria.
GET*|loans|newest,GET,/loans/newest,4,production,,Returns a simple list of the most recent fundraising loans.
GET*|loans|:ids,GET,/loans/:ids,2,production,,Returns detailed information for multiple loans.
GET*|lenders|:lender_id|loans,GET,/lenders/:lender_id/loans,5,production,,Returns loans belonging to a particular lender.
GET*|teams|:id|loans,GET,/teams/:id/loans,5,production,,Returns loans belonging to a particular team.
GET*|loans|:id|repayments,GET,/loans/:id/repayments,3,beta,,"Returns the expected repayment schedule for a loan, only for fund raising loans."
GET*|my|account,GET,/my/account,1,production,True,Returns private account information for the Kiva User as authorized.


## Request specific info from KIVA API

The code in the next block constructs a query string from different string variables as the base components.

In [15]:
## Parameters
baseurl <- "https://api.kivaws.org/v1/"
method <- "loans/search.json?"
## method <- "loans/search.xml?"
## method <- "loans/search.html?"
country <- "VN,KH"
sector <- "Agriculture"
type <- "individuals"
status <- "funded"
sortby <- "newest"

## Construct URL
query <- paste0("country_code=", country, "&",
                "sector=", sector, "&",
                "borrower_type=", type, "&",
                "status=", status, "&",
                "sort_by=", sortby)

uri <- paste0(baseurl, method, query )
uri

Sometimes you may need to construct the request more explicitly. This is also useful for catchig errors when embedding your requests in a program.

In [16]:
## Send HTTP GET request, handle response content, library(httr)
response <- GET(uri)
response
if (response$status_code == 200) {
    jsontable <- content(response, as = "text")
} else {
    stop("HTTP response not OK!")
}
jsontable

Response [https://api.kivaws.org/v1/loans/search.json?country_code=VN,KH&sector=Agriculture&borrower_type=individuals&status=funded&sort_by=newest]
  Date: 2019-01-18 17:51
  Status: 200
  Content-Type: application/json; charset=UTF-8
  Size: 13.7 kB


The API only returns a single page consisting of 20 loans per request. To get more, you need to request additional pages. The metadata is returned in the `paging` list of the returned object. The loans are contained in the `loans` element.

In [17]:
## Parse json data
data <- fromJSON(jsontable, flatten = TRUE)
str(data)
names(data)
data$paging
data <- data$loans
head(data)
dim(data)

List of 2
 $ paging:List of 4
  ..$ page     : int 1
  ..$ total    : int 18706
  ..$ page_size: int 20
  ..$ pages    : int 936
 $ loans :'data.frame':	20 obs. of  25 variables:
  ..$ id                      : int [1:20] 1669243 1668812 1687507 1681736 1676989 1669603 1673510 1679375 1671900 1679344 ...
  ..$ name                    : chr [1:20] "Le" "Ba" "Sorn" "Thao" ...
  ..$ status                  : chr [1:20] "funded" "funded" "funded" "funded" ...
  ..$ funded_amount           : int [1:20] 900 900 1025 450 875 650 875 450 1550 1025 ...
  ..$ activity                : chr [1:20] "Livestock" "Farming" "Farming" "Poultry" ...
  ..$ sector                  : chr [1:20] "Agriculture" "Agriculture" "Agriculture" "Agriculture" ...
  ..$ themes                  :List of 20
  .. ..$ : chr [1:2] "Job Creation" "Growing Businesses"
  .. ..$ : chr [1:2] "Job Creation" "Growing Businesses"
  .. ..$ : NULL
  .. ..$ : chr "Vulnerable Groups"
  .. ..$ : NULL
  .. ..$ : chr "Vulnerable Groups"


id,name,status,funded_amount,activity,sector,themes,use,partner_id,posted_date,⋯,tags,description.languages,image.id,image.template_id,location.country_code,location.country,location.town,location.geo.level,location.geo.pairs,location.geo.type
1669243,Le,funded,900,Livestock,Agriculture,"Job Creation , Growing Businesses",to expand the cage and buy more buffaloes for raising.,394,2019-01-17T17:40:08Z,⋯,"user_favorite, #Animals , #Elderly",en,3029213,1,VN,Vietnam,"Le Thuy, Quang Binh",town,17.106491 106.676292,point
1668812,Ba,funded,900,Farming,Agriculture,"Job Creation , Growing Businesses","to invest in irrigation systems and some cultivars such as bananas, oranges and mangoes.",394,2019-01-17T17:40:07Z,⋯,"user_favorite, #Elderly",en,3028717,1,VN,Vietnam,"Quang Ninh, Quang Binh",town,17.239458 106.461625,point
1687507,Sorn,funded,1025,Farming,Agriculture,,to pay for additional seedlings and fertilizer.,9,2019-01-15T11:40:02Z,⋯,"user_favorite , #Parent , #Schooling , #Biz Durable Asset, #Trees",en,3052460,1,KH,Cambodia,Kampong Cham,town,12 105.5,point
1681736,Thao,funded,450,Poultry,Agriculture,Vulnerable Groups,to purchase baby poultry to raise and sell in the future.,121,2019-01-09T12:20:03Z,⋯,"user_favorite, #Animals , #Widowed , #Elderly",en,3040939,1,VN,Vietnam,Thanh Hoá,town,19.806692 105.785182,point
1676989,Quyên,funded,875,Poultry,Agriculture,,to purchase baby poultry to raise and sell in future.,121,2019-01-08T12:40:04Z,⋯,"#Woman Owned Biz, #Animals , #Parent , #Repeat Borrower",en,3045817,1,VN,Vietnam,Thanh Hoá,town,19.806692 105.785182,point
1669603,Thúy,funded,650,Cattle,Agriculture,Vulnerable Groups,to purchase baby cattle to raise and sell in the future.,121,2019-01-07T10:20:03Z,⋯,"user_favorite , #First Loan , #Woman Owned Biz, #Animals",en,3045061,1,VN,Vietnam,01 Như Thanh,town,19.636971 105.577476,point


Again, we can also simply pass the URI directly.

In [21]:
## Even more simple, pass URI directly
data <- fromJSON(uri, flatten = TRUE)
data <- data$loans
#str(data)
names(data)
dim(data)

There are a few nested list elements in the returned data.table that we need to flatten. This lambda function transforms them to simple string columns.

In [20]:
## Nested elements need to be flattened
data$tags <- sapply(data$tags, function(x) paste(unlist(x), collapse = ", "))
data$themes <- sapply(data$themes, function(x) paste(unlist(x), collapse = ", "))
data$description.languages <- sapply(data$description.languages, function(x) paste(unlist(x), collapse = ", "))
head(data)                                    

id,name,status,funded_amount,activity,sector,themes,use,partner_id,posted_date,⋯,tags,description.languages,image.id,image.template_id,location.country_code,location.country,location.town,location.geo.level,location.geo.pairs,location.geo.type
1669243,Le,funded,900,Livestock,Agriculture,"Job Creation, Growing Businesses",to expand the cage and buy more buffaloes for raising.,394,2019-01-17T17:40:08Z,⋯,"user_favorite, #Animals, #Elderly",en,3029213,1,VN,Vietnam,"Le Thuy, Quang Binh",town,17.106491 106.676292,point
1668812,Ba,funded,900,Farming,Agriculture,"Job Creation, Growing Businesses","to invest in irrigation systems and some cultivars such as bananas, oranges and mangoes.",394,2019-01-17T17:40:07Z,⋯,"user_favorite, #Elderly",en,3028717,1,VN,Vietnam,"Quang Ninh, Quang Binh",town,17.239458 106.461625,point
1687507,Sorn,funded,1025,Farming,Agriculture,,to pay for additional seedlings and fertilizer.,9,2019-01-15T11:40:02Z,⋯,"user_favorite, #Parent, #Schooling, #Biz Durable Asset, #Trees",en,3052460,1,KH,Cambodia,Kampong Cham,town,12 105.5,point
1681736,Thao,funded,450,Poultry,Agriculture,Vulnerable Groups,to purchase baby poultry to raise and sell in the future.,121,2019-01-09T12:20:03Z,⋯,"user_favorite, #Animals, #Widowed, #Elderly",en,3040939,1,VN,Vietnam,Thanh Hoá,town,19.806692 105.785182,point
1676989,Quyên,funded,875,Poultry,Agriculture,,to purchase baby poultry to raise and sell in future.,121,2019-01-08T12:40:04Z,⋯,"#Woman Owned Biz, #Animals, #Parent, #Repeat Borrower",en,3045817,1,VN,Vietnam,Thanh Hoá,town,19.806692 105.785182,point
1669603,Thúy,funded,650,Cattle,Agriculture,Vulnerable Groups,to purchase baby cattle to raise and sell in the future.,121,2019-01-07T10:20:03Z,⋯,"user_favorite, #First Loan, #Woman Owned Biz, #Animals",en,3045061,1,VN,Vietnam,01 Như Thanh,town,19.636971 105.577476,point


## Simple script to collect more information

This script reads the metadata and interates over all pages to get all data for a specific search query.

We first set the parameters, then request info. Response tables have a fixed pagelength, so you need to send multiple requests, iterating over the page numbers you request. 

In [24]:
## Get all data, multiple requests, iterate over pages

## Note: very simple proof of concept
## (should check http response for error and have better tests)
## (more efficient to large queries to file immediately)

## Parameters
baseurl <- "https://api.kivaws.org/v1/"
method  <- "loans/search.json?"
country <- "VN"
sector  <- "Agriculture"
type    <- "individuals"
status  <- "funded"
sortby  <- "oldest" # (o/w duplicates may occur when new entries are added)
pagelength <- 20 # max page length allowed is 500

## Construct URL
query <- paste0("country_code=", country, "&",
                "sector=", sector, "&",
                "borrower_type=", type, "&",
                "status=", status, "&",
                ## "per_page=", pagelength, "&"
                "sort_by=", sortby)
uri <- paste0(baseurl, method, query)

## Get maxpagenumber and other information for iteration
response <- fromJSON(uri, flatten = TRUE)
response$paging
maxpages <- response$paging$pages
records  <- response$paging$total
columns  <- ncol(response$loans)

## Open csv, write header
header <- names(response$loans)
write.table(t(header), file = "Data/kiva.csv", sep = ";",
            col.names = FALSE, row.names = FALSE)

# Or collect in data frame (don't do this for large jobs)
## data <- data.frame(matrix(nrow = 0, ncol = columns))
## names(data) <- header

## Simple helper function to flatten columns
unnest <- function(col) paste(unlist(col), collapse = ", ")


## Iterate over pages, limit to first three for test
for (p in seq(1, maxpages, by = 1)[1:3]) {

    ## Info
    print(paste0(p, "/", maxpages))

    ## Append page to uri
    pquery <- paste0(uri, "&page=", p)

    ## Get data, assert completeness
    loans <- fromJSON(pquery, flatten = TRUE)$loans
    stopifnot(nrow(loans) == pagelength)
    stopifnot(ncol(loans) == columns)

    ## Fix nested list columns
    loans$tags <- sapply(loans$tags, unnest)
    ## loans$themes <- sapply(loans$themes, unnest) # missing for older records
    loans$description.languages <- sapply(loans$description.languages, unnest)
 
    ## Collect loans in data frame
    ## data <- rbind(data, loans)

    ## Better to append to file
    write.table(loans, "Data/kiva.csv", sep = ";", append = TRUE,
                col.names = FALSE, row.names = FALSE)

}

head(data)
dim(data)

[1] "1/169"
[1] "2/169"
[1] "3/169"


id,name,status,funded_amount,activity,sector,use,partner_id,posted_date,loan_amount,⋯,tags,description.languages,image.id,image.template_id,location.country_code,location.country,location.town,location.geo.level,location.geo.pairs,location.geo.type
49518,Thi Tinh,funded,900,Animal Sales,Agriculture,Chăn nuôi và buôn bán.,85,2008-05-23T01:30:10Z,900,⋯,,vi,162791,1,VN,Vietnam,Bac Ninh,town,21.121444 106.11105,point
49513,Thi Duyen,funded,275,Agriculture,Agriculture,Business and agriculture.,85,2008-05-29T18:18:50Z,275,⋯,,vi,162782,1,VN,Vietnam,Bac Ninh,town,21.121444 106.11105,point
49508,Thi Thuy Ha,funded,175,Agriculture,Agriculture,Agriculture.,85,2008-06-01T16:10:08Z,175,⋯,,vi,162774,1,VN,Vietnam,Bac Ninh,town,21.121444 106.11105,point
49514,Thi Tien,funded,150,Agriculture,Agriculture,Breeding.,85,2008-06-01T16:10:08Z,150,⋯,,vi,162783,1,VN,Vietnam,Bac Ninh,town,21.121444 106.11105,point
49515,Thi Meo,funded,150,Agriculture,Agriculture,Agriculture.,85,2008-06-01T16:10:08Z,150,⋯,,vi,162784,1,VN,Vietnam,Bac Ninh,town,21.121444 106.11105,point
58190,Ba,funded,75,Poultry,Agriculture,Raise chicken.,41,2008-07-27T20:50:17Z,75,⋯,,vi,184979,1,VN,Vietnam,Ham Thuan Nam,town,10.850294 107.905781,point


This is an example using another API from the `theyworkforyou.com` API. The website provides data about UK politics and parliament. You need to request an API key for authorization to use their site. The basic plan is free for educational or charitable purposes.

In [46]:
## TheyWorkForYou.com Example
apikey <- "G3WVqtBtKAbdGVqrd8BKajm8"
base <- "https://www.theyworkforyou.com/api/"
format <- "js"
func <- "getMPs?"
query <- paste0("&", "key=", apikey, "&", "output=", format)
uri <- paste0(base, func, query)
uri
## listofmps <- fromJSON(uri) # problem with encoding, maybe xml is better
response <- GET(uri)
response <- content(response, as = "raw")
listofmps <- fromJSON(rawToChar(response))
head(listofmps)


'data.frame':	650 obs. of  6 variables:
 $ member_id   : chr  "41371" "41372" "41373" "41374" ...
 $ person_id   : chr  "24709" "24807" "24710" "10069" ...
 $ name        : chr  "Bridget Phillipson" "Chi Onwurah" "Julie Elliott" "Nick Brown" ...
 $ party       : chr  "Labour" "Labour" "Labour" "Labour" ...
 $ constituency: chr  "Houghton and Sunderland South" "Newcastle upon Tyne Central" "Sunderland Central" "Newcastle upon Tyne East" ...
 $ office      :List of 650
  ..$ :'data.frame':	5 obs. of  4 variables:
  .. ..$ dept     : chr  "Speaker's Committee on the Electoral Commission" "Public Accounts Committee" "Committee on Privileges" "Committee on Standards" ...
  .. ..$ position : chr  "Member" "Member" "Member" "Member" ...
  .. ..$ from_date: chr  "2017-07-10" "2017-09-11" "2017-10-26" "2017-10-26" ...
  .. ..$ to_date  : chr  "9999-12-31" "9999-12-31" "9999-12-31" "9999-12-31" ...
  ..$ :'data.frame':	1 obs. of  4 variables:
  .. ..$ dept     : chr ""
  .. ..$ position : chr "S

member_id,person_id,name,party,constituency,office
41371,24709,Bridget Phillipson,Labour,Houghton and Sunderland South,"Speaker's Committee on the Electoral Commission, Public Accounts Committee , Committee on Privileges , Committee on Standards , European Statutory Instruments Committee , Member , Member , Member , Member , Member , 2017-07-10 , 2017-09-11 , 2017-10-26 , 2017-10-26 , 2018-07-18 , 9999-12-31 , 9999-12-31 , 9999-12-31 , 9999-12-31 , 9999-12-31"
41372,24807,Chi Onwurah,Labour,Newcastle upon Tyne Central,", Shadow Minister (Department for Business, Energy and Industrial Strategy) (Industrial Strategy), 2016-10-10 , 9999-12-31"
41373,24710,Julie Elliott,Labour,Sunderland Central,"Digital, Culture, Media and Sport Committee, Regulatory Reform Committee , Member , Member , 2017-09-11 , 2017-11-06 , 9999-12-31 , 9999-12-31"
41374,10069,Nick Brown,Labour,Newcastle upon Tyne East,", Public Accounts Commission , Opposition Chief Whip (Commons), Member , 2016-10-06 , 2017-11-16 , 9999-12-31 , 9999-12-31"
41375,24870,Justin Tomlinson,Conservative,North Swindon,", The Parliamentary Under-Secretary of State for Work and Pensions, 2018-07-09 , 9999-12-31"
41376,11592,Sharon Hodgson,Labour,Washington and Sunderland West,", Shadow Minister (Public Health), 2018-01-09 , 9999-12-31"
