# Chapter 23 - first half

The main goals of this inclass activity are

- get practice with curl in the terminal
- learn how to use curl in an ipynb setting, with %%bash
- practice with learning about APIs
- practice with GET requests and file extensions
- practice with POST

In the reading, you learned about APIs for GitHub and TMDB. With this activity and the homework you'll learn about Kiva loans.

## Understanding Paginated Results

**Step 1**

1. Read the Terms of Use Agreement: https://www.kiva.org/legal/terms
2. Read the Code of Conduct: https://www.kiva.org/build/code-of-conduct
3. Study the API: we've given you the info you need to avoid this in the current worksheet

The API has changed in the past year to become more complicated, but you can still access data from Kiva using the old API approach. That old approach is decribed at the following link. Recall, we learned about the Way Back Machine in the reading questions:

https://web.archive.org/web/20190629032937/https://build.kiva.org/api

**Step 2**

Start playing with how to get data from http://api.kivaws.org

API methods can be tested easily with most any browser. As an example, try out the loans/search method using HTML output:

    http://api.kivaws.org/v1/loans/search.html?status=fundraising

API calls with the .html extension are designed for testing or debugging. If the browser or tool you are using easily supports viewing XML output you might try using the .xml extension instead:

    http://api.kivaws.org/v1/loans/search.xml?status=fundraising

Try changing up some of the parameters and see how the search results change. What URL would you use to access the same data in JSON format?

*Solution cell*

    http://api.kivaws.org/v1/loans/search.json?status=fundraising


From the reading, it should be clear to you that we are using a *query parameter* corresponding to a Python dictionary `{'status':'fundraising'}`. Just like in the reading, we can use `&` to provide the URI with multiple query parameters at once. Here are two more that we commonly use (and that come up on your homework):

- page
  -  A number for the page of data to return (results are segmented into pages).
- per_page
  -  A number telling how many results per page you want to see

Both of these parameters will hopefully be familiar to you from times you have used a search engine like Google.

In our example link above, the default is to take you to Page 1 (out of 145 at the time of writing):

    http://api.kivaws.org/v1/loans/search.html?status=fundraising

You can see "Page 1 out of 145" in the top line of the results. Note that, by the time you go to this link, it might have changed, if more loans were made and the data source updated correspondingly.

To get the second page of results instead of the first, you can add a query parameter using `page`:

    http://api.kivaws.org/v1/loans/search.html?status=fundraising&page=2
    
Note that the first line now says "Page 2 out of 145". You can also change how many results you want to see per page, just like a search engine. For example, to see 100 results per page, you would do:

    http://api.kivaws.org/v1/loans/search.html?status=fundraising&per_page=100

Note that when showing 100 results per page, you only need 29 pages to get through all the results, instead of 145. How would you modify this URI to show you the third page, with 50 pages per day?

*Solution cell*



Here are some more parameters that the loans/search method can take:

- status
  -  Any of: fundraising,funded,in_repayment,paid,defaulted
- gender
  -  Any of: male,female
- sector
  -  Matches against a sector name such as agriculture
- region
  -  Any of: na,ca,sa,af,as,me,ee
- country_code
  -  Matches a two-digit ISO country code.
- partner
  -  Matches one or more partner IDs.
- q
  -  A general search string to match against various properties of loans
- sort_by
  -  Any of: popularity,loan_amount,oldest,expiration,newest, amount_remaining,repayment_term

Here's how you'd make a request for all loans in Cambodia or Mongolia that are actively paying back, sorted by the amount of the loan:

    http://api.kivaws.org/v1/loans/search.html?country_code=kh,mn&sort;_by=loan_amount&status;=in_repayment

Please come up with three more examples that make use of the first eight parameters above (that is, everything except `page`).

*Solution cell*


*Solution cell*


*Solution cell*


## Curl

In all the examples above, you needed to physically copy and paste the URLs into a web browser to test your results. This is obviously terrible from a computer science perspective.

One step in the right direction (to at least have a reproducible workflow all in one document) is to use `curl`. For example, with the first link given above, we have

    curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising

We can run this command in Jupyter via the following cell:

In [3]:
%%bash

curl --get --url http://api.kivaws.org/v1/loans/search.html?status=fundraising

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
	<title>Kiva API: v1/loans/search.html</title>
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
	<link rel="shortcut icon" href="favicon.ico" type="image/x-icon" />
	<style type="text/css">
	body {
		background-color: #CCCCCC;
		color: #333333;
		font: .875em/1.357 arial,helvetica,sans-serif;
	}

	h1, h2, h3, h4 {
		color: #4B9123;
	}

	table {
		width: 100%;
		min-width: 500px;
		max-width: 1500px;
	}
	tr:nth-child(even) td {
		background: #C3DCAD;
	};

	tr:nth-child(odd) td {
		background: #FFFFFF;
	}

	#content {
		background-color: #FFFFFF;
		border: 1px solid #DDDDDD;
		padding: 5px 10px;
	}
	#footer {
		font-size: 90%;
		margin-top: 10px;
		color: #888888;
	}

	</style>
	</head>
<body>
	<div id="header">
	</div>

		
	<div id="content">
	
Page 1 out of 145 (2889 total results)
<table>
	<h2>loans</h2>


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5406  100  5406    0     0  10142      0 --:--:-- --:--:-- --:--:-- 10142


Please use three `%%bash` cells below to run `curl` commands for the three links provided above in the examples about the query parameters `page=` and `per_page=`.

In [None]:
%%bash

### BEGIN SOLUTION 
curl 
### END SOLUTION


In [None]:
%%bash

### BEGIN SOLUTION 

### END SOLUTION


In [None]:
%%bash

### BEGIN SOLUTION 

### END SOLUTION


## Programmatic

While `curl` is fun and powerful, we have not illustrated yet how to actually store the results of a `curl` GET command for use in a program. Thankfully, we previously learned how to get results from the web into Python native data structures using the `requests` module. Please run the following cell.

In [5]:
import requests
import json
import io
from lxml import etree

We return now to our Kiva loan problem.

**Step 3**

Write, in a global cell, the programmatic way to get the data as xml and yields the root Element of the xml tree.

In [25]:
url = "http://api.kivaws.org/v1/loans/search.xml"
searchTerms = {'status': 'fundraising'}
resp = requests.get(url, params=searchTerms)
print(resp.status_code)

200


In [26]:
xmldata = etree.parse(io.BytesIO(resp.content)).getroot()

In [7]:
for child in xmldata:
    print(child.tag, child.attrib, child.text)

paging {} None
loans {'type': 'list'} None


**Step 4** Make the above a **function** `getRootKivas()` with no parameters that returns a Python data structure, or `None`, if there was a problem.

In [None]:
### BEGIN SOLUTION 

### END SOLUTION

**Step 5** Refine your function to take and use a **page** parameter, so `getRootKivas(p)` gives the results from page `p`. Please solve this by creating a URI with a `?` and bringing `p` into a format string.

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p):
    endpoint = 'loans/newest.json?page={}'
    kiva_api = 'http://api.kivaws.org/v1/'
    
    r = requests.get(kiva_api + endpoint.format(p))
    if r.status_code == 200:
        newest_page = json.loads(r.text)
        print(newest_page['paging'])
    else:
        print("Error getting page")
    
    return r
### END SOLUTION

In [None]:
r = getRootKivas(42)
r.request.url

**5b** Now solve the same problem by passing a dictionary to `get`. Here is an example. Please write a general function.

In [None]:
kiva_api = 'http://api.kivaws.org/v1/'
endpoint = 'loans/newest.json'
getargs = {'page': 42}
r = req.get(kiva_api + endpoint, params=getargs)
n = json.loads(r.text)
print(n['paging'])
print()
print(n['loans'][0])

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p):
    endpoint = 'loans/newest.json'
    kiva_api = 'http://api.kivaws.org/v1/'
    getargs = {'page': p}
    r = requests.get(kiva_api + endpoint, params=getargs)
    if r.status_code == 200:
        newest_page = json.loads(r.text)
        #print(newest_page['paging'])
    else:
        return None
    
    return r
### END SOLUTION

**Step 6** Refine your function further to take and use a **per_page** parameter, so `getRootKivas(p,n)` gives the results from page `p`, when each page has `n` results.

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p,n):
    endpoint = 'loans/newest.json'
    kiva_api = 'http://api.kivaws.org/v1/'
    getargs = {'page': p, 'per_page':n}
    r = requests.get(kiva_api + endpoint, params=getargs)
    if r.status_code != 200:
        return None
    
    return r
### END SOLUTION

**Step 7** Refine your function further to take and use a **sector** parameter, so `getRootKivas(p,n,s)` gives the results from page `p`, when each page has `n` results, and the sector is `s` (e.g., 'Agriculture').

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p,n,s):
    endpoint = 'loans/newest.json'
    kiva_api = 'http://api.kivaws.org/v1/'
    getargs = {'page': p, 'per_page':n, 'sector': s}
    r = requests.get(kiva_api + endpoint, params=getargs)
    if r.status_code != 200:
        return None
    
    return r
### END SOLUTION


In [None]:
# Testing cell
r = getRootKivas(2,10,'agriculture')
n = json.loads(r.text)
print(n['paging'])

**Step 8** Refine your function further to take and use **theme** and **status** parameters, so `getRootKivas(p,n,sec,theme,stat)` gives the results from page `p`, when each page has `n` results, and the sector is `sec`, and the theme is `theme` (e.g., 'Higher Eduction') and the status is `stat` (e.g. 'funded').

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p,n,sec,theme,stat):
    endpoint = 'loans/newest.json'
    kiva_api = 'http://api.kivaws.org/v1/'
    getargs = {'page': p, 'per_page':n, 'sector': sec, 'themes': theme, 'status': stat}
    r = requests.get(kiva_api + endpoint, params=getargs)
    if r.status_code != 200:
        return None
    
    return r
### END SOLUTION

In [None]:
# Testing cell
r = getRootKivas(1,10,'agriculture','Higher Education','funded')
n = json.loads(r.text)
print(n['paging'])

**Step 9** Refine your function further to take a parameter for the endpoint_type. In all the examples above, it was `newest`, but poking around on the Kiva website, `search` would have also worked. Think about what other types work.

In [None]:
### BEGIN SOLUTION 
def getRootKivas(p,n,sec,theme,stat,endpoint_type):
    endpoint = 'loans/{}.json'.format(endpoint_type)
    kiva_api = 'http://api.kivaws.org/v1/'
    getargs = {'page': p, 'per_page':n, 'sector': sec, 'themes': theme, 'status': stat}
    r = requests.get(kiva_api + endpoint, params=getargs)
    if r.status_code != 200:
        return None
    
    return r
### END SOLUTION

**Step 10** Can you make your function even more general? For example, every invocation above seeks data along the path `http://api.kivaws.org/v1/loans/`. What else in that path can be modified to be a parameter?

In [None]:
### BEGIN SOLUTION 

### END SOLUTION

## A more in-depth example

In the example above, we assumed the data would come to us in XML form. We now generalize that, then show how to build in query parameters.

**Q1** Please write a function 

    kiva_newest(baseurl, apiobject, method, form = 'json')
    
that takes four string parameters (where the fourth is optional), builds a correct URL, and executes a `requests.get`, returning the result (or `None` if there is a problem). The fourth parameter is the format of the data. Please refer to the examples above.

In [7]:
### BEGIN SOLUTION 
def kiva_newest(baseurl, apiobject, method, form = 'json'):
    """
    Parameters: 
    Return: response object
    """
    url = "{}/{}/{}.{}".format(baseurl, apiobject, method, form)
    r = requests.get(url)
    return r
### END SOLUTION

In [13]:
# Testing cell

baseurl = "https://api.kivaws.org/v1"
apiobject = "loans"
method = "newest"

resp = kiva_newest(baseurl, apiobject, method)
print(resp.text)


{"paging":{"page":1,"total":2881,"page_size":20,"pages":145},"loans":[{"id":1961740,"name":"Emarencia","description":{"languages":["en"]},"status":"fundraising","funded_amount":25,"basket_amount":0,"image":{"id":3443346,"template_id":1},"activity":"Farm Supplies","sector":"Agriculture","use":"to access premium seeds and high quality fertilizer for one acre of maize, in addition to advice and insurance, optimizing for increased productivity and profits.","location":{"country_code":"KE","country":"Kenya","town":"Kabondo West, Homa Bay","geo":{"level":"town","pairs":"-0.44775 34.891382","type":"point"}},"partner_id":596,"posted_date":"2020-04-23T22:50:08Z","planned_expiration_date":"2020-06-07T22:50:08Z","loan_amount":150,"borrower_count":1,"lender_count":1,"bonus_credit_eligibility":false,"tags":[]},{"id":1961737,"name":"Agnaeta","description":{"languages":["en"]},"status":"fundraising","funded_amount":0,"basket_amount":0,"image":{"id":3500446,"template_id":1},"activity":"Bricks","sector

In [14]:
resp = kiva_newest(baseurl, apiobject, method,'json')
print(resp.text)


{"paging":{"page":1,"total":2881,"page_size":20,"pages":145},"loans":[{"id":1961740,"name":"Emarencia","description":{"languages":["en"]},"status":"fundraising","funded_amount":25,"basket_amount":0,"image":{"id":3443346,"template_id":1},"activity":"Farm Supplies","sector":"Agriculture","use":"to access premium seeds and high quality fertilizer for one acre of maize, in addition to advice and insurance, optimizing for increased productivity and profits.","location":{"country_code":"KE","country":"Kenya","town":"Kabondo West, Homa Bay","geo":{"level":"town","pairs":"-0.44775 34.891382","type":"point"}},"partner_id":596,"posted_date":"2020-04-23T22:50:08Z","planned_expiration_date":"2020-06-07T22:50:08Z","loan_amount":150,"borrower_count":1,"lender_count":1,"bonus_credit_eligibility":false,"tags":[]},{"id":1961737,"name":"Agnaeta","description":{"languages":["en"]},"status":"fundraising","funded_amount":0,"basket_amount":0,"image":{"id":3500446,"template_id":1},"activity":"Bricks","sector

In [15]:
resp = kiva_newest(baseurl, apiobject, method,'xml')
print(resp.text)


<?xml version="1.0" encoding="UTF-8" ?><response>
<paging><page>1</page><total>2881</total><page_size>20</page_size><pages>145</pages></paging><loans type="list"><loan><id>1961740</id><name>Emarencia</name><description><languages type="list"><language>en</language></languages></description><status>fundraising</status><funded_amount>25</funded_amount><basket_amount>0</basket_amount><image><id>3443346</id><template_id>1</template_id></image><activity>Farm Supplies</activity><sector>Agriculture</sector><use>to access premium seeds and high quality fertilizer for one acre of maize, in addition to advice and insurance, optimizing for increased productivity and profits.</use><location><country_code>KE</country_code><country>Kenya</country><town>Kabondo West, Homa Bay</town><geo><level>town</level><pairs>-0.44775 34.891382</pairs><type>point</type></geo></location><partner_id>596</partner_id><posted_date>2020-04-23T22:50:08Z</posted_date><planned_expiration_date>2020-06-07T22:50:08Z</planned_

By now we should be familiar with the query parameters for `page` and `per_page`. Another useful query parameter is `ids_only` which can be either True or False. Please take a moment to familiarize yourself with this parameter, e.g., by playing with the following URI:

https://api.kivaws.org/v1/loans/newest.json?page=3&ids_only=true

For more practice (in a different setting than the previous problem), please solve the following. Pay careful attention to the test invocation to understand the parameters.

**Q2** Please write a function with five string parameters

    kiva_query(result,page,pp,ids_only,endpoint)

that uses `endpoint` and `result` to create a correct endpoint path for *newest* loans, then creates a dictionary for the other three parameters suitable for query parameters to go with the Kiva API. Please pass the URI and query parameters to `requests.get`, returning the result. Once you have a working version, please think about how to generalize it to make all arguments optional.

In [30]:
### BEGIN SOLUTION
def kiva_query(result='json', page=None, pp=None, ids_only=None,endpoint="https://api.kivaws.org/v1/loans"):
    """
    Returns response object
    """
    D = {}
    if page:
        D['page'] = page
    if pp:
        D['per_page'] = pp
    if ids_only:
        D['ids_only'] = ids_only
    assert result in ['json', 'html', 'xml', 'rss']
    s = "{}/newest.{}"
    #print(s.format(endpoint, result))
    r = requests.get(s.format(endpoint, result), params=D)
    #print(r.request.path_url)
    return r
### END SOLUTION

In [37]:
my_end="https://api.kivaws.org/v1/loans"

r = kiva_query(result='xml', page=5, pp=10, ids_only='false',endpoint=my_end)
r.request.path_url

'/v1/loans/newest.xml?page=5&per_page=10&ids_only=false'

In [38]:
stripparser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(io.BytesIO(r.content), stripparser)
root = tree.getroot()

In [39]:
root = etree.parse(io.BytesIO(r.content)).getroot()

In [40]:
print(etree.tostring(root, pretty_print=True).decode("utf-8"))

<response>
<paging><page>5</page><total>2879</total><page_size>10</page_size><pages>288</pages></paging><loans type="list"><loan><id>1961695</id><name>Jackson</name><description><languages type="list"><language>en</language></languages></description><status>fundraising</status><funded_amount>25</funded_amount><basket_amount>0</basket_amount><image><id>3443225</id><template_id>1</template_id></image><activity>Farming</activity><sector>Agriculture</sector><use>to access premium seeds and high-quality fertilizer for 1.5 acres of maize, in addition to advice and insurance, to optimize increased productivity and profits.</use><location><country_code>KE</country_code><country>Kenya</country><town>Segero/Barsombe, Uasin Gishu</town><geo><level>town</level><pairs>0.902718 35.335652</pairs><type>point</type></geo></location><partner_id>596</partner_id><posted_date>2020-04-23T20:20:09Z</posted_date><planned_expiration_date>2020-06-07T20:20:09Z</planned_expiration_date><loan_amount>175</loan_amou

**Q3** Please use the result from the previous problem, and XPath, to extract a list `sector_list` of `sectors` that appear in your query, e.g. "Agriculture", etc.

In [41]:
### BEGIN SOLUTION 
sector_list = root.xpath('/response/loans/*/sector/text()')
### END SOLUTION

In [42]:
print(sector_list)

['Agriculture', 'Agriculture', 'Agriculture', 'Food', 'Agriculture', 'Health', 'Agriculture', 'Agriculture', 'Agriculture', 'Education']


## Practice with POST

Please visit the following link in Google Chrome and use the drop-down menus to select all years from 2002 to 2015

http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php

Now use Dev Tools so you can see what POST is really doing, and write down what you learned. 

**Q4** Write a function `makePostDict(from_year,to_year)` that takes the from year and to year (as strings) and returns a correct dictionary that could be sent with the POST.


In [None]:
### BEGIN SOLUTION 
def makePostDict(from_year,to_year):
    cpiargs = {'fromYear': from_year, 'toYear': to_year, '_submit_check': "1"}
    return cpiargs
### END SOLUTION

In [None]:
# Testing cell
D = makePostDict('2004','2011')
print(D)

In [None]:
# Testing cell
endpoint = 'http://httpbin.org/post'
r = requests.post(endpoint, data=D)
r.request.body

**Q5** Following what you learned in the book, formulate a `curl` POST for the example of 2002 to 2015.

In [None]:
%%bash

### BEGIN SOLUTION 

### END SOLUTION


**Q6** Write a function `get_inflation(from_year,to_year)` that uses the `requests` module to issue a POST request whose body is obtained via a call to `makePostDict`, returning the result of the `requests.post` invocation.

In [None]:
### BEGIN SOLUTION 
def get_inflation(from_year,to_year):
    cpiurl = 'http://www.rateinflation.com/consumer-price-index/usa-historical-cpi.php'
    cpiargs = makePostDict(from_year,to_year)
    r = requests.post(cpiurl, data=cpiargs)
    return r

### END SOLUTION

In [None]:
r = get_inflation('2004','2011')
r.status_code
r.text
r.request.url
r.request.body