# Chapter 1: Interaction with the web

## 1. The WEB architecture

Webpages and APIs often offer an incredible amount of data for researchers, be it literature texts, statistics or tweets. They are nowadays central to the development of one's research in Digital Humanities. The difference between a webpage and an API is the presentation of the data : while webpages, the content you are using everyday, is encapsulated in HTML, which is a markup language oriented for design, APIs' content are described by their markup, in RDF formats, in XML or in JSON.

The world wide web is organized around http. [HTTP](https://httpwg.github.io/specs/rfc7540.html) defines the way computer, be it server or client, communicate with each other. There is 4 methods you should know :
- GET : this is the base method for http communication. You can pass parameters to tell what you want to see. You use it when you search or when you open a webpage.
- POST : this aims to send data to the server, to update or save informations. You use it when you sign up or sign in on a website.
- DELETE : this aims to suppress informations.
- PUT : this aims to save a new resource on a server.

Out of those methods, you use in your everyday browsing of the web the first 90% of the time, the second 9.99 % and the two others, maybe one day every year. What you don't know through is that those same websites you are using are most probably using the two others and some of the one lister in the [w3c](http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html) website every seconds or minutes you spend on huge websites. These websites use what is called REST API most of the time.

![REST API](images/rest.png)

Now that we have a general idea of how the web is constructed, let's get start with python.

## 2. Python and the web : getting a page

As you have seen it, Python is extremely modular. That means that we will be using modules to query the web. There is many possibilities but the one we will be using is `requests`. To import a library, you have to write the following :

In [1]:
import requests

This line will allow you to query the web. For example, the following lines of code will query the [CTS API](http://cite-architecture.github.io/cts_spec/) of Perseus, execute it :

In [None]:
url = "http://services2.perseids.org/exist/restxq/cts?request=GetCapabilities&inv=latin"
response = requests.get(url)
print(response)

Can you explain what we just did ? Or why the printed result is `<Response [200]>` ?

Explanation : after setting up a `url` variable, we have used the function `get()` of the library `requests`. This function takes as its first parameter a string representing a URL. This function performs a GET query on the url, according to the http standards. We then receive a response from the server. This response is sent with few informations : 
- a code, which express the status of the request. You might not know 200, because it means "everything went well", but I am sure you have seen 404 around. For more codes, see the list on [wikipedia](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- a header, which tells us about the content of the response.
- a body, that's what the HTML you see would be.

The result of a query with `requests` contains all those informations:
- `response.status_code` represents the status of a query
- `response.headers` is a dictionary containing the headers
- `response.text` is the content of the body.

In [None]:
# Let's see our headers :
print(response.headers)

In [None]:
# And the few first characters of our text :
print(response.text[0:100])

Great ! It works ! So may be it's time you do a request.

**DIY**

We will use the API of Perseids to get the famous first verse of the Aeneid. The text of the answer is contained between two tags `tei:l`. Can you query this page : "http://services2.perseids.org/exist/restxq/cts?request=GetPassage&inv=nemo&urn=urn:cts:latinLit:phi0959.phi001.perseus-lat2:1.1.1" ?

In [None]:
# Write your code here

## 3. Passing parameters to the web

## 4. Getting JSON out of APIs

## 5. Handling errors

## Exercises
1\. "Arma virumque cano"

Using the following URL, you will retrieve the first line of the Aeneid without xml markup (Hint : Use regular expressions !)

In [None]:
#Use this url
url = "http://services2.perseids.org/exist/restxq/cts?request=GetPassage&inv=nemo&urn=urn:cts:latinLit:phi0959.phi001.perseus-lat2:1.1.1"


2\. Second exercice

In [19]:
# Do not care about this cell, it's just here to make the page nicer.

from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()

---

<p><small><a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Python Programming for the Humanities</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="http://fbkarsdorp.github.io/python-course" property="cc:attributionName" rel="cc:attributionURL">http://fbkarsdorp.github.io/python-course</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/fbkarsdorp/python-course" rel="dct:source">https://github.com/fbkarsdorp/python-course</a>.</small></p>