# PCPP1 | Working with RESTful APIs

## 1.1.1.1 Python Professional Course Series: RESTful APIs

**After completing this course you will know:**

-   the basic concepts of network programming, REST, network sockets, and client-server communication;
-   how to use and create sockets in Python, and how to establish and close the connection with a server;
-   what JSON and XML files are, and how they can used in network communication;
-   what HTTP methods are, and how to say anything in HTTP;
-   how to build a sample testing environment;
-   what CRUD is;
-   how to build a simple REST client, and how to fetch and remove data from server, add new data to it, and update the already-existing data.

## 1.1.1.2 Networks, layers and the Internet: introduction

## Some words about REST

REST isn't actually a word - it's an acronym. It comes from three words of equal importance:

-   **RE**presentational
-   **S**tate
-   **T**ransfer

### Representational
**RE** stands for _Representational_. It means that our machinery **stores, transmits and receives representations**, while the term representation reflects the way in which **data or states are retained inside the system and presented to the users** (humans or computers).

REST uses a very curious way of representing its data - it's always **text**. Pure, plain text.

"It must be a joke," you may think now. "How is it possible to send and receive all kinds of data using plain text?"

It's a very good question. Probably the best question that can be put now. REST is focused on a very specific kind of data - the data which reflects **states**.

### State

**S** stands for _State_. The word _state_ is key to understanding what REST is and what it could be used for.

We think that your knowledge of classes and objects can be very helpful here. We want you to use it. Imagine any object. The object contains a set (the most preferable set is a non-empty one) of **properties**. We can say that the values of all the object's properties constitute its state. If any of the properties changes its value, this inevitably entails the effect of changing the whole object's state. Such a change is often called a **transition**.

Now imagine that the object is stored somewhere else, not on your computer, but on a server located over the hill and far away. Of course, you can access the server's resources using the network, but you can't just get the object and transfer it into your computer. Why not? Because it has to be accessible to many (maybe a few, maybe a million) users. It must stay on the server.

Imagine that you want to (or you must) **affect the object's state through the network**. No, you are not able to invoke any of its methods. Sorry, that's impossible. You can't do it directly. But you can do it using REST.

### Transfer

**T** stands for _Transfer_. The network (not only the Internet) is able to act as a **carrier allowing you to transmit states' representations to and from the server**.

Note: not the object, but its states, or actions able to change the states, are subject to the transfer. We can say (it's a very poor analogy, but it will work here) that transferring the states enables you to achieve results similar to those caused by method invocations.

  

**Representational State Transfer** - We hope the term is less mysterious now. Don't be afraid - we won't leave you alone with your doubts. There is a long road ahead of us.

## 1.1.1.3 Network sockets - a basic means of network programming

## BSD sockets

The sockets we want to tell you about have nothing to do with electricity - we're not going to plug anything into them and we won't draw energy from them.

A socket (in the sense that interests us now) is a kind of **end-point**. An end-point is **a point where the data is available to get it from and where the data may be sent to**. Your Python program can connect to the end-point and use it to interchange messages between itself and another program working somewhere far away on the Internet.

The history of sockets started in 1983 at the University of California in Berkeley, where the concept was formulated and where the first successful implementation was carried out.

The resulting solution was a universal set of functions suitable for implementation in nearly all operating systems and available in all modern programming languages. It was named BSD sockets - the name was borrowed from _Berkeley Software Distribution_, the name of a Unix-class operating system, where the sockets were deployed for the very first time.

After some amendments, the standard was adopted by POSIX (a standard of contemporary Unix-class operating systems) as **POSIX sockets**.

We can say that all modern OSs implement BSD sockets in a more or less accurate way. Despite their differences, the general idea remains the same and this is what we are going to tell you about.

We don't want our course to be a schooling on network programming, so be aware that we'll present to you only the absolutely essential information on how network traffic is managed. We focus - as always - on programming in Python. By the way: BSD sockets were originally implemented in the "C" programming language, which is a good reason to start our "C" course.

The main idea behind BSD sockets is closely connected to Unix philosophy contained in the words everything is a file. A socket may be often treated as very specific kind of file. Writing to a socket results in sending the data through a network. Reading from a socket enables you to receive the data coming from the network.

By the way, **MS Windows reimplements BSD sockets in the form of the WinSock**. Fortunately, you're not able to feel the difference when programming in Python. Python hides them very thoroughly. We like Python for this (and not only for this).

Be prepared to assimilate many new terms and notions. Are you ready?

## 1.1.1.4 Domains, addresses, ports, protocols and services

## Socket domains

Initially, BSD sockets were designed to organize communication in two different domains (not to be confused with internet domains like pythoninstitute.org - these terms have nothing in common). The two domains were:

-   **Unix domain** (_Unix_ for short) - a part of BSD sockets used to communicate programs working within one operating system (i.e., simultaneously present in the same computer system)
-   **Internet domain** (_INET_ in short) - a part of BSD socket API used to communicate programs working within different computer systems, connected together using a TCP/IP network (note: this doesn't preclude the use of INET sockets to communicate processes working in the same system)

In the next part, we'll deal with sockets working in the INET domain.

## Socket address

The two programs wanting to exchange their data must be able to identify each other - to be precise, they must have the ability to clearly indicate the socket they want to connect through.

INET domain sockets are identified (addressed) by pairs of values:

-   the **IP address** of the computer system inside which the socked is located;
-   the **port number** (more often referred to as service number)

![](./images/66_socket_address.png)

## IP address

An IP address (more precisely: **IP4** address) is **a 32-bit long value used to identify computers connected to any TCP/IP network**. The value is usually presented as four numbers from the range 0..255 (i.e., eight bits long) coupled together with dots (e.g., 87.98.239.87).

There is also a newer IP standard, named **IP6**, using 128 bits for the same purpose. Due to its slight prevalence (according to data published in August 2016, less than 20% of computers in the world are reachable by IP6 addressing) we will limit our considerations to IP4.

## Socket/service number

The socket/service number is **a 16-bit long integer number identifying a socket within a particular system**. As you may have guessed already, there are 65,536 (2 \*\* 16) possible socket/service numbers.

The term service number came from the fact that many standard network services usually use the same, constant socket numbers e.g., **the HTTP protocol, a carrier of data used by REST, usually uses port 80**.

  

## Protocol

A protocol is **a standardized set of rules allowing processes to communicate with each other**. We may say that a protocol is a kind of _network savoir-vivre_ specifying the rules of behaviour for all participants.

## 1.1.1.5 Domains, addresses, ports, protocols and services

## Protocol stack

A protocol stack is **a multilayer** (hence the name) **set of cooperating protocols providing a unified repertoire of services**. The TCP/IP protocol stack is designed to cooperate with networks based on the IP protocol (the IP networks).

The conceptual model of network services describes the protocol stack in a way where the most basic, **elementary services are located at the bottom of the stack**, while the most advanced and abstractive lie on the top.

It is assumed that any higher layer implements its functionalities using services provided by the adjoining lower layer (note: it is the same as in the other parts of the operating system, e.g., you program implements its functionality using OS services and OS services implement their functionalities using hardware facilities).

  

## IP

The IP (_Internetwork Protocol_) is one of the lowest parts of TCP/IP protocol stack. Its functionality is very simple - it is able to **send a packet of data (a datagram) between two network nodes**.

![](./images/67_network_node.png)

IP is a very unreliable protocol. It doesn't guarantee that:

-   any of the sent datagrams will reach the target (moreover, if any of the datagrams is lost, it may remain undetected)
-   the datagram will reach the target intact;
-   a pair of sent datagrams will reach the target in the same order as they were sent.

The upper layers are able to compensate all these IP's infirmities.

## TCP

The TCP (_Transmission Control Protocol_) is the highest part of the TCP/IP protocol stack. It **uses datagrams** (provided by the lower layers) and **handshakes** (an automated process of synchronizing the flow of data) **to construct a reliable communication channel able to transmit and receive single characters**.

Its functionality is very complex, as it guarantees that:

-   a stream of data reaches the target, or the sender is informed that communication has failed;
-   data reaches the target intact.

## UDP

The UDP (_User Datagram Protocol_) lies at the higher part of TCP/IP protocol stack, but lower than the TCP. It doesn't use handshakes, which has two serious consequences:

-   it is faster than TCP (due to fewer overheads)
-   it is less reliable than TCP.

This means that:

-   TCP is a first-choice protocol for applications where data safety is more important that efficiency (e.g., WWW, REST, mail transfer, etc.)
-   UDP is more adequate **for applications where response time is crucial** (DNS, DHCP, etc.)

## 1.1.1.6 Clients and servers - two sides of network communication

## Connection-oriented vs. connectionless communication

A form of communication which **demands some preliminary steps to establish the connection and other steps to finish it** is _connection-oriented communication_.

Usually, both parties involved in the process aren't symmetrical i.e., their roles and routines are different. Both sides of the communication are aware that the other party is connected.

A phone call is a perfect example of connection-oriented communication.

Look:

-   the roles are strictly defined: there is a caller and there is a callee;
-   the caller must dial the callee's number and wait till the network routes the connection;
-   the caller must wait for the callee to answer the call (the callee may reject the connection, or just not answer the call)
-   the actual communication won't start until all the previous steps are completed successfully;
-   the communication ends when either of the parties hangs-up.

  
  

TCP/IP networks use the following names for both sides of the communication:

-   the side that initiates the connection (caller) is named **client**;
-   the side that answers the client (callee) is named **server**.

Connection-oriented communications are usually built on top of TCP.

A communication which **can be established ad-hoc** (snap - just like that) is _connectionless communication_. Both parties usually have equal rights, but neither of the parties is aware of the other side's state.

Using walkie-talkies is a very good analogy for connectionless communication, because:

-   either of the parties of communication may initiate the communication at any time; it only requires pushing the _talk_ button;
-   talking to the mic doesn't guarantee that anybody will hear (it’s necessary to wait for an incoming answer to be sure)

Connectionless communications are usually built on top of UDP.

Okay. Taking such a dose of theory requires some practice as soon as possible. Let's do it.

## 1.2.1.1 How to use sockets in Python

## How to fetch a document from a server using Python

We are going to write our first program making use of network sockets. Of course, we'll harness Python for this purpose.

Here are our goals:

-   we want to write **a program which reads the address of a WWW site** (e.g., pythoninstitute.org) using the standard `input()` function and **fetches the root document** (the main HTML document of the WWW site) of the specified site;
-   the program **outputs the document** to the screen;
-   the program **uses TCP to connect to the HTTP server**.

Our program has to perform the following steps:

1.  **create a new socket** able to handle connection-oriented transmissions based on TCP;
2.  **connect the socket to the HTTP server** of a given address;
3.  **send a request to the server** (the server wants to know what we want from it)
4.  **receive the server's response** (it will contain the requested root document of the site)
5.  **close the socket** (end the connection)

This is our road map. Let's follow the route.

![](./images/68_request_response.png)

## Importing a socket

We are in need - we need a socket. How do we obtain a socket? Can we order it from an Internet store? Is it free?

Yes, it's free. As you probably suspect, we need a specialized module. Python offers just such a module. You won't be surprised if we tell you that the module is named socket, will you?

This is what we'll put at the top of our code:

```
import socket

```  
  

## Obtaining user input

We also need **the name of the HTTP server** we're going to connect to. In fact, it's not our problem. The user knows it better. Let's ask him or her:

```
import socket

server_addr = input("What server do you want to connect to? ")

```  
  

The user input may can take two different forms:

-   it can be **the domain name of the server** (like __www.pythoninstitute.org__, but without the leading __http://__)
-   it can be **the IP address of the server** (like __87.98.235.184__), but it must be said firmly that this variant is potentially ambiguous. Why? Because **there can be more than one HTTP server located at the same IP address** - the server you will reach may be not the server you intended to connect to.

It may sound cynical - it's not our problem which of these two ways our users choose. They know better. The customer is always right.

## 1.2.1.2 How to create a socket in Python

## The socket module: creating a socket

The `socket` module contains all the tools we need to deal with sockets. We aren't going to present all its capabilities - as we mentioned before, we aren't and won't be focusing on network programming. We want to show you how the TCP/IP works and how it is able to act as **a carrier for REST**.

We can say that TCP/IP is interesting for us only to the extent that it is able to transport HTTP traffic, and HTTP is interesting for us only to the extent that it is able to act as a relay for REST. If you want to get fully accustomed with networks, you may need to continue your reading using another of our courses.

The socket module provides a class named `socket` (what a coincidence!) which encapsulates a bundle of properties and activities related to the actual sockets' behaviour. This means that the first step is to **create an object of the class** - this is how we carry out the creation:

In [None]:
import socket

server_addr = input("What server do you want to connect to? ")
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

As you can see, the constructor takes two arguments, both declared within the module. Let us tell you about them:

-   the former argument is a domain code (we may use the `AF_INET` symbol here to **specify the Internet socket domain** - do you remember?  
      
    We told you about Unix and INET domains in the previous section); as different domains require completely different socket countenance, the target domain has to be known at the moment;
  
-   the latter argument is a socket type code (we may use the `SOCK_STREAM` symbol here to **specify a high-level socket able to act as a character device** - a device that can handle single characters, as we are interested in transferring data byte by byte, not as fixed sized blocks (e.g., a terminal is a character device, while a disk isn't)

Such a socket is prepared to work on top of TCP protocol - it's the default socket configuration.

If you want to create a socket to cooperate with another protocol, like UDP, you will need to use a different constructor syntax.

As you can see, the newly created socket object will be referenced by a variable named `sock`. No, it's not about the clothes. Really.

## 1.2.1.3 How to connect to a server

## Connecting to a server

If we use a socket on the client's side, we are ready to make use of it. The server, however, has a few more steps to take. In general, servers are usually more complex than clients (as one server serves many clients simultaneously) - this is the moment where our telephone analogies stop working.

The configured socket (just like ours) is able to be connected to its counterpart on the server's side. Look at the code in the editor - this is how we perform the connection.

The `connect()` method does what it promises - it tries to connect your socket to the service of the specified address and port (service) number.

Note: we make use of the variant where the two values are passed to the method as elements of a tuple. This is why you see two pairs of parentheses there. Omitting one of them will obviously cause an error.

Note: the form of the target service address (a pair consisting of the actual address and port number) is **specific for the INET domain**. Don't expect it to look the same in other domains.

You may ask - why 80? Can I put something else instead of this? No, you can’t. 80 is a well-known service number for HTTP. Any Internet browser will try to connect to port number 80 by default, so we do it, too.

Is it possible that the connection attempt will fail? Of course it is. There are lots of possible reasons: a malformed address of the service, a non-existent server, a connection error, and more. How we can discover such unpleasant events?

If something goes wrong, the `connect()` method (and any other method whose results may be unsuccessful) **raises an exception**. Let us postpone the issue for a moment. For the moment we can assume then everything goes smoothly.

Yes, we know. The awakening from this dream can be painful.

The connection is ready. The server has accepted our connection and is very curious about what it will hear from us. Don't let it wait too long.

But... what do we really want to tell the server anyway? How do we talk to the HTTP server to be sure that it understands us? We have to speak in HTTP, of course.

In [None]:
import socket

server_addr = input("What server do you want to connect to? ")
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((server_addr, 80))

## 1.2.1.4 How to say something in HTTP

## The GET method

The HTTP protocol is one of the simplest Internet protocols, but it is still too complex to discuss fully here. For now, we'll tell you how to get a root document from the WWW site. Of course, we'll tell you more about it later.

A conversation with the HTTP server consists of **requests (sent by the client) and responses (sent by the server)**.

HTTP defines a set of acceptable requests - these are **the request methods or HTTP words**. The method asking the server to send a particular document of a given name is called `GET` (it's rather self-explanatory, isn't it?).

To get a root document from a site named _www.site.com_ the client should send the request containing a correctly formed `GET` method description:

In [None]:
GET / HTTP/1.1\r\n
Host: www.site.com\r\n
Connection: close\r\n
\r\n


The `GET` method requires:

-   a line containing the method name (i.e., `GET`) followed by the name of the resource the client wants to receive; the root document is specified as a single slash (i.e., `/`); the line must also include the HTTP protocol version (i.e., `HTTP/1.1`) and must end with the characters **`\r\n`**; note: all lines must end the same way;
-   a line containing the name of the site (e.g., _www.site.com_) preceded by the parameter name (i.e., `Host:`)
-   a line containing a parameter named `Connection:` along with its value `close`, which forces the server to close the connection after the first request is served; it will simplify our client's code;
-   an empty line is **a request terminator**.

It doesn’t look very clear, but it doesn't exceed our capabilities, does it?

Okay, we know now that HTTP won't be our favourite language, but how we can send such a request to the server? It's simple. We have to invoke a method from within the socket object.