# Create a REST API with Flask to expose the data
Tutorial created by [Thomas Belhalfaoui](https://www.linkedin.com/in/belhalfaoui/?originalSubdomain=fr) for [Ironhack](https://www.ironhack.com/).

## 1. Very quick introduction to HTTP

What happens when you type the address of a Website in your browser?

<img src="http.png" width="1000">

Well, there is a program that runs indefinitely on some remote computer (called Web server application).

> Be careful: the word _server_ has actually two different meanings:
> * The server machine,
> * The server application (a program that runs on this machine).

The address (URL) you enter in the browser (e.g. `http://bechdeltest.com:8080/page1`) contains several elements, among which the most important ones are:
* The **hostname** or **domain** (`bechdeltest.com`), which gets converted to an IP address (`208.113.152.224`). You can also directly write the IP address instad of the hostname. The hostname or IP address allows to find the **server machine** among all machines on the Internet.

* The **port number** (`8080`), which allows to find the **server application** among all applications on the server machine.

* The **route** (`/page1`), which allows to find the right **function** that will generate the page, among all functions coded in the server application.

When you (you are called the **client**) type the URL in a browser, the corresponding function of the corresponding application is called and it returns some text. This text is actually HTML code that is then interpreted by your browser as a page and displayed to you!

## 2. Create a simple Web server with Flask

### 2.1. Very first example

Let's create a Python file that will contain our server. Call it `app_example.py` and fill it with the following code:
```python
from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "Hello!"
```

Let's try it first!
1. Open a terminal.
2. Activate a virtual environment if you have one (e.g. type `virtualenv venv/bin/activate` if your environment is in the `venv` folder). In any case, make sure that when you type `python` in your terminal, it is the right version of Python that shows up.
3. Type: 
```bash
flask --app app1 run --port 8080
```
4. Open a browser and go to [http://localhost:8080/](http://localhost:8080/). The name `localhost` is a special host name: it is a shorthand that means "your local computer". You can also use the corresponding (special) IP address, which is 127.0.0.1: [http://127.0.0.1:8080/](http://127.0.0.1:8080/).
5. You should see a page with the world "Hello", which is the string returned by the `hello_world` function!

Let's take a look at what happened:
* The Python script `app1.py` is running indefinitely. It means that if you want to stop it, you have to hit CTRL+C (also CTRL on Macs, not Command) to kill it.
* With the `--port 8080` option, we chose the port we want to use. The port number must be unique on a machine (you cannot have two programs with the same port number). It is used from the outside when somebody connects to your computer, so that it knows which program to route the call to. This it why we concatenate the port number to the URL (or IP address),separated by a colon (e.g. `localhost:8080`). The default Web port is 80, so these two are equivalent: just `localhost` or `localhost:80`.

Let's analyze the content of the `app1.py` file.
* First, we see a global variable `app = Flask(__name__)`. It is an object which refers to our Flask application (the one that we run with the `flask` command). On a side note, `__name__` is a special Python variable that contains the name of the module (here `app1` because the file is named `app1.py`). We pass this name to Flask, so that the name of the application is also `app1`.
* Then, the line `@app.route("/")` calls a method of `app` to declare a route. Actually, the `@` in Python sign means it is a _decorator_: it applies to the function that is defined just below and gives it some information. In this case, this is how we declare the mapping between the route `/` and the `hello_world` function.
* The route `/` is a special route called the _root_ of the server. It means that by default, when you just enter the URL of the server followed by nothing (or just by `/`), the function mapped to it (here `hello_world`) will be called. It is the homepage of a website.

### 2.2. Create another route
Tip: You can add the `--debug` argument when you run Flask (e.g. `flask --app app1 run --port 8080 --debug`). Then as soon as you modify `app1.py`, the Flask will reload it automatically. Otherwise, you would have to kill Flask (CTRL+C from the terminal) and rerun it.

Now, let's add a new `morning` route:
```python
@app.route("/morning")
def good_morning():
    return "Good morning!"
```
Go to [http://localhost:8080/morning](http://localhost:8080/morning). You should see a blank page with "Good morning!". 

### 2.3. Route parameters

#### 2.3.1. One route parameter
What if we want to pass some infomation to the page? Or to say it differently: how to pass an argument to the `hello` function?

Let's add a new `evening` route:
```python
@app.route("/evening/<firstname>")
def evening(firstname):
    return f"Good evening, {firstname}!"
```
Then go to http://localhost:8080/evening/Andy. You should see a blank page with "Good evening, Andy!".

What we did here is define a parameter `firstname` with `<firstname>`. What is written in thge URL after the slash gets automatically passed to the `good_evening` function through the `firstname` argument. We can then use its value, for instance to print it.

#### 2.3.2. Multiple route parameters

Of course, we can add as many parameters as we want on the same route. Let's try this:
```python
@app.route("/greetings/<period_of_day>/<firstname>")
def greetings(period_of_day, firstname):
    return f"Good {period_of_day}, {firstname}!"
```
Go to [http://localhost:8080/greetings/afternoon/Andy](http://localhost:8080/greetings/afternoon/Andy).

NB: What happens if you go to [http://localhost:8080/greetings/](http://localhost:8080/greetings/) or [http://localhost:8080/greetings/evening](http://localhost:8080/greetings/evening)?

It shows a _404 Not Found_ error. So we can see that **route parameters are manadatory**, i.e. the route does not exist if you don't specify all route parameters.

#### 2.3.3. Non-string route parameters
Last example: we want to create a route that computes the sum of two integers. By default, the route arguments are strings. But we can tell Flask to convert them to another type, here integer.

```python
@app.route("/add/<int:first>/<int:second>")
def add(first, second):
    return str(first + second)
```
Go to http://localhost:8080/add/3/5 and you should see the result.

NB: What happens if you go to [http://localhost:8080/add/3/hello](http://localhost:8080/add/3/hello)? Well, Flask tries to convert `hello` to an integer but obviously fails. So it returns again a _404 Not Found_ error without even calling the `add` function.

### 2.4. URL parameters

#### 2.4.1. First (optional) URL parameter
What if we want to add some optional parameters to a route?

We have just seen that it is impossible to achieve with route parameters, since they are mandatory. This is one reason to look at _URL parameters_ (sometimes also called _query parameters_).

Let's add the following function and route:
```python
@app.route("/afternoon")
def good_afternoon():
    firstname = request.args['firstname']
    return f"Good morning {firstname}!"
```
Go to http://localhost:8080/afternoon?firstname=Andy and you should see the result.

So what is this weird URL? Well, everything after the question mark (`?`) is what is called _URL parameters_.

Yes, the names are confusing. Everything before the question mark is the _route_ (hence _route parameters_)
and the whole address (everything before and after the question mark) is called the _URL_ (hence _URL parameters_).

And the syntax of the URL parameters is simple: `name=value`. Then we use the special global variable `request` (that we have imported beforehand with `from flask import request`). It contains a dictionary called `request.args` that contains all the URL parameters (the name of the variable is the key in the dictionary and the value of the variable is the value in the dictionary).

It tells us that the key `firstname` does not exist in the dictionary `request.args` -- which is indeed the case. How could we get rid of this error and make the parameter optional?

It is simple, we just need to use the `get` method which works on dictionaries and allows us to specify a default value in the case when the key does not exist:

```python
@app.route("/afternoon2")
def good_afternoon2():
    firstname = request.args.get('firstname', 'you')
    return f"Good morning, {firstname}!"
```
Then, go to http://localhost:8080/afternoon2 and http://localhost:8080/afternoon2?firstname=Andy. Both should work as expected.

#### 2.4.2. A side note on errors
When you go to [http://localhost:8080/afternoon](http://localhost:8080/afternoon), you see the following error:
> werkzeug.exceptions.BadRequestKeyError: 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
> KeyError: 'firstname'

First, of course (as always in programming), errors are very important! They are here to help you understand what went wrong. So don't forget to look at them and try to understand them.

Then, notice that you see the error it two different places:
* In the terminal where you started Flask (this is the **server**),
* And in your brower (this is the **client**).

Actually, the error is displayed in the browser only because you ran Flask with the `--debug` option. If you remove it, then you only get a _400 Bad Request_ error (try it!).

This is because in real life (in production), for security reasons, we do not want to give too much information to the client about the internals of our code. Indeed, it would give information to an attacker and may ease their work in trying to attack our server. Or it may reveal sentitive information from our code that we do not want to expose.

But, in any case, the errors are always displayed in the server logs (i.e. in the terminal). This is where you should always look :)

#### 2.4.2. Multiple URL parameters
Of course, now we would like to add multiple URL parameters. It is fairly easy:

```python
@app.route("/substract")
def difference():
    first = int(request.args.get('first', '0'))
    second = int(request.args.get('second', '0'))
    return str(first - second)
```
When you go to [http://localhost:8080/substract?first=15&second=4](http://localhost:8080/substract?first=15&second=4) you should see the result.

We can check that parameters are indeed optional: if we omit one of them, it is set to the default value, which here is zero.

Make sure it works by going to [http://localhost:8080/substract?second=5](http://localhost:8080/substract?second=5) or even [http://localhost:8080/substract](http://localhost:8080/substract).

You probably noticed the new character `&` which is used to separate the two parameters. This is how we can add as many URL parameters as we want, separated by `&`, e.g. `name1=value1&name2=value2&name3=value3`.

Notice also that the URL parameters are also strings. But this time, we have to make the conversion from string to integer manually, with the `int` function (there is no way to make Flask do it automatically).

### 2.5. From Web server to API

What is the difference between the Web server we just created and an API? Well, there is (almost) none.

> **An API is actually a Web server that, instead of returning some text (which is easy to understand for a human), it returns some data in a format that is easy to understand for another program (e.g. JSON)**.

On a side note, API means _Application Programming Interface_.

To rephrase it, a classical Web server is machine-to-human communication, whereas an API is machine-to-machine communication. The API aims at being called by another program to get some information.

The idea is very simple: we create a Web server (same steps as before), but instead of returning text, we return a dictionary. Flask converts auomatically this dictionary into JSON (i.e. into the text version of the dictionary).

Let's add a route to our Flask application:

```python
@app.route("/hello")
def hello_api():
    return {"message": "Hello!", "hey": "I'm an API!"}
```

Of course, you can go to you brower, type http://localhost:8080/hello and see the result in JSON format.

But there is more. The whole purpose of the API is that we can call it from another program. So let's try! We will call the API from here, from our Jupyter Notebook, using the `requests` library (the same you would use for instance to get any Web page content if you do some scraping).

In [8]:
import requests

response = requests.get("http://localhost:8080/hello")

# The `response` object contains all the information about the response.
# Let's look first at the raw text (the same as the one we see in the browser)
response.text

'{"hey":"I\'m an API!","message":"Hello!"}\n'

In [9]:
# The Requests library can automatically convert this text back into a Python dictionary
result = response.json()
result

{'hey': "I'm an API!", 'message': 'Hello!'}

In [10]:
# Then we can get the value of each field as we want.
result['hey']

"I'm an API!"

## 3. A REST API to expose my data

Here is our scenario:
* We collected some data and put it into a database (could be also multiple databases, possibly from different technologies).
* We want to create an API that anyone from the "outside" can call to get our data in a clean and usable form (it can be someone from another team in our company or anyone external in the case of open data).

Our API will:
* Retrieve the data from whatever database it is stored in,
* Possibly transform it: merge, aggregate, filter.
* Format it in a way that is easy to understand for someone external.

### 3.1. Presentation of the Bechdel dataset

In this part of the tutorial, we will work on the Bechdel database. It is a mix of two data sources:
* The Bechdel test dataset: https://bechdeltest.com
* A subset of the IMDB datasets: https://developer.imdb.com/non-commercial-datasets/

The dataset contains around 10 000 movies (or TV show episodes), along with the people related to it (actors and actresses, film directors, etc.). Each movie is given a rating from 0 to 3 which corresponds to number of criteria that the movie fulfills.

As a reminder, the Bechdel test, named after the American cartoonist Alison Bechdel, is a measure of the representation of women in a film (or other fiction). The tests asks whether the film:
1. Has at least two (named) women characters
2. Who talk to each other
3. About something other than a man.
More on this on Wikipedia: https://en.wikipedia.org/wiki/Bechdel_test

Here is the Entity-Relation (ER) diagram of the database:

<img src="bechdel_er_diagram.png" width="300"/>

### 3.2. Import the data into a database

A MySQL dump of the data is in the [bechdel.sql](bechdel.sql) file, in the form of SQL queris (`CREATE`, `INSERT`, ...).

To import it into your local MySQL database, choose your favorite options (see the SQL lesson). Here are two possible ones:

Option 1 - Use _MySQL Workbench_:
* Connect to your local MySQL server (which must be running),
* Choose _Server_ > _Data import_ > _Import from self-contained file_ and choose the `bechdel.sql` file.
* In _Default targer scheme_ click _New_ and name it `bechdel`.
* Choose _Start import_.

Option 2 - In a terminal:
```bash
echo "CREATE DATABASE bechdel;" | mysql -u root -p
mysql -u root -p bechdel < bechdel.sql
```
Replace `root` by the username of your database if different. You can remove the `-p` if your database does not have a password.

You should obtain the 5 tables presented above. You can check it by generating an ER diagram (e.g. with the _reverse-engineer_ feature in MySQL Workbench) and comparing it to the one above.

### 3.3. What is a _REST_ API? What does it mean in our case?

The goal of a REST API is to expose our data in an easy way for someone to retrieve. REST API is about conventions, so that everyone can understand easily how it works. But not all of them are strict nor are they written in a book. So the way that we present it here is one possible choice, but other options may be also valid, as long as they make sense.

So the most important thig is to define _resources_ that will be exposed. Note that:
* Each resource type will have its own route. In our case:
    * `/movies` for movies,
    * `/people` for people.<br/><br/>

* The REST resources do not necesserily have to match the entities (i.e. tables) from the SQL database (oftentimes, we will join several SQL tables together to create one REST resource). This is because:
    * SQL is about **optimizing storage** (do not store redundant information),
    * Whereas REST API is about **optimizing readability** (present the information in the most easy-to-use fashion).<br/><br/>

* Usually, the same route is used to retrieve all resources of a given type or just a single one. For instance:
    * Calling `/movies` will return all movies,
    * Calling `/movies/53825` will return the movie with ID `53825` (yes, the convention is to keep _movies_ in the plural form, even if it returns only a single one).

### 3.4. Before we start: how to retrieve the data from the SQL database?

The first thing we need to do is to find how to execute SQL queries from Python, to get the data we need.

You may have seen the Pandas `read_sql` function, which is very handy when it comes to simply get a whole SQL table into a Pandas dataframe. Sadly, it will be of no use here, since:
* Sometimes, we will need only one row of the table,
* We will need to make joins (to get all the information about movies or people).

So let's use another library: `pymysql`. Of course, you need to install it first (`pip install pymysql` or `conda install pymysql` from a terminal, whichever package manager you are using).

> NB: We could have used another library like `sqlalchemy`, which is called an ORM. But it would add yet another layer of abstraction, so we prefer to avoid it.

Here is a simple example of how to run an SQL query with `pymysql`:

In [11]:
import pymysql

db_conn = pymysql.connect(
    host="localhost",
    user="root", # add also password="yourpassword" if you have set up a password on your MySQL database
    database="bechdel",
    cursorclass=pymysql.cursors.DictCursor  # This makes pymysql return the result in a nice dictionary form
)

# A cursor is an object that we can execute query on and that will handle data fetching for us.
# NB: A cursor is specific to on query and should NOT be reused for another query.
#
# The `with` block is a handy way of automatically making the cursor close as soon as we exit the block
with db_conn.cursor() as cursor:
    cursor.execute("SELECT * FROM Movies LIMIT 1")
    # Here `fetchone` retrieves only the first row from the results (in the form of a dictionary).
    # So we should only use it with queries that return 1 row (otherwise we would waste time querying multiple
    # rows and throwing them away).
    result = cursor.fetchone()

# Don't forget to close the connection as soon as you finished. Otherwise, it will unnecessarily use
# resources on the MySQL database.
db_conn.close()

In [12]:
# The result is indeed a dictionary that represents one row from the Movies table.
result

{'movieId': 1,
 'movieType': 'short',
 'primaryTitle': 'Carmencita',
 'originalTitle': 'Carmencita',
 'isAdult': 1,
 'startYear': 1894,
 'endYear': None,
 'runtimeMinutes': '1'}

## 4. The first real API route! ðŸ™Œ

### 4.1. Return a single movie based on its ID
Now, let's wrap the SQL code in a Flask API.

Create a new `app2.py` file:

```python
from flask import Flask
import pymysql

app = Flask(__name__)

@app.route("/movies/<int:movie_id>")
def movie(movie_id):
    db_conn = pymysql.connect(host="localhost", user="root", database="bechdel",
                              cursorclass=pymysql.cursors.DictCursor)
    with db_conn.cursor() as cursor:
        cursor.execute("SELECT * FROM Movies WHERE movieId=%s", (movie_id, ))
        movie = cursor.fetchone()
    db_conn.close() 
    return movie
```
Let's try it! Go to http://localhost:8080/movies/17009710 in your browser (or you can use `requests` in Python if you prefer). You should see JSON corresponding to the film _Anatomie d'une chute_ from Justine Triet. Looks good, doesn't it?

> Maybe you noticed that we used the `%s` placeholder in the query instead of the movie ID, and that we gave the actual ID through the second parameter of the `execute` function (and **NOT** through direct string formatting or concatenation in Python!). This is for security reasons, to prevent SQL injections (more on this: https://en.wikipedia.org/wiki/SQL_injection).
>
> In short, if the user puts a weird movie ID (which is not an integer but a very special), it would be mixed with the rest of the query and result in displaying unwanted data or even deleting information. When you use the `execute` function, it isolates the parameters and makes sure it it safe.

### 4.2. Add the Bechdel score
This is already good, but we are missing the Bechdel score (which is the whole purpose of this dataset)!

How could we add it? Well, the simplest way is to add a `JOIN` in our query. Try to write it down (and test it in MySQL Workbench).

Here is the answer (replace the `SELECT` query in `app2.py` with this one):
```sql
SELECT * FROM Movies M
JOIN Bechdel B ON B.movieId = M.movieId 
WHERE M.movieId=%s
```

> Tip: You can use triple double-quotes (e.g. `"""Blah blah"""`) in Python to create strings that spread across multiple lines.

### 4.3. Add movie genre(s)
What about the movie genre(s)? Can we do the same as for the Bechdel score?

Well... not really. Since one movie can have multiple genres, doing a `JOIN` would duplicate the data and this not what we want (e.g. for genre of the film, the title, year, etc. would be repeated). We want to add a `genre` field is a list (of strings).

To do se, we will simply make another SQL query in Python. You can add this code in your route (just before closing the connection) and test it by going again to http://localhost:8080/movies/17009710:

```python
with db_conn.cursor() as cursor:
    cursor.execute("SELECT * FROM MoviesGenres WHERE movieId=%s", (movie_id, ))
    genres = cursor.fetchall()
movie['genres'] = [g['genre'] for g in genres]
```

Notice the use of the `fetchall` function. It works the same as `fetchone` but is suitable for queries that return multiple rows: it fetches all the rows in a list (each row still being a dictionary).

But be careful! Use `fetchall` only if the number of rows is not too big (typically a few dozens at most). Otherwise, for instance if you try `fetchall` on a `SELECT` query that returns 10 million rows, it will try to fetch them all at once: so it will most probably blow your MySQL server and/or your Flask application memory...

### 4.4. Add the people

Now we can do the same with people (actors and actresses, directors, etc.).

Try to write the Python and SQL code to make it work. Then you can compare to what we propose (code to be added at the end of the route, before closing the connection):

```python
with db_conn.cursor() as cursor:
    cursor.execute("""
        SELECT * FROM MoviesPeople MP
        JOIN People P on P.personId = MP.personId
        WHERE MP.movieId=%s
    """, (movie_id, ))
    people = cursor.fetchall()
movie['people'] = people
```

Let's test it! http://localhost:8080/movies/17009710

### 4.5. Some cleanup and formatting

You may have noticed that:
* Some `null` fields are returned. We would like to get rid of them.
* Some fields seem useless or redundant and could be removed:
* Some fields could be renamed for more clarity.
* Some fields could be added for more clarity.

We choose to:
* Remove the `endYear` field which is very rarely filled and therefore rename `startYear` into simply `year`,
* Remove the `job` field which is either null or redundant with the `category` (renamed `role`). Instead, we could have chosen to build a field from both values (e.g. if `job`, which is more specific, is not null, then use `job` ; otherwise use `category`). We will leave this refinement to do by yourself if you want to.
* Rename `primaryTitle` into `englishTitle`, `rating` into `bechdelScore` and `primaryName` as `name`.
* Add a boolean `bechdelTest` field which is `True` if the movie passes the Bechdel test and `False` otherwise.

To do so, the easiest way is to specify directly the fields we want to add and/or rename in the SQL queries.

Let's replace the first (movie) query by this one:
```sql
SELECT
    M.movieId,
    M.originalTitle,
    M.primaryTitle AS englishTitle,
    B.rating AS bechdelScore,
    M.runtimeMinutes,
    M.startYear AS Year,
    M.movieType,
    M.isAdult
FROM Movies M
JOIN Bechdel B ON B.movieId = M.movieId 
WHERE M.movieId=%s
```

And replace the last (people) query by that one:
```sql
SELECT
    P.personId,
    P.primaryName AS name,
    P.birthYear,
    P.deathYear,
    MP.job,
    MP.category AS role
FROM MoviesPeople MP
JOIN People P on P.personId = MP.personId
WHERE MP.movieId=%s
```

About the `null` fields, we can create a small function to remove them:

```python
def remove_null_fields(obj):
    return {k:v for k, v in obj.items() if v is not None}
```

Then we can use this function in the first (movie) part:
```python
movie = remove_null_fields(movie)
```
And in the last (people) part:
```python
movie['people'] = [remove_null_fields(p) for p in people]
```

### 4.6. Handling errors

What happens if you go to [http://localhost:8080/movies/999](http://localhost:8080/movies/999)?

You get this error (in the terminal where you launched Flask from):
```
  File "app2.py", line 32, in movie
    movie = remove_null_fields(movie)
  File "app2.py", line 8, in remove_null_fields
    return {k:v for k, v in obj.items() if v is not None}
AttributeError: 'NoneType' object has no attribute 'items'
```

It is important to look at it closely to understand what it tells us.

In the `movie` function, when we call `remove_null_fields`, the variable `movie` (called `obj` inside the `remove_null_fields` function) is `None` (and therefore `obj.items()` throws an error since we cannot do `.anything` on a `None`).

But where does this `None` come from? Well, there is no movie with ID `999`. And if we look at [the documentation of the `fetchone` function](https://peps.python.org/pep-0249/#fetchone), we see that it returns `None` when the `SELECT` query returns no result - which is the case here.

> NB: When any Python error occurs within the function in Flask, it returns by default a _500 Internal Error_, which roughly means: "something went wrong on the server".
>
> It is perfectly fine to do so if an _truly_ unexpected error happens. In this case, we should look at the logs in the terminal and try to find the cause.

But here, it is perfectly normal and expected that someone queries the `/movies` route with an ID that does not exist. Therefore, we should return a _404 Not Found_ error, to tell that no corresponding resource has been found.

We can do so very easily with:
```python
from flask import abort
    ...
if not movie:
    abort(404)
```

### 4.7. Authentication

Our first route works well now, but even... too well! If we deploy it in production, then anybody could access the data. If the data is open data, then it is probably something we want. But if there is personal or sensitive data, then definitely, we want to restrict access to our API. How can we do so?

Let's install a library called `Flask-BasicAuth` that serves that purpose (do a `pip install Flask-BasicAuth` or `conda install Flask-BasicAuth`).

Then, we have to slightly change the beginning of our app:

```python
from flask import Flask
from flask_basicauth import BasicAuth

app = Flask(__name__)
app.config.from_file("flask_config.json", load=json.load)
auth = BasicAuth(app)
```

We also need to add, just after each `@app.route(...)`:
```python
@auth.required
```


Notice that we load a file called `flask_config.json`, which looks like this:
```json
{
    "BASIC_AUTH_USERNAME": "ironhack",
    "BASIC_AUTH_PASSWORD": "ilovedata"
}
```

We chose to do so (instead of putting the credentials directly in the `app2.py` file) for security reasons. Indeed, we most probably would like to commit `app2.py` in a Git repository, but we do not want the credentials to be committed along with it.

This way, we can add `flask_config.json` to our `.gitignore` file and keep it local.

The best practice is to also create another file, say `flask_config.template.json` (this one we commit), which looks like:
```json
{
    "BASIC_AUTH_USERNAME": "myusername",
    "BASIC_AUTH_PASSWORD": "mypassword"
}
```
This way, the person who clones the repository simply has to duplicate and modify this template file.

> NB: Instead of adding `@auth.required` for each route, we could also protect all routes at once, simply by adding one variable to the configuration file: `"BASIC_AUTH_FORCE": true`.
>
> See https://flask-basicauth.readthedocs.io for more information. 

## 6. The second route: return all movies

Now it is time to write the second route: `/movies` (without ID). We expect it to return all the movies that exist in our database.

Of course, if you paid attention to what we said before, **you should NOT return all the almost 10K movies at once**! We will implement a pagination mechanism to batch the results.

### 6.1. Pagination

#### 6.1.1. First paginated route
Here is the simplest example that returns all movies with a pagination mechanism:

```python
from flask import request
    ...

PAGE_SIZE = 100

@app.route("/movies")
def movies():
    page = int(request.args.get('page', 0))
    db_conn = pymysql.connect(host="localhost", user="root", database="bechdel",
                              cursorclass=pymysql.cursors.DictCursor)
    with db_conn.cursor() as cursor:
        cursor.execute("""
            SELECT * FROM Movies
            ORDER BY movieId
            LIMIT %s
            OFFSET %s
        """, (PAGE_SIZE, page * PAGE_SIZE))
        movies = cursor.fetchall()
    db_conn.close()
    return {'movies': movies}
```

You can test it with:
* http://localhost:8080/movies
* http://localhost:8080/movies?page=1
* http://localhost:8080/movies?page=2
* etc.

Notice a couple of important things:
* We cannot return a list with Flask (only a dictionary, that gets converted to JSON). This is why we need `{'movies': movies}` and not just `movies`.

* We use two SQL clauses: `LIMIT` and `OFFSET`:
    - `LIMIT` sets the number of results that are returned (i.e. PAGE_SIZE),
    - `OFFSET` decides how many of the first items we skip (it is `page * PAGE_SIZE`).<br/><br/>

* The order of the results in SQL is not guaranteed to be consistent from query to query. So the meaning of "the X first items" can change from query to query! This is why we must not forget the `ORDER` clause, so that items are always sorted the same way.

* We use URL parameters instead of route parameters, i.e. we do `/movies?page=5` and not `/movies/5`. Of course, the first reason if to avoid confusion with the previous route `/movies/<movie_id>` that returns a single movie. But also in the REST paradigm, route parameters are reserved for actual resources (i.e. movies, not batches of movies).

* The `page` parameter is optional. If not specified, then it defaults to `0`.

#### 6.1.2. Enhancements to the pagination mechanism

We can think of several enhancements to our pagination mechanism:
1. It is a good practice to **let the client choose the number of results per page** (i.e. page size). Of course, it cannot be larger than some maximum size, that we hardcode.

2. Another good practice is to give the client:
    * The URL they need to call to get the next page,
    * The URL of the last page.

Let's do it with this updated version of the `movies` route:
```python
import math
from flask import request
    ...

PAGE_SIZE = 100

@app.route("/movies")
def movies():
    page = int(request.args.get('page', 0))
    page_size = int(request.args.get('page_size', MAX_PAGE_SIZE))
    page_size = min(page_size, MAX_PAGE_SIZE)

    db_conn = pymysql.connect(host="localhost", user="root", database="bechdel",
                              cursorclass=pymysql.cursors.DictCursor)
    with db_conn.cursor() as cursor:
        cursor.execute("""
            SELECT * FROM Movies
            ORDER BY movieId
            LIMIT %s
            OFFSET %s
        """, (page_size, page * page_size))
        movies = cursor.fetchall()

    with db_conn.cursor() as cursor:
        cursor.execute("SELECT COUNT(*) AS total FROM Movies")
        total = cursor.fetchone()
        last_page = math.ceil(total['total'] / page_size)

    db_conn.close()
    return {
        'movies': movies,
        'next_page': f'/movies?page={page+1}&page_size={page_size}',
        'last_page': f'/movies?page={last_page}&page_size={page_size}',
    }
```

A couple of remarks:
* The function `math.ceil` returns the next integer (e.g. `math.ceil(3.1) = 4`), which is what we want to account for the potential last page (when the total number of items is not a multiple of the page size).

* The `PAGE_SIZE` constant becomes `MAX_PAGE_SIZE`. It is both the default value if `page` is not given and the maximum value (this is why we take the `min`).

* We get the total number of films with a separate (fairly fast) SQL `COUNT` query.

#### 6.1.3. The real movies route with pagination

Now take some time on your own to write a `/movies` route that combines:
* What we just saw about pagination,
* What we did before for the single-movie route (adding Bechdel score, people and genres).

You will also add an `include_details` parameter which takes value `0` or `1` (default is `0`, i.e. skip the details) to let the user decide explicitly whether or not to include the people and the genres (but the Bechdel score will always be included). Indeed, the user may choose to skip the details to speed the query up.

You will find a working example in [app2_answers.py](app2_answers.py).

Some remarks:

* To retrieve people and genres, you may be tempted to make one SQL query per film. It is possible but hightly suboptimal in terms of time (there is a large overhead for each query). So the best option is to make a single query using `WHERE movieId IN (%s, %s, %s, %s)` to retrieve all people / genres that match the list of movie IDs.

* Then you neeed to group the people (and genres) by movie ID, so that you can get them back all at once later on. To do so, we use `defaultdict` (more specifically `defaultdict(list)`). It is simply a dictionary but when a key does not exist, it acts as if it were there, with the default value `[]`. So you can always do `the_dict['somekey'].append(something)`.

## 7. API documentation - SwaggerUI and OpenAPI

> The **OpenAPI Specification** is a specification for a machine-readable interface definition language for describing, producing, consuming and visualizing web services (source: https://en.wikipedia.org/wiki/OpenAPI_Specification).

We will use it to generate a Web page which will serve two purposes:
1. It is the **documentation** of the API for anyone who would like to use it.
2. One can **call** the API directly from this page with a nice visual interface.

The specification can be written in JSON or YAML format. We chose [YAML](https://en.wikipedia.org/wiki/YAML) and we wrote it in this file: [static/openapi.yaml](static/openapi.yaml).

We will use a popular tool called `Swagger` (which also happens to be the previous name of `OpenAPI`...), which is the nice Web page that we want. 

Anf in Flask, we can (there are other ways of course) install `flask_swagger_ui` (`pip install flask_swagger_ui` or `conda install flask_swagger_ui`). Then we add these lines at the beginning of `app2.py`:

```python
from flask_swagger_ui import get_swaggerui_blueprint
...
swaggerui_blueprint = get_swaggerui_blueprint(
    base_url='/docs',
    api_url='/static/openapi.yaml',
)
app.register_blueprint(swaggerui_blueprint)
```

Two remarks:
* The files in the `static` folder are automatically served by Flask. This is why we put our `openapi.yaml` file there.

* It adds a new `/docs` route that will display the Swagger UI. Note that it is not protected with authentication (which is probably desirable, since we want the documentation to remain open).

Now, let's try! Go to http://localhost:8080/docs and play with the UI!
<br/><br/>

We recommend you take some time to read and understand the `openapi.yaml` file.

Note that:
* The `info` block contains general information about the API (name, descripton, etc.)

* In the `paths` block, each route is described by its own sub-block, with its expected inputs and outputs, but also the possible errors.

* The `components` block contains objects (here `Movie` and `Person`) that can be reused multiple times (e.g. with `$ref: '#/components/schemas/Movie'`). It avoids duplicating the description of the objects.

You can copy-paste the contents of `openapi.yaml` into https://editor.swagger.io, which is a very nice tool to facilitate the creation and edition of Open API specification files (which is otherwise quite tedious...). It will also tell you where the mistake is if you make any (which is very handy!)

## 9. Calling the API with Requests

Now, if we put ourselves in the shoes of an external user that would like to call our API from their Program, they can for instance write the following code in Python.

But the whole point of making a REST API is that it is language-agnostic. It means that the client can use any language they want (not necessarily the one _you_ used to develop it) to make the query. So it makes your data available to anyone :)

In [14]:
import requests
from requests.auth import HTTPBasicAuth

response = requests.get("http://localhost:8080/movies?page=50&page_size=3",
                        auth=HTTPBasicAuth(username="ironhack", password="ilovedata"))
response_json = response.json()
response_json['movies']

[{'bechdelScore': 3,
  'englishTitle': 'Min and Bill',
  'isAdult': 1,
  'movieId': 21148,
  'movieType': 'movie',
  'originalTitle': 'Min and Bill',
  'runtimeMinutes': '69',
  'year': 1930},
 {'bechdelScore': 1,
  'englishTitle': 'Morocco',
  'isAdult': 1,
  'movieId': 21156,
  'movieType': 'movie',
  'originalTitle': 'Morocco',
  'runtimeMinutes': '92',
  'year': 1930},
 {'bechdelScore': 3,
  'englishTitle': 'Murder!',
  'isAdult': 1,
  'movieId': 21165,
  'movieType': 'movie',
  'originalTitle': 'Murder!',
  'runtimeMinutes': '104',
  'year': 1930}]

## 9. Final considerations

### 9.1. Possible improvements
This is of course a simplistic API and a lot could be improved.

We leave it as en exercice for you to practice :)

#### 9.1.1. Other routes
It would be nice to two other routes to retrieve people, the same way we did it for movies: `/people` and `/people/<person_id>`.

#### 9.1.2. Filtering
A nice feature to have would be to allow filtering in `/movies` (resp. `/people`) route. Instead of returning all movies (resp. people), the user could choose to filter on one oar multiple fields.

For instance, a call could look like:
```
http://localhost:8080/movies?originalTitle=anatomie&year=2023
```

Of course, it remains to be defined how the filtering exactly bevaves. We could for instance use:
* Exact match for integer fields like year,
* And "contains" for string fields like originalTitle (in SQL it would be: `originalTitle LIKE %anatomie%`).

#### 9.1.3. Read-write
Our API is read-only. But we may want to allow some people to add or update data.

To do so, we would create new routes with different _methods_. All the routes we created so far use the `GET` method. But there are other ones:
* `POST` and `PUT` are used to create or update an object.
* `DELETE` is used to delete an object.

HTTP methods can be seen as "words" that we put before the URL. In other words, when you type `http://localhost:8080/movies/17009710` in your browser, under the hood, it makes this HTTP query:
```
GET http://localhost:8080/movies/17009710
```

So the same route name `/movies/<movie_id>` can be reused with another method, e.g. to update the movie or create a new one if not exists:
```
POST http://localhost:8080/movies/17009710
```

In Flask, it would look like:
```python
@app.route("/movies/<int:movie_id>", methods=["POST"])
def post_movie(movie_id):
```

### 9.2. Real-life production setup
Before actually deploying it in production, there would be some mandatory steps to follow.

We mention the most important ones here, even if they are beyond the scope of this course.

#### 9.2.1. Public binding

By default, our API is "bound" to your local IP address (`127.0.0.1` aka `localhost`)", which means it is only accessible from our computer.

If we deploy it in production, we must "open" is to the outside world, i.e. "bind" it to the public IP address of your computer (or whatever server you deploy it to).

Fortunately, there is a generic IP address that stands for your public one: `0.0.0.0`. So you just need to run:
```bash
flask --app app2 run --host 0.0.0.0 --port 8080
```

#### 9.2.2. Real backend (WSGI)
You may have noticed a warning from Flask:
> WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.

Indeed, the performance of the server we run with the `flask` command are quite poor and it is not reliable. It is optimized for debugging.

In real life, we would use another server (in Python they are called WGSI). A popular one is **Gunigorn**, which we highly recommend. More about it: https://docs.gunicorn.org

#### 9.2.3. HTTPS

You may also have noticed that the API is in HTTP, not HTTPS. It means the connection between the server and the client is not encrypted, which is a *very bad practice*.

Before putting the API in production, we must enable HTTPS.

This part would typically be handled by Gunicorn (but first, you would need to generate a certificate). 

#### 9.2.4. Authentication
The _basic authentication_ we used (login and password) is not very secure. Indeed, everybody is using the same login and password.

So the first mandatory step would be to create a separate login and password for each user. And then even switch to another type of authentication
that is more secure, such as [OAuth2](https://en.wikipedia.org/wiki/OAuth).

#### 9.2.5. API versioning
What happens if you deploy your API and then decide to change a route (rename, change parameters, etc.)? Well, the code written by your client who calls your API brakes.

To avoid this, it is important to version your API.

What it simply means, is that instead of `http://localhost:8080/movies/17009710` you have something like `http://localhost:8080/v1/movies/17009710`.

This way, you can have different versions in production at the same time. This gives some time to your clients to update their code. And eventually, you can shutdown the old versions, after having properly warned them.