## Extra topics: Web


fisa (@fisadev, fisadev@gmail.com)

https://github.com/fisadev/python-basic-course

# Class 4 overview

Topic: web

- The web vs the rest
- HTTP, details on requests/responses
- Body contents: HTML, CSS, JS, JSON, ...
- Server side vs client side (rendering, apps, etc)?
- Browser dev tools
- Stateless, cookies
- Web APIs, JSON
- HTTPS
- HTTP 2.0
- Usual infrastructure
- Example with a very simple microframework

## Te web vs the rest

- Client + server
- Request + response
- No longer everything in the same process
- Connection dies after each cycle, server can restart, multiple servers, etc. Forget about state
- Mix of different languages: server logic, presentation logic, content, visual style
- You cannot trust incomming requests anymore

![](./request_response.svg)


Stuff like this? forget about them:

(pseudo code, just to get the idea)

In [4]:
class ClientEditorWindow:
    def __init__(self):
        self.add_button(some_position, some_text, color, alignment)
    
    def on_pick_client_click(self):
        self.current_client = Clients.objects.get(id=client_id_picker.value)

    def on_save_click(self):
        if all_fields_filled:
            self.current_client.name = textbox_name.text
            self.current_client.birth = date_picker_birth.date
            self.current_client.save()
        else:
            message("Missing fields!")

- Accessing window widgets from the server logic? nope
- Keeping state between actions? nope (or not that simple)
- Retrying by just "doing nothing"? not actually
- One single languate for logic+ui+style??? Pfffff...

Instead, your code will look more like this:

(still pseudo code, just to get the idea)

website.py:

```python
def view_client(request, client_id):
    client = Client.objects.get(id=client_id)
    return response("client_page.html", replacing_with_data_from=client)

def save_client(request, client_id, sent_data):
    client = Client.objects.get(id=client_id)  # yes! again!
    client.name = sent_data.name
    client.birth = convert_to_date(sent_data.birth)
    client.save()
    return response("client_page.html", replacing_with_data_from=client)  # again!
```

plus the content to display! (still pseudo code)

client_page.html:

```html
<h3>client details: {{name of client here}}</h3>
<form action="url that points to save_client() code">
    <.... widgets to display name, birth, with the client data ...>
    <... save button ...>
</form>
```

plus logic for the browser! (still pseudo code)

client_page.js:

```javascript
function on_save_click() {
    if (all data filled) {
        send_request_with_form_data()
    } else {
        alert("missing fields!!");
    }
}
```

plus style! (still pseudo code)

client_page.css:

```css
#client_form save_button {
    color: #FF0000;
    vertical-align: center;
}
```

## HTTP

- It's the protocol used in the connections
- Runs ussually over TCP/IP 
- Specifies the format for request and responses
- Stateless: each request/response cycle is independent

## HTTP steps

### If you are the client:

1. Stablish a TCP/IP connection with a machine and a port
2. Send text over the connection, in the format of a request
3. Wait and read text over the connection, parse it with the response format
4. Close the connection

### If you are the server:

1. Keep a program running, listening to a port in your machine (server)
2. When someone connects through that port via TCP/IP, read text over the connection and parse it with the request format
3. Do whatever you need to solve the request, then send text over the connection, in the format of a response
4. Close the connection

## Sample request

```
GET /teoria/ HTTP/1.1
Host:www.loiprocesos.com
Accept-encoding:utf-8
```

Parts:

- **Request line**: Verb (GET/POST/DELETE/...), resource (url) of the thing you want to interact with, and protocol version (in this case: "get the /teoria/ page using http v1.1")
- **Headers**: metadata about what you want and how you want it, who you are, etc.
- **Body**: stuff being sent, like files, form data, etc

Note: the url **doesn't** include the server or the port. That's **not** part of HTTP. The Host header is to inform the server which site you want to use, in case there are many listening to the same port.

## Sample response

```
HTTP/1.1 200 OK
Server: gunicorn/19.9.0
Date: Thu, 27 Dec 2018 04:12:50 GMT
Content-Type: text/html; charset=utf-8

27F6
<html>
    <h3>Procesos industriales: Teoría</h3>
    ... the page ...
</html>
```

Parts:

- **Response line**: http version plus "type" of result (code and description). 200=ok, 404=not found, 503=server error, etc.
- Headers: metadata of what you are getting as a response (data type, encoding, date, etc)
- Body: the actual stuff you requested (in this case, the web page)

## Body contents

- Technically anything. Just bytes, whatever format you want
- Usually in websites? HTML, javascript, CSS, images, videos, and any user downloadable file


All in the same response?

**No**

Websites usually return an HTML, which includes stuff that says "here goes an image, which is at /another/url". Then the browser does another request to get the image. And so on, for all the javascript, css, images, etc.

Usually to see a webpage, the browser ends up doing a lot of requests.

## HTML

- Defines the contents to be shown
- Usually generated on the server
- But the javascript running in the browser can generate HTML on the fly too (modifying the page)
- It's a subtype of XML, though not quite strict
- Sadly, not quite strict


```HTML
<html>
    <head>
        <title>Client: Fisa</title>
        <link type="text/css" rel="stylesheet" href="/static/css/client_details.css"/>          
        <script src="/static/vendor/js/client_details.js"></script>
    </head>
    
    <body>
        <h3>Client: Fisa</h3>
        <img src="/images/clients/fisa.jpg" />
        
        <form method="POST">
            <p>Name: <input class="client_form" maxlength="254" name="client_name" type="text" /></p>
            <p>Birth: <input class="client_form" name="client_birth" type="datetime" /></p>
            <button id="save_button" type="submit">Save!</button>                                    
        </form>        
        
    </body>
</html>
```

- `<head>` includes headers, metadata about the page and extra files required (javascript, css)
- `<body>` is the actual page the user sees. Visible tags include titles, paragraphs, images, forms, inputs, buttons, and much more.
- tags can define ids (should be unique!!!) and classes (to be shared)

Does not say **how** it's shown. Style is for css.

## CSS

- Defines the looks of the visual stuff defined in the HTML
- Uses selectors to identify stuff
- Has strange logic in many areas, css isn't easy to master

```css
/* any h3 titles, and any inputs inside p inside forms: */
h3, form p input {
    font-size: 12px;
    color: blue;
}

/* any input with class="client_form": */
input.client_form {
    text-align: center;
}

/* that specific thing with id="save_button":*/
#save_button {
    margin: 100px;
}

```

## Javascript

- Executes logic **in the browser**
- Can interact with the HTML and CSS, and modify it on the fly!
- Can do requests in background, to get or send data without the webpage refreshing

```javascript
// this is super pseudo code, but with valid syntax

function emails_data_is_ready(data) {
    var emails_list = get_html_part("#emails_list");
    
    emails_list.html = "";
    for (email in data.emails) {
        emails_list.html.append("<p>" + email.title + "</p>");
    }
}

function refresh_emails_list() {
    do_a_request("/get_emails_list/").when_ready_call(emails_data_is_ready);    
}

setInterval(refresh_emails_list, 5000);

```

- Javascript is quite inconsistent (WAT examples)
- Weakly typed, and always tries to guess and give you something if you do anything wrong (errors pass silently!!)
- But it's the only thing to program in the browser!
- Luckily, improving (even copying stuff from python)
- Lots of new frameworks, every week or so
- Some people are even using it in the server side

## Client side vs server side

There's a lot of stuff that can go both in one side or the other. Do I get emails as data and crete HTML on the fly? or do I create the emails list HTML on the server and just refresh the page? Depends on a lot of factors:

- Language expertise. Do you have frontend developers?
- Are you doing simple-ish more or less static pages? or a complex web app with a lot of interactivity? (news.site.com vs gmail.com)
- ...

Sometimes, even repeated in both places (like validating input data. Remember: you can't trust the requests).

## Browser dev tools

(F12 in Chrome)

- Inspect the HTML, CSS, etc
- Debug javascript interactively
- Javascript console! try stuff
- Edit the HTML, CSS to test stuff
- Record and analyze requests done, times, and more
- See logging (client side logging)

## Stateless problems, cookie solutions

- Every request/response is independent
- No state is stored in HTTP or the connections...
- ... how do I know that this request to "see email id=1000", is the same user that did the request "login as fisa"?
- Cookies: like the disco wristband

## Stateless problems, cookie solutions

- Every request/response is independent
- No state is stored in HTTP or the connections...
- ... how do I know that this request to "see email id=1000", is the same user that did the request "login as fisa"?


## "Solutions" that don't work:

- IP? No, they change, and multiple devices under the same public ip.
- URLs rewriting? Problems with sharing, bookmarks, etc.

## The usual solution: cookies


1. You receive a normal request to visit some url from your site
2. You create a "session" in your database, with an id
3. You return the response, with an extra header that says "hey, save this id in a little file. And every future request you do to me, include the value of that little file in the headers".
4. Every time the browser sends you a new request, the id will come in the headers. If it's in the database, you know which session it is from.

In your db, you can store any useful info related to that session. For example, you can save "he logged in with user=fisa", and so you know that session is mine.

This can be used mainly for login/logout features. But also for any other kind of session (like having a shopping cart even when not logged in, and remembering the stuff it had).

![](./cookies-0.svg)


![](./cookies-1.svg)


![](./cookies-2.svg)


![](./cookies-3.svg)


![](./cookies-4.svg)


## Issues with cookies


- No portability of session
- Privacy abuses (ads, facebook, etc, when integrating services from different servers)

## Web APIs

- The broswer isn't the only client
- If the browser is the client, still, HTML isn't the only answer

Web APIs allow **structured data** to be get or sent via HTTP request/responses.

Usually the data is in JSON format.

Examples using [httpie](https://httpie.org/) in the shell (or just navigating the urls in the browser)

```
http get https://api.github.com/users/fisadev
http get https://api.github.com/repos/django/django
```

Or just using requests (`pip install requests`):

In [7]:
import requests

response = requests.get("https://api.github.com/users/fisadev")
data = response.json()

In [11]:
data['name'], data['public_repos']

('Juan Pedro Fisanotti', 78)

## Web APIs

- You can use them to interact with other sites (get and post data)
- You can use them to interact with your own site (from javascript in the browser, from a mobile app, etc)

## HTTPS

- HTTP is plain text. Anyone in the middle can see the data you are posting and getting (emails, photos, passwords, etc)!!!!
- HTTPS == HTTP, but over an encrypted connection. Problem solved :)
- Needs signed certificates in the server. In the past, that was super expensive and problematic. But now, use [LetsEncrypt](https://letsencrypt.org/)! Free, open, and by far the best tools in the market.

## HTTP 2.0

- HTTP opens and closes the TCP connection for **every single** request/response.
- That's slow! And even more if using HTTPS!
- HTTP 2.0 == Open connection, do multiple request/response cycles, and **then** close the connection.
- Faster
- It does not solve the state issue. No "long lived" connections. Just to get multiple stuff at once.
- State is still solved with cookies.

## Usual infrastructure

- Your program (website) knows how to generate responses from requests
- But it does not know how to do HTTPS
- Or how to enqueue requests if there are many
- Or do network security
- Or scaling
- Etc...

So, enter the "public" server (reverse proxy, front end server, etc):

- Many instances of your program running, waiting for requests but **hidden** in your network, not public
- One public server using Nginx or something like that, listens from requests, enqueues them, talks to your program instances to get the responses.
- That server does the "public talking", https, network security, scaling, etc.
- You can add/remove/restart instances of your program as you need. They are "cattle".

![](./usual-infra.svg)


## Very simple example using Flask

To the editor!