# Networking, HTTP, web services

## ISO/OSI model

![](img/iso.png)

## TCP/IP model in relation to ISO

![](img/tcp_ip.png)

As application developers, we are interested in:

 - Transport protocols (TCP or UPD) - implemented by OPERATING SYSTEM libraries and kernel
 - Application protocols (like HTTP or HTTPs) - implemented by our APPLICATION

## TCP and UDP

  - both are associated with IP address and port number
  - UDP messages (also called datagrams) are "fire and forget". delivery of messages are not guaranteed
  - TCP controls order of messages and deliverability (error checking)
    
Cases for using UDP:

  - Gaming network code
  - Telemetry data collection from thin (IoT) devices
  - Gathering image frames from monitoring cameras
  - Other cases when deliverability / order of messages is not critical, but performance is

![](img/tcp_udp_protocol.png)

## Addressing

IPv4 addresses:

  - dotted decimal notation - denotes a single address  
    * 192.168.1.1       
    * 127.0.0.1       
    * 10.10.1.55
       
  - prefix notation - denotes a group of addresses (subnetwork)
    * 192.147.0.0/24


IPv6 addresses:

  - hexadecimal notation
    * 2001:db8:85a3:8d3:1319:8a2e:370:7348
    
  - prefix notation
    * 2001:db8:1234::/48


Port number:

  - 16-bit unsigned number (0-65535).

Host names, like "google.com" are NOT IP addresses. They are resolved by an application-level protocol DNS.


Connection are ALWAYS made to ip ADDRESS (and port).


### Network interfaces

- a piece of hardware (can be virtual or emulated) that provides network communication 
- network address (including IP addresses) belong to that interface
- single interface can have more than 1 addresses
- an address will always have a network interface where it belongs to.
- single machine can have multiple interfaces


### LOOPBACK INTERFACE (LOCALHOST)

- 127.0.0.1
- ::1
- localhost (can have more aliases, like localhost.localdomain)

a special address (and hostname) referencing to current machine.
Important that it's a separate _interface_.

One common error related to that is when running containers or virtual machines on your local PC, they cannot connect to the host machine by specifying "localhost" or "127.0.0.1" - that address will reference themselves instead of host.

### PORT NUMBERS

- 0 - 65535
- under 1024 - reserved ("well-known" or system ports) - do not use them for your application
- higher than 1024 - some ports are "registered" in IANA.



## SOCKETS

Abstraction of a "data tunnel" between network endpoints.

- server "listens" for accepting connections, client "connects" to remote address and port
- after communication is established, both sides can read from socket and send data to it.
- communication is bidirectional

![](img/sockets.png)

Note that we did not specify what is actually sent and written. It's up to application and is denoted with _application-level protocol_.

more details:

- by default, data sent and received is "raw"
- reads can be "blocking" and "non-blocking"
- anothe important parameter for sockets is "timeout"

In [None]:
# in python:
# server:
import socket

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(("0.0.0.0", 9000))
s.listen(1)
        
while True:
    conn, addr = s.accept()
    print("Received connection from", addr)
    conn.send(f"Hello {addr}\n".encode())
    conn.close()
    


In [11]:
# client:
import socket
s = socket.socket()
s.connect(("127.0.0.1", 9000))
print(s.recv(1024))
s.close()

b"Hello ('127.0.0.1', 52548)\n"


To be able to serve multiple connections at once, in python we can

 - .accept() connections in loop
 - open a new thread with connection handler for processing

But usually is better to use higher-level frameworks, as they tend to be more optimized. Operating systems have more mechanisms of controlling and checking communication state.

- whether new connections arrived
- whether new data appeared available to read, etc.

Usually as python programmers we dont need to go that deep. Raw socket programming with python is rare.

### Possible problems with sockets and troubleshooting

Debugging and troubleshooting tools:

  - `ping`
  - `traceroute`  
  - `netstat -p`
  - `lsof -i` or `lsof -i -n`
  - `telnet` command
  - netcat (`nc`) command
  

possible problems:

  - sockets in TIME_WAIT state - usually because of connection drops because of server exceptions
  - number of open files / socket exceeded: check `ulimit -a`. default limit of open files is 1024.
  - "blocking" connection - check socket timeouts. Very widespread problem. default is NO TIMEOUT.
  - small packets not arriving at once (for 1-byte packets for example): use TCP_NODELAY flag

## NAME RESOLUTION (DNS)

from hostnames to IP addresses.

- socket.gethostbyname()
- socket.getaddrinfo()

![](img/dns.png)




### Important tools for name resolution troubleshooting

  - ping
  - nslookup (can point to specific name resolution server)
  - hosts file (/etc/hosts, c:\Windows\System32\drivers\etc\hosts)
   


### Modern DNS

Due to privacy concerns a number of tools and standards are emerging related to name resolution

 - DNS over TLS (DoT)
 - DNS over HTTPS (DoH)
 - DNSCrypt (prevents forgery, but still visible)


# HTTP

Application-level protocol for serving hypertext content.


## Time for demo!



## Basic request components

- outside of request itself
   * server network address (host IP address and port)
   * schema (https / http)
   
- inside http request:
   * method (GET, POST, DELETE, OPTIONS, etc)
   * request path
   * request query string
   * request headers
   * request body
   
- inside http response:
   * status code
   * status message
   * response headers
   * response body
   


## Cookies

Responses sometimes contain header `set-cookie`. This information is stored in browser and later reused for subsequent request to the same website (or its part).

This is the main identification mechanism that is implemented in internet.

In [None]:
# in python: 

import requests

result = requests.get('https://google.com').text

## content-types and encodings

for responses:

- text/plain
- text/html
- binary/octet-stream
- application/json
- image/jpeg
- many others

for requests:

- multipart/form-data
- application/x-www-form-urlencoded
- application/json


# SSL (TLS) and HTTPs

- HTTPS is a transport layer wrapper ON TOP of HTTP
- TLS (SSL) can be generally used not only for HTTP, but for any other socket-based communication.
- Using PKI concepts and infrastructure


Steps:
 - Server acquires _certificate_ from CERTIFICATE AUTHORITY (CA)
 - Server certificate is _*SERVER PUBLIC KEY SIGNED BY CA*_ with additional information (CN = server name)
 - Client already has ALL CA certificates (ROOT CAs). (provided by OS or separate package. In python: certifi).
 - When connection is established, server supplies its certificate. Client checks that server name matches to certificate's CN and verifies that server certificate is valid.
 - Client and server negotiate a session-level encryption protocol, generate symmetric session encryption key
 - All further communication between server and client is made by using negotiated session key



### HTTPS in python:

Most frameworks does NOT support HTTPs directly (and is actually discouraged to use certificate with applications directly).

Usually web frameworks will receive unwrapped HTTP requests.


Process of unwrapping SSL to underlying protocol is called SSL termination.

SSL is usually performed at generic webserver (nginx) or load balancing level (gunicorn, haproxy, or container orchestration framework). Reasons: multiple HOSTs at singe webserver, load balancing, centralized webserver log collection, DDOS prevention, etc.

### Typical webservice stack for python webserver frameworks

- load balancer (haproxy)
- generic webserver (NGINX, apache, lighthttpd). Usually it also server static files.
- python fastcgi / http server (uwsgi, gunicorn) that preforks python application workers
- python web application processes


### CGI, FASTCGI, WSGI

Represent an evolution of web servers

 - Static pages 
 - Dynamic pages ( CGI )
 - mod_python - embeds python into webserver code to run python applications (almost the same as CGI)
 - Dedicated processes serving dynamic content on-demand (fastcgi)
 - WSGI - python-specific interface standard similar to fastcgi ( PEP 3333 )
 - Modern async frameworks handle HTTP requests themselves 

EXAMPLE OF CGI environment variables (from CGI wiki page). Some are common and defined by OS, some are set by server

```
COMSPEC="C:\Windows\system32\cmd.exe"
DOCUMENT_ROOT="C:/Program Files (x86)/Apache Software Foundation/Apache2.4/htdocs"
GATEWAY_INTERFACE="CGI/1.1"
HOME="/home/SYSTEM"
HTTP_ACCEPT="text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
HTTP_ACCEPT_CHARSET="ISO-8859-1,utf-8;q=0.7,*;q=0.7"
HTTP_ACCEPT_ENCODING="gzip, deflate, br"
HTTP_ACCEPT_LANGUAGE="en-us,en;q=0.5"
HTTP_CONNECTION="keep-alive"
HTTP_HOST="example.com"
HTTP_USER_AGENT="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:67.0) Gecko/20100101 Firefox/67.0"
PATH="/home/SYSTEM/bin:/bin:/cygdrive/c/progra~2/php:/cygdrive/c/windows/system32:..."
PATHEXT=".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC"
PATH_INFO="/foo/bar"
PATH_TRANSLATED="C:\Program Files (x86)\Apache Software Foundation\Apache2.4\htdocs\foo\bar"
QUERY_STRING="var1=value1&var2=with%20percent%20encoding"
REMOTE_ADDR="127.0.0.1"
REMOTE_PORT="63555"
REQUEST_METHOD="GET"
REQUEST_URI="/cgi-bin/printenv.pl/foo/bar?var1=value1&var2=with%20percent%20encoding"
SCRIPT_FILENAME="C:/Program Files (x86)/Apache Software Foundation/Apache2.4/cgi-bin/printenv.pl"
SCRIPT_NAME="/cgi-bin/printenv.pl"
SERVER_ADDR="127.0.0.1"
SERVER_ADMIN="(server admin's email address)"
SERVER_NAME="127.0.0.1"
SERVER_PORT="80"
SERVER_PROTOCOL="HTTP/1.1"
SERVER_SIGNATURE=""
SERVER_SOFTWARE="Apache/2.4.39 (Win32) PHP/7.3.7"
SYSTEMROOT="C:\Windows"
TERM="cygwin"
WINDIR="C:\Windows"
```

In [None]:
# WSGI application in python is just a callable with 2 positional parameters.


def web_app(env, start_response):
  print(env)
  status = '200 OK'
  response_headers = [('Content-type', 'text/plain')]
  start_response(status, response_headers)
  return [b"Welcome to the machine\n"]


class WebApp:
    def __init__(self, env, start_response):
        print(env)
        self.env = env 
        self.callback = start_response

    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.callback(status, response_headers)
        return iter([b"Have a cigar\n"])

        
        
        

WSGI environement variables:

```
{ 
'wsgi.errors': <gunicorn.http.wsgi.WSGIErrorsWrapper object at 0x7f2734f35a60>, 
'wsgi.version': (1, 0), 
'wsgi.multithread': False, 
'wsgi.multiprocess': False, 
'wsgi.run_once': False, 
'wsgi.file_wrapper': <class 'gunicorn.http.wsgi.FileWrapper'>, 
'wsgi.input_terminated': True, 
'SERVER_SOFTWARE': 'gunicorn/20.0.4', 
'wsgi.input': <gunicorn.http.body.Body object at 0x7f2734f35f70>, 
'gunicorn.socket': 
    <socket.socket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 8000), raddr=('127.0.0.1', 55766)>, 
'REQUEST_METHOD': 'GET', 
'QUERY_STRING': '', 
'RAW_URI': '/test', 
'SERVER_PROTOCOL': 'HTTP/1.1', 
'HTTP_HOST': '127.0.0.1:8000', 
'HTTP_USER_AGENT': 'curl/7.68.0', 
'HTTP_ACCEPT': '*/*', 
'wsgi.url_scheme': 'http', 
'REMOTE_ADDR': '127.0.0.1', 
'REMOTE_PORT': '55766', 
'SERVER_NAME': '127.0.0.1', 
'SERVER_PORT': '8000', 
'PATH_INFO': '/test', 
'SCRIPT_NAME': ''
}
```



Note that all this data is pretty basic, you will still need to parse query string yourself, and even request headers are included _inside_ request body. 

# WEBSERVER FRAMEWORKS

- Prepare request data for consumption
- Organize your request handling code in structured way
- Supply additional batteries to shorten and reuse common request-related tasks

popular frameworks:

  - synchronous (django, pyramid, flask, bottle, falcon, etc.)
  - asynchronous (aiohttp, fastapi, sanic, tornado, etc) 

And MANY MANY different standalone libraries that supply specific pluggable functionality for each specific task.

Examples: marshmallow, sqlalchemy, itsdangerous, deform, jinja, genshi, many others.
 

## Anatomy of web framework

- request parser
- routing
- template engine
- modularity and code organization
- data validation and XSS attack prevention
- session handling
- configuration management
- built-in ORM



ORM + request handler + template engine combination implements paradigm of MVC (model-view-controller). 

MVC frameworks usually aim to be a generic solution for classic single-server web applications.
At the other end of spectrum we have microframeworks as bottle and frameworks tailored for specific tasks (like microservices: vivid example would be fastapi that is specifically designed to be a API framework, or DRF, that is tailored to be REST services provider).


In [None]:
from bottle import route, request, default_app

@route('/')
def index():
    name = request.query.get('name', 'anonymous')
    return f'Hello, {name}!'  # do not do things like that in production, XSS! 
                              # Use templating engine with safeguards

app = default_app()   # you can create application object explicitly and add routes on top of it

## Useful tools

- CURL (the best tool for web request analysis)
- python and requests library
- web browser in developer mode.
- various http bins (be wary about passwords and sensitive data though!, better to self-host them)
- load testing: ab (Apache bench), gobench, locust, yandex-tank



# THE END