# Tutorial

## Background

`proxy.py` was released on 20th August, 2013 as a single file HTTP proxy server implementation with no external dependencies.  See the [first commit](https://github.com/abhinavsingh/proxy.py/commit/75044a72d9c7b4b8910ba551006b801eafdf3c47) and [read introductory blog](https://abhinavsingh.com/proxy-py-a-lightweight-single-file-http-proxy-server-in-python/) to get an insight about why `proxy.py` was created.

## Introduction

Today, `proxy.py` has matured into a full blown networking library with focus on being lightweight, ability to deliver maximum performance while being extendible.  Unlike other Python servers, `proxy.py` doesn't need a `WSGI` or `UWSI` frontend, which then usually has to be placed behind a reverse proxy e.g. `Nginx` or `Apache`.  Of-course, `proxy.py` can be placed directly behind a load-balancer _(optionally capable of speaking HA proxy protocol)_.

## The Concept Of Work

`proxy.py` core is written with a high level concept of `work`.

- A running instance can receive `work` from one or multiple `sources`
  - Example, when `proxy.py` starts, an accepted client connection is a `work` coming from TCP socket `sources`
- Handlers can be written to process various types of `work`
  - Example, `HttpProtocolHandler` handles HTTP client connections `work`
- A client connection can come from a variety of `sources`
  - TCP sockets
  - UDP sockets
  - Unix sockets
  - Raw sockets

Infact, `work` can be any processing unit.  It doesn't have to be a client connection.  Example:

- A file on disk can act as the `source` and each line in that file as the `work` definition
- Imagine tailing a file on disk as `source` and processing each line as a separate `work` object
- If you want, each line in the file can also be a URL to be scrapped or download
- If you want, your `work` handlers can append new URLs _(discovered by scrapping previous URL entries)_ back in the file, creating an infinite feedback loop between the `work` processing core.

And just like that we have created a web scraper!!!

To extend this generic concept, now imagine a distributed queue as the `source` of our `work`, where each published message in the queue is our `work` payload.  Some examples of such `sources` can be:
- A `Redis` channel
- Google Cloud PubSub channel
- Kafka queues

And just like that we have created a distributed `work` executor!!!

## HttpParser

`HttpParser` class is at the heart of everything related to HTTP.  It is used by Web server and Proxy server core and their plugin eco-system.  As the name suggests, it is capable of parsing both HTTP request and response packets.  It can also parse HTTP look-a-like protocols like ICAP, SIP etc.  Most importantly, remember that `HttpParser` was originally written to handle HTTP packets arriving in the context of a proxy server and till date its default behavior favors the same flavor.

> Let's start by parsing a HTTP web request using `HttpParser`

In [9]:
from proxy.http.methods import httpMethods
from proxy.http.parser import HttpParser, httpParserTypes, httpParserStates
from proxy.common.constants import HTTP_1_1

request = HttpParser(httpParserTypes.REQUEST_PARSER)
request.parse(b'GET / HTTP/1.1\r\nHost: localhost\r\n\r\n')

assert request.state == httpParserStates.COMPLETE
assert request.method == httpMethods.GET
assert request.version == HTTP_1_1
assert request.host == None
assert request.port == 80
assert request._url != None
assert request._url.remainder == b'/'
assert request.has_header(b'host')
assert request.header(b'host') == b'localhost'
assert len(request.headers) == 1

print(request.build())

b'GET / HTTP/1.1\r\nHost: localhost\r\n\r\n'


> Next, let's parse a HTTP proxy request using `HttpParser`

In [12]:
request = HttpParser(httpParserTypes.REQUEST_PARSER)
request.parse(b'GET http://httpbin.org/get HTTP/1.1\r\nHost: httpbin.org\r\n\r\n')

assert request.state == httpParserStates.COMPLETE
assert request.method == httpMethods.GET
assert request.version == HTTP_1_1
assert request.host == b'httpbin.org'
assert request.port == 80
assert request._url != None
assert request._url.remainder == b'/get'
assert request.has_header(b'host')
assert request.header(b'host') == b'httpbin.org'
assert len(request.headers) == 1

print(request.build())
print(request.build(for_proxy=True))

b'GET /get HTTP/1.1\r\nHost: httpbin.org\r\n\r\n'
b'GET http://httpbin.org:80/get HTTP/1.1\r\nHost: httpbin.org\r\n\r\n'


Notice how `request.build()` and `request.build(for_proxy=True)` behave for a HTTP proxy request.  Also, notice how `request.host` field was populated for HTTP proxy request but not for the prior HTTP web request example.

> To conclude, let's parse a HTTPS proxy request

In [13]:
request = HttpParser(httpParserTypes.REQUEST_PARSER)
request.parse(b'CONNECT httpbin.org:443 HTTP/1.1\r\nHost: httpbin.org:443\r\n\r\n')

assert request.state == httpParserStates.COMPLETE
assert request.method == httpMethods.CONNECT
assert request.version == HTTP_1_1
assert request.host == b'httpbin.org'
assert request.port == 443
assert request._url != None
assert request._url.remainder == None
assert request.has_header(b'host')
assert request.header(b'host') == b'httpbin.org:443'
assert len(request.headers) == 1

print(request.build())
print(request.build(for_proxy=True))

b'CONNECT / HTTP/1.1\r\nHost: httpbin.org:443\r\n\r\n'
b'CONNECT httpbin.org:443 HTTP/1.1\r\nHost: httpbin.org:443\r\n\r\n'
