# Advanced Python
# Lambdas

Or sometimes called anonymous functions.

In Python an anonymous function is created with the `lambda` keyword.

Eventualy, it may or not be assigned a name.

## Syntax

Limitations:
* Lambda can only contain expressions and can’t include statements in its body
* Lambda is written as a single line of execution.
* Lambda does not support type annotations.
* It can be immediately invoked.

Possible arguments:
* Positional arguments
* Named arguments (keyword arguments)
* Variable list of arguments
* Variable list of keyword arguments
* Keyword-only arguments

In [2]:
lambda x, y: x + y

<function __main__.<lambda>(x, y)>

Lambdas can be used as an Immediately Invoked Function Expression.

> Note: Python is not Javascript so doing this is not encouraged :)

In [3]:
(lambda x, y: x + y)(2, 2)

4

A lambda function can be a higher-order function by taking a function (normal or lambda) as an argument.

In [4]:
calculator = lambda x, y, operation: print(operation(x, y))

calculator(2, 2, lambda a, b: a + b)
calculator(2, 2, lambda a, b: a - b)

4
0


Under the hood Python considers normal and lambda functions almost identical with only a naming difference.

We'll see later in bytecode.

Python style recommendations ([PEP-8](https://peps.python.org/pep-0008/#programming-recommendations)):
> Always use a `def` statement instead of an assignment statement that binds a lambda expression directly to an identifier. 

# Multiple return values and sequence unpacking

In Python you can return multiple coma separated values from a function. 

It will return those values as a `tuple`.

In [5]:
def multiple_returns():
    a = 2
    b = 3
    c = 4
    return a, b, c

result = multiple_returns()
type(result)

tuple

You can unpack those values into separate variables if you wish.

In [6]:
x, y, z = multiple_returns()
print(x)
print(y)
print(z)

2
3
4


You can use `*` to add "leftover" variables to a list.

In [7]:
x, *y = multiple_returns()
print(x)
print(y)

2
[3, 4]


You can use `_` to "skip" some variables

In [8]:
_, _, z = multiple_returns()
print(z)

4


# `with` statement

`with` functions similarly as `try-with-resources` in Java and ensures proper acquisition and release of resources.

Instead of explicitly closing resources, e.g., when reading files:

In [9]:
file = open('./hello.py', 'r')
try:
    print(file.readlines())
finally:
    file.close()

['print("hello, world!")']


You can use `with` for a cleaner code. It allows you to enter the file context and execute code within it, while making sure the file resources will be freed upon exiting the context.

In [10]:
with open('./hello.py', 'r') as file:
    print(file.readlines())

['print("hello, world!")']


As most things in Python data model, any object can work inside `with` statement using 2 dunder methods `__enter__()` and `__exit__()`:
* `__enter__()` initializes the resource you wish to use in the object. It should always return a descriptor of the acquired resource.
* `__exit__()` allows you to implement the release of acquired resources.

In [11]:
class FileReader():
    def __init__(self, file_name):
        self.file_name = file_name
    
    def __enter__(self):
        print("Entering filereader context")
        self.file = open(self.file_name, 'r')
        return self.file
    
    def __exit__(self, *args):
        print("Exiting filereader context")
        self.file.close()

In [12]:
with FileReader('./hello.py') as file:
    print(file.readlines())
print("Outside of file reader context")

Entering filereader context
['print("hello, world!")']
Exiting filereader context
Outside of file reader context


`__enter__()` and `__exit__()` interface is called `Context Manager`.

You can simplify creating custom context objects using [contextlib](https://docs.python.org/3/library/contextlib.html) standard library module.

In [13]:
from contextlib import contextmanager

class FileReaderContext():
    def __init__(self, file_name):
        self.file_name = file_name
    
    @contextmanager
    def read_file(self):
        try:
            print("Entering filereader context")
            file = open(self.file_name, 'r')
            yield file
        finally:
            print("Exiting filereader context")
            file.close()

In [14]:
reader = FileReaderContext('./hello.py')

with reader.read_file() as file:
    print(file.readlines())
print("Outside of file reader context")

Entering filereader context
['print("hello, world!")']
Exiting filereader context
Outside of file reader context


`read_file` is a generator function. When `read_file` is executed, it creates a resource descriptor and passes it to the caller using `yield`.

After the code inside the `with` block is executed the program control returns back to the `read_file` function.

The `read_file` function resumes its execution and executes the code following the `yield` statement, which releases the acquired resources.

# Useful tools for data testing tasks

## `datetime` module

`datetime` module has 6 *main* classes 

* `date`: Gregorian calendar date. Its attributes are `year`, `month` and `day`.
* `time`: time, independent of any particular day, assuming that every day has exactly `24*60*60` seconds. Its attributes are `hour`, `minute`, `second`, `microsecond`, and `tzinfo`.
* `datetime`: combination of date and time along with the attributes year, month, day, hour, minute, second, microsecond, and tzinfo.
* `timedelta`: duration expressing the difference between two date, time, or datetime instances to microsecond resolution.
* `tzinfo`: provides time zone information objects.
* `timezone`: class that implements the tzinfo abstract base class as a fixed offset from the UTC

### date

In [None]:
from datetime import date

simple_date = date(2023, 1, 25)

print(f"Date is {simple_date}")
print(f"Current date is {date.today()}")
print(f"Stringified date: {simple_date.isoformat()}")

Date is 2023-01-25
Current date is 2024-11-07
Stringified date: 2023-01-25


### time

In [17]:
from datetime import time

simple_time = time(13, 24, 56)

print(f"Time  is {simple_time}")
print(f"Stringified time: {simple_time.isoformat()}")

Time  is 13:24:56
Stringified time: 13:24:56


### datetime

In [19]:
from datetime import datetime

a = datetime(2023, 1, 26)
print(a)

a = datetime(2023, 1, 26, 23, 1, 26, 123456)
print(a)

print(a.timestamp())

print(f"Today is: {datetime.now()}")

print(datetime.now().isoformat())

2023-01-26 00:00:00
2023-01-26 23:01:26.123456
1674766886.123456
Today is: 2024-11-07 18:56:52.570816
2024-11-07T18:56:52.570841


### timedelta

In [20]:
from datetime import datetime, timedelta

time_now = datetime.now()

date_after_2yrs = time_now + timedelta(days=730)

date_before_2days = time_now - timedelta(days=2)

print(time_now)
print(date_after_2yrs)
print(date_before_2days)

2024-11-07 18:58:43.670716
2026-11-07 18:58:43.670716
2024-11-05 18:58:43.670716


### formatting date

`strftime()` method converts the given date, time or datetime object to the a string representation of the given format.

In [21]:
from datetime import datetime as dt

now = dt.now()
print(f"Without formatting: {now}")

s = now.strftime("%A %m %-Y")
print(f"Example 1: {s}")

s = now.strftime("%a %-m %y")
print(f"Example 2: {s}")

s = now.strftime("%-I %p %S")
print(f"Example 3: {s}")

s = now.strftime("%H:%M:%S")
print(f"Example 4: {s}")

Without formatting: 2024-11-07 18:59:57.094445
Example 1: Thursday 11 2024
Example 2: Thu 11 24
Example 3: 6 PM 57
Example 4: 18:59:57


### timezone

In [22]:
%%capture
%pip install pytz

In [25]:
from datetime import datetime
from pytz import timezone

format = "%Y-%m-%d %H:%M:%S %Z%z"

now_utc = datetime.now(timezone('UTC'))
print(now_utc.strftime(format))

timezones = ['Europe/Kiev', 'America/New_York']

for tzone in timezones:
    now_world = now_utc.astimezone(timezone(tzone))
    print(now_world.strftime(format))

2024-11-07 17:03:16 UTC+0000
2024-11-07 19:03:16 EET+0200
2024-11-07 12:03:16 EST-0500


## JSON processing

Python comes with a built-in package called [json](https://docs.python.org/3/library/json.html) for working with JSON data.

### Serialization
Python objects to JSON convertion:

| Python         | JSON       |
|----------------|------------|
| dict           | object     |
| list, tuple    | array      |
| str            | string     |
| int,float,long | number     |
| True,False     | true,false |
| None           | null       |

In [26]:
data = {
	"id": "0001",
	"type": "donut",
	"name": "Cake",
	"ppu": 0.55,
	"batters":
		{
			"batter":
				[
					{ "id": "1001", "type": "Regular" },
					{ "id": "1002", "type": "Chocolate" },
					{ "id": "1003", "type": "Blueberry" },
					{ "id": "1004", "type": "Devil's Food" }
				]
		},
	"topping":
		[
			{ "id": "5001", "type": "None" },
			{ "id": "5002", "type": "Glazed" },
			{ "id": "5005", "type": "Sugar" },
			{ "id": "5007", "type": "Powdered Sugar" },
			{ "id": "5006", "type": "Chocolate with Sprinkles" },
			{ "id": "5003", "type": "Chocolate" },
			{ "id": "5004", "type": "Maple" }
		]
}

#### Serialize to string

In [27]:
import json

json.dumps(data)

'{"id": "0001", "type": "donut", "name": "Cake", "ppu": 0.55, "batters": {"batter": [{"id": "1001", "type": "Regular"}, {"id": "1002", "type": "Chocolate"}, {"id": "1003", "type": "Blueberry"}, {"id": "1004", "type": "Devil\'s Food"}]}, "topping": [{"id": "5001", "type": "None"}, {"id": "5002", "type": "Glazed"}, {"id": "5005", "type": "Sugar"}, {"id": "5007", "type": "Powdered Sugar"}, {"id": "5006", "type": "Chocolate with Sprinkles"}, {"id": "5003", "type": "Chocolate"}, {"id": "5004", "type": "Maple"}]}'

#### Serialize to any file-type destinations

Can use `indent` keyword to pretty format.

In [28]:
with open("data.json", "w") as file:
    json.dump(data, file, indent=4)

### Deserialization
JSON to Python objects convertion:

| JSON         | Python     |
|--------------|------------|
| object       | dict       |
| array        | list       |
| string       | str        |
| number(int)  | int        |
| number(real) | float      |
| true,false   | True,False |
| null         | None       |

> Note: little difference between serialization & deserialization types. Meaning you may not get the "exact" object if you serialize it in one part of your application and desererialize in another.

In [29]:
important_data = (1,2,3)
encoded = json.dumps(important_data)
decoded = json.loads(encoded)

print(important_data == decoded)
print(type(important_data))
print(type(decoded))

False
<class 'tuple'>
<class 'list'>


### Custom types

#### Serialization

* encoder function
* encoder class

In [30]:
import datetime
import json

user = {
    "id": 1,
    "name": "Mykola",
    "createdAt": datetime.datetime.now()
}

try:
    json.dumps(user)
except TypeError as error:
    print(error)

Object of type datetime is not JSON serializable


In [31]:
def encode_datetime(dt):
    if isinstance(dt, (datetime.date, datetime.datetime)):
        return dt.isoformat()
    else:
        type_name = dt.__class__.__name__
        raise TypeError(f"Object of type '{type_name}' is not JSON serializable")

json.dumps(user, default=encode_datetime)

'{"id": 1, "name": "Mykola", "createdAt": "2024-11-07T19:10:36.795644"}'

In [32]:
class DateTimeEncoder(json.JSONEncoder):
    def default(self, dt):
        if isinstance(dt, (datetime.date, datetime.datetime)):
            return dt.isoformat()
        else:
            return super().default(dt)

json.dumps(user, cls=DateTimeEncoder)

'{"id": 1, "name": "Mykola", "createdAt": "2024-11-07T19:10:36.795644"}'

#### Deserialization

As in other languages - need to figure out if there is a custom type by some marker, for example field name.

In [33]:
import datetime

json_data = """{"id": 1, "name": "Mykola", "createdAt": "2024-11-07T19:10:36.795644"}"""

def decode_datetime(dct):
    if "createdAt" in dct:
        dct["createdAt"] = datetime.datetime.fromisoformat(dct["createdAt"])
    return dct

json.loads(json_data, object_hook=decode_datetime)

{'id': 1,
 'name': 'Mykola',
 'createdAt': datetime.datetime(2024, 11, 7, 19, 10, 36, 795644)}

## JSON Schema validation

A [JSON Schema](https://json-schema.org/) is a JSON document defining the schema of some JSON data.

It is a valid JSON document with key/value pairs. Each key has a special meaning and is used to define the schema of some JSON data

It's a good way to validate your data, especially if you receive similar objects from different sources.

There are two schema keywords, namely `$schema` and `$id`. `$schema` defines the “[draft](https://json-schema.org/specification-links.html)” that is used for the schema. If `$schema` is not specified, the latest draft will be used.

In [34]:
user_schema = {
    "type": "object",
    "properties": {
        "id": {"type": "number"},
        "name": {"type": "string"},
        "createdAt": {"type": "string"},
    },
    "required": ["id", "name", "createdAt"],
    "additionalProperties": False
}

Python has the [jsonschema](https://pypi.org/project/jsonschema/) library to validate a JSON *instance* against a schema. It is very easy to validate schema using its `validate` function.

In [1]:
%%capture
%pip install jsonschema

In [36]:
from jsonschema import validate, ValidationError
import json

json_from_datasource = """{"id": 1, "name": "Mykola", "createdAt": "2023-01-25T12:29:25.996855"}"""
json_data = json.loads(json_from_datasource)

validate(instance=json_data, schema=user_schema)
# No error, the JSON is valid.

In [37]:
try:
    validate(instance={"id": "1", "name": "Mykola", "createdAt": "2023-01-25T12:29:25.996855"}, schema=user_schema)
except ValidationError as error:
    print(error)

'1' is not of type 'number'

Failed validating 'type' in schema['properties']['id']:
    {'type': 'number'}

On instance['id']:
    '1'


In [38]:
try:
    validate(instance={"name": "Mykola"}, schema=user_schema)
except ValidationError as error:
    print(error)

'id' is a required property

Failed validating 'required' in schema:
    {'type': 'object',
     'properties': {'id': {'type': 'number'},
                    'name': {'type': 'string'},
                    'createdAt': {'type': 'string'}},
     'required': ['id', 'name', 'createdAt'],
     'additionalProperties': False}

On instance:
    {'name': 'Mykola'}


In [39]:
try:
    validate(instance={"id": 1, "name": "Mykola", "createdAt": "2023-01-25T12:29:25.996855", "job": "Engineer"}, schema=user_schema)
except ValidationError as error:
    print(error)

Additional properties are not allowed ('job' was unexpected)

Failed validating 'additionalProperties' in schema:
    {'type': 'object',
     'properties': {'id': {'type': 'number'},
                    'name': {'type': 'string'},
                    'createdAt': {'type': 'string'}},
     'required': ['id', 'name', 'createdAt'],
     'additionalProperties': False}

On instance:
    {'id': 1,
     'name': 'Mykola',
     'createdAt': '2023-01-25T12:29:25.996855',
     'job': 'Engineer'}


If you have a valid JSON schema and want to use it to validate many JSON documents, then it’s recommended to use a `Validator`.

You can also use it to validate the schema definition itself against a given draft spec.

In [40]:
from jsonschema import Draft202012Validator

Draft202012Validator.check_schema(user_schema)
# No output means the schema is valid, otherwise `SchemaError` will be raised.

In [41]:
draft_202012_validator = Draft202012Validator(user_schema)

draft_202012_validator.validate(json_data)
# No output, the JSON is valid.
# Otherwise, ValidationError will be raised as above

In [9]:
# If you don't want to deal with errors
draft_202012_validator.is_valid(json_data)

True

## Working with HTTP

Two main libraries:
* [urllib.request](https://docs.python.org/3/library/urllib.request.html): part of standard library. Low level API for accessing (mostly http) urls.
* [requests](https://requests.readthedocs.io/en/latest/): high level API. One love :)

`requests` has a lot of features. Just a showcase how easy it is to use.

In [42]:
%%capture
%pip install requests

In [None]:
import requests

response = requests.get("https://jsonplaceholder.typicode.com/todos")
print(response)

<Response [404]>


In [49]:
print(response.status_code)

404


### HTTP error codes as exceptions

In [47]:
import requests
from requests.exceptions import HTTPError

for url in ['https://jsonplaceholder.typicode.com/todos', 'https://jsonplaceholder.typicode.com/invalid']:
    try:
        response = requests.get(url)

        # If the response was successful, no Exception will be raised
        response.raise_for_status()
    except HTTPError as http_err:
        print(f'HTTP error occurred: {http_err}')
    except Exception as err:
        print(f'Other error occurred: {err}')
    else:
        print('Success!')

Success!
HTTP error occurred: 404 Client Error: Not Found for url: https://jsonplaceholder.typicode.com/invalid


### Working with content

In [51]:
response = requests.get("https://jsonplaceholder.typicode.com/todos/1")
# raw response bytes
print(response.content)

b'{\n  "userId": 1,\n  "id": 1,\n  "title": "delectus aut autem",\n  "completed": false\n}'


In [52]:
# response as string
string = response.text
json.loads(string)

{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

If you don't need custom processing - use build in method

In [53]:
response.json()

{'userId': 1, 'id': 1, 'title': 'delectus aut autem', 'completed': False}

### Query parameters

In [54]:
response = requests.get(
    'https://jsonplaceholder.typicode.com/comments',
    params={'postId': 1},
)
decoded = response.json()
decoded[0]["name"]

'id labore ex et quam laborum'

`requests` supports all http method, e.g. `POST` & `PUT`

* `data` - encodes dictionary as `application/x-www-form-urlencoded`
* `json` - JSON encoding

```python
requests.post('https://httpbin.org/post', data={'key':'value'})
requests.put('https://httpbin.org/put', json={'key':'value'})
```

# Debugging in Python

It is easy to debug in IDE.

But there may be situations where you have Python a script, a terminal a possible bug and no IDE :)

Python has a built-in debugger [pdb](https://docs.python.org/3/library/pdb.html) that you can use both in the REPL and as a standalone module.

It's not as powerfull as some IDE debuggers and requires you to use `__repr__()` in your classes :) but still if nothign else available - it is a great way to see whats going on.

## *invasive* and *noninvasive* debugging

If you know where the problem is and can (or want to) modify your code, you can use a built-in `breakpoint()` function to enable debugging where you want to. This works is Python 3.7+.

You can also control if the interpreter should execute the `breakpoint()` function by setting the `PYTHONBREAKPOINT` environment variable.

`PYTHONBREAKPOINT=0` will disable `breakpoint()` function.

> Note: As debugging is interactive you need to run examples in terminal with activated venv instead of notebook.

Simple usage of `breakpoint` in file will stop execution and enable debugging:

```bash
python debugging/invasive_debugging.py
```

You can also run a script or even some functions from a module using `pdb` without code modifications.

If you want to run `pdb` in REPL you can start it by running `python`, then inside the REPL:

```python
import pdb
import your_module

pdb.run("your_module.some_function()")
```

Or just run `python -m pdb your_module.py` in terminal.

This will enable `pdb` from the beginning of your script. For example:

```bash
python -m pdb debugging/noninvasive_debugging.py
```

## Commands

You can navigate your debugging environment using `pdb` commands. Some are listed here, rest aer listed in documentation.

| Command | Action                                                                                                                                                                                                   |
|---------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `p`       | Print the value of an expression.                                                                                                                                                                        |
| `pp`      | Pretty-print the value of an expression.                                                                                                                                                                 |
| `n`       | Continue execution until the next line in the current function is reached or it returns.                                                                                                                 |
| `s`       | Execute the current line and stop at the first possible occasion (either in a function that is called or in the current function).                                                                       |
| `c`       | Continue execution and only stop when a breakpoint is encountered.                                                                                                                                       |
| `unt`     | Continue execution until the line with a number greater than the current one is reached. With a line number argument, continue execution until a line with a number greater or equal to that is reached. |
| `l`       | List source code for the current file. Without arguments, list `11` lines around the current line or continue the previous listing.                                                                        |
| `ll`      | List the whole source code for the current function or frame.                                                                                                                                            |
| `b`       | With no arguments, list all breaks. With a line number argument, set a breakpoint at this line in the current file.                                                                                      |
| `w`       | Print a stack trace, with the most recent frame at the bottom. An arrow indicates the current frame, which determines the context of most commands.                                                      |
| `u`       | Move the current frame count (default one) levels up in the stack trace (to an older frame).                                                                                                             |
| `d`       | Move the current frame count (default one) levels down in the stack trace (to a newer frame).                                                                                                            |
| `h`       | List of available commands                                                                                                                                                                               |
| `q`       | Quit the debugger                                                                                                                                                                                        |

### Printing

Allows evaluation of any valid Python expression.

```bash
python debugging/printing.py
```

Inside the debugger you can run any of the following commands:

```python
p filename
p head,tail
p 'filename: ' + filename
p get_path
p getattr(get_path, '__doc__')
pp [os.path.split(p)[1] for p in os.path.sys.path]
```

> Note: last command is pretty print :)

### Navigation

* `n` allows to continue to next line or until function returns. "Step over" in debugging UI.
* `s` allows to execute the current line and stop at the first possible occasion. "Step into" int debugging UI.
* `c` allows to continue execution until next breakpoint.
* `unt` allows to continue until the given line number.

You can use this script to experiment:

```bash
python debugging/stepping.py
```

### Dynamic breakpoints

You can tell `pdb` to create breakpoints on a line number or a function name and even specify conditions when these breakpoints should be created.

```
b(reak) [ ([filename:]lineno | function) [, condition] ]
```

You can experiment with

```bash
python debugging/breakpoint.py
```

Inside debugger set `b util:4` (and check the variables `p filename, head, tail` inside the function body) or `b util.get_path` (and check the function args `p filename` or using a simple `a` command to list function arguments). Use `c` to navigate to the breakpoint.

`b` without arguments lists all available breakpoints. You can disable and enable them by their number, e.g., `disable 1` or `enable 1`, or even remove them using clear, e.g., `clear 1`.

## Tracking value changes

You can use `display` and `undisplay` to track and untrack certain expressions. If their result changes - debugger will break.

You can experiment with

```bash
python debugging/display.py
```

by setting a breakpoint on line 6 (`b 6`) and when you reach it ask debugger to `display char`. When you continue (`c`) you'll see the result.

# Cool things you might never use :)

Because frameworks got you covered. But just in case you'll need it somewhere.

## Coroutines

Coroutines are generalizations of subroutines (a.k.a. functions). They are used for cooperative multitasking where a process voluntarily `yield` control periodically or when idle in order to enable multiple applications to be run simultaneously. The difference between coroutine and function is:
* Unlike functions, coroutines have many entry points for suspending and resuming execution. Coroutine can suspend its execution and transfer control to other coroutine and can resume again execution from the point it left off. 
* Unlike functions, there is no main function to call coroutines in a particular order and coordinate the results. Coroutines are cooperative that means they link together to form a pipeline. One coroutine may consume input data and send it to other that process it. Finally, there may be a coroutine to display the result.

Coroutines are commonly used when dealing with concepts such as an **event loop** (which Python’s `asyncio` is built upon)

### Coroutines vs Threads

* Thread
    * an operating system (or run time environment) switches between threads according to the scheduler.
* Coroutine
    * programmer and programming language decide when to switch coroutines. Coroutines work cooperatively multitask by suspending and resuming at set points by the programmer.

### Recap on generators

Generators allow us to pull data and pause execution from a function context.

In [56]:
def fibonacci(limit):
    n2 = 1
    if limit >= 1:
        print("pausing execution and return n2")
        yield n2
        print("resume execution after n2")
        n1 = 0
        for _ in range(1, limit):
            n = n1 + n2
            print("pausing execution and return n")
            yield n
            print("resume execution after n")
            n1, n2 = n2, n

In [57]:
fib = fibonacci(5)
print(next(fib))

pausing execution and return n2
1


In [58]:
print(next(fib))

resume execution after n2
pausing execution and return n
1


In [59]:
print(next(fib))

resume execution after n
pausing execution and return n
2


In [60]:
print(next(fib))
print(next(fib))
try:
    print(next(fib))
except StopIteration:
    print("Finished")

resume execution after n
pausing execution and return n
3
resume execution after n
pausing execution and return n
5
resume execution after n
Finished


## Asyncio - native (new) coroutines :)

* simple coroutines - traditional generator coroutine (no async io).
* native coroutines - `asyncio` using latest `async`/`await` implementation.

Before Python 3.10 `asyncio` implementation used another implementation of generator coroutines. Now this is deprecated.

New coroutines created with `async def` are implemented using the new `__await__()` magic method ([more in python data model](https://docs.python.org/3/reference/datamodel.html#coroutines)).

`asyncio` is evolving pretty fast and has additional cool things like tasks, futures, etc.

In [61]:
import asyncio
import random
import time

async def concurent_task_1(n: int) -> str:
    i = random.randint(0, 10)
    print(f"concurent_task_1({n}) sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-1"
    print(f"Returning concurent_task_1({n}) == {result}.")
    return result

async def concurent_task_2(n: int, arg: str) -> str:
    i = random.randint(0, 10)
    print(f"concurent_task_2{n, arg} sleeping for {i} seconds.")
    await asyncio.sleep(i)
    result = f"result{n}-2 derived from {arg}"
    print(f"Returning concurent_task_2{n, arg} == {result}.")
    return result

async def chain(n: int) -> None:
    start = time.perf_counter()
    p1 = await concurent_task_1(n)
    p2 = await concurent_task_2(n, p1)
    end = time.perf_counter() - start
    print(f"-->Chained result{n} => {p2} (took {end:0.2f} seconds).")

async def main(*args):
    await asyncio.gather(*(chain(n) for n in args))

We can run it using normal Python script like this:

```python
random.seed(111)

args = [1, 2, 3]

start = time.perf_counter()
# this is important for "normal" scripts
asyncio.run(main(*args))
end = time.perf_counter() - start
print(f"Program finished in {end:0.2f} seconds.")
```

But as we already are running inside an existing event loop in Jupyter we can do it another way :)

In [62]:
random.seed(111)

args = [1, 2, 3]

start = time.perf_counter()
# Works in Jupyter as we already run inside asyncio
await main(*args)
end = time.perf_counter() - start
print(f"Program finished in {end:0.2f} seconds.")

concurent_task_1(1) sleeping for 3 seconds.
concurent_task_1(2) sleeping for 5 seconds.
concurent_task_1(3) sleeping for 7 seconds.
Returning concurent_task_1(1) == result1-1.
concurent_task_2(1, 'result1-1') sleeping for 3 seconds.
Returning concurent_task_1(2) == result2-1.
concurent_task_2(2, 'result2-1') sleeping for 6 seconds.
Returning concurent_task_2(1, 'result1-1') == result1-2 derived from result1-1.
-->Chained result1 => result1-2 derived from result1-1 (took 6.00 seconds).
Returning concurent_task_1(3) == result3-1.
concurent_task_2(3, 'result3-1') sleeping for 6 seconds.
Returning concurent_task_2(2, 'result2-1') == result2-2 derived from result2-1.
-->Chained result2 => result2-2 derived from result2-1 (took 11.00 seconds).
Returning concurent_task_2(3, 'result3-1') == result3-2 derived from result3-1.
-->Chained result3 => result3-2 derived from result3-1 (took 13.00 seconds).
Program finished in 13.00 seconds.


## Threads
### Note on GIL

Why so much hassle for concurrency? Why not good old threads?

In Python, although multithreading is supported by utilizing actual OS threads (POSIX threads on Unix and Windows threads), because of the Global Interpreter Lock (GIL), multithreading always happens on **one CPU**, thus parallelism is not possible. Only concurrency is possible in Python.

The GIL prevents context switches from happening in the middle of C code. Basically, it makes any C code into a critical section, except when that C code explicitly releases the GIL. This greatly simplifies the task of writing extension modules as well the Python core.

The designers of Python made a design decision that extension writers would not have to take care of locking. Thus, Python is intended to be simple/easy to integrate with any C library. In order to remove the GIL, you’d have to go into all existing C code and write explicit locking/unlocking code, and you’d have to do this with every new C library as well.

Other then that, Threads are pretty similar to Java.

> Note: well and you kinda can't use them for CPU intensive work :)

> Note 2: It's usualy too much headache to use threads so `asyncio` wins :)

#### Basic run

In [73]:
from threading import Thread

def run_thread(n_max: int = 1_000_000) -> None:
    n = 0
    while n < n_max:
        n += 1


my_thread = Thread(target=run_thread, args=(10_000_000,))
my_thread.start()

#### Extending Thread

In [74]:
class MyThread(Thread):
    def __init__(self, n_max=1_000_000) -> None:
        Thread.__init__(self)
        self.n_max = n_max

    def run(self) -> None:
        n = 0
        while n < self.n_max:
            n += 1


my_thread = MyThread(n_max=1_000_000)
my_thread.start()

#### Get data from Thread using Queue

In [75]:
import time
from threading import Thread
from queue import Queue


def run_thread(result_queue: Queue) -> None:
    print("thread doing work...")
    time.sleep(2)
    func_result = "result"
    # put result in queue
    result_queue.put(func_result)

In [76]:
func_result_queue: Queue = Queue(maxsize=0)

thread = Thread(target=run_thread, args=(func_result_queue,))
thread.start()

func_result = func_result_queue.get()
print(func_result, "from queue")

thread doing work...
result from queue


### ThreadPoolExecutor :)
<img src="images/dicaprio.jpeg" style="background:none; border:none; box-shadow:none; display:inline; margin:0; vertical-align:middle;" width="500px">

In [77]:
from concurrent.futures import ThreadPoolExecutor
import time

def task(id):
    print(f'Starting the task {id}...')
    time.sleep(2)
    return f'Done with task {id}'

start = time.perf_counter()

with ThreadPoolExecutor() as executor:
    f1 = executor.submit(task, 1)
    f2 = executor.submit(task, 2)

    print(f1.result())
    print(f2.result())

finish = time.perf_counter()
print(f"It took {finish-start} second(s) to finish.")

Starting the task 1...Starting the task 2...

Done with task 1
Done with task 2
It took 2.005678737012204 second(s) to finish.


## Multiprocessing

As in true parallelism you will need to "write parallel code" and separate a larger task into many smaller jobs that can be executed in parallel. Also you need to implement coordination, communication, and synchronization between processes which is not alwasy easy.

> Note: It's a complex topic so we'll just glance over a few examples.

### Spin up a process

If you are running Linux it might be easy to run processes in Jupyter :)

On other OS spawning is available only in `__main__` modules.

```python
import time
from multiprocessing import Process

def do_stuff(sleep_secs: int) -> None:
    print("doing stuff...")
    time.sleep(sleep_secs)
    print("stuff done")

if __name__ == '__main__':
    proc = Process(target=do_stuff, args=(10,))
    proc.start()
    proc.join()
```

We can create multiple sub-processes (though they are "heavier" than threads and especially coroutines).

You can use [*multiprocessing contexts*](https://docs.python.org/3/library/multiprocessing.html) you to select how a child process starts (what it inherits from the parent process). There are three choices:

* `spawn`: Starts an entirely new Python process. The new process will not inherit unnecessary objects from the parent. In particular, it does not copy thread locks. This method is the default for macOS and Windows.
* `fork`: It is a copy of the parent process. While it does not copy threads, it does copy thread locks. It is the default in Unix. This method is considered thread-unsafe and, in particular, may cause crashes in subprocesses on macOS.
* `forkserver`: When first started, it creates a fresh Python process and a server. Whenever we want to start a new process, we connect to the server and request a fork of the initially created fresh Python process.

```python
import time
import multiprocessing as mp

def do_stuff(sleep_secs: int) -> None:
    print("doing stuff...")
    time.sleep(sleep_secs)
    print("stuff done")

if __name__ == '__main__':
    mp.set_start_method("fork")
    proc = mp.Process(target=do_stuff, args=(10,))
    proc.start()
    proc.join()
```

But generally, creating and managing processes is a pain (and if you read the docs you'll see why :)).

Most of the time, they are used to remove some CPU-intensive task from the main process (possibly parallel computing).

The `Pool` takes care of process creation and communication for us. Besides, the pool interface is designed for submitting tasks. Much like `ThreadPoolExecutor` but for processes.

```python
from multiprocessing import Pool

import time

def task(id):
    print(f'Starting the task {id}...')
    time.sleep(2)
    return f'Done with task {id}'

start = time.perf_counter()

with Pool(processes=2) as executor:
    f1 = executor.apply(task, 1)
    f2 = executor.apply(task, 2)

    print(f1.result())
    print(f2.result())

finish = time.perf_counter()
print(f"It took {finish-start} second(s) to finish.")
```

There's also a `ProcessPoolExecutor` class in the `concurrent.futures` package to combine multithreading and multiprocessing.