## Data-Oriented and ASyncIO

Reference

- https://towardsdatascience.com/data-oriented-programming-with-python-ef478c43a874/
- https://medium.com/@moraneus/mastering-pythons-asyncio-a-practical-guide-0a673265cf04
- https://medium.com/velotio-perspectives/an-introduction-to-asynchronous-programming-in-python-af0189a88bbb
- https://realpython.com/python-async-features/
- https://realpython.com/async-io-python/
- https://docs.python.org/3/library/asyncio-task.html

## Data-Oriented

A recap on Data-Oriented Programming by Yehonathan Sharvit (book published in 2022). Book used JavaScript and Java, in this treatment Python is used

![DOD Book Cover](gfx/dod-book-cover.png)

Python is a hybrid of OOP and FP

### Principles are language-agnostic

1. Separate code from data in a way that the code resides in functions whose behavior does not depend on data that is encapsulated in the function’s context.
2. Data is represented with generic data structures, such as maps (or dictionaries) and arrays (or lists).
3. Data should never change! Instead of mutating data, a new version of it is created.
4. The expected shape of data (it's schema) is represented as (meta) data that is kept separately from the main data representation.

In [None]:
# Principle #1
from dataclasses import dataclass

# A natural way of adhering to this principle in Python is to use top-level functions (for code)
# and data classes that only have fields (for data).

@dataclass     # <- a decorator! (a function that takes a function)
               # automatically adds generated special methods such as __init__() and __repr__() to user-defined classes
class AuthorData:
    """Class for keeping track of an author in the system"""

    first_name: str
    last_name: str
    n_books: int

# The code that deals with full name calculation is separate from the code that deals with the creation of author data.
def calculate_name(first_name: str, last_name: str):
    return f"{first_name} {last_name}"

author_data = AuthorData("Isaac", "Asimov", 500)
calculate_name(author_data.first_name, author_data.last_name)

In [None]:
# Principle #2
# the "class" defines a schema (how information is organized)
class FullName:
    def __init__(self, first_name, last_name, suffix):
        self.first_name = first_name
        self.last_name = last_name
        self.suffix = suffix

obj = FullName(fist_name="Jane", last_name="Doe", suffix="II") # this leads to an actual error

In [None]:
# Principle #2
# using a generic data structure is easier, but can lead to errors

# The existence of data schema at a class level makes it easy to discover the expected data shape.
# When data is represented with generic data structures, data schema is not part of the
# data representation.

names = []
names.append({"first_name": "Jane", "last_name": "Doe", "suffix": "III"})
names.append({"first_name": "Isaac", "last_name": "Asimov"})
names.append({"fist_name": "John", "last_name": "Smith"}) # error, "fist_name" should be "first_name"

print(f"{names[2].get('first_name')} {names[2].get('last_name')}")
# no schema, and using a generic data structure leads to a silent error - "None" is printed

In [None]:
# Principle #3
from dataclasses import dataclass

# The immutable data types in built-in Python are int, float, decimal, bool,
# string , tuple and range. Note that dict, list and set are mutable.
@dataclass(frozen=True)
class AuthorData:
    """Class for keeping track of an author in the system"""

    first_name: str
    last_name: str
    n_books: int

### Free concurrency safety

When data is mutable in a multi-thread environment, race condition failure can occur.

In [None]:
# Principle #3
# list is mutable and tuple is immutable, as we expand both objects,
# list identity remains the same whereas a brand new tuple is created with a different identity
list1 = [1, 2, 3]
tuple1 = (1, 2, 3)

print(id(list1))   # 1859329589504
print(id(tuple1))  # 1859328732288

list1 += [4, 5]
tuple1 += (4, 5)

print(id(list1))   # 1859329589504 (identity did not change)
print(id(tuple1))  # 1859329720944 (identity changed)
# The need to copy contents of immutable object into a new object every time we modify
# it requires additional memory and creates added cost on CPU power, especially for
# a very large collection.

## Getting Things Done (Leveraging What You Have)

- I/O-bound vs CPU-bound tasks
- Turn serial operations into parallel ones

## Concurrency Models

Before exploring asyncio, it’s worth taking a moment to compare async I/O with other concurrency models to see how it fits.

### Terms

- **Parallelism** consists of executing multiple operations at the same time. (Maybe several threads or "cores", maybe SIMD instructions.)
- **Multiprocessing** is a means of achieving parallelism that entails spreading tasks over a computer’s central processing unit (CPU) cores. Multiprocessing is well-suited for CPU-bound tasks, such as tightly bound for loops and mathematical computations.
- **Concurrency** is a slightly broader term than parallelism, suggesting that multiple tasks have the ability to run in an overlapping manner. Concurrency doesn’t necessarily imply parallelism.
- **Threading** is a concurrent execution model in which multiple threads take turns executing tasks. A single process can contain multiple threads. Python’s relationship with threading is complicated due to the global interpreter lock (GIL).

Threading is good for I/O-bound tasks, but there is overhead.

## Asynchronous Programming

Asynchronous programming is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread

![Async Programming](gfx/async-programming.png)

Async I/O isn't a new concept. It exists in - or is being built into - other languages such as Go, C#, and Rust.

## How Does Python Do Multiple Things at Once?

![Programming Models](gfx/programming-models.png)

The OS is not participating. As far as OS is concerned you’re going to have one process and there’s going to be a single thread within that process, but you’ll be able to do multiple things at once.

# **Async I/O ISN'T threading!!!!**


![Concurrency vs Parallelism](gfx/concurrency-vs-parallelism.png)


## Understanding AsyncIO

- **asyncio** is the new concurrency module introduced in Python 3.4. It is designed to use coroutines and futures to simplify asynchronous code and make it almost as readable as synchronous code.
- **asyncio** is all about writing code that can do multiple things at once, without actually doing them at the same time.
---
- **Event Loop** : The central execution device provided by asyncio. It manages and distributes the execution of different tasks. It's responsible for handling events and scheduling asynchronous routines.
- **Coroutines** : Asynchronous functions declared with **async def**. These functions can be paused and resumed at await points, allowing I/O operations to run in the background.
- **Futures** : Objects that represent the result of work that has not yet been completed. They are returned from tasks scheduled by the event loop.
- **Tasks** : Scheduled coroutines that are wrapped into a Future object by the event loop, allowing their execution.

## AsyncIO Explained

Chess master Judit Polgár hosts a chess exhibition in which she plays multiple amateur players. She has two ways of conducting the exhibition: synchronously and asynchronously.

Assumptions:

- 24 opponents
- Judit makes each chess move in 5 seconds
- Opponents each take 55 seconds to make a move
- Games average 30 pair-moves (60 moves total)

Synchronous version: Judit plays one game at a time, never several at the same time, until the game is complete. Each game takes (55 + 5) * 30 == 1800 seconds, or 30 minutes. The entire exhibition takes 24 * 30 == 720 minutes, or 12 hours.

Asynchronous version: Judit moves from table to table, making one move at each table. She leaves the table and lets the opponent make their next move during the wait time. One move on all 24 games takes Judit 24 * 5 == 120 seconds, or 2 minutes. The entire exhibition is now cut down to 120 * 30 == 3600 seconds, or just 1 hour.

Using async package adds two keywords **aync** and **await**.

- The **async** def syntax construct introduces either a coroutine function or an asynchronous generator.
- The **await** keyword suspends the execution of the surrounding coroutine and passes control back to the event loop. It is used to pause the execution of an async function until an awaitable object (like coroutines, Tasks, Futures, or I/O) completes, allowing other tasks to run in the meantime. Its primary purpose is to yield control back to the event loop, suspending the execution of the enclosing coroutine until the awaited object is resolved. This non-blocking behavior is what makes asynchronous programming efficient, especially for I/O-bound tasks.

asyncio is not part of the Python standard library, it is included with Python by default.

In [None]:
# synchronous code
import time

def say_hello():
    time.sleep(2)
    print("Hello, World!")

say_hello()

In [None]:
# asynchronous code

# the following code allows asyncio to work "inside" of jupyter instance
import nest_asyncio
nest_asyncio.apply()

import asyncio

async def say_hello():
    await asyncio.sleep(2)
    print("Hello, World!")

asyncio.run(say_hello())

## Basically it turns a I/O "blocking" call into "non-blocking"

In [None]:
# synchronous function
def f():
    print("hello")

# asynchronous function
async def g():
    result = await f() # pause and come back to g() when f() returns
    return result

print(type(f))
print(type(g))

g()

## There are hybrid approachs to use both synchronous and asynchronous functions