# Asynchronous programming for faster speed!

Imagine you are working in an assembly line where your job is to stick a logo on a product, this task takes very little time like 1 minute, but the task happening right before you is building the product which takes 1 hour. You could be sticking logos on 60 products and hour, but you have to wait for each product to be finished first. It would be great if you could build several products at the same time so you could be as productive as you could right?!

Well, this exactly what Asynchronous tasks will help you with!


## What you are going to learn in this course 🧐🧐

This course will teach you how to make asynchronous code using the `asyncio` library, which is one of the most popular out there! Let's give the outline:

* What is asynchronous programming?
* Why do we need asynchronous programming?

## What is asynchronous programming?

Asynchronous programming is a type of parallel programming in which a unit of work is allowed to run separately from the primary application thread. When the work is complete, it notifies the main thread about completion or failure of the worker thread. There are numerous benefits to using it, such as improved application performance and enhanced responsiveness.

What this sentence means is that the asynchronous piece of code will run, but it will not block the rest of the code from running, the various tasks will be able to run simultaneously (we also say in parallel).

This great when some tasks take longer than others, and when different parts of your code can be executed independently from each other without failing!

## Why do we need asynchronous programming?

As you will start coding more and more and get experience in the field of data science, you will learn to remember that data science stands at the crossroad of statistics and computer science. And who says computer science says, well computers!

We'll get back to this in future lectures but it is important to have in mind that when dealing with computers, we are using physical elements to make calculations for us and transmit data to each other etc and these operations take time. Let's take a look at different execution times per operations for various elements of a computer:

> Remember that there is a "computer" in "computer science".
— Peter Norvig

- CPU (Central Processing Unit) ≈ 1 ns
- Memory ≈ 100 ns
- Disk ≈ 20 μs
- Network ≈ 150 ms

Now let's imagine we are trying to make many API calls (wild, I know) in a loop over a list of urls. What will happen is the following:

1. Enter the loop code (CPU)
2. Get data to write the query from the url list (Memory)
3. Send query (Network)
4. Wait for query response (Network)
5. Save response (Memory)
6. Go to next step of the loop (CPU)

The amount of time we spend waiting for to receive the API's response is much longer than any of the other operations (we could be sticking lots of logos but we are waiting for the products to be ready), causing the whole process to be super slow.

Now if we make the part of the code that deals with the API asychronous, then we do not have to wait for the API to respond to start sending new queries, causing the total execution time of the loop to be much quicker! This works because most APIs are running on machines that can process several calls in parallel!

## Asynchronous programming

Let's move on to practice! We will start by giving generic examples to explain the basics of asynchronoujs programming, then we will give more concrete examples of how it works when using loops in which we expect down time periods, then a quick demonstration with an actual API.

### Synchronous program

Notebooks are very useful for trying things and exploring data, however, when writing programs to call API and fill databases etc.. we rarely use them. Instead data scientists, and more generally developpers write scripts (.py files) that contain python code all in one block!

[This first program](src/async1.py) only uses synchronous processes and therefore every task happens one after the other like we are used to.

In [1]:
!python src/async1.py # this will execute the python file async1.py in the src folder!

Hello!
Goodbye!


### Our first asynchronous program

Let's introduce the `asyncio` library, which let's python run asynchronous programs! The program can be found [here](src/async2.py).

In [2]:
!python src/async2.py

  main()
Goodbye!


Here we see that the final statement of our program was executed, however the `main()` function was never executed, and the error states the reason for that is we never awaited for this program to run!

Indeed, when defining an asynchronous program with the keyword `async` it turns the function `main()` into what we call a **coroutine** that works a little bit differently than regular functions. If we want this coroutine to actually run we need to use the `asyncio.run` method, like we do in [this program](src/async3.py).

In [3]:
!python src/async3.py

Hello!
Goodbye!


Now everything works fine! But we have'nt done anything more than running the two processes one after the other, which is not the goal of asynchronous programming. What we want is to be able to run various tasks simultaneously while others are inactive!

Using the `asyncio.run` method is actually creating what we call and **event loop** which is sort of a timeline where the system in which you are running your code will decide which program to actively run depending on what program can be immediatly active and which one is waiting on some other thing to finish.


### Waiting for other tasks to finish

In [this program](src/async4.py), we will show you how the event loop works by creating asynchronous **tasks** that we wish to run concurrently. Notice that we are using the keyword **await** which executes a coroutine inside an asynchronous function, this keyword cannot be used outside of such a function.

In [4]:
!python src/async4.py

OK
Goodbye!
Hello!


Here the function `bye()` starts by waiting for one second, which means that for one second the processor is doing nothing! Ideally we would like to take advantage of that down time to continue running the rest of the `main()` function! Which is the point of asynchronous programming!

### Taking advantage of other tasks' downtime

In order to take advantage of the downtime of some processes to run others we need to create a task using `asyncio.create_task` (how very original right!?), which will indicate python that a certain piece of program may take some time doing nothing because it is waiting for, let's say, an API to return some results, and that in the meantime it can move on with the execution of other processes.

[This program](src/async5.py) will show you how we can do this.

In [5]:
!python src/async5.py

Hello!
OK


So here, we execute `main()` without having to wait for `bye()` to finish, which means that `bye()` will run whenever `main()` stops trusting the processor. In this case `bye()` is able to print `OK` but while it's waiting for 1 second, `main()` is actually done so the execution stops before `bye()` can finish.

In the [following example](src/async6.py) we will show you how the two different asynchronous programs can alternate waiting for each other.

In [6]:
!python src/async6.py

Hello!
OK
Goodbye!


Now because we force `main()` to wait for 2 seconds, it gives enough time for `bye()` to actually keep on running and finish!

To summarize, the `await` keyword is essential because it tells python when the execution may move from a process to the other process.

### Making asynchronous loops and saving results

If you are trying to run many different tasks in a loop and want more tasks to get started as other ones are waiting and get all the results at the end you may use `asyncio.gather` which will return and aggregate list of all the results given by the asynchronous processes!

Let's take a look at [this example](src/async7.py) for an application demo.

In [10]:
import asyncio
from src.async7 import main
import time
import src.config # the configuration file so the logs from our asynchronous code be displayed
loop = asyncio.get_event_loop()

start = time.time()
task = loop.create_task(main())
end = time.time()
print(end - start)

0.00012493133544921875


[2022-01-12 13:17:16 UTC]	INFO	Starting query...	(src.async7)
[2022-01-12 13:17:18 UTC]	INFO	The process took 2.0041871070861816 seconds	(src.async7)


In [9]:
task.result()

['response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',
 'response',

In this example we simulate 100 API calls that should take 2 seconds each. If we had executed this synchronously it would have taken at least 200 seconds to run. Using asynchronous programming the process only took a little more than 2 seconds because all the tasks could be executed simultaneously!

This will be extremely useful when querying APIs to get massive amounts of data!

### Applied example

This [last example](src/async8.py) will give you an applied example of asynchronous API calls on the teleport API.
We'll have to introduce a new library to make the HTTP requets asynchronous as well (so they work a little bit like `asyncio.sleep` commands)

This library that lets you make asynchronous HTTP requests is called `tornado` and we will use its `AsyncHTTPClient` method. Find more documentation on this [here](https://www.tornadoweb.org/en/stable/httpclient.html#).

In [1]:
import asyncio
from src.async8 import search
import time
import src.config # the configuration file so the logs from our asynchronous code be displayed
loop = asyncio.get_event_loop()

cities = ["Tokyo",
"Delhi",
"Shanghai",
"Sao Paulo",
"Mexico City",
"Cairo",
"Mumbai",
"Beijing",
"Dhaka",
"Osaka",
"New York",
"Karachi",
"Buenos Aires",
"Chongqing",
"Istanbul",
"Kolkata",
"Manila",
"Lagos",
"Rio de Janeiro",
"Tianjin",
"Kinshasa",
"Guangzhou",
"Los Angeles",
"Moscow",
"Shenzhen",
"Lahore",
"Bangalore",
"Paris",
"Bogota",
"Jakarta",
"Chennai",
"Lima",
"Bangkok",
"Seoul",
"Nagoya",
"Hyderabad",
"London",
"Tehran",
"Chicago",
"Chengdu",
"Nanjing",
"Wuhan",
"Ho Chi Minh City",
"Luanda",
"Ahmedabad",
"Kuala Lumpur",
"Xi'an",
"Hong Kong",
"Dongguan",
"Hangzhou",
"Foshan",
"Shenyang",
"Riyadh",
"Baghdad",
"Santiago",
"Surat",
"Madrid",
"Suzhou",
"Pune",
"Harbin",
"Houston",
"Dallas",
"Toronto",
"Dar es Salaam",
"Miami",
"Belo Horizonte",
"Singapore",
"Philadelphia",
"Atlanta",
"Fukuoka",
"Khartoum",
"Barcelona",
"Johannesburg",
"Saint Petersburg",
"Qingdao",
"Dalian",
"Washington, D.C.",
"Yangon",
"Alexandria",
"Jinan",
"Guadalajara"]

cities = [city.replace(" ","%20") for city in cities]
print(cities[:5])

start = time.time()
task = loop.create_task(search(cities))
end = time.time()
print(end - start)

['Tokyo', 'Delhi', 'Shanghai', 'Sao%20Paulo', 'Mexico%20City']
0.0001010894775390625


[2022-01-12 14:48:55 UTC]	INFO	Starting query...	(src.async8)
[2022-01-12 14:49:08 UTC]	INFO	The process took 13.323923110961914 seconds	(src.async8)


In [2]:
task.result()

[{'_embedded': {'city:search-results': [{'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:1850147/'}},
     'matching_alternate_names': [{'name': 'Tokyo'}, {'name': 'tokyo'}],
     'matching_full_name': 'Tokyo, Tokyo, Japan'},
    {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:1863440/'}},
     'matching_alternate_names': [],
     'matching_full_name': 'Hachiōji, Tokyo, Japan'},
    {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:1857871/'}},
     'matching_alternate_names': [],
     'matching_full_name': 'Machida, Tokyo, Japan'},
    {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:1864518/'}},
     'matching_alternate_names': [],
     'matching_full_name': 'Chōfu, Tokyo, Japan'},
    {'_links': {'city:item': {'href': 'https://api.teleport.org/api/cities/geonameid:1850692/'}},
     'matching_alternate_names': [],
     'matching_full_name': 'Nishi-Tokyo-shi,

Let's now do the same thing using synchronous programming!

In [5]:
import requests

cities = ["Tokyo",
"Delhi",
"Shanghai",
"Sao Paulo",
"Mexico City",
"Cairo",
"Mumbai",
"Beijing",
"Dhaka",
"Osaka",
"New York",
"Karachi",
"Buenos Aires",
"Chongqing",
"Istanbul",
"Kolkata",
"Manila",
"Lagos",
"Rio de Janeiro",
"Tianjin",
"Kinshasa",
"Guangzhou",
"Los Angeles",
"Moscow",
"Shenzhen",
"Lahore",
"Bangalore",
"Paris",
"Bogota",
"Jakarta",
"Chennai",
"Lima",
"Bangkok",
"Seoul",
"Nagoya",
"Hyderabad",
"London",
"Tehran",
"Chicago",
"Chengdu",
"Nanjing",
"Wuhan",
"Ho Chi Minh City",
"Luanda",
"Ahmedabad",
"Kuala Lumpur",
"Xi'an",
"Hong Kong",
"Dongguan",
"Hangzhou",
"Foshan",
"Shenyang",
"Riyadh",
"Baghdad",
"Santiago",
"Surat",
"Madrid",
"Suzhou",
"Pune",
"Harbin",
"Houston",
"Dallas",
"Toronto",
"Dar es Salaam",
"Miami",
"Belo Horizonte",
"Singapore",
"Philadelphia",
"Atlanta",
"Fukuoka",
"Khartoum",
"Barcelona",
"Johannesburg",
"Saint Petersburg",
"Qingdao",
"Dalian",
"Washington, D.C.",
"Yangon",
"Alexandria",
"Jinan",
"Guadalajara"]

cities = [city.replace(" ","%20") for city in cities]
print(cities[:5])

result = {}
start = time.time()
for city in cities:
    url = f"https://api.teleport.org/api/cities/?search={city}"
    r = requests.get(url)
    result[city] = r.json()
end = time.time()
print(end - start)

['Tokyo', 'Delhi', 'Shanghai', 'Sao%20Paulo', 'Mexico%20City']
12.797714233398438


Synchronous programming took ten times longer than its asynchronous counterpart.

## Ressources

* [Medium article on asynchronous programming](https://medium.com/velotio-perspectives/an-introduction-to-asynchronous-programming-in-python-af0189a88bbb#:~:text=Asynchronous%20programming%20is%20a%20type,failure%20of%20the%20worker%20thread.)
* [asyncio video tutorial](https://www.youtube.com/watch?v=t5Bo1Je9EmE)
* [asyncio vs other asyhnchronous programming techniques](https://www.youtube.com/watch?v=bs9tlDFWWdQ)