<div style="text-align: center;" >
<h1 style="margin-top: 0.2em; margin-bottom: 0.1em;">Parallelism</h1>
<img src="sources/parallel_meme.jpg" alt="How can I finish this assignment faste?!" style="width:400px;height:400px;">
</div>
<br>

***If you're looking at this notebook online (GitHub) the images won't load correctly. To view them please download/pull the the notebook along with the 'sources' folder and view the notebook on your local machine***

***Please use the following link in order to take part in the in-session quizzes:***<br>
https://strawpoll.com/wAg3QEJAdy8

In [1]:
import time               # used to track run times
import threading          # used for multi threading
import os                 # used to check for available computing cores
import requests           # used for 'standard' web scraping
import aiohttp            # used for multi threaded web scraping
import asyncio            # used for writing async functions
import json               # work with json files
import multiprocessing    # used for multi processing

***Note: Depending on the IDE you use asyncio works a bit different. It has to do with something called 'event loop'. Check out this [post](https://stackoverflow.com/questions/55409641/asyncio-run-cannot-be-called-from-a-running-event-loop-when-using-jupyter-no) for more info regarding asyncio and jupyter.***

***Note: Depending on your operating system (OS) multi processing in python works differently. This is due to some fundamental differences in how the different operating systems deal with multi processing. If you want to get some more info on the differences between Windows/Mac and Linux check out this [resource](https://pythonforthelab.com/blog/differences-between-multiprocessing-windows-and-linux/).***

# Concurrency vs. Parallelism

Often times it is not really noticeable for us humans whether something is run concurrently or in parallel. We're just to slow...<br>
In simple terms, 'concurrency' means that different parts of a greater whole can be run out of order while still producing the same result. 'Parallelism' means that different parts of a greater whole are run at the exact same time in, well..., parallel.

## Multi Threading - Concurrency

Multiple threads are run after each other.<br>
This happens so fast, though, that you might easily assume that they are run in parallel. A good example for this are Input/Output (IO) tasks where the different processes are (mostly) independent of each other. If you want to retrieve the content of a web page for example you will have to query the server for the information you need (the input). The server will then return the answer or information (the output) to you. While you are waiting for the server to respond and give you the output your machine is basically having a grand ol' time chilling and relaxing since there is nothing else to do than wait. Now, we can't have that of course! Every free minute needs to be invested into work, work, WORK...!<br>
So... Instead of just letting our machine lounge around, we tell it to send another query to another web page while it is waiting for the first web page to answer. And while it is still waiting we tell it to send another query, and another, and another... until our machine wishes there were something like machine rights or a labor union for computers.<br>
I hope you kinda get the point. We are not doing this stuff at the very same time but rather one step after the other. Since it only takes our machine a split second to send out a new query, though, we might get the impression that everything is working in parallel. Once the answers of the web pages are returned, they are for example stored and then printed.<br>

Think of it like this:<br>
You open your favorite browser and decide to watch a movie on the platform of your choice. So you enter in the address of the chosen web page. After you hit 'search' you realize that you do not know which movie you wanna watch. So you open a second tab and look for the best movies of your chosen genre while the first tab opening your streaming service is still loading. You decide on a movie and once you go back to your fist tab, it's finished loading and your good to go!

<div style="text-align: center;" >
<img src="sources/multi_threading.png" alt="Multi Threading Vis">
</div>

### Basic Example

To start things of we'll just have a look at a simple function that counts down from a specified integer number. First we do this without any 'fancy pants stuff' and afterwards we'll see how a multi threading approach would look like.

In [2]:
count = 50_000_000

def countdown(n):
    while n > 0:
        n -= 1

In [3]:
# Without multi threading
start = time.time()
countdown(count)
end = time.time()

print('Seconds:', end - start)

Seconds: 4.295830249786377


If we were to perform the same task using multi threading would the run time be:<br>
A: faster<br>
B: the same<br>
C: slower<br>
D: I don't know<br>

Click [here](https://strawpoll.com/wAg3QEJAdy8) to vote!

In [4]:
# Let's see some speed up


### Web Requests

For this part we are gonna deal with something a bit more advanced. The code below is copied and adapted from the [`aiohttp`](https://docs.aiohttp.org/en/stable/index.html) libraries documentation. First, let's try our 'standard' approach though:

In [None]:
websites = ['https://en.wikipedia.org/', 'https://www.python.org/', 'https://stackoverflow.com/', 'https://stackexchange.com/', 'https://www.uni-konstanz.de/']

In [None]:
# Without multi threading
start = time.time()
for site in websites:
    request = requests.get(site)
    print(request)
    html = request.text
    print('First char of website html:', html[0])

end = time.time()

print('Seconds:', end - start)

If we were to perform the same task using multi threading would the run time be:<br>
A: faster<br>
B: the same<br>
C: slower<br>
D: I don't know<br>

Click [here](https://strawpoll.com/wAg3QEJAdy8) to vote!

In [None]:
# Let's see some speed up


### Race Condition

A race condition is a situation in which multiple instructions are executed at the same time. If the different instructions finish faster/slower than expected this can lead to problems. Have a look at the following code and then try to answer the question down below.

![Race Condition Example](sources/race_condition.png)

What is the final result of x, if we run the cell above:<br>
A: 20<br>
B: 25<br>
C: None (we encounter an error)<br>
D: I don't know<br>

Click [here](https://strawpoll.com/wAg3QEJAdy8) to vote!

***
## Multi Processing - Parallelism

Multiple cores, capable of running multiple threads, are running at the exactly same time.<br>
This happens so fast, though, that you might easily assume that they are run in parallel and well, yes, they are! Good examples for this are processes that are a bit more taxing (or/and) can be split among multiple cores. Think of running a classification model for example. The goal is to classify a huge amount of data according to some rule set and the data can easily be broken into smaller subsets.<br>
You simply provide multiple cores with the same instructions but slightly different data aka different subsets of your original data. Then each core starts classifying their respective subset of the data and once done you can collect the results from all cores and put them together. The beautiful part about this is, that you can also provide different instructions to different cores while providing all of your cores with the same data.<br>

Think of it like this:<br>
You have to calculators and the motor skills to use them both at the exact same time.<br>
In one scenario you have a list of values (your data) and you want to perform some mathematical operation on those values. In order to halve the time needed to perform this operation split the list of values in two and use one calculator to run the operations on the first halve and the other calculator to run the operations on the second halve.<br>
In the other scenario you have a list of values (your data) and you want to perform two different mathematical operations, A and B, on those values. So, naturally, you use one calculator to run operation A on all values and the other calculator to run operation B on all values. Thereby halving the time needed to complete the task.

<div style="text-align: center;" >
<img src="sources/multi_processing.png" alt="Multi Processing Vis">
</div>

### Basic Example

We're gonna look at the same example as above. First without any multi blabla and then with a multi processing approach.

In [None]:
# Without multi processing
start = time.time()
countdown(count)
end = time.time()

print('Seconds:', end - start)

If we were to perform the same task using multi processing would the run time be:<br>
A: faster<br>
B: the same<br>
C: slower<br>
D: I don't know<br>

Click [here](https://strawpoll.com/wAg3QEJAdy8) to vote!