# GIL, Multithreading, and Multiprocessing

The Python **Global Interpreter Lock** or **GIL**, in simple words, is a lock that allows only one thread to take control of the python interpreter in CPython. This lock is necessary mainly because CPython's memory management is not thread-safe (Not the case for IronPython or Jython, the .NET and Java implementation for Python). GIL is infamous for affecting multi-threaded programming performance in Python because it only allows one thread in the same process to run Python code at a time. But it is a myths to claim that the Python threading class is useless and it is a myth to claim that Python threading only slows down the execution time. In some circumstances, Python threading does speed up the execution. Here we are going to look at how to properly handle threading in Python.

### Single Threaded Example

First, let's see an example how we computed the square and the cube for a list of numbers using a single thread. Here I am using **timer.sleep** function to demonstrate in which scenario multithread can be useful. 


In [2]:
from utils.timer import DecoTimer
import time

def calc_square(arr):
    print("calculate square of numbers")
    for n in arr:
        time.sleep(0.2)
        print(f'square: {n*n}')
        
def calc_cube(arr):
    print("calculate cube of numbers")
    for n in arr:
        time.sleep(0.2)
        print(f'cube: {n*n*n}')

In [3]:
arr = [2, 4, 8, 9]

with DecoTimer("Testing Single Threaded"):
    calc_square(arr)
    calc_cube(arr)

>>>>Starting Function Testing Single Threaded...
calculate square of numbers
square: 4
square: 16
square: 64
square: 81
calculate cube of numbers
cube: 8
cube: 64
cube: 512
cube: 729
<<Finished function Testing Single Threaded in 1.628377914428711 seconds


Here you can see the overall runtime is just about 1.6 seconds.

### Multithreading Example

Now, let's try to run the square and cube computation in seperate threads.

In [4]:
from threading import Thread

with DecoTimer("Testing Multithreaded Example"):
    t1 = Thread(target=calc_square, args=(arr,))
    t2 = Thread(target=calc_cube, args=(arr,))
    
    # Running square and cube in seperate thread
    t1.start()
    t2.start()
    
    # Wait till t1 and t2 completes
    t1.join()
    t2.join()
    

>>>>Starting Function Testing Multithreaded Example...
calculate square of numbers
calculate cube of numbers
square: 4
cube: 8
square: 16
cube: 64
square: 64cube: 512

square: 81
cube: 729
<<Finished function Testing Multithreaded Example in 0.8118569850921631 seconds


Note that the multithreading example actually speeds up the computatiom by about 0.8 seconds, which makes the process almost two times faster than the single-threaded example. But, doesn't GIL only allow one thread running at a time? How does multithreading actually speed up the execution? Let's see another single threaded vs multi threaded example, but this time we use a long for-loop to replace the **time.sleep** functions.

In [9]:
def calc_square_cpu_bounded(arr):
    print("calculate square of numbers")
    for n in arr:
        for _ in range(8000000): # use a loop instead of timer.sleep
            pass
        print(f'square: {n*n}')
        
def calc_cube_cpu_bounded(arr):
    print("calculate cube of numbers")
    for n in arr:
        for _ in range(8000000):
            pass
        print(f'cube: {n*n*n}')

In [10]:
with DecoTimer("Testing Single Threaded without timer.sleep"):
    calc_square_cpu_bounded(arr)
    calc_cube_cpu_bounded(arr)

>>>>Starting Function Testing Single Threaded without timer.sleep...
calculate square of numbers
square: 4
square: 16
square: 64
square: 81
calculate cube of numbers
cube: 8
cube: 64
cube: 512
cube: 729
<<Finished function Testing Single Threaded without timer.sleep in 1.9267339706420898 seconds


In [8]:
with DecoTimer("Testing Multithreaded Example without timer.sleep"):
    t1 = Thread(target=calc_square_cpu_bounded, args=(arr,))
    t2 = Thread(target=calc_cube_cpu_bounded, args=(arr,))
    
    # Running square and cube in seperate thread
    t1.start()
    t2.start()
    
    # Wait till t1 and t2 completes
    t1.join()
    t2.join()

>>>>Starting Function Testing Multithreaded Example without timer.sleep...
calculate square of numbers
calculate cube of numbers
square: 4cube: 8

square: 16
cube: 64
square: 64
cube: 512
square: 81
cube: 729
<<Finished function Testing Multithreaded Example without timer.sleep in 2.0313053131103516 seconds


Notice that the multithreaded example here did not really speed up the execution, but actually performed slightly worse than the single threaded example. This is because the delay loop we used in this example is more CPU hungry and GIL needs to be frequently acquired. The **timer.sleep** function, on the other hand, can release the GIL for full dealy, enabling the other thread to acuire the lock and continue its computation.

The **I/O Bounded** tasks or tasks involving external systems behave more similar to functions with timer.sleep, and therefore threads can combine their work more efficiently. However, CPython threads provides no benefit for CPU intensive tasks because of the GIL.

### Multiprocessing Example

#### Thread vs Process

Both threads and processes are independent sequence of execution. The main difference is that threads of the same process run in a shared memory space, where as processes run in isolated memory space. 

The threading module uses threads, and the multiprocessing module uses processes. The difference is that threads run in the same memory space, while processes have separate memory. This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. This is what the global interpreter lock is for.

Let's see how the multiprocessing example performs on the square and cube tasks.

In [11]:
from multiprocessing import Process
with DecoTimer("Testing Multiprocessing Example"):
    p1 = Process(target=calc_square, args=(arr,))
    p2 = Process(target=calc_cube, args=(arr,))
    
    # Running square and cube in seperate processes
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()

>>>>Starting Function Testing Multiprocessing Example...
calculate square of numbers
calculate cube of numbers
square: 4
cube: 8
square: 16
cube: 64
square: 64
cube: 512
square: 81
cube: 729
<<Finished function Testing Multiprocessing Example in 0.8436489105224609 seconds


The execution time is about half of the single threaded (1.6 seconds) example.

In [12]:
from multiprocessing import Process
with DecoTimer("Testing Multiprocessing Example without timer.sleep"):
    p1 = Process(target=calc_square_cpu_bounded, args=(arr,))
    p2 = Process(target=calc_cube_cpu_bounded, args=(arr,))
    
    # Running square and cube in seperate processes
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()

>>>>Starting Function Testing Multiprocessing Example without timer.sleep...
calculate square of numbers
calculate cube of numbers
cube: 8
square: 4
cube: 64
square: 16
cube: 512
square: 64
cube: 729
square: 81
<<Finished function Testing Multiprocessing Example without timer.sleep in 1.0965428352355957 seconds


The execution time for a CPU bounded task is also about half of the single threaded example.

#### Memory Space Comparison

In [16]:
square_result = []

def calc_square_global(arr):
    global square_result
    for n in arr:
        time.sleep(0.2)
        square_result.append(n * n)
    print('square_result within calc_square_global function: ', square_result)

with DecoTimer("Multithreading Example"):
    t = Thread(target=calc_square_global, args=(arr,))
    t.start()
    t.join()
    print('square_result outside calc_square_global function: ', square_result)
    


>>>>Starting Function Multithreading Example...
square_result within calc_square_global function:  [4, 16, 64, 81]
square_result outside calc_square_global function:  [4, 16, 64, 81]
<<Finished function Multithreading Example in 0.8124549388885498 seconds


In [17]:
cube_result = []

def calc_cube_global(arr):
    global cube_result
    for n in arr:
        time.sleep(0.2)
        cube_result.append(n * n * n)
        
    print('cube_result within calc_cube_global function: ', cube_result)
    
with DecoTimer("Multiprocessing Example"):
    p = Process(target=calc_cube_global, args=(arr,))
    p.start()
    p.join()
    print('cube_result outside calc_cube_global function: ', cube_result)

>>>>Starting Function Multiprocessing Example...
cube_result within calc_cube_global function:  [8, 64, 512, 729]
cube_result outside calc_cube_global function:  []
<<Finished function Multiprocessing Example in 0.8199930191040039 seconds


Here you can see multithreading uses a shared memory space and the global variable is modified. The multiprocessing module, on the other hand, runs the process in a sperate memory space, and only updates a copy of the global variable in an isolated space from the main process.

In the next module, I wan to demonstrate more examples on multithreading and multiprocessing. Then, I will introduce how to share data between processes using Value, Array, and Queue.