### Docs

- Great comparison: https://www.youtube.com/watch?v=AZnGRKFUU0c
- Great Code comparison: https://builtin.com/data-science/multithreading-multiprocessing
- Good lock based examples: https://www.turing.com/kb/python-multiprocessing-vs-multithreading
- OS Level metrics (Best real-world examples): https://www.youtube.com/watch?v=BhnB45Rf3dg

- Some Myths: https://medium.com/contentsquare-engineering-blog/multithreading-vs-multiprocessing-in-python-ece023ad55a
- ThreadPoolExecutor: https://www.digitalocean.com/community/tutorials/how-to-use-threadpoolexecutor-in-python-3
  

### MultiThreading

- Best for I/O tasks, Lightweight and easy to spawn
- Its Concurrent but not parallel
- Every 15s or any I/O operation comes -> the threads being executed are changed
- Since I/O anyways need some time in between, at that time other threads can run -> **Hence great for I/O Tasks**
- You can share variables -> but need to use locks ( in case of concurrent updates )

In [1]:
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed

In [2]:
global_res = []
def api_call(url, retry_duration, lock):
    global res

    with lock: # lock.acquire() -> lock.release()
        # hit the api
        global_res.append({"res": url+str(retry_duration)})
    
    return {"res": url+str(retry_duration)}

In [3]:
lock = threading.Lock()
thread1 = threading.Thread(target=api_call, args=("https://google.com", 10, lock))
thread2 = threading.Thread(target=api_call, args=("https://facebook.com", 5, lock))
thread3 = threading.Thread(target=api_call, args=("https://reddit.com", 5, lock))
thread4 = threading.Thread(target=api_call, args=("https://instagram.com", 15, lock))

thread1.start()
thread2.start()
thread3.start()
thread4.start()

thread1.join()
thread2.join()
thread3.join()
thread4.join()

print(global_res)

[{'res': 'https://google.com10'}, {'res': 'https://facebook.com5'}, {'res': 'https://reddit.com5'}, {'res': 'https://instagram.com15'}]


In [4]:
global_res = []
with ThreadPoolExecutor(3) as executor:
    lock = threading.Lock()
    
    urls = ["https://google.com","https://facebook.com","https://reddit.com","https://instagram.com"]

    # submit tasks to thread pool
    futures = [executor.submit(api_call, url, 10, lock) for url in urls] 
    # wait for all tasks to complete
    results = [future.result() for future in as_completed(futures)]

    print(results)
    print(global_res)
    


[{'res': 'https://reddit.com10'}, {'res': 'https://facebook.com10'}, {'res': 'https://google.com10'}, {'res': 'https://instagram.com10'}]
[{'res': 'https://google.com10'}, {'res': 'https://facebook.com10'}, {'res': 'https://reddit.com10'}, {'res': 'https://instagram.com10'}]


### MultiProcessing

- Best for CPU based tasks, Heavy to spawn since each needs its own memory space
- Each process has its own Memory space -> NO variable sharing
- Even if you share, like in below example -> each process has its own global variable which will be empty. Wont matter to you
- **Share variables?** -> Use `multiprocessing.manager`

In [5]:
import multiprocessing
from concurrent.futures import ProcessPoolExecutor, as_completed

In [6]:
global_res = []
def api_call(url, retry_duration):
    global res

    # with lock: # lock.acquire() -> lock.release()
        # hit the api
    global_res.append({"res": url+str(retry_duration)})
    
    return {"res": url+str(retry_duration)}

In [7]:

process1 = multiprocessing.Process(target=api_call, args=("https://google.com", 10))
process2 = multiprocessing.Process(target=api_call, args=("https://facebook.com", 5))
process3 = multiprocessing.Process(target=api_call, args=("https://reddit.com", 5))
process4 = multiprocessing.Process(target=api_call, args=("https://instagram.com", 15))

process1.start()
process2.start()
process3.start()
process4.start()

process1.join()
process2.join()
process3.join()
process4.join()

print(global_res)

[]


#### MultiProcessing Manager / ProcessPool


- **IMP**: All code that runs in MultiProcessing mode, needs to be under `if __name__ == '__main__':`
- You can import files, but eventually its parent must be under a main
- https://realpython.com/if-name-main-python/

The if __name__ == '__main__': guard is required in Python when using multiprocessing (or ProcessPoolExecutor, which is based on multiprocessing). It ensures that the code that creates and starts processes runs only in the main process, preventing unintended recursive execution when a new process is spawned

**Limitations of Jupyter for multiprocessing**
- Jupyter Notebooks do not handle multiprocessing well because of how they manage process spawning.
- Use if `__name__ == "__main__":` in standalone Python scripts to avoid recursive execution when working with multiprocessing.

In [8]:
def api_call(url, retry_duration):
    return {"res": url+str(retry_duration)}

if __name__ == "__main__":
    urls = [
        "https://google.com",
        "https://facebook.com",
        "https://reddit.com",
        "https://instagram.com",
    ]

    with ProcessPoolExecutor(max_workers=3) as executor:
        # Submit tasks to the process pool
        futures = [executor.submit(api_call, url, 10) for url in urls]

        # Collect and print results as tasks complete
        results = []
        for future in as_completed(futures):
            try:
                results.append(future.result())
            except Exception as e:
                print(f"Task failed with exception: {e}")

    print("Results:", results)
        


Task failed with exception: A process in the process pool was terminated abruptly while the future was running or pending.
Task failed with exception: A process in the process pool was terminated abruptly while the future was running or pending.
Task failed with exception: A process in the process pool was terminated abruptly while the future was running or pending.
Task failed with exception: A process in the process pool was terminated abruptly while the future was running or pending.
Results: []


#### Using `multiprocessing.manager`

- https://medium.com/@amitkumaryadav27/multiprocessing-and-multiprocessing-manager-to-share-an-object-with-processes-in-python-946b88552b84
- https://superfastpython.com/multiprocessing-manager-example/

In [9]:
from concurrent.futures import ProcessPoolExecutor, as_completed
from multiprocessing import Manager

def api_call(url, retry_duration):
    # Simulate hitting an API and returning a result
    return {"res": url + str(retry_duration)}

if __name__ == "__main__":
    with Manager() as manager:  # Manager for shared data structures
        global_res = manager.list()  # Shared list for results

        # Function to update the shared list
        def api_call_and_store(url, retry_duration):
            result = api_call(url, retry_duration)
            global_res.append(result)
            return result

        urls = ["https://google.com", "https://facebook.com", "https://reddit.com", "https://instagram.com"]

        with ProcessPoolExecutor(3) as executor:
            # Submit tasks to process pool
            futures = [executor.submit(api_call_and_store, url, 10) for url in urls]
            
            # Collect results
            results = [future.result() for future in as_completed(futures)]

        # Print results
        print("Results (from return values):", results)
        print("Global Results (shared list):", list(global_res))  # Convert shared list to regular list for display


BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

### Python Singleton

In [45]:
import random

In [73]:
class DatabaseConnection:
    _conn = None

    @staticmethod
    def get_connection():
        if DatabaseConnection._conn is None:
            DatabaseConnection._conn = f"SOME CONNECTION - {random.randint(1,10)}"
        return DatabaseConnection._conn


In [74]:
for _ in range(5):
    print(DatabaseConnection.get_connection())

SOME CONNECTION - 4
SOME CONNECTION - 4
SOME CONNECTION - 4
SOME CONNECTION - 4
SOME CONNECTION - 4


In [7]:
class RunNumbers:
    def __init__(self, n=10):
        self.counter = 0
        self.n = n
    
    def print_nums(self):
        with self.lock:
            print(f"Printing Number {self.counter} by thread {threading.current_thread().name}")
            self.counter += 1
    
    def run(self):
        self.lock = threading.Lock();
        t1 = threading.Thread(target=self.print_nums, name="THREAD 3")
        t2 = threading.Thread(target=self.print_nums, name="THREAD 2")
        t3 = threading.Thread(target=self.print_nums, name="THREAD 1")

        t1.start()
        t2.start()
        t3.start()

        t1.join()
        t2.join()
        t3.join()


In [8]:
run = RunNumbers()
run.run()

Printing Number 0 by thread THREAD 3
Printing Number 1 by thread THREAD 2
Printing Number 2 by thread THREAD 1


In [9]:

class NumberPrinter:
    def __init__(self, n, num_threads):
        self.n = n
        self.num_threads = num_threads
        self.lock = threading.Lock()
        self.condition = threading.Condition(self.lock)
        self.current = 1
        self.turn = 0  # Keeps track of which thread's turn it is

    def print_number(self, thread_id):
        while True:
            with self.condition:
                while self.current <= self.n and self.turn != thread_id:
                    self.condition.wait()
                
                if self.current > self.n:
                    break
                
                print(f"Thread-{thread_id}: {self.current}")
                self.current += 1
                self.turn = (self.turn + 1) % self.num_threads
                self.condition.notify_all()


In [11]:
n = 15
num_threads = 3
printer = NumberPrinter(n, num_threads)

threads = []
for i in range(num_threads):
    thread = threading.Thread(target=printer.print_number, args=(i,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

Thread-0: 1
Thread-1: 2
Thread-2: 3
Thread-0: 4
Thread-1: 5
Thread-2: 6
Thread-0: 7
Thread-1: 8
Thread-2: 9
Thread-0: 10
Thread-1: 11
Thread-2: 12
Thread-0: 13
Thread-1: 14
Thread-2: 15
