# Introduction
Multiprocessing in Python allows running multiple processes simultaneously, enabling real parallelism on multi-core CPUs. Unlike multithreading, which runs multiple threads within the same process and suffers from Python’s GIL (Global Interpreter Lock), multiprocessing involves completely independent processes that can run on separate CPU cores, improving performance for CPU-bound tasks.

# Why Multiprocessing?
- Multithreading is lightweight and fast to spawn but limited by GIL in Python, which prevents true parallel execution of threads for CPU-intensive tasks.

- Multiprocessing overcomes GIL by creating separate processes with their own Python interpreter and memory space.

- Efficient for heavy CPU-bound work or when you want true parallelism.

- Can be used for I/O-bound tasks where parallel downloads or tasks run simultaneously.

# Key Concepts

### Processes vs Threads

In [1]:
import pandas as pd
pd.set_option('display.max_colwidth', None)
df = pd.read_csv('csv_files/Aspect-Threads-Processes.csv')
df

Unnamed: 0,Aspect,Threads,Processes
0,Memory,Shared within a process,Separate memory space
1,Creation Speed,Faster (lighter),Slower (heavier)
2,Parallelism,Limited by GIL in Python,True parallelism on multiple CPUs
3,Use Case,"I/O-bound, lightweight tasks","CPU-bound, heavy computation"


- Spawning a process = “spawning” (starting/creating) it to run independently.

# Python’s Multiprocessing Module
- Provides a way to create and manage processes similar to threads.

- API is quite similar and easy to learn if you know threading.

- Key methods:

    - multiprocessing.Process(target=..., args=...) - create a process object

    - .start() - start the process

    - .join() - wait for process to finish

# Detailed Example: Parallel Downloading of Images
### Step 1: Import and Setup

In [2]:
import multiprocessing
import requests
import os

### Step 2: Define a download function
- Downloads an image file from a URL and saves it with a given name.

In [3]:
def download_file(url, name):
    print(f"Started downloading {name}")
    response = requests.get(url)
    with open(name, 'wb') as f:
        f.write(response.content)
    print(f"Finished downloading {name}")

### Step 3: Create a list of image URLs
- Use a base URL from an image gallery that serves random images.

- Use list comprehension to generate different URLs by varying parameters (e.g., IDs or image size).

In [4]:
urls = [f"https://picsum.photos/2000/3000?random={i}" for i in range(10)]

### Step 4: Sequential Download (for comparison)

In [6]:
import time
start = time.time()
for i, url in enumerate(urls):
    download_file(url, f"file_{i}.jpeg")
print('total_time: ', time.time() - start)

Started downloading file_0.jpeg
Finished downloading file_0.jpeg
Started downloading file_1.jpeg
Finished downloading file_1.jpeg
Started downloading file_2.jpeg
Finished downloading file_2.jpeg
Started downloading file_3.jpeg
Finished downloading file_3.jpeg
Started downloading file_4.jpeg
Finished downloading file_4.jpeg
Started downloading file_5.jpeg
Finished downloading file_5.jpeg
Started downloading file_6.jpeg
Finished downloading file_6.jpeg
Started downloading file_7.jpeg
Finished downloading file_7.jpeg
Started downloading file_8.jpeg
Finished downloading file_8.jpeg
Started downloading file_9.jpeg
Finished downloading file_9.jpeg
total_time:  23.999173164367676


- Downloads files one-by-one, which is slower, especially for large images.

### Step 5: Multiprocessing to Download in Parallel

In [7]:
if __name__ == "__main__":
    start = time.time()
    processes = []
    for i, url in enumerate(urls):
        p = multiprocessing.Process(target=download_file, args=(url, f"file_{i}.jpeg"))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()
    print('total_time: ', time.time() - start)  

Started downloading file_0.jpegStarted downloading file_1.jpeg

Started downloading file_2.jpeg
Started downloading file_3.jpeg
Started downloading file_4.jpeg
Started downloading file_5.jpeg
Started downloading file_6.jpeg
Started downloading file_7.jpeg
Started downloading file_8.jpeg

Started downloading file_9.jpegFinished downloading file_3.jpeg
Finished downloading file_6.jpeg
Finished downloading file_7.jpeg
Finished downloading file_5.jpeg
Finished downloading file_0.jpeg
Finished downloading file_1.jpeg
Finished downloading file_9.jpeg
Finished downloading file_2.jpeg
Finished downloading file_4.jpeg
Finished downloading file_8.jpeg
total_time:  8.253820419311523


- Processes are started almost simultaneously.

- Each process downloads a file independently.

- .join() ensures the main program waits until all downloads complete.

- Result: Download speed significantly improves due to parallelism.

# Using concurrent.futures with Multiprocessing
- A higher-level abstraction to ease management of pools of workers.

- Use ProcessPoolExecutor for multiprocessing (similar to ThreadPoolExecutor for multithreading).

### Example Using concurrent.futures:

In [8]:
from concurrent.futures import ProcessPoolExecutor

def download_file_wrapper(args):
    url, name = args
    download_file(url, name)

if __name__ == "__main__":
    start = time.time()
    urls = [f"https://picsum.photos/2000/3000?random={i}" for i in range(60)]
    names = [f"file_{i}.jpeg" for i in range(60)]

    # Pair URLs with file names for mapping
    args = list(zip(urls, names))

    with ProcessPoolExecutor() as executor:
        results = executor.map(download_file_wrapper, args)

    print('total time', time.time()-start)

    # The map function runs download_file_wrapper for each tuple in parallel


Started downloading file_0.jpegStarted downloading file_2.jpegStarted downloading file_1.jpeg


Started downloading file_3.jpeg
Finished downloading file_3.jpeg
Started downloading file_4.jpeg
Finished downloading file_2.jpeg
Started downloading file_5.jpeg
Finished downloading file_5.jpeg
Started downloading file_6.jpeg
Finished downloading file_1.jpeg
Started downloading file_7.jpeg
Finished downloading file_0.jpeg
Started downloading file_8.jpeg
Finished downloading file_7.jpeg
Started downloading file_9.jpeg
Finished downloading file_4.jpeg
Started downloading file_10.jpeg
Finished downloading file_10.jpeg
Started downloading file_11.jpeg
Finished downloading file_6.jpeg
Started downloading file_12.jpeg
Finished downloading file_9.jpeg
Started downloading file_13.jpeg
Finished downloading file_8.jpeg
Started downloading file_14.jpeg
Finished downloading file_14.jpeg
Started downloading file_15.jpeg
Finished downloading file_11.jpeg
Started downloading file_16.jpeg
Finished download

### Benefits
- Simplifies creating process pools, managing lifecycle, and retrieval of results.

- Automatically manages worker processes, making code cleaner and easier to maintain.

### Important Tips and Best Practices
- Always protect your multiprocessing code entry point with

In [9]:
if __name__ == "__main__":
    # multiprocessing code here
    pass

This prevents unwanted recursive process spawning when using multiprocessing on Windows or some IDEs.

- Avoid creating an excessive number of processes (e.g., thousands), which may hang or crash your system.
Keep the processes count reasonable, often near the number of CPU cores.

- Multiprocessing is effective when task is CPU-bound or truly parallel. For I/O-bound tasks like downloading files, you might also consider async programming or multithreading, but multiprocessing is a simple way to parallelize.

# Summary
- Multiprocessing means running multiple Python processes simultaneously, allowing true parallelism beyond the Python GIL constraint.

- It is useful for CPU-intensive tasks or parallel workloads like downloading many files.

- The multiprocessing module makes it easy to create and control processes just like threads.

- Using concurrent.futures.ProcessPoolExecutor provides a convenient interface for multiprocessing pools.

- Always use if \_\_name__ == "\_\_main__" guard with multiprocessing scripts.

- Avoid spawning too many processes to prevent system overload.