# ⚙️ Mastering Multiprocessing in Python: True Parallelism for CPU-Bound Tasks

**Welcome!** This notebook explores Python's `multiprocessing` module, the standard library's solution for achieving true parallelism by creating and managing multiple independent processes. Unlike threading (in CPython), multiprocessing bypasses the Global Interpreter Lock (GIL), making it ideal for CPU-bound tasks that can benefit from multiple cores.

**Target Audience:** Python developers needing to accelerate CPU-intensive computations, process large datasets in parallel, or fully utilize multi-core processors.

**Learning Objectives:**
*   Understand the process model and its difference from threading (separate memory).
*   Create, start, and manage `Process` objects.
*   Learn various Inter-Process Communication (IPC) mechanisms: `Queue`, `Pipe`, `Value`, `Array`, `Manager`.
*   Use synchronization primitives (`Lock`, `Event`, `Semaphore`, etc.) between processes.
*   Effectively manage pools of worker processes using `multiprocessing.Pool`.
*   Utilize the high-level `concurrent.futures.ProcessPoolExecutor`.
*   Understand different process start methods (`fork`, `spawn`, `forkserver`).
*   Identify best practices, pitfalls (IPC overhead, serialization), and performance considerations.

## 1. Introduction: Why Multiprocessing?

As established previously, CPython's Global Interpreter Lock (GIL) prevents threads within the same process from executing Python bytecode simultaneously on multiple CPU cores. This makes standard threading unsuitable for speeding up CPU-bound tasks.

**Multiprocessing provides a solution:** It creates *separate processes*, each with its own Python interpreter and memory space. Since each process has its own GIL, multiple processes can run Python code **in parallel** on different CPU cores.

**Key Use Case: CPU-Bound Tasks**
The primary benefit of multiprocessing is accelerating tasks that are limited by CPU speed, such as:
*   Complex mathematical calculations
*   Image/video processing
*   Data analysis and manipulation on large datasets
*   Scientific simulations
*   Machine learning model training (often handled by libraries using multiprocessing internally)

**Analogy: Hiring More Independent Workers**

Instead of having multiple employees (threads) in one office sharing the same (sometimes bottlenecked) resources and rules (GIL), multiprocessing is like setting up completely separate, independent workshops (processes). Each workshop has its own tools and interpreter (memory and GIL) and can work fully in parallel on its assigned task. However, communication and sharing materials between workshops (IPC) requires more effort (sending messages, using shared storage) than workers talking in the same room.

## 2. Creating and Managing Processes (`multiprocessing.Process`)

The interface is very similar to `threading.Thread`.

**Steps:**
1.  Define a target function for the process.
2.  Create a `Process` instance, specifying `target` and `args`/`kwargs`.
3.  Call `start()` to spawn the new process.
4.  Call `join()` to wait for the process to finish.

**Important:** Due to how processes are often created (especially `spawn` on Windows/macOS), the main part of your script that creates processes **must** be protected by `if __name__ == '__main__':`. This prevents infinite process creation when the child process re-imports the main script.

In [1]:
from multiprocessing import Process, current_process
import os
import time
import logging

# Basic logging config (might show duplicate messages from child processes
# depending on start method and OS, more robust logging needed for prod)
logging.basicConfig(level=logging.INFO, 
                    format='[%(levelname)s] (%(processName)s - PID %(process)d) %(message)s',
                    force=True)

def cpu_bound_task(task_id: int, count_to: int):
    """Simulates a CPU-intensive task."""
    process_name = current_process().name
    pid = os.getpid()
    logging.info(f"Starting task {task_id}...")
    result = 0
    for i in range(count_to):
        result += i * i # Some calculation
    logging.info(f"Finished task {task_id}. Result sum part: {result % 1000}") # Print part of result

# --- Main Guard --- 
if __name__ == "__main__":
    logging.info("Main process starting.")
    
    start_time = time.perf_counter()
    
    processes: list[Process] = []
    num_processes = min(os.cpu_count() or 1, 4) # Use up to 4 cores for demo
    count_target = 10_000_000 # Number large enough to take some time
    
    logging.info(f"Creating {num_processes} processes...")
    for i in range(num_processes):
        # Create process
        # daemon=True: process exits if parent exits (use with caution)
        process = Process(target=cpu_bound_task, args=(i + 1, count_target), 
                          name=f"WorkerProc-{i+1}")
        processes.append(process)
        # Start the process
        process.start()
        logging.info(f"Started process {process.name} (PID: {process.pid})")
        
    # Wait for all processes to complete
    logging.info("Waiting for processes to join...")
    for process in processes:
        process.join()
        logging.info(f"Process {process.name} finished with exit code {process.exitcode}")

    end_time = time.perf_counter()
    logging.info("All processes finished.")
    print(f"\nTotal execution time (multiprocessing): {end_time - start_time:.2f} seconds")
    
    # Compare with sequential execution (conceptual)
    # start_seq = time.perf_counter()
    # cpu_bound_task(0, count_target * num_processes) # Roughly equivalent work
    # end_seq = time.perf_counter()
    # print(f"Estimated sequential time: {end_seq - start_seq:.2f} seconds")

[INFO] (MainProcess - PID 10985) Main process starting.
[INFO] (MainProcess - PID 10985) Creating 4 processes...
[INFO] (MainProcess - PID 10985) Started process WorkerProc-1 (PID: 10997)
[INFO] (WorkerProc-1 - PID 10997) Starting task 1...
[INFO] (MainProcess - PID 10985) Started process WorkerProc-2 (PID: 11000)
[INFO] (WorkerProc-2 - PID 11000) Starting task 2...
[INFO] (MainProcess - PID 10985) Started process WorkerProc-3 (PID: 11003)
[INFO] (WorkerProc-3 - PID 11003) Starting task 3...
[INFO] (MainProcess - PID 10985) Started process WorkerProc-4 (PID: 11006)
[INFO] (MainProcess - PID 10985) Waiting for processes to join...
[INFO] (WorkerProc-4 - PID 11006) Starting task 4...
[INFO] (WorkerProc-4 - PID 11006) Finished task 4. Result sum part: 0
[INFO] (WorkerProc-3 - PID 11003) Finished task 3. Result sum part: 0
[INFO] (WorkerProc-1 - PID 10997) Finished task 1. Result sum part: 0
[INFO] (MainProcess - PID 10985) Process WorkerProc-1 finished with exit code 0
[INFO] (WorkerProc-


Total execution time (multiprocessing): 2.56 seconds


## 3. Inter-Process Communication (IPC)

Since processes have separate memory spaces, they cannot directly access each other's variables. The `multiprocessing` module provides several ways to communicate and share data:

### 3.1 `multiprocessing.Queue`
*   A **process-safe** queue, similar interface to `queue.Queue`.
*   Data put onto the queue is pickled, transferred between processes (via OS mechanisms like pipes), and unpickled.
*   Good for passing moderately sized, pickleable objects between processes (task distribution, result collection).
*   **Note:** Can be slower than `threading`'s `queue.Queue` due to serialization overhead.

In [2]:
from multiprocessing import Process, Queue, current_process
import time
import os

# Note: Functions run by processes must be defined at the top level
# or be importable. Lambdas often don't work well.
def worker_puts_squares(numbers: list, q: Queue):
    """Calculates squares and puts them onto the queue."""
    pid = os.getpid()
    proc_name = current_process().name
    for n in numbers:
        result = n * n
        q.put((pid, proc_name, n, result))
        time.sleep(0.01) # Simulate work
    print(f"({proc_name}) Finished putting squares.")

if __name__ == "__main__":
    print("\n--- Multiprocessing Queue Demo ---")
    
    task_queue = Queue() # Process-safe queue
    data = list(range(10))
    
    # Create and start a process to put data onto the queue
    producer_process = Process(target=worker_puts_squares, args=(data, task_queue),
                               name="ProducerProc")
    producer_process.start()
    
    # Main process gets data from the queue
    # Wait for the producer to finish putting items before checking size
    producer_process.join()
    print("Producer process finished.")
    
    print(f"Approximate queue size: {task_queue.qsize()}")
    
    print("Retrieving results from queue:")
    while not task_queue.empty():
        try:
            pid, proc_name, original, square = task_queue.get(timeout=1)
            print(f"  Got: (Original={original}, Square={square}) from {proc_name} (PID {pid})")
        except queue.Empty:
            print("Queue became empty while getting.")
            break
        except Exception as e:
            print(f"Error getting from queue: {e}")
            break
            
    print("Queue demo finished.")


--- Multiprocessing Queue Demo ---
(ProducerProc) Finished putting squares.
Producer process finished.
Approximate queue size: 10
Retrieving results from queue:
  Got: (Original=0, Square=0) from ProducerProc (PID 11024)
  Got: (Original=1, Square=1) from ProducerProc (PID 11024)
  Got: (Original=2, Square=4) from ProducerProc (PID 11024)
  Got: (Original=3, Square=9) from ProducerProc (PID 11024)
  Got: (Original=4, Square=16) from ProducerProc (PID 11024)
  Got: (Original=5, Square=25) from ProducerProc (PID 11024)
  Got: (Original=6, Square=36) from ProducerProc (PID 11024)
  Got: (Original=7, Square=49) from ProducerProc (PID 11024)
  Got: (Original=8, Square=64) from ProducerProc (PID 11024)
  Got: (Original=9, Square=81) from ProducerProc (PID 11024)
Queue demo finished.


### 3.2 `multiprocessing.Pipe`
*   Returns a pair of `Connection` objects representing the two ends of a pipe.
*   Each connection object has `send()` and `recv()` methods.
*   Primarily for two-way communication between **two** specific processes.
*   Data is pickled/unpickled.

In [3]:
from multiprocessing import Process, Pipe
from multiprocessing.connection import Connection
import time

def sender(conn: Connection):
    """Sends messages through one end of the pipe."""
    print("Sender: Sending messages...")
    conn.send("Hello from sender!")
    time.sleep(0.5)
    conn.send([1, 2, {'data': 'payload'}])
    time.sleep(0.5)
    conn.send(None) # Signal end
    conn.close()
    print("Sender: Done and closed.")

def receiver(conn: Connection):
    """Receives messages from the other end of the pipe."""
    print("Receiver: Waiting for messages...")
    while True:
        try:
            msg = conn.recv() # Blocks until message received
            if msg is None: # Check for end signal
                print("Receiver: Received None, stopping.")
                break
            print(f"Receiver: Received --> {msg}")
        except EOFError: # Raised if sender closes connection unexpectedly
            print("Receiver: Connection closed by sender.")
            break
    conn.close()
    print("Receiver: Closed.")

if __name__ == "__main__":
    print("\n--- Multiprocessing Pipe Demo ---")
    
    # Create the pipe (returns two connection objects)
    parent_conn, child_conn = Pipe()
    
    # Create processes, passing one end of the pipe to each
    p_sender = Process(target=sender, args=(child_conn,), name="SenderProc")
    p_receiver = Process(target=receiver, args=(parent_conn,), name="ReceiverProc")
    
    p_receiver.start()
    p_sender.start()
    
    p_sender.join()
    p_receiver.join()
    
    # Close connections in the main process too (though child processes closed theirs)
    parent_conn.close()
    child_conn.close()
    
    print("Pipe demo finished.")


--- Multiprocessing Pipe Demo ---
Receiver: Waiting for messages...
Sender: Sending messages...
Receiver: Received --> Hello from sender!
Receiver: Received --> [1, 2, {'data': 'payload'}]
Sender: Done and closed.Receiver: Received None, stopping.

Receiver: Closed.
Pipe demo finished.


### 3.3 Shared Memory (`Value`, `Array`)
*   Allows processes to share data more directly using shared memory blocks managed by the OS.
*   Requires specifying a data type (`ctypes`) for the shared object.
*   Faster than Queue/Pipe for simple data types as it avoids pickling overhead.
*   **Requires explicit locking (`multiprocessing.Lock`)** to prevent race conditions when multiple processes modify the shared memory concurrently.

In [4]:
from multiprocessing import Process, Value, Array, Lock
import time

# Type codes: 'i' for signed int, 'd' for double-precision float, 'c' for char
# See ctypes documentation for more types

def modify_shared(shared_val: Value, shared_arr: Array, lock: Lock):
    """Modifies shared memory objects using a lock."""
    for _ in range(500): # Fewer iterations for demo speed
        with lock: # Acquire lock
            shared_val.value += 1
            for i in range(len(shared_arr)):
                shared_arr[i] += 0.5
        # Lock automatically released
        time.sleep(0.001) # Tiny sleep to increase chance of context switch

if __name__ == "__main__":
    print("\n--- Multiprocessing Shared Memory Demo ---")
    
    # Create shared objects
    shared_int = Value('i', 0) # Shared signed integer, initial value 0
    shared_double_array = Array('d', [10.0, 20.0, 30.0]) # Shared double array
    
    # Create a lock for synchronization
    lock = Lock()
    
    print(f"Initial Value: {shared_int.value}")
    print(f"Initial Array: {list(shared_double_array)}")

    p1 = Process(target=modify_shared, args=(shared_int, shared_double_array, lock))
    p2 = Process(target=modify_shared, args=(shared_int, shared_double_array, lock))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    # Value should be 2 * 500 = 1000
    # Each array element should be initial + 2 * 500 * 0.5 = initial + 500
    print(f"Final Value: {shared_int.value}") 
    print(f"Final Array: {list(shared_double_array)}")
    
    print("Shared memory demo finished.")


--- Multiprocessing Shared Memory Demo ---
Initial Value: 0
Initial Array: [10.0, 20.0, 30.0]
Final Value: 1000
Final Array: [510.0, 520.0, 530.0]
Shared memory demo finished.


### 3.4 Server Process (`multiprocessing.Manager`)
*   Starts a separate manager process.
*   This manager process can host shared Python objects like lists, dicts, queues, etc. (`manager.list()`, `manager.dict()`).
*   Other processes communicate with the manager process via proxies to access and modify these shared objects.
*   More flexible than `Value`/`Array` (supports complex Python objects), but slower due to communication overhead with the manager process.
*   Handles synchronization for standard methods on managed objects (e.g., `managed_list.append()` is process-safe).

In [5]:
from multiprocessing import Process, Manager
import time
import os

def worker_modifies_managed(managed_dict, managed_list, worker_id):
    """Modifies objects hosted by the Manager process."""
    pid = os.getpid()
    print(f"Worker {worker_id} (PID {pid}) starting.")
    
    # Modifications via proxy objects are sent to the manager process
    managed_dict[worker_id] = pid
    managed_list.append(f"Data from {worker_id}")
    
    # Reading is also via proxy
    print(f"Worker {worker_id}: Current dict view = {dict(managed_dict)}")
    time.sleep(0.5)
    print(f"Worker {worker_id} finished.")

if __name__ == "__main__":
    print("\n--- Multiprocessing Manager Demo ---")
    
    # Create a Manager object (starts a server process)
    with Manager() as manager:
        print("Manager process started.")
        
        # Create managed shared objects
        shared_dict = manager.dict() 
        shared_list = manager.list()
        
        processes = []
        for i in range(3):
            p = Process(target=worker_modifies_managed, 
                        args=(shared_dict, shared_list, i+1))
            processes.append(p)
            p.start()
            
        # Wait for worker processes to finish
        for p in processes:
            p.join()
            
        # Access the final state of the managed objects (from main process)
        print("\n--- Final Managed Objects State ---")
        # Convert proxies to regular types for printing if needed
        print(f"Final Dict: {dict(shared_dict)}")
        print(f"Final List: {list(shared_list)}")
        
    # Manager process is automatically shut down when exiting the 'with' block
    print("Manager demo finished.")


--- Multiprocessing Manager Demo ---
Manager process started.
Worker 1 (PID 11073) starting.
Worker 2 (PID 11076) starting.
Worker 1: Current dict view = {1: 11073}
Worker 2: Current dict view = {1: 11073, 2: 11076}
Worker 3 (PID 11083) starting.
Worker 3: Current dict view = {1: 11073, 2: 11076, 3: 11083}
Worker 1 finished.
Worker 2 finished.
Worker 3 finished.

--- Final Managed Objects State ---
Final Dict: {1: 11073, 2: 11076, 3: 11083}
Final List: ['Data from 1', 'Data from 2', 'Data from 3']
Manager demo finished.


## 4. Synchronization Between Processes

Similar to `threading`, the `multiprocessing` module provides synchronization primitives like `Lock`, `RLock`, `Semaphore`, `Event`, `Condition`. Their API is nearly identical, but they operate across process boundaries.

As shown in the `Value`/`Array` example (Section 3.3), these are crucial when using shared memory (`Value`, `Array`) to prevent race conditions.

## 5. Process Pools (`multiprocessing.Pool`)

Manages a pool of worker processes. Useful for distributing tasks across available CPU cores without manually managing individual `Process` objects.

**Key Methods:**
*   `pool.map(func, iterable, chunksize=None)`: Applies `func` to each item in `iterable`. Chops iterable into chunks. Blocks until all results are ready. Returns results in order.
*   `pool.imap(func, iterable, chunksize=1)`: Like `map`, but returns an iterator yielding results as soon as they are ready. More memory efficient for large result sets.
*   `pool.imap_unordered(func, iterable, chunksize=1)`: Like `imap`, but results are yielded as soon as they complete, regardless of input order.
*   `pool.apply(func, args=(), kwds={})`: Executes `func(*args, **kwds)` in ONE worker process. Blocks until complete. Useful for single tasks.
*   `pool.apply_async(func, args=(), kwds={}, callback=None, error_callback=None)`: Asynchronous version of `apply`. Returns an `AsyncResult` object immediately. `callback(result)` is called on success, `error_callback(exception)` on error.
*   `pool.map_async(...)`, `pool.starmap_async(...)`: Asynchronous versions of map/starmap.
*   `pool.close()`: Prevents new tasks from being submitted.
*   `pool.join()`: Waits for worker processes to exit (must call `close()` or `terminate()` first).

**Important:** Objects passed to/returned from pool workers must be pickleable.

In [6]:
from multiprocessing import Pool
import time
import os

def calculate_square(n):
    # Simulate CPU work
    # time.sleep(0.01)
    pid = os.getpid()
    # print(f"PID {pid} calculating square of {n}") # Can be noisy
    return n * n

if __name__ == "__main__":
    print("\n--- Multiprocessing Pool Demo ---")
    numbers_to_square = list(range(20)) # Example data
    
    # Determine pool size (often based on CPU count)
    pool_size = os.cpu_count()
    print(f"Creating Pool with {pool_size} workers.")
    
    start_pool = time.perf_counter()
    
    # Use Pool as context manager (automatically calls close/join)
    with Pool(processes=pool_size) as pool:
        print("Using pool.map()...")
        # map() distributes work and collects results in order
        results_map = pool.map(calculate_square, numbers_to_square, chunksize=5)
        print(f"  Results (map): {results_map}")
        
        print("\nUsing pool.imap_unordered()...")
        # imap_unordered yields results as they finish (order not guaranteed)
        results_imap = []
        for result in pool.imap_unordered(calculate_square, numbers_to_square, chunksize=5):
            # print(f"  Got result (imap): {result}") # Can be noisy
            results_imap.append(result)
        print(f"  Results (imap_unordered, sorted): {sorted(results_imap)}") # Sort for comparison
        
        # Example with apply_async (for individual tasks)
        print("\nUsing pool.apply_async()...")
        async_results = []
        for num in [100, 200, 300]:
             res_obj = pool.apply_async(calculate_square, args=(num,))
             async_results.append(res_obj)
             
        # Retrieve results from AsyncResult objects
        final_async_results = [ar.get() for ar in async_results] 
        print(f"  Results (apply_async): {final_async_results}")
        
    # Pool is automatically closed and joined here
    
    end_pool = time.perf_counter()
    print(f"\nPool execution finished in {end_pool - start_pool:.2f} seconds")


--- Multiprocessing Pool Demo ---
Creating Pool with 16 workers.
Using pool.map()...
  Results (map): [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

Using pool.imap_unordered()...
  Results (imap_unordered, sorted): [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]

Using pool.apply_async()...
  Results (apply_async): [10000, 40000, 90000]

Pool execution finished in 0.17 seconds


## 6. Modern Approach: `concurrent.futures.ProcessPoolExecutor`

Provides a high-level interface similar to `ThreadPoolExecutor`, but uses processes instead of threads. Often preferred for its simpler API compared to manually managing `Pool` or `Process` objects.

**Key Methods:**
*   `executor.submit(fn, *args, **kwargs)`: Submits a callable to be executed. Returns a `Future` object.
*   `executor.map(func, *iterables, timeout=None, chunksize=1)`: Similar to `Pool.map`, applies `func` to items from iterables. Returns an iterator yielding results.
*   `executor.shutdown(wait=True, *, cancel_futures=False)`: Signals the executor to stop accepting new tasks and shuts down (waits for pending tasks if `wait=True`). Automatically called when using a context manager.

In [7]:
from concurrent.futures import ProcessPoolExecutor
import time
import os

# Function needs to be defined at top level or importable for ProcessPoolExecutor
def calculate_cube(n):
    pid = os.getpid()
    # print(f"PID {pid} calculating cube of {n}")
    # time.sleep(0.01) # Simulate work
    return n * n * n

if __name__ == "__main__": # Essential for ProcessPoolExecutor too
    print("\n--- ProcessPoolExecutor Demonstration ---")
    numbers_to_cube = list(range(15))
    
    start_ppe = time.perf_counter()
    
    # Use context manager for automatic shutdown
    # max_workers defaults to os.cpu_count()
    with ProcessPoolExecutor(max_workers=os.cpu_count()) as executor:
        print("Submitting tasks using executor.map...")
        # map returns results in order, waits for all
        results = list(executor.map(calculate_cube, numbers_to_cube, chunksize=4))
        print(f"  Results (map): {results}")
        
        # Example using submit
        # print("\nSubmitting tasks using executor.submit...")
        # futures = [executor.submit(calculate_cube, num) for num in [100, 200, 300]]
        # submit_results = [f.result() for f in futures]
        # print(f"  Results (submit): {submit_results}")
        
    # Executor automatically shut down here
    
    end_ppe = time.perf_counter()
    print(f"\nProcessPoolExecutor finished in {end_ppe - start_ppe:.2f} seconds")


--- ProcessPoolExecutor Demonstration ---
Submitting tasks using executor.map...
  Results (map): [0, 1, 8, 27, 64, 125, 216, 343, 512, 729, 1000, 1331, 1728, 2197, 2744]

ProcessPoolExecutor finished in 0.15 seconds


## 7. Process Start Methods

`multiprocessing` can use different methods to start child processes, configurable via `multiprocessing.set_start_method()` (must be called only once, ideally within the `if __name__ == '__main__':` block before creating processes/pools).

*   **`fork` (Default on Unix/Linux):** Creates a child process by duplicating the parent process's memory space. 
    *   **Pros:** Very fast startup.
    *   **Cons:** Can be problematic with threaded applications (only the thread calling `fork` exists in the child), issues with shared resources like file descriptors or locks acquired before forking, potential for copy-on-write performance hits.
*   **`spawn` (Default on Windows/macOS, available on Unix):** Starts a fresh Python interpreter process. The child process only inherits necessary resources to run the target function, plus specified arguments.
    *   **Pros:** Cleaner separation, avoids issues related to forking threaded applications.
    *   **Cons:** Slower startup as a new interpreter needs to start and the necessary code/data needs to be imported/pickled.
*   **`forkserver` (Available on Unix):** Starts a server process when the first process is created. Subsequent processes are forked from this server process.
    *   **Pros:** Avoids issues with forking from threaded processes, potentially faster than `spawn` for subsequent process creation.
    *   **Cons:** Still slower initial startup than `fork`.

**Recommendation:** While `fork` is the default on Linux, `spawn` or `forkserver` are often considered safer, especially in complex or threaded applications, despite the startup overhead. `spawn` is generally the most portable.

In [8]:
import multiprocessing as mp
import sys

def info_func(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())

if __name__ == "__main__":
    print("--- Process Start Method Demo ---")
    # Get available methods
    print(f"Available start methods: {mp.get_all_start_methods()}")
    # Get current default method
    print(f"Default start method: {mp.get_start_method()}")

    # --- Set start method (Optional - do this BEFORE creating processes/pools) ---
    # Generally only needed if you want to override the default for your OS.
    # try:
    #     if sys.platform != 'win32': # spawn is default on Windows
    #          mp.set_start_method('spawn')
    #          print(f"Start method set to: {mp.get_start_method()}")
    #     else:
    #          print("Cannot set start method easily after context might be initialized on Windows.")
    # except (RuntimeError, ValueError) as e:
    #     # RuntimeError if called more than once or after context initialized
    #     # ValueError if method not available
    #     print(f"Could not set start method: {e}")
    
    print("\nRunning info_func in main process:")
    info_func('Main line')
    
    p = mp.Process(target=info_func, args=('Child process',))
    p.start()
    p.join()
    
    print("Start method demo finished.")

--- Process Start Method Demo ---
Available start methods: ['fork', 'spawn', 'forkserver']
Default start method: fork

Running info_func in main process:
Main line
module name: __main__
parent process: 7099
process id: 10985
Child process
module name: __main__
parent process: 10985
process id: 11167
Start method demo finished.


## 8. Best Practices & Enterprise Considerations

1.  **Use for CPU-Bound Tasks:** Multiprocessing excels where the GIL limits threading.
2.  **Guard Main Module:** Always protect the script entry point with `if __name__ == '__main__':`.
3.  **Prefer High-Level APIs:** Use `concurrent.futures.ProcessPoolExecutor` or `multiprocessing.Pool` over manual `Process` management where possible, as they handle worker lifecycle and task distribution.
4.  **Minimize IPC:** Inter-process communication (Queues, Pipes, Managers) involves overhead (serialization, OS calls). Design to minimize data transfer between processes. Pass simple data or references if possible.
5.  **Pickleable Objects:** Ensure data passed between processes (via Queues, Pipes, Pool arguments/results) is pickleable. Complex objects, closures, or generators might not be.
6.  **Resource Management:** Ensure processes release resources (files, connections) properly, especially if using `daemon=True` or `pool.terminate()`.
7.  **Error Handling:** Exceptions in child processes don't automatically propagate to the parent unless using mechanisms like `Pool` or `Executor` results (`.get()`, iterating `map` results).
8.  **Synchronization:** Use `multiprocessing` locks/events when using shared memory (`Value`, `Array`) or coordinating processes.
9.  **Consider Start Methods:** Be aware of the implications of `fork` vs. `spawn` vs. `forkserver`, especially regarding resource inheritance and compatibility with other libraries (like some GUI toolkits or network libraries).
10. **Logging:** Configuring logging across multiple processes requires care (e.g., using separate files per process, a queue handler to send logs to a central logging process, or process-aware formatters).

## 9. Pitfalls and Common Interview Questions

**Common Pitfalls:**

*   **Forgetting `if __name__ == '__main__':`:** Leading to recursive process creation or errors, especially on Windows/macOS with `spawn`.
*   **Serialization (Pickling) Errors:** Trying to pass unpickleable objects (lambdas, complex local objects, generators) between processes.
*   **IPC Overhead:** Excessive communication between processes becoming a bottleneck.
*   **Race Conditions with Shared Memory:** Using `Value` or `Array` without proper locking.
*   **Deadlocks:** Processes waiting for resources/locks held by each other.
*   **Resource Leaks:** Child processes not cleaning up resources properly.
*   **Zombie Processes:** Parent process not `join()`ing child processes, leaving them in the system process table (less common with high-level APIs).
*   **Complexity:** Managing IPC and synchronization can be significantly more complex than in threading.

**Common Interview Questions:**

1.  Why would you use multiprocessing instead of threading in Python?
2.  How does multiprocessing bypass the GIL?
3.  What does the `if __name__ == '__main__':` guard do, and why is it essential for multiprocessing?
4.  How can processes share data? Describe at least two IPC mechanisms (`Queue`, `Pipe`, `Value`, `Array`, `Manager`).
5.  What are the trade-offs between different IPC methods (e.g., Queue vs. Shared Memory)?
6.  What is `multiprocessing.Pool` used for?
7.  What is the difference between `pool.map()` and `pool.apply_async()`?
8.  What is `concurrent.futures.ProcessPoolExecutor`?
9.  What are process start methods (`fork`, `spawn`), and what are their implications?
10. Can you pass complex objects like database connections between processes directly? Why or why not? (Usually not - pickling issues, resource handles are process-specific).

## 10. Challenge: Parallel Prime Number Calculation

**Goal:** Find all prime numbers within a given range using multiple processes to speed up the calculation.

**Tasks:**

1.  **Primality Test Function:** Create a reasonably efficient function `is_prime(n: int) -> bool` that checks if a number `n` is prime. (For numbers > 2, check divisibility only by odd numbers up to `sqrt(n)`).
2.  **Define Range and Chunk Size:** Choose a large range (e.g., 1 to 2,000,000) and decide how to split it among worker processes (e.g., determine chunk size based on number of CPU cores).
3.  **Worker Function:** Create a function `find_primes_in_range(start: int, end: int) -> List[int]` that takes a start and end value and returns a list of all prime numbers found within that sub-range `[start, end)` by calling `is_prime()`.
4.  **Parallel Execution:**
    *   Use `concurrent.futures.ProcessPoolExecutor` (recommended) or `multiprocessing.Pool`.
    *   Divide the total range into chunks.
    *   Submit the `find_primes_in_range` function to the executor/pool for each chunk.
5.  **Collect Results:** Gather the lists of primes returned from each worker process.
6.  **Combine and Sort:** Combine all the lists into a single list of primes found in the total range. Sort the final list.
7.  **Output:** Print the total number of primes found and maybe the first/last few primes.
8.  **Timing:** Measure the time taken for the parallel execution.

**(Bonus):** Implement the same calculation sequentially and compare the execution time.

In [9]:
# --- Solution Space for Challenge ---
import math
import time
import os
from concurrent.futures import ProcessPoolExecutor, as_completed
from typing import List, Tuple
import logging

logging.basicConfig(level=logging.INFO, 
                    format='[%(levelname)s] (%(processName)s) %(message)s',
                    force=True)

# 1. Primality Test Function
def is_prime(n: int) -> bool:
    """Checks if a number is prime."""
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False
    # Check odd divisors up to sqrt(n)
    sqrt_n = int(math.sqrt(n)) + 1
    for i in range(3, sqrt_n, 2):
        if n % i == 0:
            return False
    return True

# 3. Worker Function
def find_primes_in_range(start: int, end: int) -> Tuple[int, int, List[int]]:
    """Finds primes in the range [start, end) and returns (start, end, primes_list)."""
    # logging.info(f"Checking range [{start}, {end})...") # Can be very noisy
    primes = [n for n in range(start, end) if is_prime(n)]
    # logging.info(f"Finished range [{start}, {end}). Found {len(primes)} primes.")
    return start, end, primes

# --- Main Execution Guard ---
if __name__ == "__main__":
    # 2. Define Range and Chunk Size
    MAX_NUMBER = 1_000_000 # Adjust as needed (2M might take a while)
    NUM_PROCESSES = max(1, os.cpu_count() or 1)
    # Calculate chunk size - ensure it divides the range reasonably
    chunk_size = (MAX_NUMBER + NUM_PROCESSES - 1) // NUM_PROCESSES # Ceiling division
    
    logging.info(f"Calculating primes up to {MAX_NUMBER}")
    logging.info(f"Using {NUM_PROCESSES} processes with chunk size ~{chunk_size}")

    # 4. Parallel Execution & 5. Collect Results
    start_time_mp = time.perf_counter()
    all_found_primes = []
    tasks = []

    with ProcessPoolExecutor(max_workers=NUM_PROCESSES) as executor:
        # Create tasks for each chunk
        for i in range(NUM_PROCESSES):
            range_start = i * chunk_size + 1 # Start from 1
            # Ensure range_start doesn't exceed MAX_NUMBER
            range_start = max(1, range_start)
            range_end = min(MAX_NUMBER + 1, range_start + chunk_size)
            
            # Avoid submitting empty ranges if chunking is uneven
            if range_start >= range_end:
                 continue 
                 
            logging.info(f"Submitting task for range [{range_start}, {range_end})")
            future = executor.submit(find_primes_in_range, range_start, range_end)
            tasks.append(future)

        # Collect results as they complete
        logging.info("Waiting for results...")
        for future in as_completed(tasks):
            try:
                s, e, primes_in_chunk = future.result()
                logging.info(f"Received {len(primes_in_chunk)} primes from range [{s}, {e})")
                all_found_primes.extend(primes_in_chunk)
            except Exception as exc:
                 # Log any exception raised by the worker function
                logging.error(f"A task generated an exception: {exc}")
    
    # 6. Combine and Sort (already combined, just sort)
    all_found_primes.sort()
    
    end_time_mp = time.perf_counter()

    # 7. Output
    print("\n--- Parallel Calculation Results ---")
    print(f"Total primes found up to {MAX_NUMBER}: {len(all_found_primes)}")
    print(f"First 10 primes: {all_found_primes[:10]}")
    print(f"Last 10 primes: {all_found_primes[-10:]}")
    print(f"Time taken (multiprocessing): {end_time_mp - start_time_mp:.2f} seconds")

    # Bonus: Sequential Comparison
    # print("\n--- Sequential Calculation (for comparison) ---")
    # start_time_seq = time.perf_counter()
    # _, _, sequential_primes = find_primes_in_range(1, MAX_NUMBER + 1)
    # end_time_seq = time.perf_counter()
    # print(f"Total primes found (sequential): {len(sequential_primes)}")
    # print(f"Time taken (sequential): {end_time_seq - start_time_seq:.2f} seconds")


[INFO] (MainProcess) Calculating primes up to 1000000
[INFO] (MainProcess) Using 16 processes with chunk size ~62500
[INFO] (MainProcess) Submitting task for range [1, 62501)
[INFO] (MainProcess) Submitting task for range [62501, 125001)
[INFO] (MainProcess) Submitting task for range [125001, 187501)
[INFO] (MainProcess) Submitting task for range [187501, 250001)
[INFO] (MainProcess) Submitting task for range [250001, 312501)
[INFO] (MainProcess) Submitting task for range [312501, 375001)
[INFO] (MainProcess) Submitting task for range [375001, 437501)
[INFO] (MainProcess) Submitting task for range [437501, 500001)
[INFO] (MainProcess) Submitting task for range [500001, 562501)
[INFO] (MainProcess) Submitting task for range [562501, 625001)
[INFO] (MainProcess) Submitting task for range [625001, 687501)
[INFO] (MainProcess) Submitting task for range [687501, 750001)
[INFO] (MainProcess) Submitting task for range [750001, 812501)
[INFO] (MainProcess) Submitting task for range [812501, 87


--- Parallel Calculation Results ---
Total primes found up to 1000000: 78498
First 10 primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
Last 10 primes: [999863, 999883, 999907, 999917, 999931, 999953, 999959, 999961, 999979, 999983]
Time taken (multiprocessing): 0.73 seconds


## 11. Conclusion

Python's `multiprocessing` module unlocks true parallelism, allowing you to bypass the GIL and fully utilize multiple CPU cores for computationally intensive tasks. While the basic `Process` interface mirrors `threading`, the key difference lies in separate memory spaces, necessitating the use of IPC mechanisms like Queues, Pipes, Shared Memory, or Managers for communication.

The high-level `multiprocessing.Pool` and `concurrent.futures.ProcessPoolExecutor` significantly simplify distributing tasks across worker processes. Choosing the right IPC method, understanding serialization limitations, and guarding the main module are crucial for successful multiprocessing.

By applying multiprocessing effectively, you can achieve substantial performance gains for CPU-bound problems in Python.