# MultiProcessing:

### Process:
- A process refers to a running instance of a program. In Python, every program is a process, and it has one default thread called the main thread that executes the program instructions. Each process is an instance of the Python interpreter executing Python bytecode.

### Thread vs Process:

#### i. Process:
- A process is an independent program in execution with its own memory space.
- Multiple processes do not share memory.
- Processes are more heavyweight compared to threads.
- Processes communicate using inter-process communication (IPC) mechanisms.

#### ii. Thread:
- A thread is a smaller unit of a process, sharing the same memory space.
- Multiple threads within a process share resources like memory and file handles.
- Threads are lightweight and faster to create and manage compared to processes.
- Threads communicate easily within the same process without needing IPC.

## 1. Scenario:
- Imagine a scenario where you need to perform a computationally heavy task, such as calculating the sum of squares of a large list of numbers. Without parallelization, this task runs in a single thread, and it can take a long time if the data is large.

### 1.1 Solution without MultiProcessing:

In [3]:
import time

def calculate_sum_of_squares(start, end):
    total = 0
    for i in range(start, end):
        total += i * i
    return total

def main():
    start_time = time.time()

    # Simulate a heavy computational task
    total = calculate_sum_of_squares(1, 100000000)

    end_time = time.time()
    print(f"Total sum of squares: {total}")
    print(f"Execution Time (Single Threaded): {end_time - start_time:.4f} seconds")

# Run the main function
main()


Total sum of squares: 333333328333333350000000
Execution Time (Single Threaded): 5.0102 seconds


### Performance Issue (Without Multiprocessing):
- The calculate_sum_of_squares function runs sequentially for each number in the range.
- This can take a significant amount of time, especially with large datasets.

### 1.2 Solution with MultiProcessing:
- Now, let’s parallelize the task using the multiprocessing module. We will split the task into smaller chunks and distribute the work across multiple processes to improve performance.

In [None]:
# in my case this code doesn't work as expected so I run it in vscode; name -> MultiProcessingDemo

import time
import multiprocessing

def calculate_sum_of_squares(start, end):
    total = 0
    for i in range(start, end):
        total += i * i
    return total

def main():
    start_time = time.time()

    # Split the task into 4 chunks and use multiprocessing
    chunks = [(1, 25000000), (25000000, 50000000), (50000000, 75000000), (75000000, 100000000)]
    pool = multiprocessing.Pool(processes=4)

    # Use starmap to parallelize the calculation
    results = pool.starmap(calculate_sum_of_squares, chunks)

    # Sum the results from each process
    total = sum(results)

    end_time = time.time()
    print(f"Total sum of squares: {total}")
    print(f"Execution Time (Multiprocessing): {end_time - start_time:.4f} seconds")

if __name__ == "__main__":
    # Run the main function
    main()


### How Multiprocessing Solves the Performance Issue:
- Parallelization: The task is split into 4 smaller chunks, and each chunk is processed by a separate process in parallel.
- Process Pool: The multiprocessing.Pool creates a pool of 4 worker processes, and starmap is used to distribute the chunks across these processes.
- Faster Execution: Since the calculations are done concurrently, the total execution time is reduced significantly compared to running the task in a single thread.

### Advantage of MultiProcessing:
- Parallel Execution: Runs tasks concurrently across multiple CPU cores.
- Bypasses GIL: Works around the Global Interpreter Lock (GIL) for CPU-bound tasks.
- Improved Performance: Speeds up execution for tasks that require heavy computation.
- Isolation: Each process has its own memory space, avoiding issues like memory corruption.
- Fault Tolerance: Failures in one process do not affect others.
- Scalability: Easily scales to take advantage of multiple processors or cores.

## MultiThreading vs MultiProcessing:

### i. MultiThreading:
- Concurrency: Runs multiple threads within a single process.
- Shared Memory: Threads share the same memory space, which makes communication between them easier.
- GIL Limitation: Python's Global Interpreter Lock (GIL) prevents true parallelism for CPU-bound tasks, meaning only one thread executes Python bytecode at a time.
- Best for I/O-bound tasks: Ideal for tasks like network requests, file operations, or database queries.
- Lightweight: Threads consume less memory compared to processes.

In [1]:
import threading
import time

def task(num):
    for i in range(10000000):
        # Simulate an I/O-bound task
        # time.sleep(0.01)
        i+=1

if __name__ == '__main__':
    # Start measuring time
    start_time = time.time()

    # Create 12 threads
    threads = []
    for i in range(12):
        t = threading.Thread(target=task, args=(i,))
        threads.append(t)
        t.start()

    # Wait for all threads to finish
    for t in threads:
        t.join()

    # End measuring time
    end_time = time.time()

    print(f"All threads completed in {end_time - start_time:.2f} seconds.")


All threads completed in 3.72 seconds.


### ii. MultiProcessing:
- Parallelism: Runs multiple processes in separate memory spaces, each with its own Python interpreter and GIL.
- No GIL Limitation: Allows true parallelism, ideal for CPU-bound tasks like calculations.
- Higher Memory Overhead: Each process has its own memory space, which can lead to higher memory consumption.
- Best for CPU-bound tasks: Ideal for tasks that require heavy computation, like mathematical operations.
- Isolation: Processes are more isolated, making it easier to avoid issues like shared memory corruption.

In [None]:
# in my case this code doesn't work as expected so I run it in vscode; name -> MultiProcessingOverThreading

import multiprocessing
import time

def task(num):
    for i in range(10000000):
        # Simulate an I/O-bound task
        # time.sleep(0.01)
        i+=1

if __name__ == '__main__':
    # Start measuring time
    start_time = time.time()

    # Create 12 processes
    processes = []
    for i in range(12):

        p = multiprocessing.Process(target=task, args=(i,))
        processes.append(p)
        p.start()

    # Wait for all processes to finish
    for p in processes:
        p.join()

    # End measuring time
    end_time = time.time()

    print(f"All processes completed in {end_time - start_time:.2f} seconds.")


### Effect of GIL on MultiProcessing:
The Global Interpreter Lock (GIL) has no effect on processes in Python.

#### i. Independent Interpreter Instances:

- Each process runs its own instance of the Python interpreter, so each process has its own GIL.
- This means the GIL of one process does not interfere with the GIL of another.
#### ii. True Parallelism:

- In multiprocessing, multiple processes can run on different CPU cores simultaneously, achieving true parallelism.
- This makes multiprocessing ideal for CPU-bound tasks where the GIL would otherwise limit performance in multithreading.
#### iii. Resource Isolation:

- Processes have separate memory spaces and resources, further isolating the effects of the GIL to individual processes.