<a href="https://colab.research.google.com/github/drsubirghosh2008/drsubirghosh2008/blob/main/PW_Assignment_Module_12_22_10_24_Multiprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Q1. What is multiprocessing in python? Why is it useful?


Answer:


Multiprocessing in Python refers to the ability to create multiple processes that run in parallel to execute tasks simultaneously. It is a way to take advantage of multiple CPUs or CPU cores to run tasks concurrently, making it possible to perform heavy computational tasks more efficiently.

Python provides a multiprocessing module that supports creating processes. Each process in the multiprocessing module has its own memory space, and can execute code concurrently, independent of the other processes.

Key features of the multiprocessing module:

Process creation: Allows the creation of separate processes to run tasks in parallel.
Shared data: Supports shared data across processes using queues, pipes, and shared memory.
Synchronization primitives: Includes mechanisms like locks, semaphores, and events to coordinate between processes.

The Reasons why Multiprocessing Useful?

Multiprocessing is particularly useful in the following scenarios:

Bypassing the Global Interpreter Lock (GIL): Python's Global Interpreter Lock (GIL) limits the execution of threads in Python, allowing only one thread to execute at a time within a single process. This prevents true parallelism in CPU-bound tasks when using threading. Multiprocessing creates separate processes, each with its own Python interpreter and memory space, effectively bypassing the GIL and achieving true parallelism.

Improved Performance for CPU-bound Tasks: CPU-bound tasks, such as mathematical computations, data processing, or image manipulation, can significantly benefit from multiprocessing. By distributing these tasks across multiple CPU cores, the workload can be processed faster.

Parallel Processing: Multiprocessing allows the execution of multiple tasks at the same time. It is ideal for tasks that can be split into smaller, independent sub-tasks that can be executed concurrently.

Task Isolation: Each process in multiprocessing runs in its own memory space, meaning that one process's memory is independent of another's. This isolation reduces the risks of memory corruption and race conditions that often occur in multithreading.

Scalability: Multiprocessing makes it easier to scale applications across multiple CPUs or CPU cores. This is particularly useful when working with large datasets or computationally expensive algorithms, allowing for horizontal scaling across processors.

In [1]:
# Example:
import multiprocessing

def print_square(num):
    print(f"Square: {num * num}")

def print_cube(num):
    print(f"Cube: {num * num * num}")

if __name__ == "__main__":
    # Creating two processes
    process1 = multiprocessing.Process(target=print_square, args=(10,))
    process2 = multiprocessing.Process(target=print_cube, args=(10,))

    # Start the processes
    process1.start()
    process2.start()

    # Wait for processes to complete
    process1.join()
    process2.join()

    print("Both processes completed.")


Square: 100Cube: 1000

Both processes completed.


Use Cases:

Data processing: Distributing data processing tasks across multiple cores, such as processing large datasets, image processing, etc.
Web scraping: Running multiple web scraping tasks concurrently, each fetching data from different URLs simultaneously.
Simulations: Running simulations or complex mathematical models in parallel to reduce processing time.

Multiprocessing in Python is a powerful way to achieve true parallelism by utilizing multiple processors. It is especially useful for CPU-bound tasks, allowing programs to bypass the limitations of Python's GIL, and it improves performance by executing tasks concurrently across multiple processes.

Q2. What are the differences between multiprocessing and multithreading?

Answer:

The differences between multiprocessing and multithreading are:

1. Global Interpreter Lock (GIL):

Multiprocessing: Each process has its own Python interpreter and memory space, so the GIL is not an issue. This allows for true parallelism on multi-core machines, making multiprocessing ideal for CPU-bound tasks.

Multithreading: In Python, the GIL prevents more than one thread from executing Python bytecode simultaneously in a single process, limiting true parallelism. Therefore, threading is more suitable for I/O-bound tasks (e.g., file I/O, network requests) where the threads can run concurrently while waiting for I/O operations to complete.

2. Memory:

Multiprocessing: Processes are completely separate and do not share memory. Each process has its own memory space, making it safe from issues like race conditions. However, sharing data between processes requires explicit mechanisms such as queues or pipes, which introduces overhead.

Multithreading: Threads run in the same memory space, so data is shared easily among them. This reduces memory usage but introduces risks such as race conditions and requires synchronization tools (e.g., locks, semaphores) to prevent conflicts when accessing shared data.

3. Performance:

Multiprocessing: Since each process runs independently, this approach takes full advantage of multiple CPU cores. Multiprocessing is ideal for CPU-bound tasks, where the processing power of multiple cores can be utilized for parallelism. However, creating processes incurs more overhead due to separate memory and interpreter instances.

Multithreading: Threading introduces less overhead than creating processes because threads share memory. While multithreading is constrained by the GIL for CPU-bound tasks, it can improve performance in I/O-bound tasks by allowing multiple threads to perform different I/O operations concurrently.

4. Communication and Synchronization:

Multiprocessing: Communication between processes is more complex and requires inter-process communication (IPC) mechanisms like pipes, queues, or shared memory. This can be slower due to the overhead of passing data between separate memory spaces.

Multithreading: Communication between threads is easier since threads share memory. However, synchronization is more critical because multiple threads accessing the same resources can lead to race conditions. Proper use of locks and other synchronization tools is necessary.

5. Use Case Suitability:

Multiprocessing is well-suited for CPU-bound tasks such as mathematical computations, data processing, simulations, and scientific computations where high CPU utilization across multiple cores is essential.

Multithreading is more appropriate for I/O-bound tasks like web scraping, file I/O, and network requests where the GIL is less of an issue because threads spend much of their time waiting for external resources.

In [2]:
# Example of Multiprocessing:

import multiprocessing

def print_square(num):
    print(f"Square: {num * num}")

if __name__ == "__main__":
    process = multiprocessing.Process(target=print_square, args=(5,))
    process.start()
    process.join()


Square: 25


In [3]:
# Example of Multithreading:

import threading

def print_square(num):
    print(f"Square: {num * num}")

if __name__ == "__main__":
    thread = threading.Thread(target=print_square, args=(5,))
    thread.start()
    thread.join()


Square: 25


Hence,

Multiprocessing is ideal for CPU-bound tasks and offers true parallelism by utilizing multiple processors or cores. Each process is isolated, which increases reliability but comes with higher memory overhead.

Multithreading is more suitable for I/O-bound tasks and involves lower overhead, but it is limited by the GIL, preventing true parallelism in CPU-heavy operations. Threads share memory, which simplifies communication but requires careful synchronization.

Q3. Write a python code to create a process using the multiprocessing module.

In [4]:
# Answer:

import multiprocessing

# Define a function that the new process will execute
def print_square(num):
    print(f"Square of {num} is {num * num}")

if __name__ == "__main__":
    # Create a new process
    process = multiprocessing.Process(target=print_square, args=(5,))

    # Start the process
    process.start()

    # Wait for the process to finish
    process.join()

    print("Process has completed.")


Square of 5 is 25
Process has completed.


Q4. What is a multiprocessing pool in python? Why is it used?

Answer:

A multiprocessing pool in Python is a feature provided by the multiprocessing module that allows you to manage and execute multiple processes concurrently, with a pool of worker processes. It simplifies the process of parallelizing the execution of tasks by distributing tasks among the worker processes in the pool, making it easier to handle concurrent tasks efficiently.

The Pool class in multiprocessing is used to create a pool of worker processes, which can execute tasks in parallel. You can define the number of processes in the pool, and tasks are distributed among them either synchronously or asynchronously.

The reasons for using  Multiprocessing Pool:

Simplifies Task Parallelization: The pool allows you to divide tasks across multiple worker processes without manually creating and managing individual processes. You can parallelize tasks over multiple processes using high-level functions such as map(), apply(), apply_async(), and map_async().

Efficient Use of System Resources: Instead of creating a large number of individual processes, a pool of processes can be reused to handle multiple tasks. This reduces the overhead associated with creating and destroying processes repeatedly.

Parallel Execution: The pool makes it easy to execute multiple tasks in parallel, distributing them across different CPU cores and speeding up the execution of CPU-bound or I/O-bound tasks.

Improved Performance: By leveraging multiple CPU cores, the pool increases the performance of tasks that can be split into smaller sub-tasks. For example, processing large datasets, running multiple simulations, or performing tasks that are independent of each other.

When to Use Multiprocessing Pool:
When there are multiple tasks that can be executed in parallel and don't depend on each other.
While dealing with CPU-bound tasks that can benefit from the use of multiple CPU cores.
To process large datasets by splitting the data into smaller chunks and processing them in parallel.

In [6]:
# Example:
import multiprocessing

# Define the function to compute the square of a number
def square(num):
    return num * num

if __name__ == "__main__":
    # Create a pool of 4 worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Use pool.map to execute the square function in parallel on the list of numbers
        numbers = [1, 2, 3, 4, 5]
        results = pool.map(square, numbers)

    print("Squared numbers:", results)


Squared numbers: [1, 4, 9, 16, 25]


Q5. How can we create a pool of worker processes in python using the multiprocessing module?

Answer:

To create a pool of worker processes in Python using the multiprocessing module, you can utilize the Pool class. Here’s a step-by-step guide and example code to demonstrate how to set up a pool of worker processes and use it to execute tasks in parallel.

Step-by-Step Guide to Create a Pool of Worker Processes

Import the multiprocessing Module: Start by importing the multiprocessing module.

Define the Task Function: Create a function that represents the task you want to execute in parallel.

Create a Pool of Processes: Use the Pool class to create a pool of worker processes. You can specify the number of processes in the pool.

Use Pool Methods to Execute Tasks: Use methods like map(), apply(), or apply_async() to distribute tasks across the worker processes.

Close and Join the Pool: After submitting all tasks, call close() to prevent any more tasks from being submitted to the pool and join() to wait for all worker processes to complete.

In [7]:
# Example Code:

import multiprocessing

# Define a function that the workers will execute
def square(num):
    return num * num

if __name__ == "__main__":
    # Create a pool of 4 worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # List of numbers to process
        numbers = [1, 2, 3, 4, 5]

        # Use pool.map to execute the square function in parallel
        results = pool.map(square, numbers)

    print("Squared numbers:", results)


Squared numbers: [1, 4, 9, 16, 25]


Q6. Write a python program to create 4 processes, each process should print a different number using the multiprocessing module in python.

In [9]:
# Answer:
import multiprocessing

# Define a function that will be executed by each process
def print_number(num):
    print(f"Process {num}: {num}")

if __name__ == "__main__":
    # Create a list of numbers to be printed by different processes
    numbers = [1, 2, 3, 4]

    # Create a list to hold the process references
    processes = []

    # Create and start a process for each number
    for number in numbers:
        process = multiprocessing.Process(target=print_number, args=(number,))
        processes.append(process)  # Add the process to the list
        process.start()  # Start the process

    # Wait for all processes to complete
    for process in processes:
        process.join()

    print("All processes have completed.")


Process 1: 1
Process 2: 2Process 3: 3

Process 4: 4
All processes have completed.


Thank You!