<a href="https://colab.research.google.com/github/Mannshah2732/datascience_assignment/blob/main/File_%26_Exceptional_Handling_Assignment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Discuss the scenarios where multithreading is preferable to multiprocessing and scenarios where multiprocessing is a better choice.

## Scenarios Favoring Multithreading

* I/O-bound Tasks : When tasks spend a lot of time waiting for input/output operations (like reading/writing to a file or network).

* Shared Memory Needs : When tasks need to share a significant amount of data.

* Low Memory Overhead :  When memory usage needs to be minimized.

* Responsive User Interfaces : In applications with GUI, such as desktop or web apps.

* Lightweight Tasks : For tasks that require quick context switching and low resource usage.


## Scenarios Favoring Multiprocessing

* CPU-bound Tasks : When tasks require significant CPU resources, such as heavy computations.

* Isolation : When tasks require strong isolation for security or stability.

* Avoiding Global Interpreter Lock (GIL) : Particularly in languages like Python, where GIL can limit the performance of multithreaded applications.

* Heavy Data Processing : In scenarios where large amounts of data need to be processed, like data analytics or image processing.

* Long-running Processes : When tasks are expected to run for a long time.

## Summary

* Use Multithreading for I/O-bound tasks, shared memory needs, low memory overhead, responsive UIs, and lightweight tasks.

* Use Multiprocessing for CPU-bound tasks, isolation, avoiding GIL limitations, heavy data processing, and long-running processes.

2. Describe what a process pool is and how it helps in managing multiple processes efficiently.

* A process pool is a design pattern that maintains a collection of pre-instantiated processes to handle tasks concurrently.

## Key Features of a Process Pool

* Reusability : Processes in a pool are reused for multiple tasks, which reduces the overhead associated with creating and destroying processes.

* Load Balancing : A process pool can distribute tasks among available processes, ensuring that the workload is balanced.

* Controlled Concurrency : By limiting the number of active processes in the pool, you can manage system resources effectively and prevent overload.

* Simplified Management : The pool abstracts the complexity of managing multiple processes.

* Error Handling : A process pool can implement error handling mechanisms for failed tasks, allowing for retries or alternative processing strategies without affecting the entire application.

## How it Works

* Initialization : The pool is created with a fixed number of processes. This number can be based on system resources, the nature of the tasks, or empirical performance data.

* Task Submission : When a task needs to be executed, it is submitted to the pool. The pool assigns the task to an available process.

* Task Execution : The assigned process executes the task. If the process is busy, the task waits in a queue until a process becomes available.

* Result Retrieval : Once the task is complete, the result can be retrieved from the pool. The process then becomes available for another task.

3. Explain what multiprocessing is and why it is used in Python programs.

* Multiprocessing in Python refers to the ability to run multiple processes simultaneously, allowing a program to leverage multiple CPU cores.

* This is particularly useful for CPU-bound tasks that require substantial processing power.

* The Python multiprocessing module provides a way to create and manage separate processes, enabling concurrent execution and improved performance.

## Why Use Multiprocessing in Python

* CPU-Bound Tasks
* Bypassing the GIL
* Improved Performance
* Robustness
* Concurrency

## Use Cases for Multiprocessing

* Data Processing Pipelines : Parallelize data transformation tasks for faster execution.

* Machine Learning : Train models on large datasets concurrently to speed up training times.

* Web Scraping : Collect data from multiple sources simultaneously, improving the speed of data collection.

* Scientific Simulations : Run multiple simulations in parallel to analyze different scenarios or parameters.



In [None]:
# 4. Write a Python program using multithreading where one thread adds numbers to a list, and another thread removes numbers from the list. Implement a mechanism to avoid race conditions using threading.Lock.

import threading
import time
import random

shared_list = []
lock = threading.Lock()

def add_numbers():
  for i in range(10):
    time.sleep(random.uniform(0.1, 0.5))
    with lock:
      shared_list.append(i)
      print(f"Added : {i} Current List : {shared_list}")

def remove_numbers():
  for i in range(10):
    time.sleep(random.uniform(0.1, 0.5))
    with lock:
      if shared_list:
        removed = shared_list.pop(0)
        print(f"Removed : {removed} Current List : {shared_list}")
      else:
        print("List is empty, nothing to remove")

add_thread = threading.Thread(target = add_numbers)
remove_thread = threading.Thread(target = remove_numbers)

add_thread.start()
remove_thread.start()

add_thread.join()
remove_thread.join()

print("Final list : ", shared_list)

List is empty, nothing to remove
Added : 0 Current List : [0]
Removed : 0 Current List : []
Added : 1 Current List : [1]
Removed : 1 Current List : []
Added : 2 Current List : [2]
Removed : 2 Current List : []
Added : 3 Current List : [3]
Added : 4 Current List : [3, 4]
Removed : 3 Current List : [4]
Added : 5 Current List : [4, 5]
Removed : 4 Current List : [5]
Added : 6 Current List : [5, 6]
Added : 7 Current List : [5, 6, 7]
Removed : 5 Current List : [6, 7]
Added : 8 Current List : [6, 7, 8]
Added : 9 Current List : [6, 7, 8, 9]
Removed : 6 Current List : [7, 8, 9]
Removed : 7 Current List : [8, 9]
Removed : 8 Current List : [9]
Final list :  [9]


5. Describe the methods and tools available in Python for safely sharing data between threads and processes.

* In Python, sharing data safely between threads and processes involves using various methods and tools to avoid issues like race conditions and data corruption.

## For Threading

1. Locks :

* threading.Lock : A simple locking mechanism to ensure that only one thread can access a resource at a time.

* threading.RLock : A reentrant lock that allows a thread to acquire the lock multiple times.

2. Semaphores :

* threading.Semaphore : A more advanced synchronization tool that allows a certain number of threads to access a resource concurrently.

3. Condition Variables :

* threading.Condition : Allows threads to wait for certain conditions to be met. Threads can wait for notifications when a condition changes.

4. Events :

* threading.Event : A simple flag that can be set or cleared, allowing one thread to signal one or more other threads.

5. Queues :

* queue.Queue : A thread-safe FIFO queue that allows threads to communicate and share data safely. This is often the preferred method for sharing data between threads.

## For Multiprocessing

1. Processes and Manager :

* multiprocessing.Process : Allows you to create separate processes that run concurrently.

* multiprocessing.Manager : Provides a way to create shared objects (like lists, dictionaries) that can be accessed by multiple processes.

2. Locks and Semaphores :

* multiprocessing.Lock and multiprocessing.Semaphore : Similar to threading, these are used for synchronizing access to shared resources among processes.

3. Queues :

* multiprocessing.Queue : A process-safe FIFO queue for inter-process communication, allowing data to be shared between processes.

4. Pipes :

* multiprocessing.Pipe : A way to establish a two-way communication channel between processes.

5. Shared Memory (Python 3.8+):

* multiprocessing.shared_memory: Allows you to create shared memory blocks that can be accessed by multiple processes, providing a fast way to share data.


## Right Tool

* For Threading: Use queue.Queue for passing messages, Lock for protecting shared resources, and Condition or Event for more complex synchronization.

* For Multiprocessing: Use multiprocessing.Queue for communication, Manager for shared state, and Lock for mutual exclusion.

6. Discuss why it's crucial to handle exceptions in concurrent programs and the techniques available for doing so.

Handling exceptions in concurrent programs is crucial for several reasons:

1. System Stability :

* Unhandled exceptions can lead to crashes or undefined behavior in a program, especially when multiple threads or processes are involved. Proper handling ensures that the application remains stable.

2. Resource Management :

* In concurrent environments, resources (like files, sockets, or database connections) may be shared. If exceptions occur and are not handled, resources may not be released properly, leading to leaks or deadlocks.

3. Debugging :

* When exceptions are caught and logged appropriately, it becomes easier to diagnose and fix issues. Without proper exception handling, tracking down the source of an error can be challenging.

4. Graceful Shutdown:

* Handling exceptions allows you to clean up resources and perform necessary shutdown procedures, ensuring that the program exits cleanly even in the event of an error.

5. Communication of Errors:

* In concurrent systems, errors in one thread or process may need to be communicated to others. Handling exceptions allows you to propagate error states effectively.

## Techniques for Handling Exceptions in Concurrent Programs

* Try-Except Blocks :

Use try and except statements within threads or processes to catch exceptions locally. This allows you to handle errors specific to that context.

Example :

In [None]:
import threading

def worker():
  try:
    # code that may raise an exception
    pass

  except exception as e:
    print(f"Error in worker : {e}")

thread = threading.Thread(target = worker)
thread.start()

In [None]:
# 7. Create a program that uses a thread pool to calculate the factorial of numbers from 1 to 10 concurrently. Use concurrent.futures.ThreadPoolExecutor to manage the threads.

import concurrent.futures
import math

def calculate_factorial(n):
  return math.factorial(n)

def main():
  numbers = range(1, 11)

  with concurrent.futures.ThreadPoolExecutor() as executor:

    futures = {executor.submit(calculate_factorial, n): n for n in numbers}

    for future in concurrent.futures.as_completed(futures):
      n = futures[future]
      try:
        result = future.result()
        print(f"Factorial of {n} is {result}")
      except Exception as e:
        print(f"Error calculating factorial of {n}: {e}")

if __name__ == "__main__":
  main()

Factorial of 8 is 40320
Factorial of 1 is 1
Factorial of 5 is 120
Factorial of 2 is 2
Factorial of 9 is 362880
Factorial of 7 is 5040
Factorial of 4 is 24
Factorial of 3 is 6
Factorial of 6 is 720
Factorial of 10 is 3628800


In [None]:
# 8. Create a Python program that uses multiprocessing.Pool to compute the square of numbers from 1 to 10 in parallel. Measure the time taken to perform this computation using a pool of different sizes (e.g., 2, 4, 8 processes).

import multiprocessing
import time

def compute_square(n):
  return n * n

def main():
  numbers = range(1, 11)

  pool_sizes = [2, 4, 8]

  for pool_size in pool_sizes:
    start_time = time.time()

    with multiprocessing.Pool(processes=pool_size) as pool:
      results = pool.map(compute_square, numbers)

    end_time = time.time()
    elapsed_time = end_time - start_time

    print(f"Pool size: {pool_size}, Results: {results}, Time taken: {elapsed_time:.4f} seconds")

if __name__ == "__main__":
    main()

Pool size: 2, Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100], Time taken: 0.0534 seconds
Pool size: 4, Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100], Time taken: 0.0722 seconds
Pool size: 8, Results: [1, 4, 9, 16, 25, 36, 49, 64, 81, 100], Time taken: 0.1284 seconds
