In [1]:
# Role of metaclasses in custom type creation
#  Example demonstration: interface class + derived classes
#  Practical use case in framework design or ORMs

In [2]:
# What is a metaclass?
# A metaclass is something that creates classes.
# - A class creates objects
# - A metaclass creates classes

# You use metaclasses when you want to control or modify class creation automatically, like:
# - enforcing rules (“every class must have a method called save()”)
# - auto-registering classes (plugin systems)
# - creating APIs / ORMs (Django models are a classic example)
# - validating class attributes at definition time
# - automatically adding methods or fields
# Metaclasses help frameworks do “magic” safely and consistently.

In [3]:
# class A:
#     pass

# So the default metaclass in Python is:
# type

# x = 10
# # x is an object of class int

# Classes come from metaclasses:
# class A:
#     pass
# # A is an object too, created by metaclass "type"


In [4]:
# Step 1: Show that classes are created by type

class A:
    pass

print(type(A))
# OUTPUT: <class 'type'>


<class 'type'>


In [5]:
# Step 2: Create a class dynamically using type

# Normally we write:
class Person:
    def greet(self):
        return "Hello"

# But you can create the same class using type:
def greet(self):
    return "Hello"

Person = type(
    "Person",          # Class name
    (),                # Base classes (empty tuple means no custom base)
    {"greet": greet}   # Attributes/methods of the class
)

p = Person()
print(p.greet())
# OUTPUT: Hello

Hello


In [6]:
class Person:
    def greet(self):
        return "Hello"

p = Person()
print(p.greet())


Hello


In [7]:
# alling p.greet() is equivalent to:
greet(p)

'Hello'

In [8]:
def abc(self):
    return "Hello"

In [9]:
print(abc(p))

Hello


In [None]:
# Defining a Custom Metaclass
# A custom metaclass is created by subclassing type.

class MyMeta(type):
    pass
# To use it:
class MyClass(metaclass=MyMeta):
    pass
# At this point, MyMeta controls how MyClass is created.

In [10]:
# The __new__ Method in a Metaclass
# The most important method in a metaclass is __new__.
# It runs when the class itself is being created, not when instances are created.

class DebugMeta(type):
    def __new__(cls, name, bases, namespace):
        print(f"Creating class: {name}")
        return super().__new__(cls, name, bases, namespace)
# Usage:

class Sample(metaclass=DebugMeta):
    pass


Creating class: Sample


In [11]:
# Enforcing Rules at Class Creation Time
# Metaclasses can enforce rules on class definitions.

class RequireMethodMeta(type):
    def __new__(cls, name, bases, namespace):
        if "process" not in namespace:
            raise TypeError("Classes must define a 'process' method")
        return super().__new__(cls, name, bases, namespace)

# Usage:
class ValidClass(metaclass=RequireMethodMeta):
    def process(self):
        pass


In [12]:
class InvalidClass(metaclass=RequireMethodMeta):
    pass

TypeError: Classes must define a 'process' method

In [13]:
# Metaclasses vs Abstract Base Classes
# Metaclasses and ABCs solve related but different problems.
# ABCs enforce instance-level behavior
# Metaclasses enforce class-level structure
# ABCs check what methods exist after the class is created. Metaclasses control how the class is created in the first place.

# This is why ABCs themselves are implemented using metaclasses internally.

In [15]:
# Practical Example: Automatic Class Registration
# Metaclasses are often used to automatically register classes.

class RegistryMeta(type):
    registry = {}

    def __new__(cls, name, bases, namespace):
        new_class = super().__new__(cls, name, bases, namespace)
        cls.registry[name] = new_class
        return new_class
# Usage:

class PluginA(metaclass=RegistryMeta):
    pass

class PluginB(metaclass=RegistryMeta):
    pass

print(RegistryMeta.registry)

{'PluginA': <class '__main__.PluginA'>, 'PluginB': <class '__main__.PluginB'>}


In [16]:
# You can:
# auto-register plugins
# build frameworks
# eliminate manual bookkeeping

In [17]:
# When Metaclasses Should Be Used
# Metaclasses are appropriate when:
# Rules must be enforced at class definition time
# Classes must be automatically modified or registered
# Framework-level behavior is required

# They should be avoided for:
# Simple behavior reuse
# Instance-level customization
# Problems solvable with inheritance or decorators

# Metaclasses are powerful but should be used sparingly.

In [None]:
# Exercise: Building a Rule-Enforcing Metaclass
# Create a custom metaclass that:

# Requires all classes to define a run() method
# Prints the class name when the class is created
# Requirements:

# Use __new__ in the metaclass
# Raise an error for invalid classes
# Demonstrate with one valid and one invalid class
# Objective:

# Understand class creation control
# Observe metaclass execution timing
# Apply enforcement logic correctly

# Example Demonstration: Interface Class and Derived Classes

## What Is an Interface-Like Design in Python?

Python does not have interfaces as a separate language construct (unlike Java or C#). Instead, Python achieves the same goal using **Abstract Base Classes (ABCs)**.

An interface-like design means:

* Defining **what methods must exist**
* Not defining **how those methods work**
* Allowing multiple implementations with different internal logic
* Enforcing correctness at class creation or instantiation time

This pattern is extremely common in real-world Python systems.

---

## Problem Statement Without an Interface

Consider a system where different components are expected to perform the same action.

```python
class EmailService:
    def send(self, message):
        print("Sending email:", message)

class SMSService:
    def deliver(self, message):
        print("Sending SMS:", message)
```

Issues with this approach:

* Method names are inconsistent
* There is no enforcement of behavior
* Code using these classes must handle each type differently
* Errors appear only at runtime

This makes the system fragile and hard to extend.

---

## Defining an Interface Using an Abstract Base Class

An Abstract Base Class defines the **required contract**.

```python
from abc import ABC, abstractmethod

class NotificationService(ABC):

    @abstractmethod
    def send(self, message):
        pass
```

Explanation:

* `NotificationService` defines the expected behavior
* Any derived class must implement `send`
* The base class itself cannot be instantiated

This acts as an interface.

---

## Creating Concrete Implementations

### Email Notification Implementation

```python
class EmailNotification(NotificationService):

    def send(self, message):
        print(f"Email sent: {message}")
```

### SMS Notification Implementation

```python
class SMSNotification(NotificationService):

    def send(self, message):
        print(f"SMS sent: {message}")
```

Explanation:

* Both classes implement the same method
* Internal behavior differs
* External usage remains consistent

---

## Using the Interface in Client Code

```python
def notify(service: NotificationService, message):
    service.send(message)
```

Usage:

```python
email = EmailNotification()
sms = SMSNotification()

notify(email, "Welcome")
notify(sms, "Your OTP is 1234")
```

Key observation:

* Client code depends on the interface, not implementations
* New implementations can be added without changing existing logic

---

## Enforcement of the Contract

If a derived class does not implement the required method, it fails early.

```python
class PushNotification(NotificationService):
    pass
```

Attempting to instantiate this class raises an error because `send` is not implemented.

This prevents incomplete implementations from entering the system.

---

## Why This Pattern Matters in Real Applications

This interface-based design enables:

* Clean separation of concerns
* Easier testing using mock implementations
* Safe extension of systems
* Clear architectural boundaries

It is widely used in:

* Notification systems
* Payment gateways
* Logging frameworks
* Storage backends
* Plugin architectures

---

## Script-Based Demonstration: Interface and Derived Classes

This code must be saved as `interface_demo.py` and executed from the terminal using:

```
python interface_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
from abc import ABC, abstractmethod

class DataExporter(ABC):

    @abstractmethod
    def export(self, data):
        pass

class CSVExporter(DataExporter):

    def export(self, data):
        print("Exporting data to CSV:", data)

class JSONExporter(DataExporter):

    def export(self, data):
        print("Exporting data to JSON:", data)

def run_export(exporter: DataExporter, data):
    exporter.export(data)

csv = CSVExporter()
json_exporter = JSONExporter()

run_export(csv, [1, 2, 3])
run_export(json_exporter, {"a": 1, "b": 2})
```

Observation:

* The same function works with multiple implementations
* Behavior changes without changing the calling code
* The interface enforces correctness

---

## Combining ABCs with Shared Logic

Abstract base classes can also include shared behavior.

```python
class BaseExporter(ABC):

    def validate(self, data):
        if not data:
            raise ValueError("No data to export")

    @abstractmethod
    def export(self, data):
        pass
```

Derived classes can reuse `validate` while implementing `export`.

This balances flexibility with code reuse.

---

## Exercise: Designing an Interface with Multiple Implementations

Create an interface for a payment system.

Requirements:

* Abstract base class `PaymentGateway`
* Abstract method `process(amount)`
* Two implementations:

  * Credit card payment
  * Wallet payment
* A function that accepts any payment gateway and processes a payment

Objective:

* Apply interface-based design
* Enforce method implementation
* Use polymorphism correctly

The solution should demonstrate that:

* All implementations follow the same contract
* Client code remains unchanged when adding new gateways

---


# Practical Use Case in Framework Design or ORMs

## Why Frameworks and ORMs Need Strong Structure

Frameworks and ORMs are not simple scripts. They are **extensible systems** where:

* New components are added over time
* Multiple developers work independently
* User-written code must integrate safely
* Errors must be detected early

To achieve this, frameworks rely heavily on:

* Interfaces (ABCs)
* Controlled class creation
* Convention enforcement
* Automatic registration of components

This section demonstrates how **ABCs and metaclasses** are used together in a realistic framework-style design.

---

## Realistic Problem Scenario

Assume a framework that supports **multiple database backends**, such as:

* SQLite
* PostgreSQL
* MySQL

Each backend must:

* Connect to the database
* Execute queries
* Close connections

The framework must:

* Enforce a common interface
* Prevent incomplete implementations
* Automatically register supported backends

---

## Step 1: Defining the Interface Using an Abstract Base Class

```python
from abc import ABC, abstractmethod

class DatabaseBackend(ABC):

    @abstractmethod
    def connect(self):
        pass

    @abstractmethod
    def execute(self, query):
        pass

    @abstractmethod
    def close(self):
        pass
```

Explanation:

* The interface defines **what operations are mandatory**
* No backend-specific logic exists here
* All implementations must follow the same contract

---

## Step 2: Why Interface Alone Is Not Enough

Even with an ABC:

* Developers can forget to register new backends
* The framework has no centralized view of available implementations
* Manual registration leads to errors and duplication

Frameworks solve this by **automating registration**.

This is where metaclasses are used.

---

## Step 3: Metaclass for Automatic Backend Registration

```python
class BackendRegistryMeta(type):
    registry = {}

    def __new__(cls, name, bases, namespace):
        new_class = super().__new__(cls, name, bases, namespace)

        if name != "BaseBackend":
            cls.registry[name] = new_class

        return new_class
```

Explanation:

* Every backend class is registered at class creation time
* No manual registration code is needed
* The framework always knows available backends

---

## Step 4: Combining ABC and Metaclass

```python
class BaseBackend(DatabaseBackend, metaclass=BackendRegistryMeta):
    pass
```

Explanation:

* `DatabaseBackend` enforces method implementation
* `BackendRegistryMeta` controls class creation
* The base class itself is excluded from registration

This pattern is extremely common in ORMs and frameworks.

---

## Step 5: Concrete Backend Implementations

### SQLite Backend

```python
class SQLiteBackend(BaseBackend):

    def connect(self):
        print("Connecting to SQLite")

    def execute(self, query):
        print(f"Executing SQLite query: {query}")

    def close(self):
        print("Closing SQLite connection")
```

### PostgreSQL Backend

```python
class PostgresBackend(BaseBackend):

    def connect(self):
        print("Connecting to PostgreSQL")

    def execute(self, query):
        print(f"Executing PostgreSQL query: {query}")

    def close(self):
        print("Closing PostgreSQL connection")
```

Explanation:

* Both backends implement the same interface
* Both are automatically registered
* The framework does not need to know implementation details

---

## Step 6: Framework-Level Usage

```python
def get_backend(name):
    backend_class = BackendRegistryMeta.registry.get(name)
    if not backend_class:
        raise ValueError("Unsupported backend")
    return backend_class()
```

Usage:

```python
backend = get_backend("SQLiteBackend")
backend.connect()
backend.execute("SELECT * FROM users")
backend.close()
```

Key observation:

* Backend selection is dynamic
* Client code depends only on the interface
* New backends can be added without changing framework logic

---

## Script-Based Demonstration: ORM-Style Backend System

This code must be saved as `orm_backend_framework_demo.py` and executed from the terminal using:

```
python orm_backend_framework_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
from abc import ABC, abstractmethod

class BackendRegistryMeta(type):
    registry = {}

    def __new__(cls, name, bases, namespace):
        new_class = super().__new__(cls, name, bases, namespace)
        if name != "BaseBackend":
            cls.registry[name] = new_class
        return new_class

class DatabaseBackend(ABC):

    @abstractmethod
    def connect(self):
        pass

    @abstractmethod
    def execute(self, query):
        pass

    @abstractmethod
    def close(self):
        pass

class BaseBackend(DatabaseBackend, metaclass=BackendRegistryMeta):
    pass

class SQLiteBackend(BaseBackend):

    def connect(self):
        print("SQLite connected")

    def execute(self, query):
        print(f"SQLite executing: {query}")

    def close(self):
        print("SQLite closed")

class MySQLBackend(BaseBackend):

    def connect(self):
        print("MySQL connected")

    def execute(self, query):
        print(f"MySQL executing: {query}")

    def close(self):
        print("MySQL closed")

def get_backend(name):
    return BackendRegistryMeta.registry[name]()

backend = get_backend("SQLiteBackend")
backend.connect()
backend.execute("SELECT * FROM products")
backend.close()
```

---

## Why ORMs Use This Pattern

Popular ORMs and frameworks use this exact approach to:

* Enforce model structure
* Register models automatically
* Validate schema definitions
* Generate queries dynamically
* Support multiple databases cleanly

Examples include:

* Database backends
* Field definitions
* Query expressions
* Migration systems

This pattern scales well as frameworks grow.

---

## Exercise: Designing a Mini ORM Component

Design a mini framework for data storage.

Requirements:

* Abstract base class `StorageEngine`
* Methods: `save(data)`, `load()`
* Use a metaclass to auto-register implementations
* Implement two engines:

  * In-memory storage
  * File-based storage
* Provide a factory function to select storage by name

Objective:

* Apply ABCs for interface enforcement
* Apply metaclasses for registration
* Simulate real framework behavior

The solution should show that:

* All engines follow the same contract
* New engines are auto-discovered
* Client code remains unchanged

---

In [18]:
# Module 4A: Multi-threading 
# • Understanding Threads and the GIL 
# • threading module essentials 
# • Synchronization using Locks, Events, Semaphores, Queues 
# • Thread-safe data access and deadlock avoidance

In [19]:
# What is a thread?
# A thread is a smaller unit of work that runs inside a program.
# A single program (process) can run multiple threads that share the same memory.

# What is multi-threading?
# Multithreading means running more than one thread in the same program so tasks can overlap.

# Why do we need threads?
# Threads are useful when your program spends time waiting, like:
# - waiting for network (API calls)
# - waiting for files/database
# - waiting/sleeping
# While one thread is waiting, another thread can do work.

# Important Terms:
# 1. Process: a running program with its own memory (heavier)
# 2. Thread: lightweight worker inside a process (shares memory)
# 3. Concurrency: tasks overlap in time (switching happens)
# 4. Parallelism: tasks run at the exact same time on multiple CPU cores

In [20]:
# Normal program (single-thread)

import time

def task():
    print("Task started")
    time.sleep(2)  # Pretend we are waiting for I/O
    print("Task finished")

start = time.time()
task()
task()
end = time.time()

print("Time taken:", end - start)
# OUTPUT (approx):
# Task started
# Task finished
# Task started
# Task finished
# Time taken: ~4.0


Task started
Task finished
Task started
Task finished
Time taken: 4.007073879241943


In [21]:
# Same thing with threads (multithreading)

import threading
import time

def task():
    print("Task started")
    time.sleep(2)   # waiting work (I/O-like)
    print("Task finished")

start = time.time()

t1 = threading.Thread(target=task)  # create thread 1 (it will run task)
t2 = threading.Thread(target=task)  # create thread 2

t1.start()  # start thread 1
t2.start()  # start thread 2

t1.join()   # wait until thread 1 finishes
t2.join()   # wait until thread 2 finishes

end = time.time()
print("Time taken:", end - start)

# OUTPUT (approx, order can vary):
# Task started
# Task started
# Task finished
# Task finished
# Time taken: ~2.0


Task started
Task started
Task finished
Task finished
Time taken: 2.005908250808716


In [22]:
# Core Thread Concepts
# start() vs join()
# start() begins the thread’s work.
# join() makes the main program wait until the thread completes.

import threading
import time

def task():
    print("Worker: starting")
    time.sleep(1)
    print("Worker: done")

t = threading.Thread(target=task)

print("Main: before start")
t.start()
print("Main: after start (worker may still be running)")
t.join()
print("Main: after join (worker must be finished now)")

# OUTPUT (order may slightly vary, but join guarantees final order):
# Main: before start
# Worker: starting
# Main: after start (worker may still be running)
# Worker: done
# Main: after join (worker must be finished now)


Main: before start
Worker: starting
Main: after start (worker may still be running)
Worker: done
Main: after join (worker must be finished now)


In [23]:
# 2) Passing arguments to a thread

import threading

def greet(name):
    print("Hello", name)

t = threading.Thread(target=greet, args=("Asha",))  # args must be a tuple
t.start()
t.join()

# OUTPUT:
# Hello Asha


Hello Asha


In [25]:
# 3) Getting results back from threads
# Threads don’t directly “return” like a normal function call.
# Use a Queue (thread-safe) to collect results.

import threading
from queue import Queue

def square(n, out_q):
    out_q.put(n * n)  # safely store result for main thread

q = Queue()

t1 = threading.Thread(target=square, args=(3, q))
t2 = threading.Thread(target=square, args=(4, q))

t1.start(); t2.start()
t1.join();  t2.join()

print("Results:", q.get(), q.get())
# OUTPUT (order may vary):
# Results: 9 16

Results: 9 16


In [1]:
# 4) Naming threads (helps debugging)

import threading
import time

def task():
    print("Running in:", threading.current_thread().name)
    time.sleep(0.5)

t = threading.Thread(target=task, name="Worker-1")
t.start()
t.join()

# OUTPUT:
# Running in: Worker-1


Running in: Worker-1


In [None]:
# # 5) Daemon threads (background helper threads)
# # A daemon thread will not keep the program alive.
# # If only daemon threads are left, Python may exit immediately.

# import threading
# import time

# def background():
#     while True:
#         print("Background working...")
#         time.sleep(0.5)

# t = threading.Thread(target=background, daemon=True)  # daemon thread
# t.start()

# time.sleep(1.2)
# print("Main ends now. Program may exit even though background loop is infinite.")

# # OUTPUT (approx):
# # Background working...
# # Background working...
# # Main ends now. Program may exit even though background loop is infinite.


Background working...
Background working...
Background working...
Main ends now. Program may exit even though background loop is infinite.


Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...
Background working...


In [2]:
# Synchronization: When threads share data
# When multiple threads access shared data, you can get race conditions (wrong results).

In [3]:
# 6) Race condition example

import threading

counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1  # not an atomic operation in Python

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start(); t2.start()
t1.join();  t2.join()

print("Counter:", counter)
# OUTPUT: Often NOT 200000 (may be less due to race condition)


Counter: 200000


In [5]:
# 7) Fix using Lock

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:          # only one thread can enter this block at a time
            counter += 1

t1 = threading.Thread(target=increment)
t2 = threading.Thread(target=increment)

t1.start(); t2.start()
t1.join();  t2.join()

print("Counter:", counter)
# OUTPUT: Counter: 200000

# Locks protect shared mutable state
# Always use a lock when:
# multiple threads write to shared data
# with lock: is the safest pattern

Counter: 200000


In [6]:
# 8) Avoiding deadlocks
# A deadlock happens when:
# thread A holds lock1 and waits for lock2
# thread B holds lock2 and waits for lock1
# Both wait forever.

# Rule: If you must use multiple locks, always acquire them in the same order everywhere.

In [7]:
# Thread coordination tools

In [9]:
# 9) Event (one thread signals another)
# Use Event when one thread should wait until another thread signals “go”.

import threading
import time

ready = threading.Event()

def worker():
    print("Worker: waiting for signal")
    ready.wait()  # blocks until event is set
    print("Worker: got signal, starting work")

t = threading.Thread(target=worker)
t.start()

time.sleep(1)
print("Main: sending signal")
ready.set()

t.join()

# OUTPUT:
# Worker: waiting for signal
# Main: sending signal
# Worker: got signal, starting work


Worker: waiting for signal
Main: sending signal
Worker: got signal, starting work


In [10]:
# 10) Semaphore (limit how many threads can run a section)
# Useful when you want “max N threads doing this at once”.

import threading
import time

sem = threading.Semaphore(2)  # allow only 2 threads at a time

def limited_work(i):
    with sem:
        print("Start:", i)
        time.sleep(1)
        print("End:", i)

threads = [threading.Thread(target=limited_work, args=(i,)) for i in range(4)]
for t in threads: t.start()
for t in threads: t.join()

# OUTPUT (pattern: only 2 starts appear before 2 ends):
# Start: 0
# Start: 1
# End: 0
# End: 1
# Start: 2
# Start: 3
# End: 2
# End: 3


Start:Start: 1
 0
End: 1
Start: 2
End: 0
Start: 3
End:End: 3
 2


In [13]:
# 11) Queue (best for producer-consumer patterns)
# Queue is thread-safe and the cleanest way to pass work/results.

import threading
from queue import Queue

q = Queue()

def producer():
    for item in ["A", "B", "C"]:
        q.put(item)                 # add work
        print("Produced:", item)
    q.put(None)                     # special marker to stop consumer

def consumer():
    while True:
        item = q.get()              # take work (waits if empty)
        if item is None:
            print("Consumer: stopping")
            break
        print("Consumed:", item)

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)

t1.start(); t2.start()
t1.join();  t2.join()

# OUTPUT:
# Produced: A
# Produced: B
# Produced: C
# Consumed: A
# Consumed: B
# Consumed: C
# Consumer: stopping


Produced:Consumed: A
 A
Produced: B
Produced: C
Consumed: B
Consumed: C
Consumer: stopping


# Understanding Threads and the GIL

## What Is a Thread?

A **thread** is a unit of execution within a process.

* A **process** owns memory and resources
* A **thread** shares the process memory
* Multiple threads can exist inside a single process

In Python:

* All threads within a process share the same memory space
* Threads run concurrently, not independently

This shared-memory model makes threads lightweight, but it also introduces coordination challenges.

---

## Why Threads Are Used

Threads are commonly used to:

* Handle multiple tasks concurrently
* Keep applications responsive
* Overlap waiting operations (I/O)
* Simplify designs that require shared state

Threads are especially useful when tasks spend time **waiting**, not computing.

---

## Creating Threads in Python

Python provides the `threading` module for working with threads.

```python
import threading

def task():
    print("Task running")

t = threading.Thread(target=task)
t.start()
t.join()
```

Explanation:

* `Thread` creates a new thread
* `start()` begins execution in a separate thread
* `join()` waits for the thread to finish

At this point, the program has executed code concurrently, even if briefly.

---

## Multiple Threads Running Together

```python
import threading

def task(name):
    print(f"Task {name} running")

threads = []

for i in range(3):
    t = threading.Thread(target=task, args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()
```

Observation:

* All threads share the same process
* Execution order is not guaranteed
* Output order may vary between runs

This nondeterminism is normal in multithreaded programs.

---

## Shared Memory in Threads

Threads share:

* Global variables
* Heap objects
* Module-level state

```python
counter = 0

def increment():
    global counter
    counter += 1
```

All threads see and modify the same `counter`.

This sharing is powerful but dangerous without proper synchronization.

---

## Race Conditions (Conceptual)

A **race condition** occurs when:

* Multiple threads access shared data
* At least one thread modifies the data
* Execution order affects the result

```python
counter = 0

def increment():
    global counter
    for _ in range(100_000):
        counter += 1
```

Running this in multiple threads may produce incorrect results due to overlapping operations.

This problem exists even before considering the GIL.

---

## Where the GIL Fits In

The Global Interpreter Lock (GIL) controls **execution of Python bytecode**, not thread creation.

Key points already established earlier:

* Only one thread executes Python bytecode at a time
* Threads still exist and are scheduled
* The GIL protects Python’s internal memory management

Now the focus is on **how this affects real threaded programs**.

---

## Threads With CPU-Bound Work

CPU-bound work is computation-heavy and spends most of its time executing Python code.

```python
def cpu_task():
    total = 0
    for i in range(10_000_000):
        total += i
```

If multiple threads run this function:

* Threads compete for the GIL
* Only one thread makes progress at a time
* CPU cores are not fully utilized

Threads do not provide speedup for CPU-bound Python code.

---

## Threads With I/O-Bound Work

I/O-bound work spends time waiting:

* Network requests
* Disk reads
* Sleeping
* External APIs

```python
import time

def io_task():
    time.sleep(2)
```

During I/O:

* The thread releases the GIL
* Another thread can run
* Waiting time overlaps

This is where threads are effective.

---

## Script-Based Demonstration: CPU-Bound Threads

This code must be saved as `thread_cpu_demo.py` and executed from the terminal using:

```
python thread_cpu_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import time

def cpu_task():
    count = 0
    for _ in range(10_000_000):
        count += 1

start = time.time()

threads = [
    threading.Thread(target=cpu_task),
    threading.Thread(target=cpu_task)
]

for t in threads:
    t.start()

for t in threads:
    t.join()

print("Time taken:", time.time() - start)
```

Observation:

* Execution time is close to single-thread execution
* CPU cores are not effectively used
* The GIL limits parallel execution

---

## Script-Based Demonstration: I/O-Bound Threads

This code must be saved as `thread_io_demo.py` and executed from the terminal using:

```
python thread_io_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import time

def io_task():
    time.sleep(2)

start = time.time()

threads = []
for _ in range(5):
    t = threading.Thread(target=io_task)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print("Time taken:", time.time() - start)
```

Observation:

* Total time is close to 2 seconds
* Threads overlap waiting time
* Threading improves throughput for I/O-bound tasks

---

## Thread Scheduling and Context Switching

Python switches between threads:

* After a fixed number of bytecode instructions
* When a thread performs blocking I/O

This switching:

* Provides fairness
* Keeps applications responsive
* Does not guarantee execution order

Developers should never rely on a specific thread order.

---

## Common Misconceptions About Threads and the GIL

* Threads do not run Python code in parallel on multiple cores
* Threads are not useless; they are effective for I/O
* The GIL does not prevent concurrency, only CPU parallelism
* Removing the GIL would require a fundamentally different memory model

Understanding these points prevents incorrect architectural decisions.

---

## When Threads Are the Right Choice

Threads are suitable when:

* Tasks are I/O-bound
* Shared memory simplifies design
* Latency matters more than throughput
* External systems dominate execution time

Examples:

* Web request handling
* Network clients
* Log ingestion
* API orchestration

---

## When Threads Are the Wrong Choice

Threads should be avoided when:

* Work is CPU-heavy
* Computation dominates execution
* True parallelism is required

In these cases:

* Multiprocessing
* Native extensions
* Vectorized libraries
  are more appropriate.

---

## Exercise: Thread Behavior Analysis

Create two Python scripts.

Script 1:

* Use multiple threads to perform a CPU-heavy loop
* Measure execution time

Script 2:

* Use multiple threads to perform a blocking operation (sleep or I/O)
* Measure execution time

Objective:

* Observe how the GIL affects CPU-bound threads
* Observe how threads overlap I/O-bound work
* Decide when threading is appropriate

The results should clearly show different behavior depending on workload type.

---

# `threading` Module Essentials

## Purpose of the `threading` Module

The `threading` module provides **high-level tools** for working with threads in Python. It allows programs to:

* Run multiple tasks concurrently
* Share memory between threads
* Coordinate execution safely
* Manage thread lifecycle explicitly

Unlike low-level thread APIs, `threading` is designed to be:

* Readable
* Portable
* Safer by default

Understanding these essentials is required before working with synchronization and thread safety.

---

## Creating and Starting a Thread

The most basic unit in the `threading` module is the `Thread` class.

```python
import threading

def task():
    print("Task running")

t = threading.Thread(target=task)
t.start()
t.join()
```

Explanation:

* `target` is the callable executed by the thread
* `start()` schedules the thread for execution
* `join()` blocks until the thread finishes

A thread does nothing until `start()` is called.

---

## Passing Arguments to Threads

Threads often need input data.

```python
import threading

def task(name):
    print(f"Task {name} running")

t = threading.Thread(target=task, args=("A",))
t.start()
t.join()
```

Explanation:

* `args` is a tuple of positional arguments
* Keyword arguments can be passed using `kwargs`
* The function signature remains unchanged

---

## Running Multiple Threads

```python
import threading

def worker(i):
    print(f"Worker {i} started")

threads = []

for i in range(3):
    t = threading.Thread(target=worker, args=(i,))
    threads.append(t)
    t.start()

for t in threads:
    t.join()
```

Observation:

* Threads run concurrently
* Output order is not guaranteed
* All threads complete before the program exits

Thread execution order should never be relied upon.

---

## Thread Naming

Threads can be given names to improve debugging and logging.

```python
import threading

def task():
    print(threading.current_thread().name)

t = threading.Thread(target=task, name="WorkerThread")
t.start()
t.join()
```

Explanation:

* Each thread has a name
* Names help identify threads in logs and debuggers
* Default names are auto-generated if not specified

---

## Getting the Current Thread

```python
import threading

def task():
    current = threading.current_thread()
    print(current.name, current.ident)

t = threading.Thread(target=task)
t.start()
t.join()
```

Explanation:

* `current_thread()` returns the running thread object
* `ident` is a unique thread identifier
* Useful for diagnostics and logging

---

## Daemon vs Non-Daemon Threads

Threads can be either **daemon** or **non-daemon**.

* Non-daemon threads keep the program alive
* Daemon threads do not block program exit

```python
import threading
import time

def background_task():
    while True:
        print("Running in background")
        time.sleep(1)

t = threading.Thread(target=background_task, daemon=True)
t.start()
```

Explanation:

* The program exits even if the daemon thread is still running
* Daemon threads are suitable for background helpers
* Important work should not be done in daemon threads

---

## Checking Thread State

```python
import threading
import time

def task():
    time.sleep(1)

t = threading.Thread(target=task)
t.start()

print(t.is_alive())
t.join()
print(t.is_alive())
```

Explanation:

* `is_alive()` checks if the thread is still running
* Useful for monitoring thread lifecycle

---

## Thread Lifecycle Summary

A thread typically goes through these states:

* Created
* Started
* Running
* Finished

Once a thread finishes:

* It cannot be restarted
* A new `Thread` object must be created

---

## Common Mistake: Forgetting `join()`

```python
import threading

def task():
    print("Task running")

threading.Thread(target=task).start()
print("Main thread finished")
```

Explanation:

* The main thread may exit before the worker finishes
* Results may appear incomplete or inconsistent
* `join()` ensures predictable shutdown

---

## Script-Based Demonstration: Thread Lifecycle

This code must be saved as `thread_lifecycle_demo.py` and executed from the terminal using:

```
python thread_lifecycle_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import time

def worker():
    print("Worker started")
    time.sleep(2)
    print("Worker finished")

t = threading.Thread(target=worker, name="Worker-1")
print("Thread created")

t.start()
print("Thread started")

t.join()
print("Thread joined, program exiting")
```

Observation:

* Thread creation does not start execution
* `start()` triggers execution
* `join()` ensures completion before exit

---

## Practical Guidelines for Using Threads

Threads should be:

* Short-lived when possible
* Clearly named
* Properly joined
* Used primarily for I/O-bound tasks

Avoid:

* Long-running daemon threads doing critical work
* Relying on execution order
* Modifying shared data without synchronization

Synchronization mechanisms will be covered next.

---

## Exercise: Basic Thread Management

Create a Python script that:

* Starts three threads
* Each thread prints its name and sleeps for a different duration
* The main thread waits for all threads to complete
* Prints a final message after all threads finish

Objective:

* Practice thread creation and lifecycle management
* Observe nondeterministic execution order
* Use `join()` correctly

The output should demonstrate that:

* Threads run concurrently
* The main thread waits for all workers
* Program exits cleanly

---

# Synchronization Using Locks, Events, Semaphores, and Queues

## Why Synchronization Is Needed

When multiple threads run at the same time, they often **share data**. Shared data can be:

* Variables
* Lists or dictionaries
* Files
* Network connections
* Any object in memory

Without coordination, multiple threads may:

* Read and write data at the same time
* Interfere with each other’s operations
* Produce incorrect or inconsistent results

**Synchronization** is the set of techniques used to control how threads access shared resources so that execution remains correct and predictable.

---

## The Core Problem: Race Conditions

A **race condition** occurs when:

* Multiple threads access shared data
* At least one thread modifies the data
* The final result depends on the execution order

Example:

```python
counter = 0

def increment():
    global counter
    counter += 1
```

If multiple threads run `increment()` at the same time:

* The operations can overlap
* Some increments may be lost
* The final value may be incorrect

This happens even though each line of code looks simple.

---

## Why the GIL Does Not Solve This Problem

Even with the Global Interpreter Lock:

* Threads can switch between bytecode instructions
* Operations like `counter += 1` are not atomic
* Partial updates can still interleave

Therefore:

* The GIL protects Python’s internals
* It does not protect your application data

Synchronization is still required.

---

## Lock: The Most Basic Synchronization Tool

A **Lock** ensures that only **one thread at a time** can execute a specific section of code.

Locks are used to protect **critical sections**, which are code blocks that access shared data.

---

## Creating and Using a Lock

```python
import threading

lock = threading.Lock()

def increment():
    global counter
    with lock:
        counter += 1
```

Explanation:

* A thread must acquire the lock before entering the `with` block
* Other threads wait until the lock is released
* The lock is released automatically when the block exits

This guarantees correctness.

---

## Lock Behavior in Practice

```python
import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:
            counter += 1

threads = [
    threading.Thread(target=increment),
    threading.Thread(target=increment)
]

for t in threads:
    t.start()

for t in threads:
    t.join()

print(counter)
```

Observation:

* The final value is correct
* Performance is slower than unsynchronized code
* Correctness is more important than speed

---

## Event: Signaling Between Threads

An **Event** is used for **communication**, not protection.

It allows:

* One thread to signal something has happened
* Other threads to wait until the signal is set

An Event has two states:

* Unset (False)
* Set (True)

---

## Using an Event

```python
import threading
import time

event = threading.Event()

def wait_for_event():
    print("Waiting for event")
    event.wait()
    print("Event received")

def trigger_event():
    time.sleep(2)
    event.set()

threading.Thread(target=wait_for_event).start()
threading.Thread(target=trigger_event).start()
```

Explanation:

* `wait()` blocks until the event is set
* `set()` wakes all waiting threads
* Useful for coordination, not mutual exclusion

---

## Semaphore: Controlling Access Count

A **Semaphore** controls how many threads can access a resource **at the same time**.

Unlike a Lock:

* A Lock allows only one thread
* A Semaphore allows a fixed number of threads

---

## Using a Semaphore

```python
import threading
import time

semaphore = threading.Semaphore(2)

def access_resource(name):
    with semaphore:
        print(f"{name} accessing resource")
        time.sleep(1)
        print(f"{name} leaving resource")

threads = [
    threading.Thread(target=access_resource, args=(f"Thread-{i}",))
    for i in range(4)
]

for t in threads:
    t.start()

for t in threads:
    t.join()
```

Observation:

* Only two threads access the resource at once
* Others wait until a slot is available
* Useful for connection pools and rate limits

---

## Queue: Thread-Safe Data Exchange

A **Queue** is a thread-safe data structure designed for **producer-consumer** patterns.

Unlike lists:

* Queues handle synchronization internally
* No manual locks are needed
* Threads can safely add and remove items

---

## Basic Queue Usage

```python
import threading
import queue
import time

q = queue.Queue()

def producer():
    for i in range(5):
        q.put(i)
        print(f"Produced {i}")
        time.sleep(0.5)

def consumer():
    while True:
        item = q.get()
        print(f"Consumed {item}")
        q.task_done()

threading.Thread(target=producer).start()
threading.Thread(target=consumer, daemon=True).start()
```

Explanation:

* `put()` adds an item safely
* `get()` removes an item safely
* `task_done()` signals completion
* Queues handle locking internally

---

## Why Queues Are Preferred Over Manual Locks

Queues:

* Reduce complexity
* Prevent common synchronization bugs
* Scale well with multiple producers and consumers
* Encourage clean architectural patterns

In most cases, using a Queue is safer than sharing a list with a Lock.

---

## Script-Based Demonstration: Producer-Consumer Pattern

This code must be saved as `queue_producer_consumer.py` and executed from the terminal using:

```
python queue_producer_consumer.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import queue
import time

q = queue.Queue()

def producer():
    for i in range(3):
        print(f"Producing {i}")
        q.put(i)
        time.sleep(1)
    q.put(None)

def consumer():
    while True:
        item = q.get()
        if item is None:
            break
        print(f"Consuming {item}")
        q.task_done()

t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)

t1.start()
t2.start()

t1.join()
t2.join()
```

---

## Choosing the Right Synchronization Tool

| Tool      | Purpose                            |
| --------- | ---------------------------------- |
| Lock      | Protect shared data                |
| Event     | Signal between threads             |
| Semaphore | Limit concurrent access            |
| Queue     | Safe data exchange between threads |

Choosing the correct tool simplifies design and reduces bugs.

---

## Exercise: Synchronization in Practice

Create a Python script that:

* Uses a Lock to protect a shared counter
* Uses an Event to signal worker threads to start
* Uses a Semaphore to limit concurrent execution
* Uses a Queue to pass tasks between threads

Objective:

* Practice each synchronization primitive
* Understand their different roles
* Observe safe, predictable behavior

The solution should demonstrate:

* Correct data updates
* Proper coordination
* No race conditions

---

# Thread-Safe Data Access and Deadlock Avoidance

## What Does “Thread-Safe” Mean?

A piece of code is **thread-safe** if it behaves correctly when accessed by multiple threads at the same time.

Correct behavior means:

* Data remains consistent
* No updates are lost
* No unexpected crashes occur
* Results do not depend on execution order

Thread safety is not automatic. It must be **designed deliberately**.

---

## Why Thread Safety Is Necessary

Threads share memory. This means:

* Multiple threads can read the same variable
* Multiple threads can modify the same object
* Operations can interleave unpredictably

Without thread safety:

* Data corruption can occur
* Bugs appear randomly
* Problems are difficult to reproduce

Thread safety protects **shared state**.

---

## Identifying Shared Data

Shared data includes:

* Global variables
* Objects referenced by multiple threads
* Class attributes
* Data structures passed between threads

Local variables inside a function are **not shared** unless explicitly referenced elsewhere.

Understanding what is shared is the first step toward safety.

---

## Example of Unsafe Data Access

```python
counter = 0

def increment():
    global counter
    counter += 1
```

If multiple threads call `increment()`:

* `counter += 1` is broken into multiple steps
* Threads can interrupt each other
* The final result may be incorrect

This is not thread-safe.

---

## Making Data Access Thread-Safe with Locks

The simplest way to make code thread-safe is to use a **Lock**.

```python
import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    with lock:
        counter += 1
```

Explanation:

* Only one thread enters the critical section
* Other threads wait
* Data integrity is preserved

This pattern is the foundation of thread-safe design.

---

## Critical Sections

A **critical section** is:

* The smallest piece of code that accesses shared data
* The section that must not run concurrently

Good practice:

* Keep critical sections short
* Lock only what is necessary
* Avoid unnecessary blocking

This improves both safety and performance.

---

## Thread-Safe Data Structures

Some Python data structures are **partially thread-safe** for simple operations.

Examples:

* `queue.Queue` (fully thread-safe)
* `collections.deque` (thread-safe appends and pops)

However:

* Compound operations are not automatically safe
* Assumptions about safety can lead to bugs

Explicit synchronization is still required for complex logic.

---

## Avoiding Shared State When Possible

The safest thread-safe code is code that:

* Avoids shared mutable state
* Uses message passing instead of shared variables
* Communicates through Queues

This approach reduces the need for locks and simplifies reasoning.

---

## What Is a Deadlock?

A **deadlock** occurs when:

* Two or more threads are waiting for each other
* Each thread holds a resource the other needs
* No thread can proceed

The program stops making progress indefinitely.

---

## Simple Deadlock Example

```python
import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def task_one():
    with lock_a:
        time.sleep(1)
        with lock_b:
            print("Task one complete")

def task_two():
    with lock_b:
        time.sleep(1)
        with lock_a:
            print("Task two complete")
```

Explanation:

* `task_one` holds `lock_a` and waits for `lock_b`
* `task_two` holds `lock_b` and waits for `lock_a`
* Both threads are blocked forever

This is a deadlock.

---

## Why Deadlocks Are Dangerous

Deadlocks:

* Do not raise errors
* Do not crash the program
* Appear as “hung” applications
* Are difficult to debug

Preventing deadlocks is easier than fixing them.

---

## Deadlock Avoidance Rule #1: Lock Ordering

Always acquire multiple locks in the **same order**.

```python
def safe_task():
    with lock_a:
        with lock_b:
            print("Safe execution")
```

If all threads follow the same order:

* Circular waiting cannot occur
* Deadlocks are avoided

This is the most important deadlock prevention technique.

---

## Deadlock Avoidance Rule #2: Minimize Lock Scope

Holding locks longer than necessary increases deadlock risk.

Good practice:

* Acquire lock as late as possible
* Release lock as early as possible
* Avoid long operations inside locks

This reduces contention and waiting.

---

## Deadlock Avoidance Rule #3: Avoid Nested Locks When Possible

Nested locks increase complexity.

Instead of:

```python
with lock_a:
    with lock_b:
        ...
```

Consider:

* Merging critical sections
* Using a single lock
* Redesigning data flow

Simpler locking leads to safer code.

---

## Deadlock Avoidance Rule #4: Use Timeouts

Locks can be acquired with timeouts.

```python
acquired = lock.acquire(timeout=1)
if acquired:
    try:
        print("Lock acquired")
    finally:
        lock.release()
```

Explanation:

* Threads do not wait forever
* Timeouts allow recovery logic
* Useful in high-availability systems

---

## Using Context Managers for Safety

Always prefer `with lock:` syntax.

Benefits:

* Automatic release
* Exception-safe behavior
* Clear and readable code

This reduces mistakes that lead to deadlocks.

---

## Script-Based Demonstration: Deadlock and Safe Access

This code must be saved as `deadlock_demo.py` and executed from the terminal using:

```
python deadlock_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import time

lock_a = threading.Lock()
lock_b = threading.Lock()

def safe_task():
    with lock_a:
        time.sleep(0.5)
        with lock_b:
            print("Safe task completed")

t1 = threading.Thread(target=safe_task)
t2 = threading.Thread(target=safe_task)

t1.start()
t2.start()

t1.join()
t2.join()
```

Observation:

* Both threads complete successfully
* Lock ordering prevents deadlock

---

## Real-World Best Practices

* Protect shared data with locks
* Prefer queues over shared state
* Keep critical sections small
* Follow consistent lock ordering
* Avoid unnecessary nested locks
* Design for simplicity

Thread-safe code is easier to reason about when structure is clean.

---

## Exercise: Safe Access and Deadlock Prevention

Create a Python script that:

* Uses two shared resources
* Protects each resource with a lock
* Ensures all threads acquire locks in the same order
* Demonstrates safe execution without deadlock

Objective:

* Practice identifying critical sections
* Apply deadlock avoidance rules
* Observe predictable behavior

The solution should clearly show that:

* Data remains consistent
* Threads do not block indefinitely
* Execution completes successfully

---

In [14]:
# Module 4B: Multi-processing
# • Using the multiprocessing module
# • Process Pools and shared memory objects
# • Inter-process communication (IPC) and data serialization
# • Best use cases for processes vs threads

In [15]:
# What is Multiprocessing?
# Multiprocessing means running your work using multiple processes (separate Python programs) at the same time.
# A process is a running program with its own memory
# Each process can run on a different CPU core
# This gives true parallel execution for CPU-heavy work

# Why do we need multiprocessing?
# Because Python threads don’t speed up CPU-bound code well in CPython due to the GIL.
# Multiprocessing solves this by using separate processes, each with its own GIL.

# Use multiprocessing when the work is:
# - heavy calculations
# - large loops
# - data processing
# - CPU-bound tasks (compression, encryption, simulations)

# Key Terms:
# 1. Process: independent worker program
# 2. Main process: your original Python program
# 3. Child process: a new process created by main
# 4. IPC (Inter-Process Communication): ways to share data between processes (Queue, Pipe, shared memory)
# 5. Pickling: Python converts objects to bytes to send them to another process (important limitation)

In [17]:
# # The simplest multiprocessing example (start + join)

# import multiprocessing
# import time

# def task():
#     print("Child process: starting work")
#     time.sleep(2)  # simulate work
#     print("Child process: finished work")

# if __name__ == "__main__":
#     # This guard is required on Windows/macOS (spawn mode)
#     start = time.time()

#     p = multiprocessing.Process(target=task)  # create a new process to run task()
#     p.start()                                 # start the child process
#     p.join()                                  # wait for child process to finish

#     end = time.time()
#     print("Time taken:", end - start)

# # OUTPUT (approx):
# # Child process: starting work
# # Child process: finished work
# # Time taken: ~2.0


In [None]:
# B) Why multiprocessing helps CPU-bound work (real benefit)

# CPU-heavy function
import multiprocessing
import time

def cpu_task(n):
    total = 0
    for i in range(n):
        total += i
    return total

# But processes can’t directly “return” like normal calls, so we use a Queue to get results.

In [None]:
# Using Queue to collect results

import multiprocessing
import time

def cpu_task(n, out_q):
    total = 0
    for i in range(n):
        total += i
    out_q.put(total)  # send result back to main process

if __name__ == "__main__":
    start = time.time()

    q = multiprocessing.Queue()  # process-safe queue

    p1 = multiprocessing.Process(target=cpu_task, args=(50_000_000, q))
    p2 = multiprocessing.Process(target=cpu_task, args=(50_000_000, q))

    p1.start()
    p2.start()

    p1.join()
    p2.join()

    r1 = q.get()  # get result from process 1
    r2 = q.get()  # get result from process 2

    end = time.time()
    print("Two results received:", r1, r2)
    print("Time taken:", end - start)


In [None]:
# The easiest high-level multiprocessing tool: Pool

# When you have many inputs and want to run the same function on them, use Pool.

# C) Pool.map()
import multiprocessing
import os

def square(x):
    # os.getpid() shows which process is running this
    print("Squaring", x, "in process", os.getpid())
    return x * x

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    with multiprocessing.Pool(processes=2) as pool:
        results = pool.map(square, numbers)  # run square() on each item in parallel

    print("Results:", results)

# OUTPUT (pid will differ):
# Squaring 1 in process 12345
# Squaring 2 in process 12346
# Squaring 3 in process 12345
# Squaring 4 in process 12346
# Squaring 5 in process 12345
# Results: [1, 4, 9, 16, 25]


# Pool creates a fixed number of worker processes
# map distributes tasks across workers
# Returns results in the same order as inputs

In [None]:
# Passing data to processes

# D) Why normal variables don’t update across processes
# Processes do not share memory. Example:

import multiprocessing

counter = 0

def increment():
    global counter
    counter += 1  # this changes counter only inside the child process

if __name__ == "__main__":
    p = multiprocessing.Process(target=increment)
    p.start()
    p.join()

    print("Counter in main process:", counter)
    # OUTPUT: Counter in main process: 0


# global ≠ shared across processes
# Changes in a child process never affect the parent unless you explicitly share data

# To share data between processes, use:
# multiprocessing.Value
# multiprocessing.Array
# multiprocessing.Queue
# multiprocessing.Manager

In [None]:
# Sharing data properly between processes

# E) Use multiprocessing.Value for simple shared numbers

import multiprocessing

def increment(shared_counter):
    # shared_counter is a special object stored in shared memory
    with shared_counter.get_lock():      # lock to avoid race conditions
        shared_counter.value += 1

if __name__ == "__main__":
    counter = multiprocessing.Value("i", 0)  # "i" = integer

    p1 = multiprocessing.Process(target=increment, args=(counter,))
    p2 = multiprocessing.Process(target=increment, args=(counter,))

    p1.start(); p2.start()
    p1.join();  p2.join()

    print("Shared counter:", counter.value)
    # OUTPUT: Shared counter: 2


# Normal globals are not shared between processes
# Use multiprocessing.Value for shared scalars
# Always protect shared memory with a lock
# This pattern enables safe inter-process mutation

In [None]:
# F) Use multiprocessing.Queue for messages/results

import multiprocessing

def worker(out_q):
    out_q.put("Hello from child process")

if __name__ == "__main__":
    q = multiprocessing.Queue()

    p = multiprocessing.Process(target=worker, args=(q,))
    p.start()
    msg = q.get()   # read message from child
    p.join()

    print("Received:", msg)
    # OUTPUT: Received: Hello from child process

# Use multiprocessing.Queue to pass data between processes
# put() sends data
# get() receives data
# join() ensures synchronization

In [20]:
# Common rules to be followed:

# 1) Always use if __name__ == "__main__":
# Without it, some systems may repeatedly spawn processes.

# 2) Only send “picklable” objects
# Processes communicate by serializing data (pickling).
# Some objects cannot be pickled easily (open file handles, some lambdas, local functions).

# 3) Processes are heavier than threads
# - Higher startup cost
# - More memory usage
# - So use multiprocessing mainly for CPU-bound work.

# Using the `multiprocessing` Module

## Why Multiprocessing Is Needed

So far, execution has been discussed using **threads**. Threads share memory and are limited by the **Global Interpreter Lock (GIL)** when executing CPU-bound Python code.

Multiprocessing exists to solve a different problem.

Multiprocessing allows a program to:

* Run **multiple processes**
* Use **multiple CPU cores**
* Execute Python code truly in parallel
* Avoid the GIL entirely

Each process has its **own Python interpreter and memory space**.

---

## Process vs Thread (Starting From Basics)

### Thread

* Runs inside a process
* Shares memory with other threads
* Limited by the GIL for CPU-bound work
* Lightweight and fast to create

### Process

* Independent execution unit
* Own memory space
* Own Python interpreter
* Can run on a separate CPU core
* Heavier than threads but truly parallel

Multiprocessing is the correct tool for **CPU-bound workloads**.

---

## What Is the `multiprocessing` Module?

The `multiprocessing` module is part of Python’s standard library.

It provides:

* Process creation similar to threads
* Inter-process communication (IPC)
* Shared memory primitives
* Process pools for parallel execution

Its API is intentionally similar to the `threading` module, making it easier to learn.

---

## Basic Concept: Running Code in Another Process

When using multiprocessing:

* The main program starts one or more child processes
* Each child process runs independently
* Memory is not shared automatically

This isolation prevents race conditions but introduces communication challenges.

---

## Creating a Process (Minimal Example)

```python
import multiprocessing

def task():
    print("Running in a separate process")

if __name__ == "__main__":
    p = multiprocessing.Process(target=task)
    p.start()
    p.join()
```

Explanation:

* `Process` creates a new process
* `start()` launches it
* `join()` waits for it to finish
* The `if __name__ == "__main__"` guard is mandatory

---

## Why the `__main__` Guard Is Required

On many systems (especially Windows and macOS):

* A new process imports the main module
* Without the guard, the process creation code runs again
* This leads to infinite process spawning

The guard ensures that:

* Process creation runs only in the main program
* Child processes execute only the target function

This is a critical rule in multiprocessing.

---

## Running Multiple Processes

```python
import multiprocessing

def worker(name):
    print(f"Worker {name} running")

if __name__ == "__main__":
    processes = []

    for i in range(3):
        p = multiprocessing.Process(target=worker, args=(i,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()
```

Observation:

* Multiple processes run concurrently
* Output order is not guaranteed
* Each process runs independently

---

## Memory Isolation Between Processes

Processes do **not share memory by default**.

```python
import multiprocessing

counter = 0

def increment():
    global counter
    counter += 1
    print(counter)

if __name__ == "__main__":
    p = multiprocessing.Process(target=increment)
    p.start()
    p.join()
    print("Main counter:", counter)
```

Explanation:

* The child process modifies its own copy of `counter`
* The main process remains unchanged
* No shared state exists automatically

This behavior avoids many threading problems but requires explicit communication.

---

## Passing Data to Processes

Arguments can be passed to processes at creation time.

```python
import multiprocessing

def square(n):
    print(n * n)

if __name__ == "__main__":
    p = multiprocessing.Process(target=square, args=(5,))
    p.start()
    p.join()
```

Data passed as arguments is **copied** (serialized) into the child process.

---

## Returning Results From a Process

Processes cannot return values directly like function calls.

Instead, multiprocessing provides:

* Queues
* Pipes
* Shared memory objects

---

## Using a Queue for Inter-Process Communication

```python
import multiprocessing

def worker(q):
    q.put("Result from process")

if __name__ == "__main__":
    q = multiprocessing.Queue()
    p = multiprocessing.Process(target=worker, args=(q,))
    p.start()
    print(q.get())
    p.join()
```

Explanation:

* `Queue` is process-safe
* Data is serialized and transferred between processes
* This is the most common IPC mechanism

---

## Multiple Processes Using a Queue

```python
import multiprocessing

def worker(num, q):
    q.put(num * num)

if __name__ == "__main__":
    q = multiprocessing.Queue()
    processes = []

    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i, q))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    while not q.empty():
        print(q.get())
```

Observation:

* Results arrive in unpredictable order
* All processes execute independently
* The queue safely collects results

---

## Shared Memory Objects

When data must be shared, multiprocessing provides controlled shared memory.

### Shared Value

```python
import multiprocessing

def increment(shared_value):
    with shared_value.get_lock():
        shared_value.value += 1

if __name__ == "__main__":
    value = multiprocessing.Value("i", 0)
    processes = []

    for _ in range(5):
        p = multiprocessing.Process(target=increment, args=(value,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print("Final value:", value.value)
```

Explanation:

* `Value` creates shared memory
* A lock is used to ensure safety
* Shared memory must still be synchronized

---

## Shared Arrays

```python
import multiprocessing

def modify(arr):
    arr[0] += 1

if __name__ == "__main__":
    arr = multiprocessing.Array("i", [0, 1, 2])
    p = multiprocessing.Process(target=modify, args=(arr,))
    p.start()
    p.join()
    print(list(arr))
```

Shared arrays allow limited shared-state designs.

---

## Process Pools: Managing Multiple Workers Easily

Manually managing processes becomes complex at scale.

A **process pool**:

* Manages a fixed number of worker processes
* Distributes tasks automatically
* Reuses processes efficiently

---

## Using `multiprocessing.Pool`

```python
import multiprocessing

def square(n):
    return n * n

if __name__ == "__main__":
    with multiprocessing.Pool(processes=4) as pool:
        results = pool.map(square, [1, 2, 3, 4, 5])
        print(results)
```

Explanation:

* A pool of worker processes is created
* Tasks are distributed automatically
* Results are collected in order

This is the most common multiprocessing pattern.

---

## When to Use `map`, `apply`, and `apply_async`

* `map`: parallel version of built-in `map`
* `apply`: run one task in a worker
* `apply_async`: non-blocking execution

```python
with multiprocessing.Pool(2) as pool:
    result = pool.apply(square, (5,))
    print(result)
```

---

## CPU-Bound Performance Advantage

Multiprocessing enables true parallelism.

```python
import multiprocessing
import time

def cpu_task():
    count = 0
    for _ in range(10_000_000):
        count += 1

if __name__ == "__main__":
    start = time.time()

    processes = [
        multiprocessing.Process(target=cpu_task),
        multiprocessing.Process(target=cpu_task)
    ]

    for p in processes:
        p.start()

    for p in processes:
        p.join()

    print("Time taken:", time.time() - start)
```

Observation:

* Multiple CPU cores are utilized
* Execution is faster than threading for CPU-bound work

---

## Costs of Multiprocessing

Multiprocessing is powerful but has trade-offs:

* Higher memory usage
* Slower startup time
* Serialization overhead
* More complex debugging

It should not be used blindly.

---

## When to Use Multiprocessing

Use multiprocessing when:

* Work is CPU-bound
* Tasks are independent
* Data can be partitioned
* Parallel execution is required

Avoid multiprocessing when:

* Tasks are small and short-lived
* Data sharing is frequent
* Startup overhead dominates execution

---

## Exercise: Parallel Computation Using Multiprocessing

Create a Python script that:

* Computes the square of numbers from 1 to 10
* Uses a process pool
* Prints the results
* Uses the `__main__` guard correctly

Objective:

* Practice process creation
* Use multiprocessing safely
* Observe parallel execution

The solution should clearly show:

* Independent processes running
* Correct result collection
* Clean program termination

---

# Process Pools and Shared Memory Objects

## Why This Topic Exists

When learning multiprocessing for the first time, two problems appear very quickly:

1. **Creating and managing many processes manually is hard**
2. **Processes do not share memory by default**

To solve these:

* **Process Pools** manage worker processes for you
* **Shared Memory Objects** allow controlled data sharing between processes

This topic starts from absolute basics and builds step by step.

---

## Recap: What a Process Is (Very Brief)

* A process has its **own memory**
* A process runs on its **own CPU core**
* Processes do **not share variables automatically**
* Communication must be explicit

This is very different from threads.

---

## Problem 1: Manual Process Management Does Not Scale

You already know how to create processes like this:

```python
p = multiprocessing.Process(target=task)
p.start()
p.join()
```

This is fine for:

* 1 or 2 processes
* Simple demonstrations

But it becomes messy when:

* You need 10, 50, or 100 tasks
* You want to reuse processes
* You want results collected automatically

This is why **Process Pools** exist.

---

## What Is a Process Pool?

A **Process Pool** is:

* A fixed number of worker processes
* Created once
* Reused to execute many tasks

Instead of manually creating processes:

* You submit tasks to the pool
* The pool assigns tasks to available workers
* Results are collected for you

Think of it as a **worker team** instead of hiring a new worker for every task.

---

## Creating a Process Pool (Basic Example)

```python
import multiprocessing

def task(n):
    return n * n

if __name__ == "__main__":
    pool = multiprocessing.Pool(processes=4)
    results = pool.map(task, [1, 2, 3, 4, 5])
    pool.close()
    pool.join()
    print(results)
```

Explanation:

* `processes=4` creates 4 worker processes
* `map()` distributes work across workers
* Results are returned as a list
* The pool is closed and joined cleanly

This is the **most common pool usage pattern**.

---

## Why `map()` Feels Familiar

`pool.map()` works like Python’s built-in `map()`:

```python
map(function, iterable)
```

But instead of running sequentially:

* Each item may run in parallel
* Tasks are distributed automatically
* CPU cores are utilized efficiently

This makes learning pools much easier.

---

## Using a Pool with a Context Manager (Recommended)

```python
import multiprocessing

def task(n):
    return n * n

if __name__ == "__main__":
    with multiprocessing.Pool(4) as pool:
        results = pool.map(task, range(6))
    print(results)
```

Explanation:

* The pool is created automatically
* The pool is cleaned up automatically
* This is safer and cleaner than manual close/join

Always prefer this pattern.

---

## Blocking vs Non-Blocking Pool Calls

### Blocking Call (`map`)

```python
results = pool.map(task, data)
```

* Waits until all tasks finish
* Returns results in order
* Simple and safe

---

### Non-Blocking Call (`apply_async`)

```python
result = pool.apply_async(task, (5,))
print(result.get())
```

Explanation:

* Task runs in background
* `get()` waits for the result
* Useful for more complex workflows

Beginners should start with `map()`.

---

## Problem 2: Processes Do Not Share Memory

Consider this example:

```python
counter = 0

def increment():
    global counter
    counter += 1
```

With multiprocessing:

* Each process has its **own copy** of `counter`
* Changes do not affect other processes
* This avoids race conditions but limits coordination

To share data, Python provides **shared memory objects**.

---

## What Are Shared Memory Objects?

Shared memory objects allow:

* Multiple processes to access the same memory
* Controlled, synchronized updates
* Explicit data sharing

Multiprocessing provides:

* `Value` for single values
* `Array` for sequences
* Managers for complex objects

---

## Shared `Value`: Single Shared Variable

```python
import multiprocessing

def increment(shared_value):
    with shared_value.get_lock():
        shared_value.value += 1

if __name__ == "__main__":
    value = multiprocessing.Value("i", 0)
    processes = []

    for _ in range(5):
        p = multiprocessing.Process(target=increment, args=(value,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print("Final value:", value.value)
```

Explanation:

* `"i"` means integer
* `shared_value.value` accesses the data
* A lock ensures safe updates
* Without the lock, data corruption can occur

---

## Why Locks Are Still Needed

Even though memory is shared:

* Multiple processes may write at the same time
* Operations are not atomic
* Locks prevent overlapping updates

Shared memory does **not** mean automatic safety.

---

## Shared `Array`: Shared Sequence of Values

```python
import multiprocessing

def modify(arr):
    arr[0] += 1

if __name__ == "__main__":
    arr = multiprocessing.Array("i", [1, 2, 3])
    p = multiprocessing.Process(target=modify, args=(arr,))
    p.start()
    p.join()
    print(list(arr))
```

Explanation:

* Shared arrays behave like lists
* Changes are visible across processes
* Best used for numeric data

---

## Using a Manager for Complex Shared Objects

Managers allow sharing:

* Lists
* Dictionaries
* Sets
* Custom objects (with limitations)

```python
import multiprocessing

def add_item(shared_list):
    shared_list.append("data")

if __name__ == "__main__":
    with multiprocessing.Manager() as manager:
        shared_list = manager.list()
        p = multiprocessing.Process(target=add_item, args=(shared_list,))
        p.start()
        p.join()
        print(shared_list)
```

Explanation:

* Managers use a server process
* Slower than shared memory
* Easier for beginners and complex data

---

## Shared Memory vs Queues

| Use Case              | Best Tool |
| --------------------- | --------- |
| Return results        | Queue     |
| Stream tasks          | Queue     |
| Share counters        | Value     |
| Share numeric arrays  | Array     |
| Share complex objects | Manager   |
| Avoid shared state    | Queue     |

Queues are usually safer and simpler than shared memory.

---

## Script-Based Demonstration: Pool + Shared Value

This code must be saved as `pool_shared_value_demo.py` and executed from the terminal using:

```
python pool_shared_value_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import multiprocessing

def increment(shared_value):
    with shared_value.get_lock():
        shared_value.value += 1

if __name__ == "__main__":
    value = multiprocessing.Value("i", 0)

    with multiprocessing.Pool(4) as pool:
        pool.starmap(increment, [(value,) for _ in range(10)])

    print("Final value:", value.value)
```

---

## Common Beginner Mistakes

* Forgetting `if __name__ == "__main__"`
* Expecting global variables to be shared
* Using shared memory without locks
* Creating too many processes
* Using multiprocessing for tiny tasks

Avoiding these saves hours of debugging.

---

## When to Use Process Pools

Use process pools when:

* Tasks are CPU-bound
* Tasks are independent
* Same function runs on many inputs
* Results can be collected centrally

Avoid pools when:

* Tasks are very short
* Data transfer dominates computation
* Shared state is complex

---

## Exercise: Pool + Shared Data

Create a Python script that:

* Uses a process pool
* Squares numbers from 1 to 10
* Stores results in a shared list using a Manager
* Prints the final shared list

Objective:

* Practice pool creation
* Practice shared data usage
* Observe process coordination

The solution should demonstrate:

* Parallel execution
* Correct data sharing
* Clean shutdown

---

In [21]:
# Inter-Process Communication (IPC) and Data Serialization

# 1) First, what is the problem?
# When you use multiprocessing, you create separate processes.
# Each process has its own memory (its own private variables).

# So if a child process changes a normal Python variable, the main process will not see it.
# That is why we need IPC.


# What is IPC?
# IPC (Inter-Process Communication) means:
# Ways for one process to send data to another process.
    
# Common IPC tools in Python:
# - Queue (most common, easiest)
# - Pipe (two-way connection between two processes)
# - Shared memory (Value, Array, shared_memory) (fast for simple data)
# - Manager (shared dict/list across processes, slower but convenient)

In [22]:
# What is Data Serialization?
# When a process sends data to another process, Python must convert that data into a form that can travel between processes.
# This conversion is called serialization.

# In Python multiprocessing, serialization usually happens using pickle.

# So:
# Serialization = Convert Python object → bytes (sendable form)
# Deserialization = Convert bytes → Python object (usable form)

# Note:
# IPC often requires serialization, because processes don’t share normal memory.

# Why is Serialization needed?
# Processes are like separate rooms.
# To send something from Room A to Room B, you can’t “point to the same object”.
# You must pack the data and send it.
# That “packing” is serialization.

In [None]:
# A) IPC using multiprocessing.Queue (Most common)
# Goal:
# Child process calculates a result and sends it back to main process.

import multiprocessing

def worker(out_q):
    # This function runs inside the child process
    result = 10 * 10                 # do some work in child
    out_q.put(result)                # send result to main process via Queue

if __name__ == "__main__":
    out_q = multiprocessing.Queue()  # Queue is safe for processes (IPC tool)

    p = multiprocessing.Process(
        target=worker,               # child process will run worker()
        args=(out_q,)                # give the Queue to child
    )

    p.start()                        # start child process
    value = out_q.get()              # main process waits and receives data
    p.join()                         # ensure child has finished

    print("Received from child:", value)
    # OUTPUT: Received from child: 100

# # In this example, we understand:
# Processes do not share memory
# Data must be passed explicitly using IPC tools

# multiprocessing.Queue:
# is safe
# blocks correctly
# serializes objects automatically

In [None]:
# B) IPC using multiprocessing.Pipe (Two processes talk directly)
# Pipe is like a private phone line between two endpoints.

import multiprocessing

def worker(conn):
    # conn is one end of the pipe (child side)
    conn.send("Hello from child")    # send message to parent
    conn.close()                     # close this end when done

if __name__ == "__main__":
    parent_conn, child_conn = multiprocessing.Pipe()
    # parent_conn stays in main process
    # child_conn will be passed to child process

    p = multiprocessing.Process(target=worker, args=(child_conn,))
    p.start()

    msg = parent_conn.recv()         # receive message from child (blocks until available)
    p.join()

    print("Message:", msg)
    # OUTPUT: Message: Hello from child


# Our understanding from this code:
# multiprocessing.Pipe creates a direct communication channel

# Pipes are:
# faster than queues for one-to-one communication
# suitable for simple request/response patterns
# Processes still do not share memory
# Data must be explicitly sent

# Pipe vs Queue:
# Pipe
# point-to-point
# low overhead
# manual control

# Queue
# many-to-many
# safer for complex workflows
# higher overhead

# When to prefer Pipe
# Simple communication between exactly two processes
# Slightly lower overhead than Queue
# But Queue is easier for beginners and supports many producers/consumers.

In [24]:
# Serialization
# C) What does serialization look like?
# We can demonstrate with pickle (same idea multiprocessing uses).

import pickle

data = {"name": "Asha", "scores": [10, 20, 30]}

packed = pickle.dumps(data)     # serialize: Python object -> bytes
print(type(packed))
# OUTPUT: <class 'bytes'>

unpacked = pickle.loads(packed) # deserialize: bytes -> Python object
print(unpacked)
# OUTPUT: {'name': 'Asha', 'scores': [10, 20, 30]}

# Important Caution:
# Never unpickle data from untrusted sources
# Pickle can execute arbitrary code during loading
# It is not secure against malicious input

<class 'bytes'>
{'name': 'Asha', 'scores': [10, 20, 30]}


In [None]:
# Not everything can be serialized (pickled)

# Some objects cannot be easily sent to another process:
# open file handles
# sockets (directly)
# database connections
# generators
# lambdas (often)
# nested/local functions (often)

In [None]:
# Example of a common serialization failure
# (Do not run this in production; this is for learning.)

import multiprocessing

if __name__ == "__main__":
    q = multiprocessing.Queue()

    f = open("sample.txt", "w")      # file handle (not safe to pickle)

    try:
        q.put(f)                     # tries to serialize file object -> usually fails
    except Exception as e:
        print("Serialization failed:", e)

    f.close()


In [None]:
# Shared Memory vs Serialization
# Queue/Pipe usually serialize (pickle) data.

# Shared memory tools avoid copying:
# - multiprocessing.Value
# - multiprocessing.Array
# - multiprocessing.shared_memory (advanced)

# Use shared memory for:
# - large numeric data
# - performance-critical cases

# Use Queue/Pipe for:
# - general Python objects
# - simple IPC

In [26]:
# # E) Shared memory example (no pickling for the shared value itself)

# import multiprocessing

# def worker(shared_num):
#     # shared_num is stored in shared memory
#     shared_num.value += 5            # child modifies the shared value directly

# if __name__ == "__main__":
#     num = multiprocessing.Value("i", 10)  # shared integer starting at 10

#     p = multiprocessing.Process(target=worker, args=(num,))
#     p.start()
#     p.join()

#     print("Shared value:", num.value)
#     # OUTPUT: Shared value: 15


# Best Use Cases for Processes vs Threads

## Why This Comparison Matters

When building concurrent or parallel Python programs, one of the most important design decisions is choosing between:

* **Threads**
* **Processes**

Choosing the wrong one can lead to:

* Poor performance
* Unnecessary complexity
* Wasted system resources
* Hard-to-debug bugs

This topic explains **from first principles** when and why to use each approach.

---

## Starting From the Basics

Before comparing, it is important to clearly understand what threads and processes are.

### Thread (Basic Definition)

* A thread runs inside a process
* Threads share the same memory
* Threads are lightweight
* Context switching is fast

### Process (Basic Definition)

* A process has its own memory
* Processes do not share data automatically
* Each process has its own Python interpreter
* Processes can run on separate CPU cores

These fundamental differences drive all use cases.

---

## The Role of the GIL in Python

In CPython:

* Only one thread executes Python bytecode at a time
* This is enforced by the Global Interpreter Lock (GIL)

Implication:

* Threads do **not** run Python code in parallel on multiple cores
* Processes **do** run in parallel

This single fact is central to choosing between threads and processes.

---

## CPU-Bound Workloads

### What Is CPU-Bound Work?

CPU-bound work:

* Spends most time doing computation
* Uses CPU heavily
* Has little waiting

Examples:

* Image processing
* Data compression
* Numerical simulations
* Cryptographic calculations

---

### Threads for CPU-Bound Work

Threads:

* Compete for the GIL
* Run one at a time
* Do not scale across cores

Result:

* Multiple threads may be slower than one thread
* No true parallelism

Threads are a poor choice for CPU-bound work.

---

### Processes for CPU-Bound Work

Processes:

* Each process has its own GIL
* Can run simultaneously on multiple cores
* Scale with CPU availability

Result:

* True parallel execution
* Significant speedup

Processes are the correct choice for CPU-bound workloads.

---

## I/O-Bound Workloads

### What Is I/O-Bound Work?

I/O-bound work:

* Spends most time waiting
* Waits on disk, network, APIs, or databases

Examples:

* Web scraping
* File downloads
* Network services
* Logging systems

---

### Threads for I/O-Bound Work

Threads:

* Release the GIL during I/O
* Allow other threads to run
* Overlap waiting time

Result:

* High throughput
* Efficient resource usage
* Low overhead

Threads are an excellent choice for I/O-bound workloads.

---

### Processes for I/O-Bound Work

Processes:

* Also work for I/O-bound tasks
* Have higher overhead
* Require IPC for data sharing

Result:

* Often unnecessary complexity
* Slower startup and higher memory usage

Threads are usually preferred for I/O-bound work.

---

## Memory Sharing Considerations

### Threads

* Share memory naturally
* Easy data access
* High risk of race conditions
* Require careful synchronization

### Processes

* Do not share memory
* Safer isolation
* Explicit IPC required
* Fewer accidental data corruption bugs

Choice depends on whether shared state is required.

---

## Fault Isolation and Stability

### Threads

* A crash in one thread can crash the whole process
* Bugs can corrupt shared memory
* Harder to isolate failures

### Processes

* Process crashes are isolated
* One process failing does not crash others
* Better fault tolerance

For critical systems, processes provide stronger safety guarantees.

---

## Resource Usage Comparison

| Aspect          | Threads | Processes |
| --------------- | ------- | --------- |
| Startup cost    | Low     | High      |
| Memory usage    | Low     | High      |
| CPU parallelism | No      | Yes       |
| Data sharing    | Easy    | Explicit  |
| Fault isolation | Weak    | Strong    |
| Debugging       | Hard    | Easier    |

---

## Practical Use Case Scenarios

### Use Threads When:

* Work is I/O-bound
* Many tasks spend time waiting
* Shared memory simplifies design
* Latency is critical
* Tasks are lightweight

Examples:

* Web crawlers
* Network clients
* API gateways
* Concurrent file readers

---

### Use Processes When:

* Work is CPU-bound
* Parallel execution is required
* Tasks are independent
* Fault isolation is important
* Memory safety matters

Examples:

* Data analytics pipelines
* Video encoding
* Scientific computing
* Batch processing jobs

---

## Combining Threads and Processes

Many real systems use **both**.

Example:

* Processes handle CPU-heavy tasks
* Each process uses threads for I/O
* This maximizes CPU usage and responsiveness

This hybrid model is common in production systems.

---

## Simple Comparison Example (Conceptual)

```python
# I/O-bound
threads -> good choice

# CPU-bound
processes -> good choice
```

This simple rule solves most beginner decisions.

---

## Script-Based Demonstration: Threads vs Processes

This code must be saved as `threads_vs_processes_demo.py` and executed from the terminal using:

```
python threads_vs_processes_demo.py
```

It should not be run inside a Jupyter Notebook.

```python
import threading
import multiprocessing
import time

def cpu_task():
    total = 0
    for _ in range(10_000_000):
        total += 1

def run_threads():
    threads = [threading.Thread(target=cpu_task) for _ in range(2)]
    start = time.time()
    for t in threads:
        t.start()
    for t in threads:
        t.join()
    print("Threads time:", time.time() - start)

def run_processes():
    processes = [multiprocessing.Process(target=cpu_task) for _ in range(2)]
    start = time.time()
    for p in processes:
        p.start()
    for p in processes:
        p.join()
    print("Processes time:", time.time() - start)

if __name__ == "__main__":
    run_threads()
    run_processes()
```

---

## Common Beginner Mistakes

* Using threads for CPU-heavy tasks
* Expecting threads to use multiple cores
* Sharing large data across processes
* Overusing multiprocessing for simple I/O tasks
* Ignoring startup overhead

Understanding trade-offs prevents these mistakes.

---

## Decision Guide (Simple Rule)

Ask these questions:

1. Is the task CPU-heavy?

   * Yes → Use processes
   * No → Continue

2. Is the task mostly waiting for I/O?

   * Yes → Use threads
   * No → Re-evaluate design

This rule works for most real-world cases.

---

## Exercise: Choosing the Right Tool

Given the following tasks, decide whether to use threads or processes:

1. Downloading files from multiple URLs
2. Calculating prime numbers
3. Reading multiple log files from disk
4. Image resizing
5. Sending API requests

Objective:

* Apply workload analysis
* Choose appropriate concurrency model
* Justify the choice

---

In [27]:
# Module 4C: Async Programming 
# • asyncio framework overview 
# • Writing async coroutines and tasks 
# • Integrating async I/O with APIs and databases 
# • Combining async, threading, and multiprocessing safely

In [28]:
# Async Programming
# Async programming helps when your program spends a lot of time waiting (network, DB, file I/O).
# Instead of creating many threads, async uses one thread and switches between tasks while they wait.

# What problem does async solve?
# Many real programs do this:
# - call an API → wait
# - query DB → wait
# - download something → wait
# If you do these one-by-one, total time becomes very large.


# Key Terms:
# a) Event loop
# The event loop is like a manager:
# - it runs async tasks
# - it pauses tasks that are waiting
# - it resumes tasks when they are ready
# In asyncio, the event loop is managed by asyncio.run(...).

# b) Coroutine (async def)
# A coroutine is a special function declared with async def.
# It can pause itself using await.

# c) await
# await means:
# “Pause here until this operation finishes, and let the event loop run other tasks meanwhile.”

# d) Task
# A Task is a coroutine that has been scheduled to run concurrently by the event loop.

In [29]:
# asyncio framework overview
# # The asyncio module provides tools to write async code in Python.
# It provides:
# - event loop management
# - async/await syntax
# - high-level APIs for async I/O
# - tools like asyncio.sleep, asyncio.gather, etc. for common patterns

In [None]:
# Why normal code is slow (sequential)

import time

def download_simulation(n):
    time.sleep(2)  # blocking sleep (pretend it's network wait)
    return f"file-{n}"

start = time.time()

print(download_simulation(1))
print(download_simulation(2))
print(download_simulation(3))

end = time.time()
print("Time taken:", end - start)
# OUTPUT (approx):
# file-1
# file-2
# file-3
# Time taken: ~6.0

# Note: Meaning: each call blocks, so total time adds up.

file-1
file-2
file-3
Time taken: 6.015554666519165


In [None]:
# Async version

# A) Write an async coroutine

import asyncio
import time

async def download_simulation(n):
    # asyncio.sleep is non-blocking (it yields control to event loop)
    await asyncio.sleep(2)
    return f"file-{n}"

async def main():
    start = time.time()

    # schedule 3 downloads concurrently
    results = await asyncio.gather(
        download_simulation(1),
        download_simulation(2),
        download_simulation(3),
    )

    for r in results:
        print(r)

    end = time.time()
    print("Time taken:", end - start)

asyncio.run(main())

# OUTPUT (approx):
# file-1
# file-2
# file-3
# Time taken: ~2.0


In [None]:
# Writing async coroutines and tasks
# Calling an async function gives you a coroutine object, it does NOT run immediately.

import asyncio

async def hello():
    await asyncio.sleep(1)
    return "Hello"

async def main():
    c = hello()             # coroutine created, not running yet
    print(c)
    # OUTPUT: <coroutine object hello at ...>

asyncio.run(main())


In [None]:
# Task (scheduled to run)
# To start it concurrently, you wrap it in a Task.

import asyncio

async def hello():
    await asyncio.sleep(1)
    return "Hello"

async def main():
    t = asyncio.create_task(hello())  # scheduled immediately
    result = await t                  # wait for task to finish
    print(result)
    # OUTPUT: Hello

asyncio.run(main())


# asyncio.create_task():
# schedules a coroutine immediately
# allows it to run concurrently with other tasks

# await task:
# waits for the task’s completion
# retrieves its return value

| Code                       | Behavior                            |
| -------------------------- | ----------------------------------- |
| `c = hello()`              | Coroutine created, **not running**  |
| `t = create_task(hello())` | Coroutine scheduled and **running** |


In [None]:
# # Integrating async I/O with APIs and databases

# Async is best when using libraries that support async, like:
# - HTTP clients: aiohttp, httpx (async mode)
# - DB drivers: asyncpg (Postgres), aiomysql, motor (MongoDB)
# - Redis: redis.asyncio

# Async works best when the library provides awaitable operations.
# If the library is blocking (normal requests/DB driver), it will block the event loop.

import asyncio

async def fake_api_call(user_id):
    print("API: fetching user", user_id)
    await asyncio.sleep(1)          # pretend network wait (non-blocking)
    return {"id": user_id, "name": "User" + str(user_id)}

async def fake_db_save(user):
    print("DB: saving", user["id"])
    await asyncio.sleep(1)          # pretend DB wait (non-blocking)
    return True

async def main():
    # run multiple API calls concurrently
    users = await asyncio.gather(
        fake_api_call(1),
        fake_api_call(2),
        fake_api_call(3),
    )

    # then save them concurrently too
    saves = await asyncio.gather(*(fake_db_save(u) for u in users))

    print("Saved:", saves)
    # OUTPUT: Saved: [True, True, True]

asyncio.run(main())


In [32]:
# Combining async, threading, and multiprocessing safely
# The simple rule (very important)
# - Async is great for I/O-bound
# - Multiprocessing is best for CPU-bound
# - Threads are useful for blocking I/O libraries that have no async support

# Why do we combine them?
# Because sometimes you have:
# - async web server (asyncio)
# - but you must call a blocking library (thread)
# - and you must run heavy computation (process)

In [None]:
# A) Running blocking code safely inside asyncio (use a thread)
# If you call blocking functions directly, you freeze the event loop.
# So you offload blocking work to a thread.

import asyncio
import time

def blocking_work():
    time.sleep(2)           # blocking sleep
    return "blocking done"

async def main():
    # run blocking_work in a separate thread so event loop stays free
    result = await asyncio.to_thread(blocking_work)

    print(result)
    # OUTPUT: blocking done

asyncio.run(main())

# Use asyncio.to_thread() to run blocking functions in async programs
# Ideal for:
# legacy synchronous code
# CPU-light but blocking I/O

# Keeps async code fast and responsive

In [None]:
# B) Running CPU-heavy work safely inside asyncio (use a process)
# CPU-heavy work blocks everything (even in async).
# So push CPU work into a separate process.

import asyncio
from concurrent.futures import ProcessPoolExecutor

def cpu_heavy(n):
    total = 0
    for i in range(n):
        total += i
    return total

async def main():
    loop = asyncio.get_running_loop()

    # create a process pool for CPU heavy tasks
    with ProcessPoolExecutor() as pool:
        # run cpu_heavy in another process
        result = await loop.run_in_executor(pool, cpu_heavy, 50_000_000)

    print("CPU result:", result)

asyncio.run(main())


| Task type         | Best tool             |
| ----------------- | --------------------- |
| I/O-bound         | `asyncio`             |
| Blocking sync I/O | `asyncio.to_thread()` |
| CPU-bound         | `ProcessPoolExecutor` |


In [None]:
# Decision Guide:
# Use asyncio when:
# many API calls
# many DB queries (async driver)
# many sockets, web scraping, web servers

# Use threads when:
# you must call blocking libraries inside async
# file operations or legacy code blocks the event loop

# Use processes when:
# CPU heavy tasks (encryption, compression, ML preprocessing, large loops)

In [33]:
# Exercise: Choosing the Right Tool
# Given the following tasks, decide whether to use threads or processes:

# Downloading files from multiple URLs
# Calculating prime numbers
# Reading multiple log files from disk
# Image resizing
# Sending API requests
# Objective:

# Apply workload analysis
# Choose appropriate concurrency model
# Justify the choice

In [34]:
# Exercise: Writing Your Own Async Tasks
# Create an asyncio program that:

# Defines three coroutines
# Each coroutine waits for a different time
# Each returns a string
# All are scheduled using create_task
# Results are collected using gather
# Objective:

# Practice writing coroutines
# Practice creating tasks
# Observe concurrent execution

In [35]:
# Exercise: API + Async Thinking
# Create an asyncio program that:

# Simulates calling three APIs
# Each waits for a different time
# Prints start and end messages
# Runs all calls concurrently
# Objective:

# Practice async I/O thinking
# Understand waiting vs blocking
# Observe overlapping execution

In [None]:
# Exercise: Hybrid Concurrency Practice
# Create a program that:

# Runs two async tasks concurrently
# Calls a blocking function using to_thread
# Runs a CPU-heavy function using a process pool
# Prints execution order clearly
# Objective:

# Practice correct delegation
# Keep the event loop responsive
# Use each model for its strength

---

# Project Problem Statement

## Build a Polite and Efficient Async Web Scraper Using Python

---

## Background

Modern applications often need to retrieve data from multiple web pages. When this is done using traditional synchronous HTTP requests, the program becomes slow because each request blocks execution until it completes.

Asynchronous programming allows a program to send multiple requests concurrently and efficiently utilize waiting time caused by network delays.

In this project, you will build an **asynchronous web scraper** using Python that demonstrates how concurrency improves performance while following responsible scraping practices.

---

## Objective

Design and implement a Python-based asynchronous web scraper that:

* Sends multiple HTTP requests concurrently
* Reuses a single HTTP client session
* Limits the number of concurrent requests
* Handles network errors gracefully
* Prints meaningful information for each fetched page

The program must be executed from the **terminal** and not from a Jupyter Notebook.

---

## Website to Scrape

Use the following website for scraping:

```
https://quotes.toscrape.com
```

Reasons for choosing this site:

* Designed specifically for web scraping practice
* No authentication required
* Lightweight and predictable HTML structure
* Widely accepted for educational demonstrations

You will scrape multiple paginated pages such as:

```
https://quotes.toscrape.com/page/1/
https://quotes.toscrape.com/page/2/
https://quotes.toscrape.com/page/3/
```

---

# Step-by-Step Problem Breakdown

---

## Step 1: Understand the Scraping Problem

**Task**
Explain what a web scraper is and why asynchronous programming is suitable for web scraping.

**Hint**
Focus on network waiting time and how concurrency reduces idle time during HTTP requests.

---

## Step 2: Install Required Libraries

**Task**
Prepare the Python environment to support asynchronous HTTP requests.

**Hint**
Only one external library is required, and installation must be done using the command line.

---

## Step 3: Create an Async Function to Fetch One Page

**Task**
Write an asynchronous function that sends an HTTP GET request and returns the HTML content of a page.

**Hint**
Use `async def`, `await`, and avoid blocking operations.

---

## Step 4: Reuse a Single HTTP Client Session

**Task**
Modify the fetch logic so that all requests reuse a single HTTP client session.

**Hint**
Creating a new session per request is inefficient. Use a shared session and manage it with a context manager.

---

## Step 5: Fetch Multiple Pages Concurrently

**Task**
Fetch multiple pages from the target website concurrently.

**Hint**
Create a list of asynchronous tasks and execute them using `asyncio.gather()`.

---

## Step 6: Add Error Handling and Timeouts

**Task**
Ensure the scraper does not crash when a request fails or times out.

**Hint**
Use `try-except` blocks and configure request timeouts.

---

## Step 7: Limit Concurrent Requests

**Task**
Prevent the scraper from sending too many requests at the same time.

**Hint**
Use `asyncio.Semaphore` to control the maximum number of concurrent requests.

---

## Step 8: Display Useful Output

**Task**
For each fetched page, print the URL, response status, and size of the downloaded content.

**Hint**
This output helps verify concurrency and successful execution.

---

## Step 9: Run the Script From the Terminal

**Task**
Execute the scraper as a standalone Python script.

**Hint**
Use `asyncio.run(main())` and save the file as `async_web_scraper.py`.

---

# Constraints

* Do not use the `requests` library
* Do not block the event loop
* Reuse a single HTTP session
* Respect polite scraping practices
* Do not scrape authenticated or private data

---

## Expected Learning Outcomes

After completing this project, you should be able to:

* Explain why asynchronous programming improves web scraping performance
* Use `asyncio` and `aiohttp` effectively
* Implement concurrency control in network-based programs
* Build a production-ready async scraping foundation

---

# Happy Learning