# processing 

Multiprocessing means:

    Running multiple processes, each with its own Python interpreter and memory space, on different CPU cores.

This bypasses Python's Global Interpreter Lock (GIL) ‚Äî allowing real parallel execution of CPU tasks.


#### ‚öñÔ∏è Multithreading vs Multiprocessing

|Feature	|Multithreading	|Multiprocessing|
|-----------|---------------|----------------|
|Parallelism	|‚ùå Not truly parallel (due to GIL)	|‚úÖ True parallelism (each process is independent)|
|Memory Sharing	|Shared memory (same process)	|Separate memory (isolated)|
|Best For	|I/O-bound tasks	|CPU-bound tasks|
|Overhead	|Low	|Higher (spawns separate processes)|

#### üß™ Example: Multiprocessing in Python

We'll simulate a CPU-heavy task: squaring numbers with a delay.
üß† CPU-bound version using multiprocessing

In [3]:
import multiprocessing
import time

def compute():
    print("Start computing...")
    total = 0
    for i in range(10_000_000):
        total += i * i
    print(total)    
    print("Done computing!")

start = time.time()

# Create two processes
p1 = multiprocessing.Process(target=compute)
p2 = multiprocessing.Process(target=compute)

p1.start()
p2.start()

p1.join()
p2.join()

print(f"Total time: {time.time() - start:.2f} seconds")


Start computing...
Start computing...
333333283333335000000
Done computing!
333333283333335000000
Done computing!
Total time: 1.07 seconds


Expected Behavior

    Two processes are created.

    Both run truly in parallel on different CPU cores.

### ‚úÖ Key Benefits

    Bypasses the GIL ‚Äî perfect for number crunching, data processing, machine learning, etc.

    Each process has its own memory, reducing risks from shared state bugs.

## How to Identify a Task as I/O-bound or CPU-bound

To decide whether to use multithreading, multiprocessing, or something else, you need to understand what type of work your task is doing.


#### Ask: What is the task waiting on?

|If it's mostly waiting for...	|Then it's...|
|--------------------------------|----------|
|Disk I/O (file read/write)	|I/O-bound|
|Network responses (APIs, web scraping)	|I/O-bound|
|User input (GUIs, CLI input)	|I/O-bound|
|Database queries	|I/O-bound|
|Heavy calculations (math, ML, image/video processing)	|CPU-bound|
|Data transformations (sorting, hashing, encoding)	|CPU-bound|

### Think in terms of behavior
üîß I/O-bound behavior:

    Task seems "slow" but CPU isn‚Äôt busy

    Waiting for files to read/write

    Waiting for web APIs to respond

    Example: Web scraper that hits 100 URLs

üßÆ CPU-bound behavior:

    CPU usage goes to 100% on one core

    Task takes longer as input size increases

    Example: Sorting a large list, compressing images, training a model

#### Quick Rule of Thumb:
|Task	|Strategy|
|-------|---------|
|Waiting a lot? (e.g., time.sleep(), requests.get())	|Use multithreading|
|Computing a lot? (e.g., math, loops, pandas)	|Use multiprocessing|

#### Example Scenario:

Task:

You are writing a script that:

    Reads a large list of URLs from a file.

    Downloads the HTML content of each URL.

    Parses the HTML to extract article titles.

    Writes the titles to a CSV file.

### Step-by-Step Analysis:

|Step	|What it Does|	Type of Work|	Why?|
|----------|----------|-------------|--------|
|1. Read URLs from file	|File I/O	|I/O-bound	|Reading from disk is I/O|
|2. Download HTML pages	|Network I/O	|I/O-bound	|Waiting on remote servers|
|3. Parse HTML	|Depends on size	|Light CPU (or I/O if using BeautifulSoup)	|HTML parsing can be CPU-light unless you're doing a lot of analysis|
|4. Write to CSV	|File I/O	|I/O-bound	|Writing to disk|

‚úÖ Conclusion:

    This is mostly an I/O-bound task ‚úÖ

 ### let‚Äôs walk through an example where multiple threads run different functions and share data safely using a thread-safe mechanism.

You have:

    A function that reads data (producer)

    A function that processes data (consumer)

    A shared queue between them

Using Queue for Safe Thread Communication

Python's queue.Queue is thread-safe ‚Äî perfect for sharing data between threads without using manual locks.

### Example: Producer/Consumer with Shared Data

In [7]:
import threading
import time
from queue import Queue

data_queue = Queue()

def producer():
    for i in range(5):
        print(f"Producer: generating item {i}")
        data_queue.put(f"item-{i}")
        time.sleep(1)  # Simulate delay
    data_queue.put(None)  # Signal end of data
    print("Producer: done")

def consumer():
    while True:
        item = data_queue.get()
        if item is None:
            print("Consumer: no more items")
            break
        print(f"Consumer: processing {item}")
        time.sleep(2)  # Simulate processing delay
    print("Consumer: done")

# Create threads
t1 = threading.Thread(target=producer)
t2 = threading.Thread(target=consumer)

# Start threads
t1.start()
t2.start()

# Wait for both to finish
t1.join()
t2.join()

print("Main thread: all done.")


Producer: generating item 0
Consumer: processing item-0
Producer: generating item 1
Consumer: processing item-1
Producer: generating item 2
Producer: generating item 3
Consumer: processing item-2
Producer: generating item 4
Producer: done
Consumer: processing item-3
Consumer: processing item-4
Consumer: no more items
Consumer: done
Main thread: all done.
