# Concurrency Management in Python

Welcome to the comprehensive guide on concurrency in Python! This notebook will teach you how to make your programs run multiple tasks simultaneously, making them faster and more efficient.

**What you'll learn:**
- How to run multiple tasks at the same time (concurrency)
- Three different approaches: Threading, Multiprocessing, and AsyncIO
- When to use each approach with real-world examples
- Best practices to write safe and efficient concurrent code

## Table of Contents
1. [Introduction to Concurrency](#1-introduction-to-concurrency)
2. [Threading - Running Tasks Simultaneously](#2-threading)
   - [Basic Threading](#basic-threading)
   - [ThreadPoolExecutor (Recommended)](#21-threadpoolexecutor-with-concurrentfutures)
3. [Multiprocessing - True Parallel Processing](#3-multiprocessing)
   - [Basic Multiprocessing](#basic-multiprocessing)
   - [ProcessPoolExecutor (Recommended)](#22-processpoolexecutor-with-concurrentfutures)
4. [AsyncIO - Asynchronous Programming](#4-asyncio-asynchronous-programming)
5. [When to Use Each Approach](#5-when-to-use-each-approach)
6. [Best Practices and Common Pitfalls](#6-best-practices)

## 1. Introduction to Concurrency

**What is Concurrency?**
Imagine you're cooking dinner. Instead of:
1. Boiling water for pasta (wait 10 minutes)
2. Then chopping vegetables (5 minutes)
3. Then cooking sauce (8 minutes)

You could do all three tasks "concurrently" - start the water boiling, chop vegetables while it heats, then start the sauce. This saves time!

**In Programming:**
Concurrency allows your program to handle multiple tasks at the same time, making it faster and more responsive.

**Python's Three Main Approaches:**

| Approach | Real-World Analogy | Best For | Example Use Case |
|----------|-------------------|----------|------------------|
| **Threading** | One chef switching between tasks | I/O operations (waiting for files, network) | Downloading multiple files |
| **Multiprocessing** | Multiple chefs working independently | CPU-intensive tasks | Mathematical calculations |
| **AsyncIO** | One very efficient chef with a task scheduler | Many I/O operations efficiently | Web server handling many requests |

## 2. Threading - Running Tasks Simultaneously

**What is Threading?**
Think of threading like a single person (your program) who can switch quickly between multiple tasks. The person shares the same workspace (memory) but can work on different things.

**Key Concepts:**
- **Thread**: A separate flow of execution within your program
- **Shared Memory**: All threads can access the same variables
- **Good for I/O-bound tasks**: When your program waits for files, network, or user input

**Real-world example**: While downloading a file (waiting), your program can update the UI or process other requests.

In [1]:
import threading
import time

def print_numbers():
    """This function prints numbers 0-4, with a 0.5 second pause between each"""
    for i in range(5):
        print(f"Number: {i}")
        time.sleep(0.5)  # Simulate some work/waiting

def print_letters():
    """This function prints letters a-e, with a 0.5 second pause between each"""
    for letter in "abcde":
        print(f"Letter: {letter}")
        time.sleep(0.5)  # Simulate some work/waiting

# Step 1: Create threads (like hiring two workers for different tasks)
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

# Step 2: Start both threads (both workers start working simultaneously)
print("Starting both tasks...")
thread1.start()
thread2.start()

# Step 3: Wait for both threads to finish (wait for both workers to complete)
thread1.join()  # Wait for numbers to finish
thread2.join()  # Wait for letters to finish

print("Both tasks completed!")

# Without threading, this would take 5 seconds (2.5 + 2.5)
# With threading, it takes about 2.5 seconds (both run simultaneously)

Starting both tasks...
Number: 0
Letter: a
Number: 1
Letter: b
Number: 1
Letter: b
Number: 2
Letter: c
Number: 2
Letter: c
Number: 3
Letter: d
Number: 3
Letter: d
Number: 4
Letter: e
Number: 4
Letter: e
Both tasks completed!
Both tasks completed!


## 3. Multiprocessing - True Parallel Processing

**What is Multiprocessing?**
Think of multiprocessing like hiring multiple chefs to work in separate kitchens. Each chef (process) has their own complete kitchen (memory space) and can work independently without interfering with others.

**Key Concepts:**
- **Process**: A completely separate program instance with its own memory
- **True Parallelism**: Multiple processes can run simultaneously on different CPU cores
- **Good for CPU-bound tasks**: Mathematical calculations, data processing, image manipulation

**Real-world example**: Processing thousands of images - each process handles different images independently.

In [None]:
from multiprocessing import Process, current_process

def calculate_factorial(n):
    """Calculate factorial and demonstrate CPU-intensive work"""
    print(f"🔢 {current_process().name} calculating factorial of {n}")
    
    # Simulate CPU-intensive work
    result = 1
    for i in range(1, n + 1):
        result *= i
        time.sleep(0.01)  # Small delay to simulate work
    
    print(f"✅ {current_process().name} completed: {n}! = {result}")
    return result

print("🚀 Starting multiprocessing calculation...\n")

# Create processes for different calculations
process1 = Process(target=calculate_factorial, args=(5,), name="Worker-1")
process2 = Process(target=calculate_factorial, args=(6,), name="Worker-2")
process3 = Process(target=calculate_factorial, args=(4,), name="Worker-3")

# Start all processes
process1.start()
process2.start()
process3.start()

# Wait for all to complete
process1.join()
process2.join()
process3.join()

print("\n🎉 All calculations completed!")

🚀 Starting multiprocessing calculation...

🔢 Worker-1 calculating factorial of 5
🔢 Worker-1 calculating factorial of 5
🔢 Worker-2 calculating factorial of 6
🔢 Worker-3 calculating factorial of 4
🔢 Worker-2 calculating factorial of 6
🔢 Worker-3 calculating factorial of 4
✅ Worker-1 completed: 5! = 120
✅ Worker-3 completed: 4! = 24
✅ Worker-1 completed: 5! = 120
✅ Worker-3 completed: 4! = 24
✅ Worker-2 completed: 6! = 720

🎉 All calculations completed!
✅ Worker-2 completed: 6! = 720

🎉 All calculations completed!


## 4. AsyncIO - Asynchronous Programming

**What is AsyncIO?**
Think of AsyncIO like a very efficient waiter at a restaurant. Instead of waiting for one customer's order to be prepared before taking another order, the waiter takes multiple orders and checks back when each is ready.

**Key Concepts:**
- **Asynchronous**: Tasks can pause and resume without blocking other tasks
- **Single-threaded**: Everything runs in one thread, but efficiently
- **Good for I/O-bound tasks**: Network requests, file operations, database queries

**Real-world example**: A web server handling hundreds of simultaneous user requests efficiently.

In [3]:
import asyncio

async def fetch_data(api_name, delay):
    """Simulates fetching data from an API"""
    print(f"🌐 Starting API call to {api_name}")
    await asyncio.sleep(delay)  # Simulate network delay
    print(f"✅ Received data from {api_name} (took {delay}s)")
    return f"Data from {api_name}"

async def main():
    """Main async function that coordinates all API calls"""
    print("🚀 Starting concurrent API calls...\n")
    
    # Start all API calls concurrently
    results = await asyncio.gather(
        fetch_data("Weather API", 1),
        fetch_data("News API", 2),
        fetch_data("Stock API", 1.5)
    )
    
    print(f"\n📋 All API calls completed!")
    for result in results:
        print(f"   - {result}")

# Run the async program
await main()

🚀 Starting concurrent API calls...

🌐 Starting API call to Weather API
🌐 Starting API call to News API
🌐 Starting API call to Stock API
✅ Received data from Weather API (took 1s)
✅ Received data from Weather API (took 1s)
✅ Received data from Stock API (took 1.5s)
✅ Received data from Stock API (took 1.5s)
✅ Received data from News API (took 2s)

📋 All API calls completed!
   - Data from Weather API
   - Data from News API
   - Data from Stock API
✅ Received data from News API (took 2s)

📋 All API calls completed!
   - Data from Weather API
   - Data from News API
   - Data from Stock API


## 5. When to Use Each Approach - Decision Guide

**Quick Decision Tree:**

1. **Is your task waiting for something external (files, network, database)?**
   - YES → Go to step 2
   - NO → Use **Multiprocessing** (CPU-bound task)

2. **Do you need to handle many (100+) simultaneous operations?**
   - YES → Use **AsyncIO** (most efficient for many I/O operations)
   - NO → Use **Threading** (simpler for few I/O operations)

**Detailed Comparison:**

| Scenario | Best Choice | Why? | Example |
|----------|-------------|------|---------|
| Downloading 10 files | **Threading** | Simple, good for moderate I/O | Web scraping |
| Processing 1000 images | **Multiprocessing** | Uses all CPU cores | Image resizing |
| Web server (1000+ users) | **AsyncIO** | Handles many connections efficiently | REST API |
| Mathematical calculations | **Multiprocessing** | Bypasses Python's GIL limitation | Data analysis |
| Reading multiple databases | **Threading** or **AsyncIO** | Depends on scale | Data collection |

## 6. Best Practices and Common Pitfalls

**✅ DO:**
- **Start simple**: Use ThreadPoolExecutor or ProcessPoolExecutor instead of manual thread/process management
- **Profile first**: Measure your code before optimizing - you might not need concurrency!
- **Handle exceptions**: Always use try/except blocks in concurrent code
- **Use context managers**: `with ThreadPoolExecutor() as executor:` automatically cleans up resources

**❌ DON'T:**
- **Share mutable state**: Avoid global variables in threading - use locks if necessary
- **Mix approaches unnecessarily**: Pick one concurrency model per problem
- **Assume faster**: More threads/processes ≠ faster performance (overhead exists!)
- **Forget the GIL**: Python's Global Interpreter Lock limits threading for CPU tasks

**🛠️ Common Debugging Tips:**
- **Deadlocks**: When threads wait for each other forever - use timeouts
- **Race conditions**: When threads compete for resources - use locks
- **Resource exhaustion**: Don't create unlimited threads/processes - use pools

**📝 Code Example - Error Handling:**
```python
from concurrent.futures import ThreadPoolExecutor
import requests

def safe_download(url):
    try:
        response = requests.get(url, timeout=5)
        return f"✅ {url}: {response.status_code}"
    except Exception as e:
        return f"❌ {url}: {str(e)}"

# Always handle errors in concurrent code!
with ThreadPoolExecutor(max_workers=3) as executor:
    urls = ["http://example.com", "http://invalid-url"]
    results = list(executor.map(safe_download, urls))
    for result in results:
        print(result)
```

## 2.1 ThreadPoolExecutor - The Professional Way

**Why use ThreadPoolExecutor?**
While basic threading works, manually managing threads can be tricky and error-prone. `ThreadPoolExecutor` is like having a professional task manager that:
- Creates and destroys threads for you
- Manages the optimal number of threads
- Handles errors gracefully
- Makes your code cleaner and safer

**When to use this:**
- When you have many I/O-bound tasks (file reading, web requests, database queries)
- When you want simple, clean code without manual thread management

In [4]:
from concurrent.futures import ThreadPoolExecutor, as_completed
import time

def download_file(file_name, size_mb):
    """Simulates downloading a file"""
    print(f"📥 Starting download: {file_name} ({size_mb}MB)")
    time.sleep(size_mb * 0.5)  # Simulate download time (0.5 seconds per MB)
    print(f"✅ Completed download: {file_name}")
    return f"{file_name} downloaded successfully"

# List of files to download
files_to_download = [
    ("document.pdf", 2),
    ("image.jpg", 1),
    ("video.mp4", 4),
    ("music.mp3", 3)
]

print("🚀 Starting concurrent downloads...\n")

# Using ThreadPoolExecutor (max 3 downloads at once)
with ThreadPoolExecutor(max_workers=3) as executor:
    # Submit all download tasks
    futures = [executor.submit(download_file, name, size) for name, size in files_to_download]
    
    # Collect results as they complete
    for future in as_completed(futures):
        result = future.result()
        print(f"📋 Result: {result}")

print("\n🎉 All downloads completed!")

🚀 Starting concurrent downloads...

📥 Starting download: document.pdf (2MB)
📥 Starting download: image.jpg (1MB)
📥 Starting download: video.mp4 (4MB)
✅ Completed download: image.jpg
📥 Starting download: music.mp3 (3MB)
📋 Result: image.jpg downloaded successfully
✅ Completed download: image.jpg
📥 Starting download: music.mp3 (3MB)
📋 Result: image.jpg downloaded successfully
✅ Completed download: document.pdf
📋 Result: document.pdf downloaded successfully
✅ Completed download: document.pdf
📋 Result: document.pdf downloaded successfully
✅ Completed download: video.mp4
📋 Result: video.mp4 downloaded successfully
✅ Completed download: music.mp3
📋 Result: music.mp3 downloaded successfully

🎉 All downloads completed!
✅ Completed download: video.mp4
📋 Result: video.mp4 downloaded successfully
✅ Completed download: music.mp3
📋 Result: music.mp3 downloaded successfully

🎉 All downloads completed!


**When to use ThreadPoolExecutor:**
- When you want to run many I/O-bound tasks concurrently with a simple interface.
- It manages thread creation, scheduling, and joining for you.

## 3.1 ProcessPoolExecutor - The Professional Way for CPU Tasks

**Why use ProcessPoolExecutor?**
While basic multiprocessing works, `ProcessPoolExecutor` is like having a professional project manager for CPU-intensive work:
- Automatically manages multiple processes for you
- Distributes work efficiently across CPU cores
- Handles process creation, communication, and cleanup
- Bypasses Python's GIL (Global Interpreter Lock) for true parallelism

**When to use this:**
- CPU-intensive tasks (math calculations, data processing, image manipulation)
- When you want to utilize all CPU cores efficiently
- When you need simple, clean code for parallel processing

In [5]:
from concurrent.futures import ProcessPoolExecutor, as_completed
import time

def process_data(data_chunk, chunk_id):
    """Simulates CPU-intensive data processing"""
    print(f"🔄 Worker {chunk_id} processing {len(data_chunk)} items...")
    
    # Simulate CPU-intensive work (mathematical calculations)
    result = sum(x ** 2 for x in data_chunk)
    time.sleep(1)  # Simulate processing time
    
    print(f"✅ Worker {chunk_id} completed processing")
    return {"chunk_id": chunk_id, "result": result, "items_processed": len(data_chunk)}

# Large dataset to process
large_dataset = list(range(1, 21))  # Numbers 1-20
chunk_size = 5

# Split data into chunks for parallel processing
data_chunks = [
    large_dataset[i:i + chunk_size] 
    for i in range(0, len(large_dataset), chunk_size)
]

print(f"🚀 Processing {len(large_dataset)} items using {len(data_chunks)} workers...\n")

# Using ProcessPoolExecutor (uses all CPU cores)
with ProcessPoolExecutor() as executor:
    # Submit all processing tasks
    futures = [
        executor.submit(process_data, chunk, i+1) 
        for i, chunk in enumerate(data_chunks)
    ]
    
    # Collect results as they complete
    total_result = 0
    for future in as_completed(futures):
        result = future.result()
        total_result += result["result"]
        print(f"📊 Chunk {result['chunk_id']}: {result['items_processed']} items → {result['result']}")

print(f"\n🎯 Final result: {total_result}")
print("🎉 All data processing completed!")

🚀 Processing 20 items using 4 workers...



🔄 Worker 1 processing 5 items...🔄 Worker 2 processing 5 items...🔄 Worker 3 processing 5 items...🔄 Worker 4 processing 5 items...



🔄 Worker 2 processing 5 items...🔄 Worker 3 processing 5 items...🔄 Worker 4 processing 5 items...



✅ Worker 2 completed processing✅ Worker 1 completed processing✅ Worker 3 completed processing✅ Worker 4 completed processing



✅ Worker 2 completed processing✅ Worker 1 completed processing✅ Worker 3 completed processing✅ Worker 4 completed processing



📊 Chunk 2: 5 items → 330
📊 Chunk 3: 5 items → 855
📊 Chunk 1: 5 items → 55
📊 Chunk 4: 5 items → 1630

🎯 Final result: 2870
🎉 All data processing completed!
📊 Chunk 2: 5 items → 330
📊 Chunk 3: 5 items → 855
📊 Chunk 1: 5 items → 55
📊 Chunk 4: 5 items → 1630

🎯 Final result: 2870
🎉 All data processing completed!


**When to use ProcessPoolExecutor:**
- For CPU-bound tasks that benefit from true parallelism.
- When you want a simple interface for running functions in separate processes.