# Chapter 11: Iterators, Generators, and Decorators

Python's elegance lies not just in its readable syntax, but in its sophisticated protocols that enable memory-efficient data processing and clean separation of concerns. Three features—iterators, generators, and decorators—form the backbone of advanced Python programming, allowing you to handle infinite data streams, create lazy evaluation pipelines, and inject cross-cutting functionality without cluttering business logic.

This chapter explores the iterator protocol that powers Python's `for` loops, the generator functions that enable memory-efficient data processing, and the decorator pattern that modifies function behavior. These tools are essential for writing Pythonic code that scales from kilobytes to gigabytes of data while maintaining clean, reusable architectures.

## 11.1 Iterators: The Iterator Protocol

An **iterator** is an object that represents a stream of data, returning one element at a time when requested. Understanding iterators is fundamental because they underpin every `for` loop, comprehension, and iterable operation in Python.

### The Iterator Protocol

Python's iterator protocol consists of two methods:
1.  `__iter__()`: Returns the iterator object itself (required for iterables)
2.  `__next__()`: Returns the next value or raises `StopIteration` when exhausted

```python
from typing import Iterator, Iterable, Self

class Countdown:
    """
    Custom iterator counting down from start to 0.
    
    Demonstrates the iterator protocol explicitly.
    """
    
    def __init__(self, start: int) -> None:
        if start < 0:
            raise ValueError("Start must be non-negative")
        self.start: int = start
        self.current: int = start
    
    def __iter__(self) -> Self:
        """
        Return the iterator object itself.
        
        This makes the class both an iterable (has __iter__) 
        and an iterator (has __next__).
        """
        self.current = self.start  # Reset for reuse
        return self
    
    def __next__(self) -> int:
        """
        Return next value or raise StopIteration.
        
        StopIteration signals to the for loop that iteration is complete.
        """
        if self.current < 0:
            raise StopIteration
        
        num: int = self.current
        self.current -= 1
        return num

# Usage
counter: Countdown = Countdown(5)

# Manual iteration
iterator: Iterator[int] = iter(counter)  # Calls __iter__
print(next(iterator))  # 5 (calls __next__)
print(next(iterator))  # 4
print(next(iterator))  # 3

# For loop iteration (automatic protocol)
for num in Countdown(3):
    print(num)  # 3, 2, 1, 0
```

**Key Insight:** The `for` loop is syntactic sugar for:
```python
iterator = iter(obj)  # Calls __iter__()
while True:
    try:
        item = next(iterator)  # Calls __next__()
    except StopIteration:
        break
    # Process item
```

### Iterable vs. Iterator

Critical distinction:
*   **Iterable**: Object with `__iter__()` method (can be looped over). Examples: `list`, `str`, `dict`, file objects.
*   **Iterator**: Object with `__next__()` method (stateful, produces values on demand). Iterators are also iterables (they return themselves from `__iter__`).

```python
from typing import Iterable, Iterator

# List is iterable but not an iterator
numbers: list[int] = [1, 2, 3]
print(hasattr(numbers, '__iter__'))  # True (iterable)
print(hasattr(numbers, '__next__'))  # False (not iterator)

# Getting iterator from iterable
it: Iterator[int] = iter(numbers)  # Calls numbers.__iter__()
print(hasattr(it, '__next__'))     # True (now it's an iterator)

# Iterators are consumed (one-shot)
print(list(it))  # [1, 2, 3]
print(list(it))  # [] (exhausted, cannot rewind)
```

### Custom Iterables (Separating Iterable from Iterator)

For complex iterators, separate the iterable (factory) from the iterator (state machine):

```python
from typing import Iterator, Optional
import hashlib

class FileLineIterable:
    """
    Iterable that yields lines from a file matching a pattern.
    
    Iterable: Represents the collection/resource
    Iterator: Maintains state during iteration
    """
    
    def __init__(self, filepath: str, pattern: str) -> None:
        self.filepath: str = filepath
        self.pattern: str = pattern
    
    def __iter__(self) -> 'FileLineIterator':
        # Return fresh iterator each time
        return FileLineIterator(self.filepath, self.pattern)

class FileLineIterator:
    """Iterator with state (file handle, current position)."""
    
    def __init__(self, filepath: str, pattern: str) -> None:
        self.filepath: str = filepath
        self.pattern: str = pattern
        self.file_handle: Optional[object] = None
        self.line_number: int = 0
    
    def __iter__(self) -> Self:
        return self
    
    def __next__(self) -> str:
        if self.file_handle is None:
            self.file_handle = open(self.filepath, 'r', encoding='utf-8')
        
        while True:
            line: str = self.file_handle.readline()
            self.line_number += 1
            
            if not line:  # EOF
                self.file_handle.close()
                self.file_handle = None
                raise StopIteration
            
            if self.pattern in line:
                return line.strip()

# Usage - iterable can be reused, iterator maintains state
log_lines: FileLineIterable = FileLineIterable("app.log", "ERROR")

# First iteration
for line in log_lines:
    print(f"Found: {line}")

# Second iteration (fresh iterator created automatically)
for line in log_lines:  # Works because __iter__ returns new iterator
    print(f"Found again: {line}")
```

### The `collections.abc` Abstract Base Classes

Use abstract base classes to properly type-check iterables:

```python
from collections.abc import Iterator, Iterable, Generator

def process_stream(data: Iterable[int]) -> None:
    """
    Accepts any iterable (list, tuple, generator, custom iterator).
    
    Using Iterable[int] indicates we only need to loop once.
    """
    total: int = 0
    for item in data:
        total += item
    print(f"Total: {total}")

def get_stream() -> Iterator[int]:
    """
    Returns iterator (stateful, single-pass).
    
    Using Iterator indicates this can only be consumed once.
    """
    return iter([1, 2, 3])
```

## 11.2 Generators: Lazy Evaluation with yield

**Generators** are a special type of iterator that simplify the iterator protocol using the `yield` keyword. They automatically implement `__iter__` and `__next__`, maintain local variable state between calls, and raise `StopIteration` automatically when the function returns.

### Basic Generator Functions

```python
from typing import Generator

def fibonacci(n: int) -> Generator[int, None, None]:
    """
    Generate first n Fibonacci numbers.
    
    Generator function: contains yield, returns generator iterator.
    Type hint: Generator[YieldType, SendType, ReturnType]
    """
    a: int = 0
    b: int = 1
    count: int = 0
    
    while count < n:
        yield a  # Suspends here, returns a to caller
        # Resumes here on next iteration
        a, b = b, a + b
        count += 1
    # Implicit StopIteration when function returns

# Usage
fib_gen: Generator[int, None, None] = fibonacci(10)

# Convert to list (eager evaluation - consumes generator)
numbers: list[int] = list(fibonacci(5))  # [0, 1, 1, 2, 3]

# Lazy evaluation (memory efficient)
for num in fibonacci(1000000):  # Doesn't store million numbers in memory
    if num > 1000:
        break
    print(num)
```

**Generator Lifecycle:**
1.  Calling `fibonacci(10)` returns a generator object (doesn't execute code yet)
2.  Calling `next()` starts/resumes execution until `yield`
3.  State (local variables `a`, `b`, `count`) is frozen between calls
4.  When function returns (or raises), `StopIteration` is raised automatically

### Generator State and Memory Efficiency

Generators excel at processing large datasets that don't fit in memory:

```python
from typing import Generator
import csv

def read_large_csv(filepath: str) -> Generator[dict[str, str], None, None]:
    """
    Lazily yield rows from a large CSV file.
    
    Memory usage remains constant regardless of file size.
    """
    with open(filepath, 'r', newline='', encoding='utf-8') as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield row  # Only one row in memory at a time

# Process 10GB file with minimal memory
for row in read_large_csv('huge_dataset.csv'):
    if float(row['value']) > 1000:
        process_row(row)
```

**Comparison:**
*   **List approach**: `return [row for row in reader]` → Memory = size of entire file
*   **Generator approach**: `yield row` → Memory = size of one row

### Bidirectional Communication: send(), throw(), close()

Generators can receive data from the caller, enabling coroutine-like behavior:

```python
from typing import Generator

def running_average() -> Generator[float, float, None]:
    """
    Generator that receives values and yields running average.
    
    Type: Generator[YieldType, SendType, ReturnType]
    """
    total: float = 0.0
    count: int = 0
    average: float = 0.0
    
    while True:
        # yield sends out current average, receives new value via send()
        new_value: float = yield average
        total += new_value
        count += 1
        average = total / count

# Usage
avg: Generator[float, float, None] = running_average()
next(avg)  # Prime the generator (advance to first yield)

print(avg.send(10))  # 10.0
print(avg.send(20))  # 15.0
print(avg.send(30))  # 20.0

# Exception injection
avg.throw(ValueError, "Invalid input")  # Can be caught inside generator

# Cleanup
avg.close()  # Raises GeneratorExit inside generator
```

**Practical Example: Stateful Logger**
```python
from typing import Generator
from datetime import datetime

def log_processor() -> Generator[None, str, None]:
    """Accumulate log messages and batch write."""
    buffer: list[str] = []
    
    while True:
        message: str = yield
        timestamp: str = datetime.now().isoformat()
        buffer.append(f"[{timestamp}] {message}")
        
        if len(buffer) >= 10:
            write_to_disk(buffer)
            buffer.clear()

logger = log_processor()
next(logger)  # Prime

logger.send("User logged in")
logger.send("Database connection established")
```

### Delegation with yield from (Sub-generators)

`yield from` delegates iteration to another iterable, transparently passing values and exceptions:

```python
from typing import Generator

def sub_generator(start: int, end: int) -> Generator[int, None, None]:
    """Yield range of numbers."""
    for i in range(start, end):
        yield i

def main_generator() -> Generator[int, None, None]:
    """Delegate to sub-generators."""
    print("Phase 1")
    yield from sub_generator(1, 3)
    
    print("Phase 2")
    yield from sub_generator(10, 13)
    
    print("Done")

# Output: Phase 1, 1, 2, Phase 2, 10, 11, 12, Done
for val in main_generator():
    print(val)
```

**Benefits of `yield from`:**
*   Eliminates boilerplate `for item in sub_gen: yield item`
*   Propagates `send()` and `throw()` to sub-generator
*   Returns value from sub-generator (Python 3.3+)

```python
def sub_gen() -> Generator[int, None, str]:
    yield 1
    yield 2
    return "Finished"

def main_gen() -> Generator[int, None, None]:
    result: str = yield from sub_gen()
    print(result)  # "Finished"
    yield 3
```

## 11.3 Generator Expressions: Memory-Efficient Comprehensions

**Generator expressions** provide a concise, memory-efficient way to create generators using syntax similar to list comprehensions but with parentheses instead of brackets.

### Syntax and Behavior

```python
from typing import Generator
import sys

# List comprehension (eager, stores all values in memory)
squares_list: list[int] = [x**2 for x in range(1000000)]
print(f"List size: {sys.getsizeof(squares_list)} bytes")  # ~8MB

# Generator expression (lazy, stores only iterator)
squares_gen: Generator[int, None, None] = (x**2 for x in range(1000000))
print(f"Generator size: {sys.getsizeof(squares_gen)} bytes")  # ~112 bytes

# Generator expressions are single-use
gen = (x for x in range(5))
print(list(gen))  # [0, 1, 2, 3, 4]
print(list(gen))  # [] (exhausted)
```

### Chaining Generators (Pipelines)

Generator expressions excel at creating data processing pipelines without intermediate memory allocation:

```python
from typing import Generator
import re

def read_lines(filename: str) -> Generator[str, None, None]:
    """Yield lines from file."""
    with open(filename, 'r') as f:
        for line in f:
            yield line.strip()

# Pipeline: File -> Filter -> Transform -> Aggregate
# Without storing intermediate lists

log_lines: Generator[str, None, None] = read_lines('server.log')

# Filter: Only ERROR lines
errors: Generator[str, None, None] = (line for line in log_lines if 'ERROR' in line)

# Transform: Extract timestamp and message
pattern = re.compile(r'(\d{4}-\d{2}-\d{2}) (.*)')
parsed: Generator[tuple[str, str], None, None] = (
    (match.group(1), match.group(2)) 
    for line in errors 
    if (match := pattern.search(line))
)

# Execute pipeline (lazy evaluation happens here)
for date, message in parsed:
    print(f"[{date}] {message}")

# Memory usage remains constant regardless of file size
```

**Performance Comparison:**
```python
# Approach 1: Nested lists (high memory, intermediate storage)
result = [process(x) for x in [filter(y) for y in huge_list]]

# Approach 2: Generator pipeline (low memory, streaming)
result = (process(x) for x in (filter(y) for y in huge_list))
```

### When to Use What

| Feature | Syntax | Memory | Use Case |
|---------|--------|--------|----------|
| List Comprehension | `[x for x in data]` | High (all data) | Need random access, multiple iterations, or len() |
| Generator Expression | `(x for x in data)` | Low (iterator) | Single pass, large datasets, pipelining |
| Generator Function | `def gen(): yield x` | Low (iterator) | Complex logic, stateful iteration, reuse |

**Best Practice:** Use generator expressions for simple transformations, generator functions for complex logic requiring multiple statements.

## 11.4 Decorators: Modifying Function Behavior

**Decorators** are functions that take another function or class as input, extend or alter its behavior, and return a modified function. They enable separation of cross-cutting concerns (logging, authentication, caching) from business logic.

### Function Decorators Fundamentals

```python
from typing import Callable, Any
from functools import wraps
import time

def timer_decorator(func: Callable[..., Any]) -> Callable[..., Any]:
    """
    Decorator that measures function execution time.
    
    Args:
        func: The function to be decorated
        
    Returns:
        Wrapped function with timing logic
    """
    @wraps(func)  # Preserves metadata (__name__, __doc__)
    def wrapper(*args: Any, **kwargs: Any) -> Any:
        start_time: float = time.time()
        result: Any = func(*args, **kwargs)
        end_time: float = time.time()
        print(f"{func.__name__} executed in {end_time - start_time:.4f}s")
        return result
    return wrapper

# Application using @ syntax (syntactic sugar)
@timer_decorator
def slow_function(n: int) -> int:
    """Calculate sum of range (simulated slow operation)."""
    time.sleep(0.1)
    return sum(range(n))

# Equivalent to: slow_function = timer_decorator(slow_function)

result: int = slow_function(100000)  # Prints timing info
```

**Without `@wraps`:**
*   `slow_function.__name__` becomes "wrapper"
*   `slow_function.__doc__` becomes None
*   Introspection breaks

**With `@wraps(func)`:**
*   Metadata is copied from original function to wrapper
*   Debugging and documentation remain intact

### Decorators with Arguments (Decorator Factories)

To accept arguments, create a decorator factory that returns the actual decorator:

```python
from typing import Callable, Any
from functools import wraps

def retry(max_attempts: int = 3, delay: float = 1.0) -> Callable:
    """
    Decorator factory that creates retry logic.
    
    Args:
        max_attempts: Maximum number of retry attempts
        delay: Seconds to wait between retries
    """
    def decorator(func: Callable[..., Any]) -> Callable[..., Any]:
        @wraps(func)
        def wrapper(*args: Any, **kwargs: Any) -> Any:
            attempts: int = 0
            while attempts < max_attempts:
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    attempts += 1
                    if attempts == max_attempts:
                        raise
                    print(f"Attempt {attempts} failed: {e}. Retrying...")
                    time.sleep(delay)
            return None  # Unreachable but satisfies type checker
        return wrapper
    return decorator

# Usage with arguments
@retry(max_attempts=5, delay=2.0)
def unreliable_api_call() -> dict:
    """Simulate flaky API."""
    if random.random() < 0.7:
        raise ConnectionError("Network timeout")
    return {"status": "success"}

# Equivalent to: unreliable_api_call = retry(max_attempts=5)(unreliable_api_call)
```

### Preserving Function Signatures with inspect

For type-safe decorators that preserve exact signatures:

```python
from typing import TypeVar, ParamSpec
from functools import wraps
import logging

P = ParamSpec('P')  # Captures parameter specification
R = TypeVar('R')    # Captures return type

def log_call(func: Callable[P, R]) -> Callable[P, R]:
    """
    Type-safe decorator preserving function signature.
    
    Uses ParamSpec (Python 3.10+) to maintain parameter types.
    """
    @wraps(func)
    def wrapper(*args: P.args, **kwargs: P.kwargs) -> R:
        logging.info(f"Calling {func.__name__} with args={args}, kwargs={kwargs}")
        result: R = func(*args, **kwargs)
        logging.info(f"{func.__name__} returned {result}")
        return result
    return wrapper

@log_call
def add(x: int, y: int) -> int:
    return x + y

# Type checker knows: add(x: int, y: int) -> int
reveal_type(add)  # Revealed type is "def (x: int, y: int) -> int"
```

### Class Decorators

Decorators can also modify classes, adding methods, registering subclasses, or enforcing invariants:

```python
from typing import Type, TypeVar, Callable
import json

T = TypeVar('T')

def auto_repr(cls: Type[T]) -> Type[T]:
    """
    Class decorator that automatically adds __repr__ method.
    
    Generates representation showing class name and attributes.
    """
    def __repr__(self: T) -> str:
        attributes: str = ', '.join(
            f"{k}={v!r}" 
            for k, v in self.__dict__.items() 
            if not k.startswith('_')
        )
        return f"{self.__class__.__name__}({attributes})"
    
    cls.__repr__ = __repr__
    return cls

def json_serializable(cls: Type[T]) -> Type[T]:
    """
    Add to_json method to class.
    """
    def to_json(self: T) -> str:
        return json.dumps(self.__dict__, default=str)
    
    cls.to_json = to_json
    return cls

@auto_repr
@json_serializable
class User:
    def __init__(self, name: str, email: str) -> None:
        self.name: str = name
        self.email: str = email

user = User("Alice", "alice@example.com")
print(user)  # User(name='Alice', email='alice@example.com')
print(user.to_json())  # {"name": "Alice", "email": "alice@example.com"}
```

### Stacking Decorators

Multiple decorators are applied bottom-up (closest to function first):

```python
@decorator_a
@decorator_b
@decorator_c
def func():
    pass

# Equivalent to: func = decorator_a(decorator_b(decorator_c(func)))
```

**Execution Order:**
1.  `decorator_c` wraps original function
2.  `decorator_b` wraps result of step 1
3.  `decorator_a` wraps result of step 2
4.  When called: `decorator_a` logic runs first (outer), then `decorator_b`, then `decorator_c`, then original function, then unwinds back up

### Practical Decorator Patterns

**1. Caching/Memoization:**
```python
from functools import lru_cache

@lru_cache(maxsize=128)
def fibonacci(n: int) -> int:
    """LRU cache decorator stores recent results."""
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# First call: computed
# Subsequent calls with same n: returned from cache
```

**2. Access Control:**
```python
from typing import Callable
from functools import wraps

def require_auth(func: Callable) -> Callable:
    """Decorator ensuring user is authenticated."""
    @wraps(func)
    def wrapper(self, *args, **kwargs):
        if not getattr(self, 'is_authenticated', False):
            raise PermissionError("Authentication required")
        return func(self, *args, **kwargs)
    return wrapper

class AdminPanel:
    @require_auth
    def delete_user(self, user_id: int) -> None:
        """Protected method."""
        pass
```

**3. Rate Limiting:**
```python
import time
from collections import deque
from typing import Callable

def rate_limit(max_calls: int, period: int) -> Callable:
    """Limit function calls to max_calls per period seconds."""
    def decorator(func: Callable) -> Callable:
        calls: deque[float] = deque()
        
        @wraps(func)
        def wrapper(*args, **kwargs):
            now: float = time.time()
            
            # Remove calls outside the time window
            while calls and now - calls[0] > period:
                calls.popleft()
            
            if len(calls) >= max_calls:
                raise RuntimeError(f"Rate limit exceeded: {max_calls} calls per {period}s")
            
            calls.append(now)
            return func(*args, **kwargs)
        return wrapper
    return decorator

@rate_limit(max_calls=5, period=60)
def api_request() -> dict:
    """Limited to 5 calls per minute."""
    return fetch_data()
```

## Summary

Advanced Python features enable you to write code that is both efficient and elegant. You have mastered the **iterator protocol** (`__iter__` and `__next__`), understanding how Python's `for` loops work under the hood and how to create custom data streams. **Generators** and **generator expressions** provide memory-efficient lazy evaluation, allowing you to process infinite or massive datasets with constant memory usage while maintaining clean, imperative syntax through `yield`.

You have explored bidirectional generator communication with `send()` and sub-generator delegation with `yield from`. **Decorators** allow you to extract cross-cutting concerns—logging, retry logic, caching, authentication—into reusable, composable components that modify function behavior without altering source code. Using `functools.wraps` and `ParamSpec`, you ensure decorators preserve function metadata and type signatures.

These patterns separate *what* code does from *how* it does it, creating systems that are modular, testable, and resource-efficient. However, data processing requires persistence. In the next chapter, we explore Python's facilities for interacting with the file system, handling various data formats, and ensuring resources are properly managed through context managers.

**Next Chapter**: Chapter 12: File I/O and Data Persistence.

<div style='width:100%; display:flex; justify-content:space-between; align-items:center; margin: 1em 0;'>
  <a href='../4. professional_development_practices/10. testing_and_quality_assurance.ipynb' style='font-weight:bold; font-size:1.05em;'>&larr; Previous</a>
  <a href='../TOC.md' style='font-weight:bold; font-size:1.05em; text-align:center;'>Table of Contents</a>
  <a href='12. file_io_and_data_persistence.ipynb' style='font-weight:bold; font-size:1.05em;'>Next &rarr;</a>
</div>
