# Generators

Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the `yield` statement whenever they want to return data.

### 1. Creating a Generator

In [28]:
def my_generator(start, end):
    current = start
    while current < end:
        yield current
        current += 1

for number in my_generator(1, 5):
    print(number)

1
2
3
4


### 2. Generator Expression

A generator expression is a compact generator notation that looks similar to a list comprehension, but instead of creating a list, it returns a generator object. This generator object can be iterated over to produce items on-the-fly, which is more memory-efficient than creating a list when dealing with large datasets.

The syntax of a generator expression is similar to that of a list comprehension, but it uses parentheses `()` instead of square brackets `[]`.

Consider a scenario where we have a large list of numbers, and we want to create a generator that yields only the even numbers. Here I'll use only `range(10)`. 

In [29]:
# Large list of numbers
numbers = range(10)

# Generator expression to filter even numbers
even_numbers = (num for num in numbers if num % 2 == 0)

# Iterating over the generator
for even in even_numbers:
    print(even)

0
2
4
6
8


This generator expression filters out the even numbers from the `numbers` list. Since it generates items one-by-one, it doesn't require storing all the even numbers in memory at once.

You can combine generator expressions to create more complex pipelines.

In [30]:
# Generator for squares of even numbers
squares_of_even = (x * x for x in (num for num in range(10) if num % 2 == 0))

# Iterating over the combined generator
for square in squares_of_even:
    print(square)

0
4
16
36
64


### 3. Delegating Generators (yield from)

The `yield from` expression is used to delegate part of a generator’s operations to another generator. It creates a sub-iterator from the iterable expression following the `yield from` statement and delegates operations to it. This can simplify code that yields values from multiple iterables.

#### Benefits of Using `yield from`

1. **Code Simplification:**
   - It removes the need for explicit loops to yield values from sub-generators.
   - Makes the code more readable and maintainable.

2. **Enhanced Generator Functionality:**
   - It allows sub-generators to receive values sent to the delegating generator.
   - Can pass back exceptions thrown into the delegating generator.
   - Supports `return` values from sub-generators.

Consider a scenario where we have multiple nested generators:

In [31]:
def sub_generator1():
    yield from range(1, 4)

def sub_generator2():
    yield from range(4, 7)

def main_generator():
    yield from sub_generator1()
    yield from sub_generator2()

for num in main_generator():
    print(num)

1
2
3
4
5
6


Here:
- `main_generator` delegates to `sub_generator1` first, yielding values `1, 2, 3`.
- Then it delegates to `sub_generator2`, yielding values `4, 5, 6`.



#### Example: Traversing a Tree Structure

Imagine we have a hierarchical tree structure representing a company's organizational chart. We want to traverse this tree and perform some operations, like listing all employees in a depth-first manner.

**Step 1: Define the Tree Structure**

First, let's define a simple tree structure using classes.

In [32]:
class Employee:
    '''
    Represents a node in the tree, with methods to add subordinates.
    '''
    def __init__(self, name, position):
        self.name = name
        self.position = position
        self.subordinates = []

    def add_subordinate(self, subordinate):
        self.subordinates.append(subordinate)

**Step 2: Create the Organization Tree**

Next, we'll create an example organizational chart.

In [33]:
# We build a sample organizational chart with a CEO, CTO, CFO, and their respective subordinates.

# CEO
ceo = Employee("Alice", "CEO")

# CTO and direct reports
cto = Employee("Bob", "CTO")
cto.add_subordinate(Employee("Charlie", "Dev Manager"))
cto.add_subordinate(Employee("Diana", "QA Manager"))

# CFO and direct reports
cfo = Employee("Eve", "CFO")
cfo.add_subordinate(Employee("Frank", "Accountant"))

# Add CTO and CFO as direct reports to CEO
ceo.add_subordinate(cto)
ceo.add_subordinate(cfo)

**Step 3: Traversing the Tree with `yield from`**

Now, we create a generator function to traverse the tree using `yield from` to handle subordinates.

In [34]:
def traverse_employee_tree(employee):
    '''
    Recursively traverses the tree, yielding each employee's information. 
    The `yield from` statement delegates the iteration to the subordinates, 
    making the recursion straightforward.
    '''
    yield f"{employee.name} - {employee.position}"
    for subordinate in employee.subordinates:
        yield from traverse_employee_tree(subordinate)

**Using the Traversal Function**

Here is how we can use the traversal function in a Jupyter Notebook to list all employees in the organization.

In [35]:
# Traverse the organizational chart and print each employee
for employee_info in traverse_employee_tree(ceo):
    print(employee_info)

Alice - CEO
Bob - CTO
Charlie - Dev Manager
Diana - QA Manager
Eve - CFO
Frank - Accountant


### 4. Sending Values to Generators

Generators can receive values via `send()` and handle exceptions via `throw()`. 

In [36]:
# Receiving Values
def accumulator():
    total = 0
    while True:
        x = yield total
        if x is None:
            break
        total += x

def delegator():
    result = yield from accumulator()
    print('Accumulator Result:', result)

gen = delegator()
print(next(gen))      # Start the generator
print(gen.send(10))   # Send 10
print(gen.send(20))   # Send 20
print(gen.send(None)) # End the accumulator

0
10
30
Accumulator Result: None


StopIteration: 

Here:
- `accumulator` is a sub-generator that sums values sent to it.
- `delegator` delegates to `accumulator` using `yield from`.
- The total is accumulated in `accumulator` and returned when `None` is sent.

### 5. Incorporating both value handling and exception handling

Imagine we have a task processing system where tasks are processed in multiple steps. Each step might raise specific exceptions that need to be handled gracefully. We'll use a generator to process each task and another generator to manage the delegation and error handling.

**Task Processing System**

**Step 1: Define Task Processing Generators**

We'll define two generators: one for processing individual tasks (`task_processor`) and another for managing the overall task processing flow (`task_manager`).

In [None]:
def task_processor(task):
    try:
        # Simulating task steps
        yield f"Processing {task} - Step 1"
        if task == "task_with_error":
            raise ValueError("Simulated error in task")
        yield f"Processing {task} - Step 2"
    except ValueError as e:
        yield f"Handled error: {str(e)}"
    yield f"Processing {task} - Final Step"

def task_manager(tasks):
    for task in tasks:
        yield from task_processor(task)

**Step 2: Define the Task List**

We'll define a list of tasks, including one that will raise an exception.


In [None]:
tasks = ["task1", "task2", "task_with_error", "task3"]

**Step 3: Running the Task Processing System**

We'll create an instance of the `task_manager` generator and process the tasks, handling the exceptions as they occur.


In [None]:
def run_task_processing(tasks):
    task_gen = task_manager(tasks)
    for output in task_gen:
        print(output)

run_task_processing(tasks)

Processing task1 - Step 1
Processing task1 - Step 2
Processing task1 - Final Step
Processing task2 - Step 1
Processing task2 - Step 2
Processing task2 - Final Step
Processing task_with_error - Step 1
Handled error: Simulated error in task
Processing task_with_error - Final Step
Processing task3 - Step 1
Processing task3 - Step 2
Processing task3 - Final Step


### 6. Using Generators with the Parallel Processing

In [None]:
# Importing the script
from modules.log_processor import process_log_file

# Define the file paths
input_file_path = './data/large_log_file.txt'
output_file_path = './data/processed_errors.txt'

# Process the log file
process_log_file(input_file_path, output_file_path)

# Verify the output
with open(output_file_path, 'r') as file:
    print(file.read())

21-05-2024 12:01:00 - ERROR - This is an error message.
21-05-2024 12:03:00 - ERROR - Another error occurred.
21-05-2024 12:05:00 - ERROR - Yet another error message.
21-05-2024 12:07:00 - ERROR - Error with additional details.
21-05-2024 12:09:00 - ERROR - Final error message.



### 7. Benefits of Using Generators

- **Memory Efficiency:** Generators allow processing of large files without loading the entire file into memory.
- **Modularity:** Each step of the pipeline is modular, making it easy to modify or extend.
- **Lazy Evaluation:** Generators compute values on-the-fly, which can improve performance.