
---

## 💡 **Before You Start**  
In **data engineering**, generators are used because:  
- Data is often too large to fit in memory.  
- Lazy evaluation saves time and resources.  
- Clean code that behaves like a pipeline.

---

## 🏁 **Level 1: Beginner — Fundamental Generator Usage**  

---

### 🧑‍💻 **Q1: Generate Numbers 1 to N**

**Task:**  
Write a generator function `generate_numbers(n)` that yields numbers from `1` to `n` one by one.

💡 **Why:** This mirrors scenarios like streaming row IDs or sequential event timestamps.

**Example:**
```python
gen = generate_numbers(5)
print(list(gen))  # Output: [1, 2, 3, 4, 5]
```

---

### 🧑‍💻 **Q2: Infinite Even Number Stream**

**Task:**  
Create a generator `generate_even_numbers()` that produces an **infinite sequence of even numbers** starting from `2`. Stop iteration must be controlled from the consumer side.

💡 **Why:** Many data systems process streaming data with unknown size, and generators fit perfectly.

**Example:**
```python
gen = generate_even_numbers()
for _ in range(5):
    print(next(gen), end=' ')  # Output: 2 4 6 8 10
```

---

### 🧑‍💻 **Q3: Square Values of a List**

**Task:**  
Write a generator `square_numbers(lst)` that takes a list of numbers and lazily yields their squares.

💡 **Why:** Mimics data transformation pipelines — e.g., normalizing numerical fields.

**Example:**
```python
lst = [1, 2, 3, 4]
gen = square_numbers(lst)
print(list(gen))  # Output: [1, 4, 9, 16]
```

---

### 🧑‍💻 **Q4: Chunk a Large Dataset**

**Task:**  
Write a generator `chunker(iterable, chunk_size)` that splits large data into smaller chunks of size `chunk_size`.

💡 **Why:** Chunking is fundamental for batch processing — Snowflake, S3, GCS, Kafka all use this.

**Example:**
```python
lst = [1, 2, 3, 4, 5, 6, 7]
gen = chunker(lst, 3)
for chunk in gen:
    print(chunk)  # Output: [1,2,3], [4,5,6], [7]
```

---

### 🧑‍💻 **Q5: CSV Row Reader**

**Task:**  
Write a generator `read_csv(file_path)` that reads a CSV file line by line, yielding each row as a list.

💡 **Why:**  
When dealing with huge files like logs or transaction records, you never want to load the whole file into memory.

**Example:**
```python
# Suppose file content is:
# John,30,NY
# Alice,25,LA

gen = read_csv('users.csv')
print(next(gen))  # Output: ['John', '30', 'NY']
```

---

## ⚡️ **Level 2: Intermediate — Transformation & Filtering**

---

### 🧑‍💻 **Q6: Filter Positive Numbers**

**Task:**  
Write a generator `filter_positive_numbers(numbers)` that yields only **positive integers** from a given list.

💡 **Why:** Filtering raw data before further processing saves compute and network.

**Example:**
```python
lst = [-1, 4, -2, 5, 0]
gen = filter_positive_numbers(lst)
print(list(gen))  # Output: [4, 5]
```

---

### 🧑‍💻 **Q7: Paginated API Simulation**

**Task:**  
Create a generator `mock_api_paginator(data_list, page_size)` that yields one "page" at a time from a list.

💡 **Why:** APIs like AWS S3 or BigQuery use pagination when the result set is too large.

**Example:**
```python
data = list(range(1, 11))
gen = mock_api_paginator(data, 4)
print(next(gen))  # Output: [1, 2, 3, 4]
```

---

### 🧑‍💻 **Q8: Infinite Fibonacci Sequence**

**Task:**  
Write `fibonacci_sequence()` — a generator that yields an infinite Fibonacci series:  
`0, 1, 1, 2, 3, 5, 8, ...`

💡 **Why:** Lazy evaluation is key when the end of the series is unknown.

---

### 🧑‍💻 **Q9: Unique ID Generator**

**Task:**  
Write `unique_id_generator(prefix)` that yields:  
`prefix_1, prefix_2, prefix_3...`

💡 **Why:** Used to create synthetic primary keys or filenames.

---

### 🧑‍💻 **Q10: Date Range Generator**

**Task:**  
Write `date_range(start_date, end_date)` that yields every date from `start` to `end`.

💡 **Why:** Generating date partitions for data lake queries or backfilling.

**Example:**
```python
from datetime import date
gen = date_range(date(2024, 1, 1), date(2024, 1, 4))
print(list(gen))  
# Output: [2024-01-01, 2024-01-02, 2024-01-03, 2024-01-04]
```

---

## 🚀 **Level 3: Advanced — Real-World Streaming & Patterns**

---

### 🧑‍💻 **Q11: Sliding Window over a Sequence**

**Task:**  
Write `sliding_window(sequence, window_size)` that yields overlapping sublists (windows).

💡 **Why:** Used in **time-series anomaly detection** and **rolling averages**.

**Example:**
```python
gen = sliding_window([1, 2, 3, 4, 5], 3)
# Output: [1,2,3], [2,3,4], [3,4,5]
```

---

### 🧑‍💻 **Q12: Apply Transformation to a Stream**

**Task:**  
Write `apply_transformation(generator, func)` to yield `func(item)` for every item in the input generator.

💡 **Why:** Fundamental for transformation stages in ETL pipelines.

---

### 🧑‍💻 **Q13: Flatten Nested Lists**

**Task:**  
Write `flatten(nested_list)` that flattens nested lists:  
Example: `[[1, 2], [3, 4]]` → `1, 2, 3, 4`

💡 **Why:** Flattening nested records for schema normalization.

---

### 🧑‍💻 **Q14: Log File Tailer**

**Task:**  
Write `tail(file_path)` that watches a log file and yields new lines as they are written.

💡 **Why:** Log monitoring without loading entire files — similar to `tail -f`.

---

### 🧑‍💻 **Q15: Build Generator Pipelines**

**Task:**  
Design a sequence of generators where each generator processes the output of the previous one, e.g.:  
- Generator1: reads lines from a file.  
- Generator2: filters valid JSON lines.  
- Generator3: extracts fields.

💡 **Why:** Used in ETL pipelines, Kafka consumers, and Spark-like frameworks.

---

## 🔥 **Level 4: Expert — Handling Exhaustion, Reuse & Streaming Challenges**

---

### 🧑‍💻 **Q16: Generator Class Implementation**

**Task:**  
Implement a generator using an iterator class with `__iter__` and `__next__`.

💡 **Why:** Understand the difference between **generator function** and **iterator class**.

---

### 🧑‍💻 **Q17: Random Sampler Without Repeats**

**Task:**  
Write a generator `random_sampler(data)` that randomly yields items from a list, without repeating, until all items are exhausted.

💡 **Why:** Sampling subsets from large datasets for testing or training.

---

### 🧑‍💻 **Q18: Merge Two Sorted Generators**

**Task:**  
Write `merge_sorted_streams(gen1, gen2)` that lazily merges two sorted generators.

💡 **Why:** Useful when merging two log files or two sorted data streams.

---

### 🧑‍💻 **Q19: Read File with Backpressure Simulation**

**Task:**  
Simulate a generator `read_file_in_chunks(file_path, chunk_size)` that can pause/resume based on consumer signals (simulate "backpressure").

💡 **Why:** Mimics real-world systems like Apache Kafka and Airflow operators which throttle data consumption.

---

### 🧑‍💻 **Q20: Mini Data Pipeline Simulation**

**Task:**  
Create a full data pipeline using generators:
- Stage 1: read CSV.
- Stage 2: filter rows.
- Stage 3: transform data.
- Stage 4: output results.

💡 **Why:** This is how data flows in real-world ETL jobs.

---