
### 🧑‍💻 **Q1: Generate Numbers 1 to N**

**Task:**  
Write a generator function `generate_numbers(n)` that yields numbers from `1` to `n` one by one.

💡 **Why:** This mirrors scenarios like streaming row IDs or sequential event timestamps.

**Example:**
```python
gen = generate_numbers(5)
print(list(gen))  # Output: [1, 2, 3, 4, 5]
```

In [6]:
def generate_numbers(n):
    for i in range(1,n+1):
        yield i
        
gen = generate_numbers(5)

for i in gen:
    print(i)

1
2
3
4
5


### 🧑‍💻 **Q2: Infinite Even Number Stream**

**Task:**  
Create a generator `generate_even_numbers()` that produces an **infinite sequence of even numbers** starting from `2`. Stop iteration must be controlled from the consumer side.

💡 **Why:** Many data systems process streaming data with unknown size, and generators fit perfectly.

**Example:**
```python
gen = generate_even_numbers()
for _ in range(5):
    print(next(gen), end=' ')  # Output: 2 4 6 8 10
```


In [15]:
def generate_even_numbers(num):
    for i in range(1,num+1):
        if i % 2 == 0:
            yield i
    
    
gen = generate_even_numbers(15)
for i in gen:
    print(i,end=' ')

2 4 6 8 10 12 14 

### 🧑‍💻 **Q3: Square Values of a List**

**Task:**  
Write a generator `square_numbers(lst)` that takes a list of numbers and lazily yields their squares.

💡 **Why:** Mimics data transformation pipelines — e.g., normalizing numerical fields.

**Example:**

```python
lst = [1, 2, 3, 4]
gen = square_numbers(lst)
print(list(gen))  # Output: [1, 4, 9, 16]
```

In [16]:
def square_numbers(lst):
    for i in lst:
        yield i**2
        
gen = square_numbers([1,2,3,4])
for i in gen:
    print(i,end=' ')

1 4 9 16 

### 🧑‍💻 **Q4: Chunk a Large Dataset**

**Task:**  
Write a generator `chunker(iterable, chunk_size)` that splits large data into smaller chunks of size `chunk_size`.

💡 **Why:** Chunking is fundamental for batch processing — Snowflake, S3, GCS, Kafka all use this.

**Example:**
```python
lst = [1, 2, 3, 4, 5, 6, 7]
gen = chunker(lst, 3)
for chunk in gen:
    print(chunk)  # Output: [1,2,3], [4,5,6], [7]
```

In [17]:
def chunker(iterable, chunk_size):
    chunk = []
    
    for item in iterable:
        chunk.append(item)
        if len(chunk) == chunk_size:
            yield chunk
            chunk=[]
    if chunk:
        yield chunk

In [18]:
lst = [1, 2, 3, 4, 5, 6, 7]
gen = chunker(lst, 3)
for chunk in gen:
    print(chunk)

[1, 2, 3]
[4, 5, 6]
[7]


---

### 🧑‍💻 **Q5: CSV Row Reader**

**Task:**  
Write a generator `read_csv(file_path)` that reads a CSV file line by line, yielding each row as a list.

💡 **Why:**  
When dealing with huge files like logs or transaction records, you never want to load the whole file into memory.

**Example:**
```python
# Suppose file content is:
# John,30,NY
# Alice,25,LA

gen = read_csv('users.csv')
print(next(gen))  # Output: ['John', '30', 'NY']
```


In [17]:
import csv
def gen_read_csv(file_path):
    with open(file_path,newline='\n') as csvfile:
        line = csv.reader(csvfile)
        for row in line:
            yield row

In [18]:
path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/info.csv'

gen = gen_read_csv(path)

for i in gen:
    print(i)

['John', '30', 'NY']
['Mahbub', '30', 'NA']


### 🧑‍💻 **Q6: Filter Positive Numbers**

**Task:**  
Write a generator `filter_positive_numbers(numbers)` that yields only **positive integers** from a given list.

💡 **Why:** Filtering raw data before further processing saves compute and network.

**Example:**
```python
lst = [-1, 4, -2, 5, 0]
gen = filter_positive_numbers(lst)
print(list(gen))  # Output: [4, 5]
```

In [23]:
def filter_positive_numbers(lst):
    for i in lst:
        if i>0:
            yield i
            
lst = [-1, 4, -2, 5, 0]            
gen = filter_positive_numbers(lst)
print(list(gen))  

[4, 5]


### 🧑‍💻 **Q7: Paginated API Simulation**

**Task:**  
Create a generator `mock_api_paginator(data_list, page_size)` that yields one "page" at a time from a list.

💡 **Why:** APIs like AWS S3 or BigQuery use pagination when the result set is too large.

**Example:**
```python
data = list(range(1, 11))
gen = mock_api_paginator(data, 4)
print(next(gen))  # Output: [1, 2, 3, 4]
```

In [32]:
def mock_api_paginator(data_list, page_size):
    for data in range(0,len(data_list),page_size):
        yield data_list[data:data+page_size]
        
        
data = list(range(1, 11))
gen = mock_api_paginator(data, 4)
for i in gen:
    print(i)

[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10]



### 🧑‍💻 **Q8: Infinite Fibonacci Sequence**

**Task:**  
Write `fibonacci_sequence()` — a generator that yields an infinite Fibonacci series:  
`0, 1, 1, 2, 3, 5, 8, ...`

💡 **Why:** Lazy evaluation is key when the end of the series is unknown.

---

In [40]:
def gen_fibonacci_sequence():
    first = 0
    second = 1
    while True:
        yield first   
        first, second = second, first + second  

In [41]:
gen = gen_fibonacci_sequence()

print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))
print(next(gen))

0
1
1
2
3
5
8
13
21



### 🧑‍💻 **Q9: Unique ID Generator**

**Task:**  
Write `unique_id_generator(prefix)` that yields:  
`prefix_1, prefix_2, prefix_3...`

💡 **Why:** Used to create synthetic primary keys or filenames.

---

In [44]:
def unique_id_generator(prefix, start=1, width=3):
    num = start
    while True:
        yield f"{prefix}_{str(num).zfill(width)}"
        num += 1


In [45]:
gen = unique_id_generator("user", start=1, width=3)

print(next(gen))  # user_001
print(next(gen))  # user_002
print(next(gen))  # user_003


user_001
user_002
user_003


### 🧑‍💻 **Q10: Date Range Generator**

**Task:**  
Write `date_range(start_date, end_date)` that yields every date from `start` to `end`.

💡 **Why:** Generating date partitions for data lake queries or backfilling.

**Example:**
```python
from datetime import date
gen = date_range(date(2024, 1, 1), date(2024, 1, 4))
print(list(gen))  
# Output: [2024-01-01, 2024-01-02, 2024-01-03, 2024-01-04]
```

In [46]:
from datetime import datetime, timedelta

def generate_dates(start_date, end_date):
    current = start_date
    while current <= end_date:
        yield current
        current += timedelta(days=1)


In [47]:
start = datetime(2024, 1, 1)
end = datetime(2024, 1, 5)

for date in generate_dates(start, end):
    print(date.strftime('%Y-%m-%d'))


2024-01-01
2024-01-02
2024-01-03
2024-01-04
2024-01-05


---

### 🧑‍💻 **Q11: Sliding Window over a Sequence**

**Task:**  
Write `sliding_window(sequence, window_size)` that yields overlapping sublists (windows).

💡 **Why:** Used in **time-series anomaly detection** and **rolling averages**.

**Example:**
```python
gen = sliding_window([1, 2, 3, 4, 5], 3)
# Output: [1,2,3], [2,3,4], [3,4,5]
```

---

In [6]:
def sliding_window(sequence, window_size):
    if window_size > len(sequence):
        return 
    for i in range(len(sequence)-window_size+1):
        yield sequence[i:i+window_size]

In [7]:
gen = sliding_window([1, 2, 3, 4, 5], 3)
for i in gen:
    print(i)

[1, 2, 3]
[2, 3, 4]
[3, 4, 5]


---

### 🧑‍💻 **Q12: Apply Transformation to a Stream**

**Task:**  
Write `apply_transformation(generator, func)` to yield `func(item)` for every item in the input generator.

💡 **Why:** Fundamental for transformation stages in ETL pipelines.

---

In [11]:
def square(x):
    return x**2

def apply_transformation(generator,func):
    for i in generator:
        yield func(i)

def number_stream(lst):
    for i in lst:
        yield i
        
        
lst = [1,2,3,4,5] 

gen1 = number_stream(lst)
gen2 = apply_transformation(gen1,square)

for i in gen2:
    print(i)

1
4
9
16
25


---

### 🧑‍💻 **Q13: Flatten Nested Lists**

**Task:**  
Write `flatten(nested_list)` that flattens nested lists:  
Example: `[[1, 2], [3, 4]]` → `1, 2, 3, 4`

💡 **Why:** Flattening nested records for schema normalization.

---

In [12]:
def flatten(lst):
    for item in lst:
        if isinstance(item, list):
            yield from flatten(item)
        else:
            yield item

nested = [1, [2, 3], [4, [5, 6]], 7]
flat = list(flatten(nested))
print(flat)  

[1, 2, 3, 4, 5, 6, 7]


---

### 🧑‍💻 **Q14: Log File Tailer**

**Task:**  
Write `tail(file_path)` that watches a log file and yields new lines as they are written.

💡 **Why:** Log monitoring without loading entire files — similar to `tail -f`. Stop after max line number

---

In [17]:
import time


def tail(file_path,max_lines):
    with open(file_path,'r') as f:
        line_count=0
        f.seek(0,2)
        while line_count<=max_lines:
            line = f.readline()
            if not line:
                time.sleep(1)
                continue
            
            yield line.strip()
            line_count+=1
            

            
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/info.log'
gen = tail(file_path,3)

for i in gen:
    print(i)


Hossain
Abeer
Imran


---

### 🧑‍💻 **Q15: Build Generator Pipelines**

**Task:**  
Design a sequence of generators where each generator processes the output of the previous one, e.g.:  
- Generator1: reads lines from a file.  
- Generator2: filters valid JSON lines.  
- Generator3: extracts fields.

💡 **Why:** Used in ETL pipelines, Kafka consumers, and Spark-like frameworks.

---

In [19]:


def read_lines(file_path):
    with open(file_path,'r') as f:
        while True:
            line = f.readline()
            yield line.strip()

In [21]:
import json

def filter_valid_json(lines):
    for line in lines:
        try:
            json_obj = json.loads(line)
            yield json_obj
        except json.JSONDecodeError:
            continue

In [23]:
def extract_fields(json_lines,field_names):
    for json_obj in json_lines:
        extracted = {}
        for field in field_names:
            extracted[field] = json_obj.get(field)
        yield extracted

In [24]:
file_path = 'C:/Users/Mahbub/Desktop/Data Engineering/Python/data/log_file.json'

pipeline1 = read_lines(file_path)

In [26]:
pipeline2 = filter_valid_json(pipeline1)

In [None]:
fields = ['id','timestamp','message']
pipeline3 = extract_fields(pipeline2,fields)


for i in pipeline3:
    print(i)

{'id': 1, 'timestamp': '2024-04-17T10:00:00', 'message': 'System started'}
{'id': 2, 'timestamp': '2024-04-17T10:05:00', 'message': 'User logged in'}
{'id': 3, 'timestamp': '2024-04-17T10:10:00', 'message': None}
