## 1. Introduction to Iterators

### What is an Iterator?

An iterator is an object that implements the iterator protocol, consisting of:

- `__iter__()` method: Returns the iterator object itself
- `__next__()` method: Returns the next item from the iterator

### Why are Iterators Important in Data Engineering?

- **Memory Efficiency**: Process large datasets without loading everything into memory
- **Lazy Evaluation**: Compute values on-demand

res = [('name', age, salary, ),


]

res[0]

In [None]:
# Basic example: Understanding the difference
numbers_list = [1, 2, 3, 4, 5]  # This is an iterable
numbers_iter = iter(numbers_list)  # This is an iterator

print(f"List: {numbers_list}")
print(f"Iterator: {numbers_iter}")
print(f"Next from iterator: {next(numbers_iter)}")
print(f"Next from iterator: {next(numbers_iter)}")
print(f"Next from iterator: {next(numbers_iter)}")
print(f"Next from iterator: {next(numbers_iter)}")
print(f"Next from iterator: {next(numbers_iter)}")

# print(f"Next from iterator: {next(numbers_iter)}")


List: [1, 2, 3, 4, 5]
Iterator: <list_iterator object at 0x0000018763B7BE20>
Next from iterator: 1
Next from iterator: 2
Next from iterator: 3
Next from iterator: 4
Next from iterator: 5


StopIteration: 

## 2. Iterables vs Iterators

### Iterables

Objects that can be iterated over (lists, tuples, strings, dictionaries, sets)

In [2]:
# Examples of iterables
my_list = [1, 2, 3]
my_string = "hello"
my_dict = {'a': 1, 'b': 2}

# All of these work in for loops
for item in my_list:
    print(item)

for char in my_string:
    print(char)

for key in my_dict:
    print(key, my_dict[key])

1
2
3
h
e
l
l
o
a 1
b 2


### Testing if an object is iterable

In [3]:
from collections.abc import Iterable, Iterator

def check_iterable_iterator(obj):
    print(f"Object: {obj}")
    print(f"Is iterable: {isinstance(obj, Iterable)}")
    print(f"Is iterator: {isinstance(obj, Iterator)}")
    print("-" * 30)

# Test different objects
check_iterable_iterator([1, 2, 3])
check_iterable_iterator(iter([1, 2, 3]))
check_iterable_iterator("hello")
check_iterable_iterator(42)

Object: [1, 2, 3]
Is iterable: True
Is iterator: False
------------------------------
Object: <list_iterator object at 0x000001876397F220>
Is iterable: True
Is iterator: True
------------------------------
Object: hello
Is iterable: True
Is iterator: False
------------------------------
Object: 42
Is iterable: False
Is iterator: False
------------------------------


## 3. The Iterator Protocol in Action

### Manual iteration using next()

In [4]:
# Creating an iterator from a list
data = ['apple', 'banana', 'cherry']
data_iter = iter(data)

try:
    while True:
        item = next(data_iter)
        print(f"Processing: {item}")
except StopIteration:
    print("Iteration complete!")

Processing: apple
Processing: banana
Processing: cherry
Iteration complete!


### What happens in a for loop (behind the scenes)


In [5]:
# This for loop...
for item in [1, 2, 3]:
    print(item)

# ...is equivalent to this:
iterable = [1, 2, 3]
iterator = iter(iterable)
try:
    while True:
        item = next(iterator)
        print(item)
except StopIteration:
    pass

1
2
3
1
2
3


--- 

## HOW DOES `for` WORK?

- Python’s `for` command iterates over an object using the **iterator protocol**
- **Iterators** are objects used to iterate over an iterable
- Iterators implement the iterator protocol
- A `for` loop calls `iter()` on an iterable to create an iterator object
- The iterator object is responsible for returning each item to the loop
- A `for` loop calls `next()` on the iterator object to fetch each item
- The `next()` function raises a `StopIteration` exception when there is nothing left in the iterator object


Calling `iter()` on a list gives a list iterator object.
The iterator object is responsible for returning each item to the loop.

But why a separate object for iteration?
- Because we got to track the current item to be returned somewhere and Python's answer to that is to store that information in an iterator object. 
- You want this to be distinct from the iterable because you could have multiple iterators attached to the same iterable at the same time.

---




## 4. Creating Custom Iterators

### Method-1: Using a Class


In [14]:

class NumberRange:
    """Custom iterator that generates numbers in a range"""
    
    def __init__(self, start, end, step=1):
        self.start = start
        self.end = end
        self.step = step
        self.current = start
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current >= self.end:
            raise StopIteration
        else:
            current = self.current
            self.current += self.step
            return current

# Using our custom iterator
print("Custom NumberRange iterator:")
for num in NumberRange(1, 10):
    print(num)

Custom NumberRange iterator:
1
2
3
4
5
6
7
8
9


In [15]:
class LogLineCounter:
    """Iterator that counts lines while reading a file"""
    
    def __init__(self, filename):
        self.filename = filename
        self.file = None
        self.line_count = 0
    
    def __iter__(self):
        self.file = open(self.filename, 'r')
        self.line_count = 0
        return self
    
    def __next__(self):
        line = self.file.readline()
        if line:
            self.line_count += 1
            return f"Line {self.line_count}: {line.strip()}"
        else:
            self.file.close()
            raise StopIteration

# Example usage (create a sample file first)
with open('sample.log', 'w') as f:
    f.write("ERROR: Database connection failed\n")
    f.write("INFO: Retrying connection\n")
    f.write("SUCCESS: Connected to database\n")

# Use the iterator
for line in LogLineCounter('sample.log'):
    print(line)

Line 1: ERROR: Database connection failed
Line 2: INFO: Retrying connection
Line 3: SUCCESS: Connected to database


### Method 2: Using Generator Functions (Recommended)


In [19]:
def number_range(start, end, step=1):
    """Generator function - much simpler than class-based iterator"""
    current = start
    while current < end:
        yield current
        current += step


for num in number_range(0,20,2):
    print(num)


0
2
4
6
8
10
12
14
16
18


In [None]:
def add(a,b):
    return a+b


In [None]:
def number_range(start, end, step=1):

    
    """Generator function - much simpler than class-based iterator"""
    current = start
    while current < end:
        yield current
        current += step

# Using the generator
print("Generator function:")
for num in number_range(1, 6):
    print(num)

Generator function:
1
2
3
4
5


In [10]:

# Data Engg Example: CSV Row Processor
def process_csv_rows(filename, skip_header=True):
    """Generator that processes CSV rows one at a time"""
    with open(filename, 'r') as file:
        if skip_header:
            next(file)  # Skip header row
        
        for line_num, line in enumerate(file, start=1):
            # Process each row
            row_data = line.strip().split(',')
            yield {
                'line_number': line_num,
                'data': row_data,
                'processed': True
            }

# Create sample CSV
with open('sample.csv', 'w') as f:
    f.write("name,age,city\n")
    f.write("Alice,25,New York\n")
    f.write("Bob,30,San Francisco\n")
    f.write("Charlie,35,Chicago\n")

# Process CSV using generator
for row in process_csv_rows('sample.csv'):
    print(f"Line {row['line_number']}: {row['data']}")

Line 1: ['Alice', '25', 'New York']
Line 2: ['Bob', '30', 'San Francisco']
Line 3: ['Charlie', '35', 'Chicago']
