# ⚙️ Generators in Python – From List Comprehensions to Lazy Iterators

---

## 🎯 What Is a Generator?

A **generator** is a special kind of iterable that **produces items on the fly**, rather than storing them all in memory.  
This makes them **lazy, efficient**, and perfect for **large data** or **infinite streams**.

> Generators don't **compute values immediately** — they **wait until you ask** for the next one.

---

## 🪄 From List Comprehension to Generator Expression

You already know list comprehensions:


In [3]:
squares = [x**2 for x in range(5)]

Now replace the square brackets `[]` with parentheses `()`:

In [4]:
squares_gen = (x**2 for x in range(5))

This **does not compute anything yet** — it simply sets up a **lazy pipeline**. Nothing happens until you start iterating.

---

### 🧪 Let’s Prove It:

In [5]:
gen = (x**2 for x in range(3))
print(gen)        # <generator object ...>
print(next(gen))  # 0
print(next(gen))  # 1
print(next(gen))  # 4

<generator object <genexpr> at 0x00000210DC7564D0>
0
1
4


---

## 🔁 Convert Generator to List (If Needed)

You can force evaluation and turn it into a list:

In [7]:
gen = (x**2 for x in range(5))
squares = list(gen)  # Triggers computation

> But doing this removes the memory advantage. Only do it when necessary.

---

## ⚡ Why Are Generators Powerful?

| Feature         | Generator                     | List                            |
|-----------------|-------------------------------|----------------------------------|
| Memory usage    | ✅ Low (only 1 item at a time) | ❌ High (holds all items)       |
| Performance     | ✅ Fast in pipelines           | ❌ Slower for large data        |
| Infinite series | ✅ Possible                    | ❌ Not safe                     |
| Reusability     | ❌ One-time use                | ✅ Multiple use                 |

---

## 🧮 Example: `sum()` with Generator vs List

### ❌ List (wastes memory):

In [8]:
sum([x**2 for x in range(10**6)])

333332833333500000

### ✅ Generator (no list built):

In [9]:
sum(x**2 for x in range(10**6))

333332833333500000


✔️ Faster  
✔️ Lower memory  
✔️ Clean code

---

## 🧰 Writing Generator Functions

You can define generators using the `yield` keyword:



In [10]:
def countdown(n):
    while n > 0:
        yield n
        n -= 1

### 🔄 Usage:

In [11]:
for i in countdown(5):
    print(i)

5
4
3
2
1


---

## 🧠 What Happens Inside?

Each time `yield` runs:
- The function **pauses**.
- The value is **sent out**.
- Next time, execution **resumes from there**.

This lets you build **stateful, lazy iterators**.

---

## 🔧 Generator Function vs Class

### ✅ Functional Style (with `yield`)



In [12]:
def even_numbers(limit):
    for i in range(limit):
        if i % 2 == 0:
            yield i


---

### ✅ Class-based Generator



In [13]:
class EvenNumbers:
    def __init__(self, limit):
        self.limit = limit
        self.current = 0

    def __iter__(self):
        return self

    def __next__(self):
        while self.current < self.limit:
            val = self.current
            self.current += 1
            if val % 2 == 0:
                return val
        raise StopIteration

Usage:

In [14]:
for num in EvenNumbers(10):
    print(num)

0
2
4
6
8




> The functional version with `yield` is simpler and preferred in most use cases.

---

## 🧪 Built-in Functions That Work with Generators

Generators can be passed to all these built-in functions:
```python
sum(), max(), min(), any(), all(), list(), set(), tuple(), sorted()
```

Example:


In [15]:
even_sum = sum(x for x in range(1000) if x % 2 == 0)

---

## ✅ Summary

| Concept               | Description                                |
|------------------------|--------------------------------------------|
| Generator expression  | `(x for x in iterable)`                     |
| Generator function    | Uses `yield` to return values lazily       |
| Memory efficiency     | Only stores one item at a time             |
| One-time use          | Can't rewind a generator                   |
| Conversion            | Use `list()`, `tuple()` to force materialization |
| Best for              | Streams, large datasets, pipelines         |

---

> 🧠 Generators allow you to write **clean**, **efficient**, and **scalable** data pipelines in Python — and they’re used heavily in libraries like `pandas`, `TensorFlow`, `asyncio`, and web frameworks.



## 🧩 Alternative: Using `__getitem__` for Iteration

You don’t always need `__iter__` and `__next__` to make an object iterable.  
You can define **`__getitem__()`** and Python will automatically iterate until you raise `IndexError`.

---

### 🧪 Example: Iterable Class with `__getitem__`


In [31]:
class Squares:
    def __init__(self, limit):
        self.limit = limit

    def __getitem__(self, index):
        if index >= self.limit:
            raise IndexError
        return index ** 2

Usage:

In [30]:
s = Squares(5)

for val in s:
    print(val)

0
1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361
400
441
484
529
576
625
676
729
784
841
900
961
1024
1089
1156
1225
1296
1369
1444
1521
1600
1681
1764
1849
1936
2025
2116
2209
2304
2401
2500
2601
2704
2809
2916
3025
3136
3249
3364
3481
3600
3721
3844
3969
4096
4225
4356
4489
4624
4761
4900
5041
5184
5329
5476
5625
5776
5929
6084
6241
6400
6561
6724
6889
7056
7225
7396
7569
7744
7921
8100
8281
8464
8649
8836
9025
9216
9409
9604
9801
10000
10201
10404
10609
10816
11025
11236
11449
11664
11881
12100
12321
12544
12769
12996
13225
13456
13689
13924
14161
14400
14641
14884
15129
15376
15625
15876
16129
16384
16641
16900
17161
17424
17689
17956
18225
18496
18769
19044
19321
19600
19881
20164
20449
20736
21025
21316
21609
21904
22201
22500
22801
23104
23409
23716
24025
24336
24649
24964
25281
25600
25921
26244
26569
26896
27225
27556
27889
28224
28561
28900
29241
29584
29929
30276
30625
30976
31329
31684
32041
32400
32761
33124
33489
33856
34225
34596
34969
35344
35721
36100


3454618176
3454735729
3454853284
3454970841
3455088400
3455205961
3455323524
3455441089
3455558656
3455676225
3455793796
3455911369
3456028944
3456146521
3456264100
3456381681
3456499264
3456616849
3456734436
3456852025
3456969616
3457087209
3457204804
3457322401
3457440000
3457557601
3457675204
3457792809
3457910416
3458028025
3458145636
3458263249
3458380864
3458498481
3458616100
3458733721
3458851344
3458968969
3459086596
3459204225
3459321856
3459439489
3459557124
3459674761
3459792400
3459910041
3460027684
3460145329
3460262976
3460380625
3460498276
3460615929
3460733584
3460851241
3460968900
3461086561
3461204224
3461321889
3461439556
3461557225
3461674896
3461792569
3461910244
3462027921
3462145600
3462263281
3462380964
3462498649
3462616336
3462734025
3462851716
3462969409
3463087104
3463204801
3463322500
3463440201
3463557904
3463675609
3463793316
3463911025
3464028736
3464146449
3464264164
3464381881
3464499600
3464617321
3464735044
3464852769
3464970496
3465088225
3465205956

10498871296
10499076225
10499281156
10499486089
10499691024
10499895961
10500100900
10500305841
10500510784
10500715729
10500920676
10501125625
10501330576
10501535529
10501740484
10501945441
10502150400
10502355361
10502560324
10502765289
10502970256
10503175225
10503380196
10503585169
10503790144
10503995121
10504200100
10504405081
10504610064
10504815049
10505020036
10505225025
10505430016
10505635009
10505840004
10506045001
10506250000
10506455001
10506660004
10506865009
10507070016
10507275025
10507480036
10507685049
10507890064
10508095081
10508300100
10508505121
10508710144
10508915169
10509120196
10509325225
10509530256
10509735289
10509940324
10510145361
10510350400
10510555441
10510760484
10510965529
10511170576
10511375625
10511580676
10511785729
10511990784
10512195841
10512400900
10512605961
10512811024
10513016089
10513221156
10513426225
10513631296
10513836369
10514041444
10514246521
10514451600
10514656681
10514861764
10515066849
10515271936
10515477025
10515682116
1051

16827018961
16827278400
16827537841
16827797284
16828056729
16828316176
16828575625
16828835076
16829094529
16829353984
16829613441
16829872900
16830132361
16830391824
16830651289
16830910756
16831170225
16831429696
16831689169
16831948644
16832208121
16832467600
16832727081
16832986564
16833246049
16833505536
16833765025
16834024516
16834284009
16834543504
16834803001
16835062500
16835322001
16835581504
16835841009
16836100516
16836360025
16836619536
16836879049
16837138564
16837398081
16837657600
16837917121
16838176644
16838436169
16838695696
16838955225
16839214756
16839474289
16839733824
16839993361
16840252900
16840512441
16840771984
16841031529
16841291076
16841550625
16841810176
16842069729
16842329284
16842588841
16842848400
16843107961
16843367524
16843627089
16843886656
16844146225
16844405796
16844665369
16844924944
16845184521
16845444100
16845703681
16845963264
16846222849
16846482436
16846742025
16847001616
16847261209
16847520804
16847780401
16848040000
16848299601
1684

29083891600
29084232681
29084573764
29084914849
29085255936
29085597025
29085938116
29086279209
29086620304
29086961401
29087302500
29087643601
29087984704
29088325809
29088666916
29089008025
29089349136
29089690249
29090031364
29090372481
29090713600
29091054721
29091395844
29091736969
29092078096
29092419225
29092760356
29093101489
29093442624
29093783761
29094124900
29094466041
29094807184
29095148329
29095489476
29095830625
29096171776
29096512929
29096854084
29097195241
29097536400
29097877561
29098218724
29098559889
29098901056
29099242225
29099583396
29099924569
29100265744
29100606921
29100948100
29101289281
29101630464
29101971649
29102312836
29102654025
29102995216
29103336409
29103677604
29104018801
29104360000
29104701201
29105042404
29105383609
29105724816
29106066025
29106407236
29106748449
29107089664
29107430881
29107772100
29108113321
29108454544
29108795769
29109136996
29109478225
29109819456
29110160689
29110501924
29110843161
29111184400
29111525641
29111866884
2911

41622120225
41622528256
41622936289
41623344324
41623752361
41624160400
41624568441
41624976484
41625384529
41625792576
41626200625
41626608676
41627016729
41627424784
41627832841
41628240900
41628648961
41629057024
41629465089
41629873156
41630281225
41630689296
41631097369
41631505444
41631913521
41632321600
41632729681
41633137764
41633545849
41633953936
41634362025
41634770116
41635178209
41635586304
41635994401
41636402500
41636810601
41637218704
41637626809
41638034916
41638443025
41638851136
41639259249
41639667364
41640075481
41640483600
41640891721
41641299844
41641707969
41642116096
41642524225
41642932356
41643340489
41643748624
41644156761
41644564900
41644973041
41645381184
41645789329
41646197476
41646605625
41647013776
41647421929
41647830084
41648238241
41648646400
41649054561
41649462724
41649870889
41650279056
41650687225
41651095396
41651503569
41651911744
41652319921
41652728100
41653136281
41653544464
41653952649
41654360836
41654769025
41655177216
41655585409
4165

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



KeyboardInterrupt: 

✔️ This works with any `for` loop, `list()`, `sum()`, etc.  
✔️ It's **lazy** — each item is generated on demand.  
✔️ Simpler than using `__iter__` and `__next__` for finite sequences.

---

### ⚠️ When to Use `__getitem__`

| Use `__getitem__` if...       | Use `yield` / `__next__` if...           |
|-------------------------------|------------------------------------------|
| You need simple index-based access | You need more control/state in logic |
| Sequence is finite and indexed     | Sequence is stateful / dynamic       |
| You want to mimic list-like behavior | You’re building real generators     |

---

> 🔎 Tip: If your iterable **can be indexed**, `__getitem__` might be easier and faster to implement than `yield` or `__next__`.


---

## 📦 1. What is a `Dataset`?

`torch.utils.data.Dataset` is the **standard abstract base class** for datasets in PyTorch.

> You implement:
- `__len__()` → for total number of samples
- `__getitem__(index)` → to retrieve a single sample by index

✅ Supports:
- Random access
- Shuffling
- Batching via DataLoader

---

### 🧪 Example: Custom Dataset (Map-style)

In [32]:
import torch
from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self):
        self.data = list(range(10))

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        return self.data[index]

Usage:

In [33]:
ds = MyDataset()
for x in ds:
    print(x)

0
1
2
3
4
5
6
7
8
9


✔️ Suitable for structured data like CSV, images, tabular formats  
✔️ Can be shuffled, batched, and sliced

---

## 🔁 2. What is an `IterableDataset`?

`torch.utils.data.IterableDataset` is for **streaming data** where you **don’t know the length** or **can’t index directly**.

> You implement:
- `__iter__()` → must return an **iterator or generator**

✅ Supports:
- Large files
- Infinite streams (e.g. sensors, web API)
- Generators or yield-based logic

---

### 🧪 Example: Streaming Dataset


In [34]:
from torch.utils.data import IterableDataset

class StreamDataset(IterableDataset):
    def __init__(self, start=0):
        self.start = start

    def __iter__(self):
        for i in range(self.start, self.start + 5):
            yield i

Usage:

In [35]:
ds = StreamDataset()
for x in ds:
    print(x)

0
1
2
3
4



✔️ Suitable for large files, logs, databases  
✔️ Works well with multiprocessing (`num_workers > 0`)

---

## 🧠 Comparison Table

| Feature                | `Dataset`                   | `IterableDataset`             |
|------------------------|-----------------------------|-------------------------------|
| Index-based            | ✅ Yes (`__getitem__`)      | ❌ No                         |
| Requires `__len__()`   | ✅ Yes                      | ❌ No                         |
| Shuffling allowed      | ✅ Yes                      | ❌ Only with custom logic     |
| Supports `DataLoader`  | ✅ Yes                      | ✅ Yes                        |
| Best for               | Tabular/image datasets      | Streams/logs/lazy generators  |

---

## 🔥 Real-World Use Cases

| Use Case                 | Recommended Dataset Type |
|--------------------------|--------------------------|
| Training from CSV files  | `Dataset` (map-style)    |
| Image classification     | `Dataset` with transforms|
| Log streaming            | `IterableDataset`        |
| Live sensor data         | `IterableDataset`        |
| File reading with `yield`| `IterableDataset`        |

---

### ⚠️ Note on `DataLoader`

Both types of datasets can be passed to `DataLoader`:



In [36]:
from torch.utils.data import DataLoader

# For Dataset
dl = DataLoader(MyDataset(), batch_size=4, shuffle=True)

# For IterableDataset
dl = DataLoader(StreamDataset(), batch_size=4)  # No shuffling!

---

## ✅ Summary

| Concept        | `Dataset`            | `IterableDataset`       |
|----------------|----------------------|--------------------------|
| Style          | Index-based          | Generator-style          |
| Lazy loading   | ❌ No                | ✅ Yes                   |
| Random access  | ✅ Yes               | ❌ No                    |
| Infinite data  | ❌ Not ideal         | ✅ Perfect               |
| Use case       | Images, CSV, etc.    | Logs, streams, pipes     |

---

> 💡 Use `Dataset` when you know the size and want indexing/shuffling.  
> Use `IterableDataset` when your data is **infinite**, **lazy**, or **streamed**.

