## Goal

- Write a **generator function**
- That reads a **very large file** (for example, a 100GB file)
- Reads the file **line by line**

The entire file should **NOT** be loaded into memory at once.

---

## Deep Dive (Why This Is Important)

### What Is the Problem?

Suppose you write:

```python
lines = open("bigfile.txt").readlines()
```
### This tries to load the entire file into RAM

If the file is very large (e.g., 100GB) and your RAM is smaller
The program will crash or freeze
This is why reading large files all at once is unsafe.


## Problem with Loading the Whole File

 Issues:

- The entire file is loaded into RAM
- If the file is 100GB and the system has only 16GB RAM  
  â†’ the program will crash

---

## Why a Generator Is Needed Here

When using a generator with `yield`:

- One line is read from the file
- The line is processed immediately
- The line is then released from memory
- The next line is read

All data is **never stored in memory at the same time**.

This approach is called **Lazy Evaluation**.


In [None]:
def read_large_file(file_path):
    with open(file_path, "r") as f:
        for line in f:
            yield line

In [None]:
for line in read_large_file("bigfile.txt"):
    process(line)

## Key Characteristics

- One line is processed at a time  
- RAM usage stays almost constant  
- The program remains safe, no matter how large the file is  

---

## Why This Is the Standard in Big Data Processing

This approach is used because:

- Big data is often **larger than available RAM**
- Generators process data as a **stream**, not all at once
- The same technique is commonly used in:
  - Log processing
  - CSV / JSONL parsing
  - ETL pipelines
  - Machine Learning data loading

---

## One-Line Rule to Remember

> **If data is bigger than RAM, use generators.**
