### A no-nonsense guide to generators!!

Adapted from: Three Keys to Leveling Up your Python by Aaron Maxwell (aaron@powerfulpython.com)

### Make your Python code scalable

```python
def fetch_squares(max_root):
    squares = []
    for x in range(max_root):
        squares.append(x**2)
    return squares
MAX = 5
for square in fetch_squares(MAX):
    do_something_with(square)
```
The following code causes:
- `fetch_squares` creates an **entire list** in memory
-  As `MAX` increases, the program is forced to page to disk (and therefore becomes sluggish)

In [86]:
def gen_squares(max_root):
    for x in range(max_root):
        yield x**2
squares = gen_squares(5)
print([sq for sq in squares])

[0, 1, 4, 9, 16]


#### key points of generator function:
- A function is a generator function if and only if it contains the `yield` keyword
- A generator function returns a generator object


In [68]:
type(squares)

generator

Some notes about generators:
- The scalability benefit of the generator object is like a raw iterator
- every generator object is an iterator but not the other way around
- easiest way to create an iterator is the generator object

#### The next() thing

In [88]:
squares = gen_squares(5)
print(next(squares), next(squares), next(squares), next(squares), next(squares))
# next lets you supply the default value, in this case : None
if next(squares, None) == None: 
    print('thats all folks')

0 1 4 9 16
thats all folks


In [89]:
def gen_up_to(limit):
    n = 0
    while n <= limit:
        yield n
        n += 1
limit = 10
it = gen_up_to(limit)

In [90]:
next(it, None)

0

### Use case 1 : `pattern matching` log files with `small memory footprint`

In [91]:
import sys
def matching_lines(pattern, path):
    '''
    generator function matches lines from a log file based on pattern
    :param path : file with records one per line
    :return yields an object
    '''
    with open(path) as handler_list:
        for line in handler_list:
            if pattern in line:
                yield line.rstrip('\n')
pattern, path = 'WARNING', 'log.txt'

In [92]:
print([line for line in matching_lines(pattern, path)])



### Use case 2 : `Transforming` log files to `dictionaries`

In [93]:
def parse_log_records(lines):
    '''
    return a dictionary to parse log records
    : param lines
    : yield dict with two keys
    '''
    for line in lines:
        level, message = line.split(": ", 1)
        yield {"level" : level, "message" : message}

In [94]:
print([record for record in parse_log_records(matching_lines(pattern, path))])



##### With the two building blocks `parse_log_records` and `matching_lines`, lets proceed to build a class `Logs`, which comes quite handy during data engineering.

In [95]:
class Logs:
    def __init__(self, log_file_path):
        self.log_file_path = log_file_path
    def records(self):
        with open(self.log_file_path) as log_lines:
                 for record in parse_log_records(log_lines):
                    yield record
    def warnings(self):
        log_lines = matching_lines("WARNING", self.log_file_path)
        for record in parse_log_records(log_lines):
            yield record

In [96]:
logs = Logs('log.txt')

In [101]:
print([record for record in logs.records()])



In [102]:
print([record for record in logs.warnings()])

