# Generator

- **Generator**:A function that returns a [lazy iterator](https://en.wikipedia.org/wiki/Lazy_evaluation).

- This chapter is organized as follows.  
  1. Basic usage of generator  
  2. Advantages of generator  
  3. How to use the generator properly    
    - Index Slicing  
    - Iterate elegantly  
    - Length of the generator    

  4. Other examples of generator  
  5. Appendix - *generator* and *list*, which is faster??  
  6. Reference

## 1. Basic usage of generator

- Basic example of **yield** and **generator expression**

### (A) yield

In [108]:
# number_generator returns generator
def number_generator(N: int):
    for i in range(N):
        yield i

n = 5
numbers = number_generator(n)
print(numbers)

# Index of this iterable object is added by 1
print(next(numbers))

# In this loop, it starts from index 1
for i in numbers:
    print(i, end=' ')


# This raises StopIteration error because iteration is done
# You can't reuse or copy generator object
print(next(numbers))

<generator object number_generator at 0x7f8cdba7cf20>
0
1 2 3 4 

StopIteration: 

### (B) Generator expression  
- Generator expression by surrounding parenthesis ()

In [56]:
numbers = (x for x in range(5))  # generator expression
print(numbers)
print(sum(numbers))

# Do not be confused with list
numbers_list = [x for x in range(5)]
print(type(numbers_list))

<generator object <genexpr> at 0x7f8cf5725c80>
10
<class 'list'>


## 2. Advantages of the generator  
#### 1. Memory Efficient  
   When you iterate over a list, python reserves memory for whole list.  
   However, generator doesn't consume much memory because the generator object generates only next element on demand.  
   
#### 2. Infinite Stream  
   Infinite streams can't be reserved in the memory, and the generator yields one item.  
   So, when you handling infinite stream of data, it is recommended to use generator.  
   
#### 3. Pipelining Generators  
   Creating data pipeline, it is useful to use generator.

In [78]:
# 1. Memory Efficient
import sys
N = 10**7
square_list = [i ** 2 for i in range(N)]
square_gen = (i**2 for i in range(N))
print(f'size with list: {sys.getsizeof(square_list)} bytes')
print(f'size with generator: {sys.getsizeof(square_gen)} bytes')

size with list: 81528048 bytes
size with generator: 112 bytes


In [109]:
# 2. Infinite Stream
def infinite_generator():
    num = 0
    while True:
        yield num
        num += 1

inf_gen = infinite_generator()
for n in inf_gen:
    print(n, end=', ')
    if n == 100:
        print('...   end of infinite numbers...')
        break

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, ...   end of infinite numbers...


In [107]:
# 3. Pipelining Generators
# sum of squares of the Fibonacci numbers
# pipeline: fibonacci -> square -> sum
def fibonacci(rg):
    x, y = 0, 1
    for _ in range(rg):
        x, y = y, x+y
        yield x
        
        
def square(nums):
    # 'nums' variable is a generator object, so you should iterate
    for num in nums:
        yield num ** 2
        

num_range = 10
print(f'sum of squares of {num_range} Fibonacci number: {sum(square(fibonacci(num_range)))}')

sum of squares of 10 Fibonacci number: 4895


## 3. How to utilize generator properly  


### (A) Index Slicing  

- You can't slice indices of the generator object.
- it makes ```TypeError: 'generator' object is not subscriptable```

In [23]:
n = 5
numbers = number_generator(n)
index = 1

# error
print(numbers[index])
    
# error
for i in numbers[index:]:
    print(i)

TypeError: 'generator' object is not subscriptable

- If you want to index slicing, you can use two ways.  
    (1) generator to list  
    (2) using ```itertools.islice```

In [45]:
index = 1
n = 5
numbers = number_generator(n)


# (1) generator to list
num_list = list(numbers)
print(f'(1) generator to list: {num_list[index]}')

# (2) itertools.islice
# itertools.islice(iterable, stop)
# itertools.islice(iterable, start, stop[, step])
from itertools import islice
slice = islice(number_generator(n), index, index+1)  # arg position 0: not 'numbers', but 'number_generator()'
index_element = next(slice)
print(f'(2) itertools.islice: {index_element}')

(1) generator to list: 1
(2) itertools.islice: 1


### (B) Using ```for``` loop is an elegant way to iterate generator object


In [None]:
for element in iterable
    pass

Since ```for``` loop **automatically** iterates until the end of the iterable object, StopIteration error doesn't occur.

```for``` loop is actually implemented as below.

In [None]:
iterator = iter(iterable)
while True:
    try:
        element = next(iterator)
        pass
    except StopIteration:
        break

### (C) Length of the generator object

- You can't use ```len``` function to the generator object directly.
- There are 2 ways to get length of generator  
    (1) convert ```generator``` to ```list``` and use ```len``` function  
    (2) make another generator to use ```sum``` function

In [52]:
# can't use len function to the generator object
n = 5
numbers = number_generator(n)
try:
    len_numbers = len(numbers)
except TypeError:
    print("TypeError: object of type 'generator' has no len()")
    

# (1) convert generator to list
numbers = number_generator(n)
numbers_list = list(numbers)
len_numbers = len(numbers_list)
print(f'(1) len of numbers: {len_numbers}')


# (2) sum function
numbers = number_generator(n)
len_numbers = sum(1 for _ in numbers)
print(f'(2) len of numbers: {len_numbers}')

TypeError: object of type 'generator' has no len()
(1) len of numbers: 5
(2) len of numbers: 5


In [56]:
numbers = (x for x in range(5))  # generator expression
print(numbers)
print(sum(numbers))

# Do not be confused with list
numbers_list = [x for x in range(5)]
print(type(numbers_list))

<generator object <genexpr> at 0x7f8cf5725c80>
10
<class 'list'>


## 4. Other examples


### (A) batch function

In [2]:
# input: iterable object, batch size
# output: generator object
def batch(iterable, batch_size: int):
    l = len(x)
    for i in range(0, l, batch_size):
        yield iterable[i:min(l, i+batch_size)]

### (B) Read large file  

In [111]:
# (1) yield
def reader(file):
    for row in open(file, 'r'):
        yield row

        
# (2) generator expression
# example from https://realpython.com/introduction-to-python-generators/#creating-data-pipelines-with-generators
# file = "data.csv"
# lines = (line for line in open(file, 'r'))

# imagine var 'lines' is the large file
lines = ['permalink,company,numEmps,category,city,state,fundedDate,raisedAmt,raisedCurrency,round',
'digg,Digg,60,web,San Francisco,CA,1-Dec-06,8500000,USD,b',
'digg,Digg,60,web,San Francisco,CA,1-Oct-05,2800000,USD,a',
'facebook,Facebook,450,web,Palo Alto,CA,1-Sep-04,500000,USD,angel',
'facebook,Facebook,450,web,Palo Alto,CA,1-May-05,12700000,USD,a',
'photobucket,Photobucket,60,web,Palo Alto,CA,1-Mar-05,3000000,USD,a',
]
lines_split = (line.strip().split(',') for line in lines)
list_line = (s.rstrip().split(",") for s in lines)

# extract column names
cols = next(list_line)

company_dicts = (dict(zip(cols, data)) for data in list_line)
print(f'company_dicts : {company_dicts}')
for c in company_dicts:
    print(c)

company_dicts : <generator object <genexpr> at 0x7f8cc779c820>
{'permalink': 'digg', 'company': 'Digg', 'numEmps': '60', 'category': 'web', 'city': 'San Francisco', 'state': 'CA', 'fundedDate': '1-Dec-06', 'raisedAmt': '8500000', 'raisedCurrency': 'USD', 'round': 'b'}
{'permalink': 'digg', 'company': 'Digg', 'numEmps': '60', 'category': 'web', 'city': 'San Francisco', 'state': 'CA', 'fundedDate': '1-Oct-05', 'raisedAmt': '2800000', 'raisedCurrency': 'USD', 'round': 'a'}
{'permalink': 'facebook', 'company': 'Facebook', 'numEmps': '450', 'category': 'web', 'city': 'Palo Alto', 'state': 'CA', 'fundedDate': '1-Sep-04', 'raisedAmt': '500000', 'raisedCurrency': 'USD', 'round': 'angel'}
{'permalink': 'facebook', 'company': 'Facebook', 'numEmps': '450', 'category': 'web', 'city': 'Palo Alto', 'state': 'CA', 'fundedDate': '1-May-05', 'raisedAmt': '12700000', 'raisedCurrency': 'USD', 'round': 'a'}
{'permalink': 'photobucket', 'company': 'Photobucket', 'numEmps': '60', 'category': 'web', 'city': 

## 5. Appendix - *generator* and *list*, which is faster?  

- We see the memory efficiency of generator compared to list in **2-1(Memory effiency)**  
- in terms of *performance*, list is *faster* than generator.

In [82]:
import cProfile
N = 10**8
# list
print('List: ')
cProfile.run('sum([i ** 2 for i in range(N)])')
# nums = list(number_generator(N))
# cProfile.run('sum(nums)')

# generator
print('Generator: ')
cProfile.run('sum((i ** 2 for i in range(N)))')
# nums = number_generator(N)
# cProfile.run('sum(nums)')

List: 
         5 function calls in 17.865 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1   15.396   15.396   15.396   15.396 <string>:1(<listcomp>)
        1    0.693    0.693   17.865   17.865 <string>:1(<module>)
        1    0.000    0.000   17.865   17.865 {built-in method builtins.exec}
        1    1.776    1.776    1.776    1.776 {built-in method builtins.sum}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}


Generator: 
         100000005 function calls in 23.781 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
100000001   17.441    0.000   17.441    0.000 <string>:1(<genexpr>)
        1    0.000    0.000   23.781   23.781 <string>:1(<module>)
        1    0.000    0.000   23.781   23.781 {built-in method builtins.exec}
        1    6.340    6.340   23.781   23.781 {built-in method builtins.sum}
 

As you see above, generator is *slower* than list.  
Specifically, the result of list reports ```listcomp(list comprehension)``` was called **1** times.  
In contrast, the result of generator says ```genexpr'(generator expression)``` is called **100000001** times.  
This is because generator *produces only one element at a time*, which makes generator *slower* than list.  

### References
- https://www.programiz.com/python-programming/iterator
- https://www.daleseo.com/python-yield/
- https://docs.python.org/3/library/itertools.html#itertools.islice
- https://realpython.com/introduction-to-python-generators/
- https://stackoverflow.com/questions/393053/length-of-generator-output
- https://medium.com/free-code-camp/python-list-comprehensions-vs-generator-expressions-cef70ccb49db