## Generator
A type of function that allows you to declare a function that behaves like an iterator.

How are *list* and *generator* different?

In [None]:
### List
nums = [x**2 for x in range(1000000)]

### Generator
def gen_squares():
    for x in range(1000000):
        yield x**2

squares = gen_squares()
# squares

<generator object gen_squares at 0x7953cdd0eb50>

#### List comprehension
In the example below, we build a list of numbers from 0 to n-1. All the numbers are stored in memory inside the `numbers` list.  

What is the problem?  
If n is very large, storing every nnumber in a list uses a lot of of RAM. We cannot afford to keep all n "10 megabyte" integers in memory.

In [None]:
### Build and return a list
def get_numbers(n):
    num = 0
    numbers = []
    while num < n:
        numbers.append(num)
        num += 1
    return numbers

## Calculate the sum of 0 to 1000000
a = get_numbers(1000000)
## Create a list that can be generated from the given number than can be returned for the given number
sum_of_nums = sum(a)
sum_of_nums

499999500000

In [None]:
a[2000]

2000

#### Generator
Wrting a generator function builds a list of numbers, but yields one number at a time instead of building and storing the full list.
It's more memory efficient, especially for large n.

In [None]:
## Generator that yields items instead of returning a list
def get_numbers(n):
    num = 0
    while num < n:
        yield num
        num += 1

## Calculate the sum of 0 to 1000000
b = get_numbers(1000000)
sum_of_nums = sum(b)
sum_of_nums

499999500000

In [None]:
## Generators are iterators and don't store all their values in memory like a list.
## You cannot directly access elements using indexing with b[2000]

b[2000]

TypeError: 'generator' object is not subscriptable

Here we have a list comprehension that builds a complete list of all 4-letter substrings (4-mers) from the DNA string. The entire list is stored in memory immediately.


To convert this to a generator use `()` to indicate a generator expression. Generator generates one value at a time and not a full list.

In [None]:
## list comprehension []
# dna = "aattagatagatgatgcgctcggcgcctcgaga"
# kmers = [dna[a:a+4] for a in range(len(dna) - 4)]
# kmers

## generator expression ()
dna = "aattagatagatgatgcgctcggcgcctcgaga"
kmers = (dna[a:a+4] for a in range(len(dna) - 4))
kmers

<generator object <genexpr> at 0x7d3b205d2500>

In [None]:
## list comprehension
doubles = [2 * n for n in range(50)]
# print(type(doubles))
print(*doubles) #asterisk to unpack an iterable into positional arguments

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 72 74 76 78 80 82 84 86 88 90 92 94 96 98


In [None]:
## generator expression
doubles = (2 * n for n in range(50)) # not a tuple
print(doubles)

## To access the values, you must iterate:
# for num in doubles:
    # print(num)

<generator object <genexpr> at 0x7d3b204e6e90>


### `yield`
To make a function a generator, use `yield`. Each call to `yield` pauses the function and returns a value and then resume where function left off. This is efficient for large sequences where you do not need to generate the entire list at once.

In [None]:
def a(n):
  for i in range(n):
    yield i/2

In [None]:
## Generate a generator object a(9) and output values
list(a(9))

[0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0]

In [None]:
## Another example using list comprehension
def sq_num(nums):
  res = []
  for i in nums:
    res.append(i*i)
  return res

my_nums = sq_num([1, 2, 3])
my_nums

[1, 4, 9]

This is a generator function that uses `yield` to return one at a time the square of each number in the list nums. Unlike return, which ends the function, yield pauses the function and remembers where it left off.

`next()` yields 1 and the generator pauses and wait for the next call.


In [None]:
## Turn into a generator expression
# Do not hold entire result in memory just yield one result at a time
def sq_nums(nums):
  for i in nums:
    yield i*i

my_nums = sq_nums([1, 2, 3]) # Creates a generator object
# print(my_nums)

## Generator yield first element and pauses waiting for the next call
next(my_nums)
next(my_nums)
next(my_nums)
# What happens if the elements end? We get an exeception which asks us to StopIteration


9

In [None]:
class get_odds:
    def __init__(self, max):
        self.n=3
        self.max=max

    def __iter__(self):
        return self

    def __next__(self):
        if self.n <= self.max:
            result = self.n
            self.n += 2
            return result
        else:
            raise StopIteration
numbers = get_odds(10)

In [None]:
print(numbers)

<__main__.get_odds object at 0x7d3aef663290>


In [None]:
next(numbers)
next(numbers)

5

## Let's practice

Create two different number generator functions:
1.   Write a function get_numbers_list(n) that returns a list of numbers from 0 to n-1
2.   Write another function get_numbers_gen(n) that yields numbers from 0 to n-1

In [None]:
def get_numbers_list(n):
    ## Your code here

In [None]:
def get_numbers_gen(n):
    ## Your code here

To measure efficiency we can use the time library to measure how long does it take to sum the numbers.

In [None]:
import time

## Time to run list
start = time.time() #gives the seconds when you started a process
sum(get_numbers_list(10_000_000))
end = time.time()
print("List time:", end - start)

## Time to run generator
start = time.time()
sum(get_numbers_gen(10_000_000))
end = time.time()
print("Generator time:", end - start)


List time: 3.4001386165618896
Generator time: 1.1194005012512207


## Reflect on the following:
1. What is the difference between how `get_numbers_list(n)` and `get_numbers_gen(n)` behave?
2. What happens to memory usage when `n` is very large (e.g. 1,000,000)?
3. Can you print all elements of the generator using a for-loop?
4. What happens if you try to use `next()` on a generator that has no values left?
5. Which version would you use for a data stream with billions of records and why?