# Generators

Consider the following code that computes the sum of squared numbers up to N.

In [11]:
def squared_numbers(n):
    return [x*x for x in range(n)]

def sum_squares(n):
    return sum(squared_numbers(n+1))

sum_squares(20000000)

2666666866666670000000

The code works and is all great, but it has one flaw: it creates a list of all the numbers from 1 to N in memory. If N were large, we would use a lot of extra memory, which might lead to the system swapping or running out of memory. 

In this case it is not necessary to create the entire list in memory. The ```sum``` function iterates over it's input and only needs the cumulative sum and the next value at a time.

The Python keyword ``yield``is used to achieve this. Using ``yield`` in a statement automatically makes that statement a generator expression. A generator expression can be iterated over like a list, but it only creates new values as they are needed.

In [12]:
def squared_numbers_alternate(n):
    for x in range(n):
        yield x*x
        
def sum_squares_alternate(n):
    return sum(squared_numbers_alternate(n+1))

sum_squares(20000000)

2666666866666670000000

At this you may wonder, doesn't ``range()`` return a list? The short answer is no, but the details are complicated.

### Synthesis

Python is often used to process text files, some of which may be quite large. Typically a single row in a text file isn't large, however. The following type of pattern permits one to cleanly read in a file much larger than what would fit memory one line at a time.

In [14]:
def read_file_by_lines(filename):
    with open(filename, "r") as fileobj :
        for line in fileobj:
            yield line
        
def process_file(input_, output_):
    with open(output_, "w") as outfile:
        for line in read_file_by_lines(input_):
            outfile.write(line[::-1]) # odd syntax for reversing a file
            
process_file("notes.txt", "seton.txt")