# Iterators and Generators

In this notebook we will learn about iterators and generator functions!

You have already used iterators in the prerequisite course, but we have not formally introduced them yet.  We will introduce them, and then learn how to write generator functions which allow us to define custom iterators.

In [None]:
import os
from IPython.core.debugger import set_trace

## Introducing Iterators: another look at range()

If you remember from the previous course, we introduced the `range()` function when discussing for-loops. The `range()` function returns an iterator that will iterate through the values that we specify when we call `range()`.

The iterator only calculates and yields one value at a time. Below, there are two examples fusing range.  In example 1, range iterates from 5 through 7, and it does not calculate or store those numbers in memory up front, it calculates them only when they are needed, one at a time.

In [None]:
print('range example 1:')
for x in range(5, 8):
    print(x)
    
print('range example 2:')
my_range = range(11, 23)
for x in my_range:
    print(x)

## File Object Are Also Iterators
You may have remembered from the prerequisite course how we opened files and iterated through the lines one at a time. I will walk through the example below in the lecture. The main takeaway of this lecture is that each line is read in (iterated through) one at a time.  The entire file is not read up-front, each line is read into this notebook one at a time, as needed.

In [None]:
filepath = os.path.join(os.getcwd(), 'AAA_Fuel_Prices.csv')
count = 0
with open(filepath, 'r') as my_file:
    for line in my_file:
        line = line.strip()
        print(line)
        count += 1
        if count > 10:
            break

## Iterators Can Also Be Made From Lists (as well as other data types)

Iterators can also be made out of lists. This is what happens when we use a list in a for loop.

In [None]:
my_list = [1, 2, 3, 4, 5]
for x in my_list:
    print(x)

In [None]:
my_list_iterator = iter(my_list)

print(type(my_list_iterator))
print(my_list_iterator.__next__())
print(my_list_iterator.__next__())
print(my_list_iterator.__next__())

## Introducing Generators

You may wonder how can we write functions, like `range()`, that return iterators that we can iterate through. We can!  In order to do so, we must use the keyword `yield` instead of `return`.

### Our Own Version Of Range

Let's write our own version of the `range()` function.  We need to write a function the will yield numbers between a beginning and ending number. Note that when the function reaches the yield keyword, it will return that value (in this case, the value of `i`) and it will cease execution until it is asked for the next value. Let's talk through the example below.

In [None]:
def my_range(beg, end):
    "Generate numbers from start to stop"
    i = beg
    while i < end:
        yield i
        i += 1
    

In [None]:
# Let's call the function to return a generator that we can iterate through. 
range_of_nums = my_range(0, 10)

# Now, let's call the __next__() method to get each value is the it is 
# "yielded" by the generator

# This executes the code in the generator until it hits the yield
# statement for the first time.  It then stops until __next__() is called again.
print(range_of_nums.__next__())

# This now resumes executing the code in the generator until it hits the yield
# statement a second time. It then stops until __next__() is called again.
print(range_of_nums.__next__())

# This now resumes executing the code in the generator until it hits the yield
# statement a third time. It then stops until __next__() is called again.
print(range_of_nums.__next__())

We do not usually use the `__next__()` method directly. We are usually looping over the iterator, or passing the iterator to another iterative process (in these case,`__next__()` is still used "under the hood", but we are not using it directly as programmers). Below, we simple use `range_of_nums` in a loop, we also use the `my_range()` function directly in a for loop, just like you would use `range()`.

In [None]:
print("Example 1:")
range_of_nums = my_range(0, 3)
print(type(range_of_nums))
for num in range_of_nums:
    print(num)

print("Example 2:")
for num in my_range(30, 33):
    print(num)

### A Fibonacci Series Generator.
A Fibonacci Series is a series of numbers in which the next number is the sum of the two preceding numbers.  If we start with 0 and 1, then the series is 0, 1, 1, 2, 3, 5, 8, 13, etc... This is a fun series that is often used in computer science lessons. Let's code a function that will return a generator that iterates through the Fibonacci Series (starting with 0 and 1).

In [None]:
def fibonacci_series(N):
    """Generate the Fibonacci series starting at 0 and 1"""
    # We start by seeding 0 and 1 as the first two numbers
    i_prev = 0
    i = 1
    yield i_prev  # we yield 0 first
    # now in the following loop, we yield "i" and then calculate i_next by
    # summing the two previous
    for _ in range(N-1):
        yield i 
        i_next = i + i_prev
        i_prev = i
        i = i_next

In [None]:
f_s = fibonacci_series(10)
for x in f_s:
    print(x)

### Iterators Can Only Be Iterated Over Once

Once we create an iterator, it can only be iterated over once.  For example, in the above cell we looped over the entirety of `f_s`.  Below, we try to loop over it again, but nothing prints. This is because we have already looped over the iterator to its end.  If we need to iterate again, we will have to create a new iterator.

In [None]:
for x in f_s:
    print(x)

### Another Fibonacci Series Generator, Using Tuple Unpacking. 
The above function we coded is great, but we can define the function using fewer lines if we take advantage of tuple unpacking. This allows us to define and update multiple variables in one line. I will talk us through the example below.

In [None]:
def fibonacci_series_2(N):
    """Generate the Fibonacci series starting at 0 and 1"""
    # We start by seeding 0 and 1 as the first two numbers
    i_prev, i = 0, 1
    yield i_prev  # we yield 0 first
    # now in the following loop, we yield "i" and then calculate i_next by
    # summing the two previous
    for _ in range(N-1):
        yield i
        # we use tuple unpacking to update the variables in one line. This way
        # we do not need to define an i_next variable.
        i, i_prev = i + i_prev, i

In [None]:
for x in fibonacci_series_2(10):
    print(x)

## File Word Counts

Another example, which will become more meaningful if you take the course Python Data Structures, Data Mining and Big Data, is producing a word counts from a file, line by line.

Let's write a function that will generate word counts, from a file, line by line.

In [None]:
def file_word_count(filepath):
    """Generate word counts from ta file, line by line"""
    # First, open the file
    with open(filepath, 'r') as my_file:
        # Loop through the lines
        for line in my_file:
            line = line.strip()  # strip whitespace from the line
            words = line.split()  # split the line into words
            # create a dictionary that we will store the word counts in
            word_count_dict = {}
            # loop through the words in the line and tally them in the
            # dictionary
            for word in words:
                if word in word_count_dict:
                    word_count_dict[word] += 1
                else:
                    word_count_dict[word] = 1
            # now loop through the dictionary and yield up the word counts 
            for word in word_count_dict:
                yield word, word_count_dict[word]

In [None]:
aesopa10_path = os.path.join(os.getcwd(), 'aesopa10.txt')
counter = 0
for word, count in file_word_count(aesopa10_path):
    print(word, count)
    counter += 1
    if counter > 2000:
        break