# Iterators

**An iterator is an object representing a stream of data**

This object returns the data one element at a time.

---

https://docs.python.org/dev/library/stdtypes.html#iterator-types

https://docs.python.org/dev/howto/functional.html#iterators

---

A Python iterator must support a method called `__next__()` that takes no arguments and always returns the next element of the stream.

If there are no more elements in the stream, `__next__()` must raise the StopIteration exception.

*(Anything that supports this "protocol" is an Iterator)*

### Iterables

The built-in `iter()` function takes an arbitrary object and tries to return an iterator that will return the object’s contents or elements, raising TypeError if the object doesn’t support iteration.

An object is called `iterable` if you can get an iterator for it.

---


In [34]:
# most container objects can be iterated over using a `for` loop:

my_str = "qwerty"

for i in my_str:
    print(i)

q
w
e
r
t
y


---

Behind the scenes, the `for` statement calls `iter()` on the container object. 

The function returns an iterator object that defines the method `__next__()` which accesses elements in the container one at a time. When there are no more elements, `__next__()` raises a `StopIteration` exception which tells the `for` loop to terminate.

https://docs.python.org/3/tutorial/classes.html#iterators

You can call the `__next__()` method using the `next()` built-in function:

In [None]:
it1 = iter(my_str)

next(it1)

### Implementing iterators

To add iterator behavior to your classes define an `__iter__()` method which returns an object with a `__next__()` method.

If the class defines `__next__()`, then `__iter__()` can just return self.

In [None]:
class CountFrom():
    
    def __init__(self, start):
        self.start = start
        self.step = 1
        
    def __next__(self):
        self.start += 1
        return self.start - 1
    
    def __iter__(self):
        return self


In [None]:
cnt = CountFrom(-10)

cnt

In [None]:
dir(cnt)

In [None]:
next(cnt)

---

# Generators

(Functions that behave like iterators)

---
**Generator = A function which returns an Iterator**

It looks like a normal function except that it contains `yield` expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the `next()` function.

* *Any function that contains `yield` becomes a generator!*

https://docs.python.org/3.6/glossary.html#term-generator

https://docs.python.org/3.6/howto/functional.html#generators

---

**Lazy execution**

Values are generated on-demand, as necessary => lazy execution

Generator / iterator can be infinite.

---

In [None]:
def count_from(start):
    
    i = start
    
    # infinite cycle
    while True:
        
        # return next value
        yield i
        
        i += 1

In [None]:
count_from

In [None]:
res = count_from()
res

In [None]:
help(res)

In [None]:
next(res)

In [None]:
for i in range(5):
    print(next(res))
    

## How do generators work?

Let's take a look:

In [None]:
def count_from(start):

    print("Let's start") 
    
    i = start
    
    while True:
        
        print(" ... before yield")
        yield i
        
        print(" ... after yield")
        i += 1
        
    print("Exiting")

In [None]:
res = count_from(42)

In [None]:
# see how it works
next(res)

In [None]:
next(res)

In [None]:
### Data processing example

data = "Some134 content __here, @1441   needs cleanup  "

In [None]:
def get_tokens(data_in):
    
    for item in data_in.split():
        yield item

In [None]:
tokens = get_tokens(data)

for i in tokens:
    print(i)

## Generators -> Data Pipelines

---

### Generator Tricks for Systems Programmers

**by David Beazley**

http://www.dabeaz.com/generators-uk/
  * see the link for *source data* and *code examples*

---

**presentation slides:**
http://www.dabeaz.com/generators-uk/GeneratorsUK.pdf

* ... background (generator functions, generator expressions) - from slide 24
* Part 2 (Processing data files) - **from slide 35**


### Processing [huge] data files

**... using Generator expressions**

*see: Processing data files - from slide 35*

without generators:

```
wwwlog = open("access-log")
total = 0
for line in wwwlog:
    bytestr = line.rsplit(None,1)[1]
    if bytestr != '-':
        total += int(bytestr)
print "Total", total
```

with generators:

```
wwwlog     = open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes      = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes)
```

### Fun with files and directories (part 3)

*see: Part 3 = from slide 35*

generator to list files matching a pattern:

```
import os
import fnmatch
def gen_find(filepat,top):
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)  
```

using this generator:

`logs = gen_find("access-log*","/usr/www/")`

**see slide 59 -> how a chain of generators is used to process these files**

## Advanced Generator topics

---

### Generators: The Final Frontier

**by David Beazley**

http://www.dabeaz.com/finalgenerator/
  * see the link for *source data* and *code examples*

This tutorial discusses advanced uses of using generators to alter program control flow, *explode brains*, and exponentially increase your job security. 

Topics include context managers, inlined futures, concurrency, asyncio, actors, compilers, and more. 

---

* slide: http://www.dabeaz.com/finalgenerator/FinalGenerator.pdf
* video: http://pyvideo.org/video/2575/generators-the-final-frontier
* screencast: http://www.youtube.com/watch?v=5-qadlG7tWo

---

Example: "Let's write a compiler!"
* see video at https://youtu.be/5-qadlG7tWo?t=6884
* using generators to solve recursion problems

---

https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/

---

## Extra content


### Generator expressions

Similar to list expressions:

`["*"*i for i in range(20)]`

... but they are "lazy", do not build the full list and return a *generator object* instead:

`("*"*i for i in range(20))`

*Note: you can only consume (iterate over) generator objects once*

In [None]:
list = ["*"*i for i in range(20)]
gen = ("*"*i for i in range(20))

print(repr(list))
print(repr(gen))

In [None]:
next(gen)

In [None]:
next(gen)

### Iterators *versus* Generators

The same functionality can be implemented both using Iterators and Generators:

see https://wiki.python.org/moin/Generators

---

* standalone version (builds list in memory, takes up space)
* iterator version
* generator version

---

## Exercises

In [None]:
# write a generator function my_seq(a) that returns 
#   a sequence of numbers: a, a+2, a+4, ...

def my_seq(a):
    
    # for now, this fn does nothing
    # edit it (adding yield, etc.) to return the sequence described above
    pass

In [None]:
res_seq = my_seq(100)

for i in range(5):
    print(next(res_seq))

In [None]:
# write a modified function my_seq2(a) that returns
#   a sequence of numbers: a, (a+2), (a+2)-3, (a+2-3)+2, ...

def my_seq2(a):
    
    # for now, this fn does nothing
    # edit it (adding yield, etc.) to return the sequence described above
    pass

In [None]:
res_seq2 = my_seq2(100)

for i in range(5):
    print(next(res_seq2))