# Iterators

**An iterator is an object representing a stream of data**

This object returns the data one element at a time.

---

https://docs.python.org/dev/library/stdtypes.html#iterator-types

https://docs.python.org/dev/howto/functional.html#iterators

---

A Python iterator must support a method called `__next__()` that takes no arguments and always returns the next element of the stream.

If there are no more elements in the stream, `__next__()` must raise the StopIteration exception.

*(Anything that supports this "protocol" is an Iterator)*

### Iterables

The built-in `iter()` function takes an arbitrary object and tries to return an iterator that will return the object’s contents or elements, raising TypeError if the object doesn’t support iteration.

An object is called `iterable` if you can get an iterator for it.

---


In [1]:
# most container objects can be iterated over using a `for` loop:

my_str = "qwerty"

for i in my_str:
    print(i)

q
w
e
r
t
y


---

Behind the scenes, the `for` statement calls `iter()` on the container object. 

The function returns an iterator object that defines the method `__next__()` which accesses elements in the container one at a time. When there are no more elements, `__next__()` raises a `StopIteration` exception which tells the `for` loop to terminate.

https://docs.python.org/3/tutorial/classes.html#iterators

You can call the `__next__()` method using the `next()` built-in function:

In [2]:
it1 = iter(my_str)

next(it1)

'q'

In [3]:
type(it1)

str_iterator

In [4]:
next(it1)

'w'

### Implementing iterators

To add iterator behavior to your classes define an `__iter__()` method which returns an object with a `__next__()` method.

If the class defines `__next__()`, then `__iter__()` can just return self.

In [31]:
class CountFrom:
    
    def __init__(self, start, step = 1, end = None):
        self.start = start
        self.step = step
        self.end = end
        
    def __next__(self):
        self.start += self.step
        if self.end is not None and self.start > self.end:
            raise StopIteration
        return self.start - self.step
    
    def __iter__(self):
        return self


In [12]:
cnt = CountFrom(-10)

cnt

<__main__.CountFrom at 0x1fcd4626760>

In [7]:
dir(cnt)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'start',
 'step']

In [13]:
next(cnt)

-10

In [8]:
cnt_it = iter(cnt)
type(cnt_it)

__main__.CountFrom

In [14]:
next(cnt_it)

-8

In [17]:
next(cnt_it)

-5

In [32]:
jmp_cnt = CountFrom(50, 5, 80)
type(jmp_cnt)

__main__.CountFrom

In [33]:
next(jmp_cnt)

50

In [34]:
for item in jmp_cnt:
    if item > 100:
        break;
    print(item)

55
60
65
70
75


---

# Generators

(Functions that behave like iterators)

---
**Generator = A function which returns an Iterator**

It looks like a normal function except that it contains `yield` expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the `next()` function.

* *Any function that contains `yield` becomes a generator!*

https://docs.python.org/3.6/glossary.html#term-generator

https://docs.python.org/3.6/howto/functional.html#generators

---

**Lazy execution**

Values are generated on-demand, as necessary => lazy execution

Generator / iterator can be infinite.

---

In [39]:
def count_from(start=1):
    
    i = start
    
    # infinite cycle
    while True:
        
        # return next value
        yield i
        
        i += 1

In [36]:
count_from

<function __main__.count_from(start)>

In [37]:
type(count_from)

function

In [40]:
res = count_from()
res

<generator object count_from at 0x000001FCD577A350>

In [41]:
help(res)

Help on generator object:

count_from = class generator(object)
 |  Methods defined here:
 |  
 |  __del__(...)
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __repr__(self, /)
 |      Return repr(self).
 |  
 |  close(...)
 |      close() -> raise GeneratorExit inside generator.
 |  
 |  send(...)
 |      send(arg) -> send 'arg' into generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  throw(...)
 |      throw(typ[,val[,tb]]) -> raise exception in generator,
 |      return next yielded value or raise StopIteration.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  gi_code
 |  
 |  gi_frame
 |  
 |  gi_running
 |  
 |  gi_yieldfrom
 |      object being iterated by yield from, or None



In [42]:
next(res)

1

In [43]:
for i in range(5):
    print(next(res))
    

2
3
4
5
6


In [44]:
for i in res:
    if i > 20:
        break
    print(i)

7
8
9
10
11
12
13
14
15
16
17
18
19
20


## How do generators work?

Let's take a look:

In [45]:
def count_from(start):

    print("Let's start") 
    
    i = start
    
    while True:
        
        print(" ... before yield")
        yield i
        
        print(" ... after yield")
        i += 1
        
    print("Exiting")

In [52]:
res = count_from(42)
type(res)

generator

In [47]:
# see how it works
next(res)

Let's start
 ... before yield


42

In [49]:
next(res)

 ... after yield
 ... before yield


44

In [50]:
def cnt_from(start, step, end=None):
    i = start
    while True:
        print("Before yield..",i)
        yield i
        print("After yield..",i)
        i += step
        if i > end:
            break

In [51]:
my_cnt = cnt_from(10,2,20)
type(my_cnt)

generator

In [53]:
for el in my_cnt:
    print(el)

Before yield.. 10
10
After yield.. 10
Before yield.. 12
12
After yield.. 12
Before yield.. 14
14
After yield.. 14
Before yield.. 16
16
After yield.. 16
Before yield.. 18
18
After yield.. 18
Before yield.. 20
20
After yield.. 20


In [54]:
### Data processing example

data = "Some134 content __here, @1441   needs cleanup  "

In [55]:
def get_tokens(data_in):
    for item in data_in.split():
        yield item

In [58]:
type(data.split())

list

In [56]:
tokens = get_tokens(data)
print(type(tokens))
for i in tokens:
    print(i)

<class 'generator'>
Some134
content
__here,
@1441
needs
cleanup


## Generators -> Data Pipelines

---

### Generator Tricks for Systems Programmers

**by David Beazley**

http://www.dabeaz.com/generators-uk/
  * see the link for *source data* and *code examples*

---

**presentation slides:**
http://www.dabeaz.com/generators-uk/GeneratorsUK.pdf

* ... background (generator functions, generator expressions) - from slide 24
* Part 2 (Processing data files) - **from slide 35**


### Processing [huge] data files

**... using Generator expressions**

*see: Processing data files - from slide 35*

without generators:

```
wwwlog = open("access-log")
total = 0
for line in wwwlog:
    bytestr = line.rsplit(None,1)[1]
    if bytestr != '-':
        total += int(bytestr)
print "Total", total
```

with generators:

```
wwwlog     = open("access-log")
bytecolumn = (line.rsplit(None,1)[1] for line in wwwlog)
bytes      = (int(x) for x in bytecolumn if x != '-')

print "Total", sum(bytes)
```

### Fun with files and directories (part 3)

*see: Part 3 = from slide 35*

generator to list files matching a pattern:

```
import os
import fnmatch
def gen_find(filepat,top):
    for path, dirlist, filelist in os.walk(top):
        for name in fnmatch.filter(filelist,filepat):
            yield os.path.join(path,name)  
```

using this generator:

`logs = gen_find("access-log*","/usr/www/")`

**see slide 59 -> how a chain of generators is used to process these files**

## Advanced Generator topics

---

### Generators: The Final Frontier

**by David Beazley**

http://www.dabeaz.com/finalgenerator/
  * see the link for *source data* and *code examples*

This tutorial discusses advanced uses of using generators to alter program control flow, *explode brains*, and exponentially increase your job security. 

Topics include context managers, inlined futures, concurrency, asyncio, actors, compilers, and more. 

---

* slide: http://www.dabeaz.com/finalgenerator/FinalGenerator.pdf
* video: http://pyvideo.org/video/2575/generators-the-final-frontier
* screencast: http://www.youtube.com/watch?v=5-qadlG7tWo

---

Example: "Let's write a compiler!"
* see video at https://youtu.be/5-qadlG7tWo?t=6884
* using generators to solve recursion problems

---

https://jeffknupp.com/blog/2013/04/07/improve-your-python-yield-and-generators-explained/

---

## Extra content


### Generator expressions

Similar to list expressions:

`["*"*i for i in range(20)]`

... but they are "lazy", do not build the full list and return a *generator object* instead:

`("*"*i for i in range(20))`

*Note: you can only consume (iterate over) generator objects once*

In [None]:
list = ["*"*i for i in range(20)]
gen = ("*"*i for i in range(20))

print(repr(list))
print(repr(gen))

In [None]:
next(gen)

In [None]:
next(gen)

### Iterators *versus* Generators

The same functionality can be implemented both using Iterators and Generators:

see https://wiki.python.org/moin/Generators

---

* standalone version (builds list in memory, takes up space)
* iterator version
* generator version

---

## Exercises

In [None]:
# write a generator function my_seq(a) that returns 
#   a sequence of numbers: a, a+2, a+4, ...

def my_seq(a):
    
    # for now, this fn does nothing
    # edit it (adding yield, etc.) to return the sequence described above
    pass

In [None]:
res_seq = my_seq(100)

for i in range(5):
    print(next(res_seq))

In [None]:
# write a modified function my_seq2(a) that returns
#   a sequence of numbers: a, (a+2), (a+2)-3, (a+2-3)+2, ...

def my_seq2(a):
    
    # for now, this fn does nothing
    # edit it (adding yield, etc.) to return the sequence described above
    pass

In [None]:
res_seq2 = my_seq2(100)

for i in range(5):
    print(next(res_seq2))

In [62]:
from itertools import islice

In [69]:
# write a generator function that returns a fibonacci - infinite
# 1,1,2,3,5,8,13
def fib():
    a = 1
    b = 1
    c = a + b
    while True:
        yield a
        a = b
        b = c
        c = a+b

In [70]:
my_fib = fib()

In [71]:
for i in islice(my_fib, 20):
    print(i)

1
1
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765


In [78]:
def crazy_gen():
    yield 1
    yield 5
    raise ValueError("Value fail")
    yield 42
    yield 9000
    

In [79]:
my_g = crazy_gen()

In [83]:
next(my_g)

StopIteration: 

In [77]:

for i in my_g:
    print(i)

1
5
Value fail
42
9000


In [None]:
# remember that generators (unlike lists) they get used up
