## Iterators

What is an Iterator? It is any class/object that implements the right interface to allow Python to loop over its contents. Some examples of Iterators are `range()` , Lists, Tuples, Dictionaries, Files


In [5]:
for i in range(10):
    print(i)
print()

for s in ['apple', 'banana', 'cherry', 'date']:
    print(s)
print()

for k in { 'First':1, 'Second':2, 'Third':3}:
    print(k)

0
1
2
3
4
5
6
7
8
9

apple
banana
cherry
date

First
Second
Third


In Python an interface isn't explicitly declared, it is just a set of functions/methods that is needed to implement a behavior. So in Python to make an Iterator you don't declare a class with a keyword like Java's "interface" and you don't have to inherit from a master "Interface" base class.

You just define the methods:  
`__iter__()` Return the iterator object itself  
`__next__()` Return the next value from the iterator  
raise StopIteration exception if there are no more values to return


In [13]:
class MyRange:
    def __init__(self, start, end):
        self.value = start
        self.end = end

    def __iter__(self):
        return self

    def __next__(self):
        if self.value < self.end:
            result = self.value
            self.value += 1
            return result
        else:
            raise StopIteration
            

In [14]:
for v in MyRange(1,10):
    print(v)
    

1
2
3
4
5
6
7
8
9


You can use this pattern for your own custom arbitrarily complex data structures to make it easy to loop over them. 
Lots of functions take Iterators, eg: set, sum sorted, enumerate etc.

In [35]:
print( set( [1,2,3,4,5]))
print( set( {'First':1, 'Second':2, 'Third':3}))
print()

print( sum( [1,2,3,4,5]))
print()

print( sorted( [5,4,3,2,1]))
print( sorted( {'First':1, 'Second':2, 'Third':3}))
print()

print( enumerate( ['apple', 'banana', 'cherry', 'date']))

{1, 2, 3, 4, 5}
{'Third', 'Second', 'First'}

15

[1, 2, 3, 4, 5]
['First', 'Second', 'Third']

<enumerate object at 0x1220cf470>


Notice that enumerate doesn't return a value, it returns another Iterator function that can be used to iterate over the result. This is "lazy evaluation" which is more efficient. It doesn't create a large list of results ahead of time, only returning each result as you need it saving time (if you don't need all of the values) and space.

In [1]:
for i,v in enumerate( ['apple', 'banana', 'cherry', 'date']):
    print(i,v)

0 apple
1 banana
2 cherry
3 date


In [38]:
print( enumerate( MyRange(1,10)))

<enumerate object at 0x1220df4c0>


In [39]:
print( list(enumerate( MyRange(1,10))))

[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]


Once an iterator reaches its end, it is used up. You have to create a new iterator to start over:

In [41]:
mr = MyRange(1,10)
print( list( mr))
print( list( mr))
print()
m2 = MyRange(1,10)
print( list(m2))

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[]

[1, 2, 3, 4, 5, 6, 7, 8, 9]


You can't ask an Iterator for its `len()` or jump to get a value at any index `mr[3]` because you haven't provided methods to do so, only `__iter__()` and `__next__()`. There are different interfaces for things you can take the length of and things you can index:

In [43]:
len( mr)

TypeError: object of type 'MyRange' has no len()

In [44]:
mr[3]

TypeError: 'MyRange' object is not subscriptable

In [9]:
# Define an Iterator that also can be called be len() and indexed:
class MyRange2:
    def __init__(self, start, end):
        self.start = start
        self.end = end
        self.value = start

    def __iter__(self):
        return self

    def __next__(self):
        if self.value < self.end:
            result = self.value
            self.value += 1
            return result
        else:
            raise StopIteration

    def __len__(self):
        return self.end - self.start

    def __getitem__(self, i):
        return self.start + i - 1

In [54]:
len( MyRange2(1,10))

9

In [56]:
MyRange2(1,10)[3]

3

An Iterator can be used to efficiently return values using "lazy evaluation" meaning they do not calculate results until `__next__()` calls for them. An Iterator could return an infinite list of results, which could never fit in a list.

In [3]:
class MyInfiniteRange:
    def __init__(self, start):
        self.value = start

    def __iter__(self):
        return self

    def __next__(self):
        result = self.value
        self.value += 1
        return result

In [5]:
my_infinite_range = MyInfiniteRange(99)
for i,v in enumerate(my_infinite_range):
    if i==10:
        break
    print( i,v)

0 99
1 100
2 101
3 102
4 103
5 104
6 105
7 106
8 107
9 108


In [57]:
enumerate(MyRange2(1,10))

<enumerate at 0x122177d80>

In [59]:
list(enumerate(MyRange2(1,10)))

[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 7), (7, 8), (8, 9)]

In [12]:
enumerate(MyRange2(1,10)).__iter__()

<enumerate at 0x1068679c0>

In [62]:
enumerate(MyRange2(1,10)).__next__()

(0, 1)

In [64]:
dir( enumerate(MyRange2(1,10)))

['__class__',
 '__class_getitem__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

There is a simpler syntax for defining Iterators called Generators. Instead of defining `__iter()__` and `__next()__` we can use the yield keyword to return a result and resume where we left off on the next call:

In [68]:
def countdown(n):
    while n > 0:
        yield n
        n -= 1

In [69]:
countdown(10)

<generator object countdown at 0x12207dcc0>

In [70]:
list(countdown(10))

[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

In [71]:
for i in countdown(10):
    print(i)

10
9
8
7
6
5
4
3
2
1


In [72]:
dir(countdown(10))

['__class__',
 '__del__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__name__',
 '__ne__',
 '__new__',
 '__next__',
 '__qualname__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'close',
 'gi_code',
 'gi_frame',
 'gi_running',
 'gi_suspended',
 'gi_yieldfrom',
 'send',
 'throw']

In [None]:
def read_large_file_inefficiently(filename):
    with open(filename) as f:
        lines = f.readlines()
    return f

def read_large_file_efficiently(filename):
    with open(filename) as f:
        for line in f:
            yield line
            

In [75]:
# list comprehension
[x**2 for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [76]:
# dict comprehension
{ x: x**2 for x in range(10) }

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}

In [77]:
# New: generator comprehension
(x**2 for x in range(10))

<generator object <genexpr> at 0x12200dff0>

In [78]:
list( (x**2 for x in range(10)))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Advanced: look up these:
close() inside a generator
yield from generator2()
coroutines (async programming)
