# Iterators

As we already learned, iterators are objects that are "iterable" (duh) which means you can traverse through them. Basically it means that it contains the operators iter() and next(). Python 3 already includes iterators with it's built in objects like lists, tuples and dictionaries but you can also generate an iterator manually or overload the iter() operator to make your custom class iterable.

Here's an example of a specific iterator:

In [7]:
iterable = [1,2,3,4]
iterator = iterable.__iter__()    # or iterator = iter(iterable)
print(type(iterator))

value = iterator.__next__()   # or value = next(iterator)
print(value)
next(iterator)
for i in iterator:
    print(i)

<class 'list_iterator'>
1
3
4


There are several things to notice in this code. First, we use the iterator instead of the list itself to go through each value. Notice how the for loop doesn't reprint 1 and skips 2. This is because the iterator was incremented twice (once by value and once by next()) so the for loop started at the third number.

When you saw for loops you learned about iteration, but you didn't need to declare an iterator. This is because a for loop works on any iterable object.

In [8]:
for x in [1, 2, 3, 4, 5]:
	print(x ** 3, end=' ')

1 8 27 64 125 

We have seen file iteration as well, this is done because file objects have the next() and the iter() operators. Let's look at this into more detail using the remember.txt file from unit 1

In [12]:
f = open('remember.txt')
print(f.__next__())
print(f.__next__())
print(f.__next__())
print(f.__next__())
for i in f:
    print(f.__next__())
print(f.__next__())

Remember me

Though I have to say goodbye

Remember me

Don't let it make you cry

I hold you in my heart

Each night we are apart

Though I have to travel far

Each time you hear a sad guitar

The only way that I can be

Remember me


StopIteration: 

Notice how the last statement raises a StopIteration exception. This let's us know that there are no more objects to iterate.

## iter and next

Another way to call the next item is to use the class next(). You will see this being preferred to the .\__next__() because is shorter, easier to use but they mean the same thing.

Now let's look at dictionaries. In this case the iterator returns keys instead of values.

In [17]:
D = {'a':97, 'b':98, 'c':99}
iterObj = iter(D)
print(next(iterObj))
print(next(iterObj))
print(next(iterObj))

a
b
c


How about range()?, range is a special generator (coming up next lesson), it works as an iterator but generates values on demand. To have a full range we need to create the list with it.

In [20]:
L = range(5)
print(L)
list(L)

range(0, 5)


[0, 1, 2, 3, 4]

## Iterable classes

Ok, now for the fun part. You already know how iteration works, but since Python 3 handles iterators so well with built-ins, all this information is practically useless unless you know how to implement it with a class.

Using our previous knowledge about operator overloadiing, we are going to build an iterable class. First we'll declare the whole class and then we'll go into detail for every operator.

In [23]:
class Countdown:
    def __init__(self,start):
        self.start = start	# self.start never changes; see self.n in __iter__
    
    # __iter__ must return an object on which __next__ can be called; it returns
    # self, which is an object of the Countdown class, which defines __next__.
    # Later we will see a problem with returning self (when the same Countdown
    # object is iterated over in a nested structure), and how to solve that
    # problem. 

    def __iter__(self):
        self.n = self.start	# n attribute is added to the namespace here 
        return self             # (not in __init__) and processed in __next__
    
    def __next__(self):
        if self.n < 0:
            raise StopIteration # can del self.n here, after exhausting iterator
        else:
            answer = self.n	# or, without the temporary, but more confusing
            self.n -= 1		#  self.n -= 1
            return answer       #  return self.n+1

In this class, when \__iter__ is called it (re)sets self.n (the value \__next__ 
will return first) to self.start (which is set in \__init__ and never changes for
a constructed object). The \__iter__ method has a requirement that it must
return an object that defines a \__next__ method. Here it returns self, which as
an object constructed from Countdown, defines \__next__ (right below \__iter__).

When \__next__ is called it checks whether self.n has been decremented past 0,
and if is has, raises StopIteration; otherwise it returns the current value
of self.n, but before doing so, it decrements self.n by 1 (by saving it in a
local variable, decrementing it, and then returning the saved local value).

As a variant in \__next__, we could put del self.n directly before the raise
statement, to remove this attribute from the namespace once the iterator is
exhausted; if we did this, calling \__next__ again would raise a NameError when
accessing self.n; the code above, without del, would just raise StopIteration
again, which is probably a better behavior to implement.

Note that if we substituted Countdown(-1) in the loop above, its body would be
executed 0 times and nothing would be printed before "blastoff".


In [25]:
cd = Countdown(10)
for i in  cd:
    print(str(i)+', ',end='')
print ('blastoff')

for i in  cd:
    print(str(i)+', ',end='')
print ('blastoff')

10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff
10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0, blastoff


Let's look at a more elaborate example. The following class stores and processes
histograms. For simplicity we will assume it processes percentages (ints from
0 to 100) and places them in 10 bins: 0-9, 10-19, 20-29, ... 80-89, 90-100;
note that the last bin really reprsents 11 values, while all the others
represent 10 values. Of course we will focus on the how to accomplish iteration
for objects of this class (iterating over the counts in their bins) but there
are other interesting aspects about this class that we will discuss first (and
we could always generalize or add methods to make this class even more
powerful).

In [27]:
class Percent_Histogram:
    def __init__(self,init_percents=[]):
        self._histogram = 10*[0]    # [0,0,0,...,0,0] length 10, all 0s
        for p in init_percents:
            self.tally(p)
         
    # Called only when 0<=p<=100: 100//10 is 10 but 100 belongs in index 9
    def _tally(self,p):
        self._histogram[p//10 if p<100 else 9] += 1
    
    def clear(self):
        for i in range(10):         # could write: self._histogram = 10*[0]
            self._histogram[i] = 0

    # tally allows any number of arguments, collected into a tuple by *args
    def tally(self,*args):
        if len(args) == 0:
            raise IndexError('Percent_Histogram.tally: no value(s) to tally')
        for p in args:
            if 0 <= p <= 100:
                self._tally(p)
            else:
                raise IndexError('Percent_Histogram.tally: '+str(p)+' outside [0,100]')
                # Another approach would be to store/remember all tally failures

    # allow indexing for bins [0-9]
    # but can mutate these values only through __init__, clear, and tally
    # no __setitem__ defined
    def __getitem__(self,bin_num):
        if 0 <= bin_num <= 9:
            return self._histogram[bin_num]
        else:
            raise IndexError('Percent_Histogram.__getitem__: '+str(bin_num)+' outside [0,9]')

    # standard __iter__: defines a class with __init__/__next__ and returns
    #   an object from that class
    def __iter__(self):

        class PH_iter:
            def __init__(self,histogram):
                self._histogram = histogram          # sharing; sees mutation
                # self._histogram = list(histogram)  # copying; doesn't see it
                self._next = 0

            def __next__(self):
                if self._next == 10:
                    raise StopIteration
                answer = self._histogram[self._next]
                self._next += 1
                return answer

            def __iter__(self):
                return self

        return PH_iter(self._histogram)
            
    # To reconstruct a call the __init__ that reproduces the correct counts in
    #   the histogram, we supply the correct number of values, but all at the
    #   start of the bin: e.g., if bin 5 has 3 items, the repr has three 50s
    def __repr__(self):
        param = []
        for i in range(10):
            param += self[i]*[i*10]
        return 'Percent_Histogram('+str(param)+')'
    
    # a 2-dimensional display; do you understand the use of .format here?
    def __str__(self):
        return '\n'.join(['[{l: >2}-{h: >3}] | {s}'.format(l=10*i,h=10*i+9 if i != 9 else 100,s=self[i]*'*') for i in range(10)])

0) The __init__ method uses the idiom 10*[0] that generates ten 0's. Then calls the function tally for each of the percentages.

1) The _tally function is supposed to be called only by methods defined in this
class. It does the actual work, putting a number from the range [0,100] into the
correct bin, treating 100 specially (it belongs in bin 9, but p//10 would put
it in bin 10, which doesn't exist). The last bin, 90-100 contains 11 values,
while all the other bins (e.g., 30-39) contain 10. To work correctly, this
method assumes p is legal: 0 <= p <= 100

2) The clear method sets each bin in the list to 0; we could have allocated a
new list as shown in the comment, but generally that takes more time and
occupies more space. Better to zero-out existing list.

3) By using *args, the tally method can have any number (0 or more) of
positional arguments. All arguments are collected into a tuple that is iterated 
over to process the value individually. If there is not at least one value, or
any value is out of range, this method raises an exception. 

4) The \__getitem__ method allows us to index all the bins, 0-9 inclusive of a
Histogrm object. Note that we can set values into these bins (i.e., mutate the
list), only via \__init__ and tally. So we call this information read-only: we
can read it but not write/change it (this class defines no \__setitem__). Of
course, Python actually allows us to write o._histogram but the leading
underscore indicates only methods in the class should refer to the _histogram
attribute

5) We use the now standard way to implement \__iter__, by defining a local class
that defines \__next__ and returning an object from that class. 

6) The \__repr__ method doesn't know what numbers went into the bins! But we can
use the lowest number in each bin, repeated by the count in that bin, to specify
a list needed to construct an equivalent object (with the equivalent number of
values in each bin) with the construtor.

7) The \__str__ method returns a two-dimensional plot of the histogram. 


In [30]:
quiz1 = Percent_Histogram([50, 55, 70, 75, 85, 100])
quiz1.tally(20,30,95)
print(repr(quiz1))
for count in quiz1:
    print(count,end=' ')
print('\n',quiz1,sep='')

Percent_Histogram([20, 30, 50, 50, 70, 70, 80, 90, 90])
0 0 1 1 0 2 0 2 1 2 
[ 0-  9] | 
[10- 19] | 
[20- 29] | *
[30- 39] | *
[40- 49] | 
[50- 59] | **
[60- 69] | 
[70- 79] | **
[80- 89] | *
[90-100] | **


In [29]:
for count in quiz1:
    print(count,end=' ')
    quiz1.tally(100)

0 0 1 1 0 2 0 2 1 11 