### The iterator protocol
- A protocol is simply a fancy way of saying that our class is going to implement certain functionality that Python can count on.
- To let Python know our class can be iterated over using **\__next__** we implement the iterator protocol.
- The iterator protocol is quite simple - the class needs to implement two methods:
    * **\__iter\__** - this method should just return the object (class instance) itself.
    * **\__next\__** - this method is responsible for handing back the next element from the collection and raising the **StopIteration** exception when all elements have been handed out. 

### iterable
- An iterable is a Python object that implements the iterable protocol.
- The iterable protocol requires that the object implement a single method - **\__iter__**
- **\__iter__** returns a new instance of the iterator object - used to iterate over the iterable

In [2]:
class Squares:

    def __init__(self, length):
        self.length = length
        self.i = 0
    
    def __next__(self):
        print('__next__ called')
        if self.i >= self.length:
            raise StopIteration
        else:
            result = self.i**2
            self.i +=1
            return result
    
    def __iter__(self):
        print('__iter__ called')
        return self

In [3]:
sq = Squares(3)
for item in sq:
    print(item)

__iter__ called
__next__ called
0
__next__ called
1
__next__ called
4
__next__ called


**The drawback to iterators is that they get exhausted, which means that they become useless for iterating again.**

**Solution --> Separate the collection from the iterator**
- The collection is iterable
- The iterator is responsible for iterating over the collection
<br><br>

### iterable
- An iterable is a Python object that implements the iterable protocol.
- The iterable protocol requires that the object implement a single method - **\__iter__**
- **\__iter__** returns a new instance of the iterator object - used to iterate over the iterable
<br><br>

### Iterator
- An object that implements **\__iter__** (returns itself - an iterator) and the **\__next__** method


In [9]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
    
    def __len__(self):
        return len(self._cities)


class CityIterator:
    def __init__(self, city_obj):
        self._city_obj = city_obj
        self._index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._index >= len(self._city_obj): # Use self._city_obj._cities if len method not impemented in Cities 
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item


In [8]:
cities = Cities()
city_list = [city for city in CityIterator(cities)]
print(city_list)

['Paris', 'Berlin', 'Rome', 'Madrid', 'London']


**This works fine but we would like to be able to iterate over the cities directly**

We do this by adding a **\__iter__** method to the Cities class that returns the CityIterator class. 

In [13]:
class CityIterator:
    def __init__(self, city_obj):
        print('CityIterator new object created')
        self._city_obj = city_obj
        self._index = 0
    
    def __iter__(self):
        print('CityIterator __iter__ called')
        return self
    
    def __next__(self):
        print('CityIterator __next__ called')
        if self._index >= len(self._city_obj):
            raise StopIteration
        else:
            item = self._city_obj._cities[self._index]
            self._index += 1
            return item


class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
    
    def __len__(self):
        return len(self._cities)
    
    # implementing the iterable protocol
    def __iter__(self):
        print('Cities __iter__ called')
        return CityIterator(self)

In [14]:
cities = Cities()
for city in cities:
    print(city)

Cities __iter__ called
CityIterator new object created
CityIterator __next__ called
Paris
CityIterator __next__ called
Berlin
CityIterator __next__ called
Rome
CityIterator __next__ called
Madrid
CityIterator __next__ called
London
CityIterator __next__ called


In [11]:
cities = Cities()
city_list = [city for city in cities]
print(city_list)

Cities __iter__ called
CityIterator __next__ called
CityIterator __next__ called
CityIterator __next__ called
CityIterator __next__ called
CityIterator __next__ called
CityIterator __next__ called
['Paris', 'Berlin', 'Rome', 'Madrid', 'London']


**We can also put CityIterator inside the Cities class**

In [15]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
    
    def __len__(self):
        return len(self._cities)
    
    def __iter__(self):
        print('Cities __iter__ called')
        return self.CityIterator(self)
    
    class CityIterator:
        def __init__(self, city_obj):
            print('CityIterator new object created')
            self._city_obj = city_obj
            self._index = 0
        
        def __iter__(self):
            print('CityIterator __iter__ called')
            return self
        
        def __next__(self):
            print('CityIterator __next__ called')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

In [16]:
cities = Cities()
for city in cities:
    print(city)

Cities __iter__ called
CityIterator new object created
CityIterator __next__ called
Paris
CityIterator __next__ called
Berlin
CityIterator __next__ called
Rome
CityIterator __next__ called
Madrid
CityIterator __next__ called
London
CityIterator __next__ called


**Implementing the sequence protocol**

In [None]:
class Cities:
    def __init__(self):
        self._cities = ['Paris', 'Berlin', 'Rome', 'Madrid', 'London']
    
    def __len__(self):
        return len(self._cities)
    
    # implementing the iterable protocol
    def __iter__(self):
        print('Cities __iter__ called')
        return self.CityIterator(self)
    
    # implementing the sequence protocol
    def __getitem__(self, s):
        print('Cities __getitem__ called')
        return self._cities[s]

    
    class CityIterator:
        def __init__(self, city_obj):
            print('CityIterator new object created')
            self._city_obj = city_obj
            self._index = 0
        
        def __iter__(self):
            print('CityIterator __iter__ called')
            return self
        
        def __next__(self):
            print('CityIterator __next__ called')
            if self._index >= len(self._city_obj):
                raise StopIteration
            else:
                item = self._city_obj._cities[self._index]
                self._index += 1
                return item

Now we have two methods of looping over the items. We can use both the **Iterator protocol** and the **Sequence protocol**. Python first looks to see if we have the iterator protocol implemented, and if we do, it uses that. But if we dont have it implemented, it uses the sequence protocol. 

In [19]:
# Python preferres the __iter__ method over the __getitem__ method but will use __getitem__ if __iter__ not available
cities = Cities()
for city in cities:
    print(city)

Cities __iter__ called
CityIterator new object created
CityIterator __next__ called
Paris
CityIterator __next__ called
Berlin
CityIterator __next__ called
Rome
CityIterator __next__ called
Madrid
CityIterator __next__ called
London
CityIterator __next__ called


## Consuming iterators manually

In [6]:
import pandas as pd
cars = pd.read_csv('../../../../01-datasets/python-deep-dive/cars.csv', sep=';')
cars.head()

Unnamed: 0,Car,MPG,Cylinders,Displacement,Horsepower,Weight,Acceleration,Model,Origin
0,STRING,DOUBLE,INT,DOUBLE,DOUBLE,DOUBLE,DOUBLE,INT,CAT
1,Chevrolet Chevelle Malibu,18.0,8,307.0,130.0,3504.,12.0,70,US
2,Buick Skylark 320,15.0,8,350.0,165.0,3693.,11.5,70,US
3,Plymouth Satellite,18.0,8,318.0,150.0,3436.,11.0,70,US
4,AMC Rebel SST,16.0,8,304.0,150.0,3433.,12.0,70,US


In [26]:
from collections import namedtuple

# Goal: Store each line in the text file as a namedtuple as the correct data type

def cast(data_type, value):
    if data_type == 'DOUBLE':
        return float(value)
    elif data_type == 'INT':
        return int(value)
    else:
        return str(value)


def cast_row(data_types, data_row):
    return [cast(data_type, value)
            for data_type, value
            in zip(data_types, data_row)]


cars = []
file = '../../../datasets/cars.csv'
with open(file) as f:
    # create an iterator
    file_iter = iter(f)
    # we iterate through the file manually by using the next function
    headers = next(file_iter).strip('\n').split(';')  
    data_types = next(file_iter).strip('\n').split(';')
    Car = namedtuple('Car', headers)
    
    for line in file_iter:
        # remove the line endings and the separators
        data = line.strip('\n').split(';')
        data = cast_row(data_types, data)
        car = Car(*data)
        cars.append(car)


# The above can be written as a comprehension
with open(file) as f:
    file_iter = iter(f)
    # we iterate through the file manually by using the next function
    headers = next(file_iter).strip('\n').split(';')  
    data_types = next(file_iter).strip('\n').split(';')
    Car = namedtuple('Car', headers)

    # cast_row returns the value as the correct data type (int, float, str)
    cars_data = [cast_row(data_types, line.strip('\n').split(';')) for line in file_iter]
    # populates the namedtuple with the correct value in the correct format
    cars = [Car(*car) for car in cars_data]

In [27]:
cars[:3]

[Car(Car='Chevrolet Chevelle Malibu', MPG=18.0, Cylinders=8, Displacement=307.0, Horsepower=130.0, Weight=3504.0, Acceleration=12.0, Model=70, Origin='US'),
 Car(Car='Buick Skylark 320', MPG=15.0, Cylinders=8, Displacement=350.0, Horsepower=165.0, Weight=3693.0, Acceleration=11.5, Model=70, Origin='US'),
 Car(Car='Plymouth Satellite', MPG=18.0, Cylinders=8, Displacement=318.0, Horsepower=150.0, Weight=3436.0, Acceleration=11.0, Model=70, Origin='US')]

## Cyclic iterators

In [34]:
# Infinite cyclic iterator for sequence types only
class CyclicIterator:
    def __init__(self, lst):
        self.lst = lst
        self.i = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        result = self.lst[self.i % len(self.lst)] # 0%4=0, 1%4=1, 2%4=2, 3%4=3, 4%4=0, 5%4=1 and so on
        self.i += 1
        return result
    
     # alternative, but less elegant
    def __next__(self):
        result = self.lst[self.i]
        self.i = 0 if self.i == 3 else (self.i + 1)
        return result

In [36]:
numbers = range(1, 11)
iter_cycle = CyclicIterator('NSWE')
zipped = zip(list(numbers), iter_cycle)
[(str(number) + direction) for number, direction in zipped]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

In [37]:
# Alternatively
n = 10
iter_cycl = CyclicIterator('NSWE')
[f'{i}{next(iter_cycl)}' for i in range(1, n+1)]

# we can do the same using itertools
import itertools
n = 10
iter_cycle = itertools.cycle('NSWE')
[f'{i}{next(iter_cycle)}' for i in range(1, n+1)]

['1N', '2S', '3W', '4E', '5N', '6S', '7W', '8E', '9N', '10S']

In [None]:
# A cyclic iterator for any iterable, not just a sequence type
class CyclicIterator:
    def __init__(self, iterable):
        self.iterable = iterable
        self.iterator = iter(self.iterable)
    
    def __iter__(self):
        return self
    
    def __next__(self):
        try:
            item =  next(self.iterator)
        except StopIteration:
            self.iterator = iter(self.iterable)
            item = next(self.iterator)
        return item

## Lazy iterators

In [3]:
import math

class Circle:
    def __init__(self, r):
        self.radius = r
        self._area = None
    
    @property
    def radius(self):
        return self._radius
    
    @radius.setter
    def radius(self, r):
        self._radius = r
        # set the area to None to force area to be
        # calculated next time the area is requested
        self._area = None
    
    @property
    def area(self): 
        # if area is None it has either never been requested,
        # or the radius has changed since we last requested it
        if self._area is None:
            self._area = math.pi * (self.radius ** 2)
        return self._area

In [41]:
import math
# infinite iterator of factorials
class Factorials:
    
    def __iter__(self):
        return self.FactIter()
    
    class FactIter:
        def __init__(self):
            self.i = 0
        
        def __iter__(self):
            return self
        
        def __next__(self):
            result = math.factorial(self.i)
            self.i += 1
            return result

In [44]:
facts = Factorials()
fact_iter = iter(facts)
print(next(fact_iter))
print(next(fact_iter))
print(next(fact_iter))
print(next(fact_iter))

1
1
2
6


## Built-in iterables and iterators
- range() --> iterable
- zip() --> iterator
- enumerate() --> iterator
- open() --> iterator
- dictionary keys() --> iterable
- dictionary values() --> iterable
- dictionary items() --> iterable
<br><br>

- We can check if an object is an iterable or an iterator by checking if it has **\_\_iter__** or **\_\_next__** methods. 
- If it only has **\_\_iter__** then it is an iterable, and if it has both it is an iterator.
- We can consume an iterable more than once, while we can only consume an iterator once. 

Another test we can do to check if an object is an iterator is to ask python if **iter(object) is object**. If the statement evaluates to True, then we have an iterator.

## Sorting iterables

In [20]:
import random

class RandomInts:
    def __init__(self, length, *, seed=0, lower=0, upper=10):
        self.length = length
        self.seed = seed
        self.lower = lower
        self.upper = upper
    
    def __len__(self):
        return self.length
    
    def __iter__(self):
        return self.RandomIterator(self.length,
                                   seed=self.seed,
                                   lower=self.lower, 
                                   upper=self.upper)
    
    class RandomIterator:
        def __init__(self, length, *, seed, lower, upper):
            self.length = length
            self.lower = lower
            self.upper = upper
            self.i = 0
            random.seed(seed)
        
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.i >= self.length:
                raise StopIteration
            else:
                result = random.randint(self.lower, self.upper)
                self.i += 1
                return result

randoms = RandomInts(10)
print('Random list: ', list(randoms))
print('Sorted random list: ', sorted(randoms))

Random list:  [6, 6, 0, 4, 8, 7, 6, 4, 7, 5]
Sorted random list:  [0, 4, 4, 5, 6, 6, 6, 7, 7, 8]


## Iter-function
- **\_\_getitem__** supports **Sequences**
- **\_\_iter__** supports **iterables**
- **\_\_next__** supports **iterators**

<br>

When the iter-function is called, Python first looks for an **\_\_iter__** method. If it finds one, it uses it. If it does not find one, it looks for a **\__getitem__** method. If it finds one it creates an iterator object and returns that. If it does not find one it raises a **TypeError** exception.


In [50]:
# A very simple example of the functionality of the iter function
class SequenceIterator:
    def __init__(self, sequence):
        self._sequence = sequence
        self._i = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self._i >= len(self._sequence):
            raise StopIteration
        else:
            result = self._sequence[self._i]
            self._i += 1
            return result
        
my_list = [1, 2, 3, 4]
seq_iterator = SequenceIterator(my_list)
for i in seq_iterator:
    print(i)

1
2
3
4


## Iterating callables

In [53]:
def counter():
    i = 0
    
    def inc():
        nonlocal i
        i += 1
        return i
    return inc


class CallableIterator:
    def __init__(self, callable_, sentinel):
        self.callable_ = callable_
        self.sentinel = sentinel
        self.is_consumed = False
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.is_consumed:
            raise StopIteration
        else:
            result = self.callable_()
            if result == self.sentinel:
                self.is_consumed = True
                raise StopIteration
            else:
                return result
            
cnt = counter()
cnt_iter = CallableIterator(cnt, 5)
for c in cnt_iter:
    print(c)

1
2
3
4


In [58]:
# we can use the builtin iter function to do the same thing
cnt = counter()
cnt_iter = iter(cnt, 3)
print(next(cnt_iter))
print(next(cnt_iter))

1
2


In [62]:
# an example with a lambda func that returns random values
import random
min_val, max_val, sentinel = 0, 10, 8

random_iter = iter(lambda: random.randint(min_val, max_val), sentinel)
random.seed(0)
for num in random_iter:
    print(num)

6
6
0
4


## Reverse iteration

In [65]:
from collections import namedtuple

_SUITS = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
_RANKS = tuple(range(2, 11)) + tuple('JQKA')

Card = namedtuple('Card', 'rank suit')

class CardDeck:
    
    def __init__(self):
        self.length = len(_SUITS) * len(_RANKS)
    
    def __len__(self):
        return self.length
    
    def __iter__(self):
        return self.CardDeckIterator(self.length)
    
    def __reversed__(self):
        # setting up this function enables us to call reversed()
        # on the instance. But it would not work unless we make it
        # possible inside the iterator.
        return self.CardDeckIterator(self.length, reverse=True)
    
    class CardDeckIterator:
        
        def __init__(self, length, reverse=False):
            self.length = length
            self.i = 0
            self.reverse = reverse
        
        def __iter__(self):
            return self
        
        def __next__(self):
            if self.i >= self.length:
                raise StopIteration
            else:
                if self.reverse:
                    index = self.length - 1 - self.i
                else:
                    index = self.i                
                suit = _SUITS[index // len(_RANKS)]
                rank = _RANKS[index % len(_RANKS)]
                self.i += 1
                return Card(rank, suit)

            
deck = reversed(CardDeck())
cards = [card for card in deck]
cards[:5]

[Card(rank='A', suit='Clubs'),
 Card(rank='K', suit='Clubs'),
 Card(rank='Q', suit='Clubs'),
 Card(rank='J', suit='Clubs'),
 Card(rank=10, suit='Clubs')]

## An example with a sequence
The reverse-function works on sequence types by default as long as we have the **\__len__** method implemented

In [66]:
class Squares:
    def __init__(self, length):
        self.squares = [i **2 for i in range(length)]
    
    # __len__ enables us to use the reversed function
    def __len__(self):
        return len(self.squares)
    
    def __getitem__(self, s):
        return self.squares[s]
    
for num in reversed(Squares(5)):
    print(num)

16
9
4
1
0
