# Lecture 13

### Monday, October 23rd 2017

## Last time:
* Data structures motivation
* Abstract data types
* Sequences
* Linked lists

## This time:
* Iterators and Iterables
* Trees, B-trees, and BSTs

# From pointers to iterators

One can simply follow the `next` pointers to the next **position** in a linked list. 

This suggests an abstraction of the **position** to an **iterator**.

Such an abstraction allows us to treat arrays and linked lists with an identical interface.

The salient points of this abstraction are:
- The notion of a `next` abstracting away the actual gymnastics of where to go next in a storage system.
- The notion of a `first` to a `last` that `next` takes us on a journey from and to respectively.

We already implemented the sequence protocol.

Now we suggest an additional abstraction that is more fundamental than the notion of a sequence: the **iterable**.

# Iterators and Iterables in `Python`

Just as a sequence is something implementing `__getitem__` and `__len__`, an **iterable** is something implementing `__iter__`. 

`__len__` is not needed and indeed may not make sense.
```python
len(open('fname.txt')) # File iterator has no length
```

Example `14-1` in `Fluent Python` Sentence sequence and shows how it can be iterated upon.

In [2]:
import reprlib
class Sentence:
    def __init__(self, text): 
        self.text = text
        self.words = text.split()
        
    def __getitem__(self, index):
        return self.words[index] 
    
    def __len__(self):
        #completes sequence protocol, but not needed for iterable
        return len(self.words) 
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [5]:
# Sequence'
s = Sentence("Dogs will save the world.")
print(len(s), "   ", s[3], "   ",  s)

5     the     Sentence('Dogs will save the world.')


In [6]:
min(s), max(s)

('Dogs', 'world.')

In [8]:
list(s)

['Dogs', 'will', 'save', 'the', 'world.']

To iterate over an object `x`, `Python` automatically calls `iter(x)` (i.e. `x.__iter__`). 

An **iterable** is something which, when `iter` is called on it, returns an **iterator**.

(1) If `__iter__` is defined, it is called to implement an iterator.

(2) If not, `__getitem__` is called starting from index `0`.

(3) If no `__iter__` and no `__getitem__` then raise a `TypeError`.

Any `Python` sequence is iterable because sequences implement `__getitem__`. The standard sequences also implement `__iter__`; for future proofing you should too because  (2) might be deprecated in a future version of `Python`.

We know that `for` operates on iterables:

In [9]:
for i in s:
    print(i)

Dogs
will
save
the
world.


What's actually going on here?

In [11]:
it = iter(s) # Build an iterator from an iterable
while True:
    try:
        nextval = next(it) # Get the next item in the iterator
        print(nextval)
    except StopIteration:
        del it # Iterator is exhausted.  Release reference and discard.
        break

Dogs
will
save
the
world.


We can completely abstract away a sequence in favor an iterable (i.e. we dont need to support indexing anymore)

Example `14-4` in `Fluent Python`:

In [12]:
class SentenceIterator: # has __next__ and __iter__
    def __init__(self, words): 
        self.words = words 
        self.index = 0
        
    def __next__(self): 
        try:
            word = self.words[self.index] 
        except IndexError:
            raise StopIteration() 
        self.index += 1
        return word 

    def __iter__(self):
        return self
    
class Sentence: # An iterable b/c it has __iter__
    def __init__(self, text): 
        self.text = text
        self.words = text.split()
        
    def __iter__(self):
        return SentenceIterator(self.words) # Returns an instance of the iterator
    
    def __repr__(self):
        return 'Sentence(%s)' % reprlib.repr(self.text)

In [13]:
s2 = Sentence("What is data science?")

In [18]:
for i in s2:
    print(i)

What
is
data
science?


In [20]:
s2it=iter(s2) # Make the iterable an iterator
print(next(s2it)) # Get the next entry
s2it2=iter(s2) # Reset the iterator
next(s2it),next(s2it2) # Get the next entry of s2it and s2it2

What


('is', 'What')

While we could have implemented `__next__` in Sentence itself, making it an iterator, we will run into the problem of "exhausting an iterator". 

The iterator above keeps state in `self.index` and we must be able to start anew by creating a new instance if we want to re-iterate. Thus the `__iter__` in the iterable, simply returns the `SentenceIterator`.

From `Fluent Python` ("Sentence Take #2:  A Classic Iterator"):
> A common cause of errors in building iterables and iterators is to confuse the two. To be clear: iterables have an `__iter__` method that instantiates a new iterator every time. Iterators implement a `__next__` method that returns individual items, and an `__iter__` method that returns self.

`min()` and `max()` also work even though we no longer satisfy the sequence protocol.

`min` and `max` are pairwise comparisons and can be handled via iteration.

The take home message is that in programming with these iterators we don't need either the length or indexing to work to implement many algorithms: we have abstracted these away.

In [21]:
min(s2), max(s2)

('What', 'science?')

# Trees

A tree is:

- a hierarchical data structure that has a bunch of items,
- each of which may have a value
- some of which may point to other such items, and some that dont (leaf nodes)
- each item is pointed to by exactly one other item, with the sole exception of the root.

Trees arise everywhere:

- in parsing of code
- evolutionary trees in biology
- language origin trees
- unix file system
- html tags on this page

Just like with lists, one can consider looking at a tree in two ways: a collection of nodes, or a tree with a root and multiple sub-trees.

Once again, one can represent trees using the recursive data structures we used to represent linked lists (from cs61a):

![](http://wla.berkeley.edu/~cs61a/fa11/lectures/img/tree.png)



You could also use a tree in which the nodes all themselves have data. This is often used to represent a binary tree.

In [24]:
class Tree: #from cs61a
    
    def __init__(self, data, left=None, right=None):
        self.entry = data
        self.left = left
        self.right = right
    def __repr__(self):
        args = repr(self.entry)
        if self.left or self.right:
            args += ', {0}, {1}'.format(repr(self.left), repr(self.right))
        return 'Tree({0})'.format(args)

Tree(1,Tree(2), Tree(3, Tree(4)))

Tree(1, Tree(2), Tree(3, Tree(4), None))

Once we do iteration in more detail, we'll talk about traversal mechanisms!