# Iterators vs. Lists

In [1]:
# Create a list and an iterator for illustration purposes

# Don't imitate this code -- I don't know of any reason 
# other than illustration to cast a range to `list` or to `iterator`
my_list = list(range(5))
my_iterator = iter(range(5))

Python 3 changed a lot of built-in Python functions and method (e.g. `zip`, `map`, `dict.keys`, `dict.items`) so that they return iterators rather than lists. There are good performance reasons to use iterators, but they are a little tricky to work with at first. Let's see how they are different from lists.

## Iteration

We can iterate through a list as many times as we want. The list abides.

In [2]:
print('Pass 1')
for item in my_list:
    print(item)

print('\nPass 2')
for item in my_list:
    print(item)

print('\nPass 3')
for item in my_list:
    print(item)

Pass 1
0
1
2
3
4

Pass 2
0
1
2
3
4

Pass 3
0
1
2
3
4


We can only iterate through an iterator once -- it gets "consumed."

In [3]:
print('Pass 1')
for item in my_iterator:
    print(item)

print('\nPass 2')
for item in my_iterator:
    print(item)

print('\nPass 3')
for item in my_iterator:
    print(item)

Pass 1
0
1
2
3
4

Pass 2

Pass 3


In [4]:
# Recreate the iterator
my_iterator = iter(range(5))

## Slicing

We can slice into a list.

In [5]:
my_list[3]

3

We cannot slice into an iterator, at least in the same way.

In [6]:
my_iterator[3]

TypeError: 'range_iterator' object is not subscriptable

## Why Use Iterators?

The good thing about an iterator is that it generates each value on the fly, just when it is needed. As a result, it uses essentially no memory.

### Example

Memory use before:

![](../assets/images/memory_before.png)

In [7]:
# Don't run this -- it will probably crash your Jupyter kernel
# big_list = list(range(500000000))

Memory use after creating that massive list:

![](../assets/images/memory_after_list.png)

In [1]:
# This is safe to run
big_list = None
big_iterator = iter(range(500000000))

Memory use after creating an iterator over the same elements:

![](../assets/images/memory_after_iter.png)

It's not that unusual for a data scientist to work with data sets that have millions of rows, so knowing how to use iterators effectively is a good skill to have.

Iterators can also be used to consume ongoing streams of data, which you can't load in as a list up front because it isn't all available up front.

## Summary

Advantages of iterators:

- They use essentially no memory.
- They can be used to process ongoing streams of data.

Disdvantages of iterators:

- They are "consumed" as you iterate over them.
- They cannot be sliced, at least in the usual way.

# How to Work with Iterators

In this lesson we will focus on "survival skills" for dealing with the iterators that Python gives you, rather than going out of your way to use iterators for the sake of performance.

## Option 1: Just iterate over the iterator

Very often you only need to iterate over a collection once. In that case, you don't need to do anything special; an iterator is perfect.

In [9]:
# Create a map object, which is a type of iterator
my_map = map(int, [5., 6.7, 3.2, 4.3, 8.9])
print(my_map)

<map object at 0x10f7a8cf8>


In [10]:
for item in my_map:
    print(item)

5
6
3
4
8


## Option 2: Cast to list

If you want the convenience of a list and can afford the memory burden, just cast to list.

In [11]:
# Recreate a map object, which is a type of iterator
my_map = map(int, [5., 6.7, 3.2, 4.3, 8.9])
print(my_map)

<map object at 0x10f7d0e48>


In [12]:
# Cast to list
list(my_map)

[5, 6, 3, 4, 8]

## For slicing: Use `islice`

You can use islice from the itertools package to slice iterators -- just be aware that you are consuming them.

In [13]:
from itertools import islice

my_map = map(int, [5., 6.7, 3.2, 4.3, 8.9])
print(list(islice(my_map, 0, 2)))
print(list(islice(my_map, 0, 2)))

[5, 6]
[3, 4]


## Summary

Working with iterators:

- If you can do everything you need to do in one pass through a collection, then an iterator is perfect.
- If you want the convenience of a list and can afford the memory burden, cast to list.
- Use `islice` for slicing, but be aware that you are consuming the iterator as you do so.

# Special Case: `range` objects

`range` objects provide the best of both worlds: you can slice them and iterate over them multiple times as if they were lists, but their elements are generated on the fly rather than being loaded into memory up front.

In [14]:
my_range = range(5)

print('Pass 1')
for item in my_range:
    print(item)

print('\nPass 2')
for item in my_range:
    print(item)

print('\nPass 3')
for item in my_range:
    print(item)

Pass 1
0
1
2
3
4

Pass 2
0
1
2
3
4

Pass 3
0
1
2
3
4


In [15]:
range(5)[4]

4

This hybrid behavior is possible because the elements of a `range` object follow a predictable pattern. It is not possible for arbitrary data streams.