# [`itertools`](https://pymotw.com/3/itertools/) - Iterator Functions

**Purpose**: itertools module includes a set of functions for working with sequence data sets

- iterator-based code offers better memory consumption characteristics than code that uses lists
    - data is not produced from the iterator until it is needed
    - as a result, data does not need to be stored in memory at the same time

### _Merging and Splitting Iterators_

`chain()` functions takes several iterators as arguments and returns a single iterator 

In [3]:
from itertools import *

In [4]:
for i in chain([1, 2, 3], ['a', 'b', 'c']):
    print(i, end=' ')
print()

1 2 3 a b c 


`chain()` makes it easy to process several sequences without constructing one large list.

If iterables to be combined are not all known in advance, `chain.from_iterable()` can be used to construct the chain instead.

In [5]:
def make_iterables_to_chain():
    yield [1, 2, 3]
    yield ['a', 'b', 'c']
    
for i in chain.from_iterable(make_iterables_to_chain()):
    print(i, end=' ')
print()

1 2 3 a b c 


The built-in function `zip()` returns an iterator that combines the elements of several iterators into tuples.

In [6]:
for i in zip([1, 2, 3], ['a', 'b', 'c']):
    print(i)

(1, 'a')
(2, 'b')
(3, 'c')


As with the other functions in this module, the return value is an iterable object that produces values one at a time.

`zip()` stops when the first input iterator is exhausted. To process all of the inputs, even if the iterators produce different numbers of values, use `zip_longest()`.

In [8]:
from itertools import *

r1 = range(3)
r2 = range(2)

print('Zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all fo the values:')
print(list(zip_longest(r1, r2)))

Zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all fo the values:
[(0, 0), (1, 1), (2, None)]


By default, `zip_longest` subs `None` for any missing values. Use the `fillvalue` argument to use a different sub value. 

In [9]:
from itertools import *

r1 = range(3)
r2 = range(2)

print('Zip stops early:')
print(list(zip(r1, r2)))

r1 = range(3)
r2 = range(2)

print('\nzip_longest processes all fo the values:')
print(list(zip_longest(r1, r2, fillvalue='-')))

Zip stops early:
[(0, 0), (1, 1)]

zip_longest processes all fo the values:
[(0, 0), (1, 1), (2, '-')]


The `islice()` function returns an iterator which returns selected items from input iterator, by index.

In [11]:
from itertools import *

print('Stop at 5:')
for i in islice(range(100), 5):
    print(i, end=' ')
print('\n')

print('Start at 5, Stop at 10:')
for i in islice(range(100), 5, 10):
    print(i, end=' ')
print('\n')

print('By tens to 100:')
for i in islice(range(100), 0, 100, 10):
    print(i, end=' ')
print('\n')

Stop at 5:
0 1 2 3 4 

Start at 5, Stop at 10:
5 6 7 8 9 

By tens to 100:
0 10 20 30 40 50 60 70 80 90 



`islice()` takes the same arguments as the slice operator for lists: `start`, `stop`, and `step`. The start and step arguments are optional.

The `tee()` function returns several independent iterators (default = 2) based on a single original input.

In [13]:
from itertools import *

r = islice(count(), 5)
i1, i2 = tee(r)

print('i1:', list(i1))
print('i2:', list(i2))

i1: [0, 1, 2, 3, 4]
i2: [0, 1, 2, 3, 4]


Iterators returned by `tee()` can be used to feed the same set of data into multiple algorithms to be processed in parallel.

Also, new iterators created by `tee()` share their input, so the original iterator should not be used after the new ones are created.

In [14]:
from itertools import *

r = islice(count(), 5)
i1, i2 = tee(r)

print('r:', end=' ')
for i in r:
    print(i, end=' ')
    if i > 1:
        break
print()

print('i1:', list(i1))
print('i2:', list(i2))

r: 0 1 2 
i1: [3, 4]
i2: [3, 4]


### _Converting Inputs_

Built-in `map()` function returns an iterator that calls a function on the values in the input iterators and returns the results. It stops when any input iterator is exhausted.

In [15]:
def times_two(x):
    return 2 * x

def multiply(x, y):
    return (x, y, x*y)

print('Doubles:')
for i in map(times_two, range(5)):
    print(i)
    
print('\nMultiples:')
r1 = range(5)
r2 = range(5, 10)
for i in map(multiply, r1, r2):
    print('{:d} * {:d} = {:d}'.format(*i))
    
print('\nStopping:')
r1 = range(5)
r2 = range(2)
for i in map(multiply, r1, r2):
    print(i)

Doubles:
0
2
4
6
8

Multiples:
0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36

Stopping:
(0, 0, 0)
(1, 1, 1)


- 1st Example: lambda function multiples the input values by 2
- 2nd Example: lambda function multiplies two arguments, taken from separate iterators, and returns a tuple with the original arguments and the computed value
- 3rd Example: stops after producing two tuples because the second range is exhausted

`starmap()` function is similar to `map()` but instead of constructing a tuple from multiple iterators, it splits up the items in a single iterator as arguments to the mapping function using the `*` syntax.

In [17]:
from itertools import *

values = [(0, 5), (1, 6), (2, 7), (3, 8), (4, 9)]

for i in starmap(lambda x, y: (x, y, x * y), values):
    print('{} * {} = {}'.format(*i))

0 * 5 = 0
1 * 6 = 6
2 * 7 = 14
3 * 8 = 24
4 * 9 = 36


### _Producing New Values_

`count()` function returns an iterator that produces consecutive integers, indefinitely. First number can be passed as an argument (default: 0). No upper bound argument.

In [18]:
from itertools import *

for i in zip(count(1), ['a', 'b', 'c']):
    print(i)

(1, 'a')
(2, 'b')
(3, 'c')


The start and step arguments to `count()` can be any numerical values that can be added together.

In [20]:
import fractions
from itertools import *

# start and steps are Fraction objects from fraction module
start = fractions.Fraction(1, 3)
step = fractions.Fraction(1, 3)

for i in zip(count(start, step), ['a', 'b', 'c']):
    print('{}: {}'.format(*i))

1/3: a
2/3: b
1: c


`cycle()` function returns an iterator that repeats the contents of the arguments it is given indefinitely. Since it has to remember the entire contents of the input iterator, it may consume quite a bit of memory if the iterator is long.

In [21]:
from itertools import *

for i in zip(range(7), cycle(['a', 'b', 'c'])):
    print(i)

(0, 'a')
(1, 'b')
(2, 'c')
(3, 'a')
(4, 'b')
(5, 'c')
(6, 'a')


The `repeat()` function returns an iterator that produces the same value each time it is accessed.

In [22]:
from itertools import *

for i in repeat('over-and-over', 5):
    print(i)

over-and-over
over-and-over
over-and-over
over-and-over
over-and-over


`repeat()` keeps returning data forever, unless the optional `times` argument is provided to limit it.

It is useful to combine `repeat()` with `zip()` or `map()` when invariant values need to be included with the values from the other iterators.

In [23]:
from itertools import *

for i, s in zip(count(), repeat('over-and-over', 5)):
    print(i, s)

0 over-and-over
1 over-and-over
2 over-and-over
3 over-and-over
4 over-and-over


In [24]:
# this example use map() to multiply the numbers in the 0 through 4 by 2
from itertools import *

for i in map(lambda x, y: (x, y, x * y), repeat(2), range(5)):
    print('{:d} * {:d} = {:d}'.format(*i))

2 * 0 = 0
2 * 1 = 2
2 * 2 = 4
2 * 3 = 6
2 * 4 = 8


### _Filtering_

`dropwhile()` function returns an iterator that produces elements of the input iterator after a condition becomes false for the first time.

In [25]:
from itertools import *

def should_drop(x):
    print('Testing:', x)
    return x < 1

for i in dropwhile(should_drop, [-1, 0, 1, 2, -2]):
    print('Yielding:', i)

Testing: -1
Testing: 0
Testing: 1
Yielding: 1
Yielding: 2
Yielding: -2


`dropwhile()` does not filter every item of the input; after the condition is false the first time, all of the remaining items in the input are returned.

Opposite of `dropwhile()` is `takewhile()`, and returns an iterator that returns items from the input iterator as long as the test function returns `True`.

In [26]:
from itertools import *

def should_take(x):
    print('Testing:', x)
    return x < 2

for i in takewhile(should_take, [-1, 0, 1, 2, -2]):
    print('Yielding:', i)

Testing: -1
Yielding: -1
Testing: 0
Yielding: 0
Testing: 1
Yielding: 1
Testing: 2
