# Itertools 

Functions creating iterators for efficient looping

### Chain

Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted

In [1]:
from itertools import chain

In [2]:
for x in chain('ABC', 'DEF'):
    print x

A
B
C
D
E
F


In [3]:
for x in chain.from_iterable(['ASD', 'QWE']):
    print x

A
S
D
Q
W
E


### Combinations

Return r length subsequences of elements from the input iterable.

Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

In [4]:
from itertools import combinations

In [5]:
for x in combinations('ABCD', 2):
    print x

('A', 'B')
('A', 'C')
('A', 'D')
('B', 'C')
('B', 'D')
('C', 'D')


In [6]:
for x in combinations('ABCD', 3):
    print x

('A', 'B', 'C')
('A', 'B', 'D')
('A', 'C', 'D')
('B', 'C', 'D')


In [7]:
from itertools import combinations_with_replacement

In [9]:
for x in combinations_with_replacement('ABCD', 2):
    print x

('A', 'A')
('A', 'B')
('A', 'C')
('A', 'D')
('B', 'B')
('B', 'C')
('B', 'D')
('C', 'C')
('C', 'D')
('D', 'D')


### Compress / Count / Cycle / Dropwhile

**Compress**: Make an iterator that filters elements from data returning only those that have a corresponding element in selectors that evaluates to True. Stops when either the data or selectors iterables has been exhausted.
 

**Count**: Make an iterator that returns evenly spaced values starting with n. Often used as an argument to imap() to generate consecutive data points. Also, used with izip() to add sequence numbers. 

 
**Cycle**: Make an iterator returning elements from the iterable and saving a copy of each. When the iterable is exhausted, return elements from the saved copy. Repeats indefinitely.

**DropWhile**: Make an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element. Note, the iterator does not produce any output until the predicate first becomes false, so it may have a lengthy start-up time.




In [12]:
from itertools import compress

for x in compress('ABCDEF', [1,0,1,0,1,0]):
    print x

A
C
E


In [24]:
from itertools import count

stop = 10
for x in count(0):
    print x
    if x==stop:
        break


0
1
2
3
4
5
6
7
8
9
10


In [28]:
from itertools import cycle
loops = 3
string = 'ABCDEF'

for x in cycle(string):
    if x == string[0]:
        loops -= 1
    if loops == 0:
        break
    print x


A
B
C
D
E
F
A
B
C
D
E
F


In [32]:
from itertools import dropwhile

for x in dropwhile(lambda x: x<10, [1,2,3,6,11,2,20]):
    print x

11
2
20


### GroupBy

Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function.

In [36]:
from itertools import groupby

print [x for x, g in groupby('AAAABBBCCDAABBB')]

['A', 'B', 'C', 'D', 'A', 'B']


In [43]:
for x,g in groupby('AAAABBBCCDAABBB'):
    print x, '-->', [y for y in g]

A --> ['A', 'A', 'A', 'A']
B --> ['B', 'B', 'B']
C --> ['C', 'C']
D --> ['D']
A --> ['A', 'A']
B --> ['B', 'B', 'B']


### iFilter / iFilterFalse

Make an iterator that filters elements from iterable returning only those for which the predicate is True. If predicate is None, return the items that are true. 

In [49]:
filter(lambda x: x>10, [1,2,20,5,22])

[20, 22]

In [50]:
from itertools import ifilter
for x in ifilter(lambda x: x>10, [1,2,20,5,22]):
    print x

20
22


In [51]:
from itertools import ifilterfalse
for x in ifilterfalse(lambda x: x>10, [1,2,20,5,22]):
    print x

1
2
5


### iMap

Make an iterator that computes the function using arguments from each of the iterables. If function is set to None, then imap() returns the arguments as a tuple. Like map() but stops when the shortest iterable is exhausted instead of filling in None for shorter iterables. The reason for the difference is that infinite iterator arguments are typically an error for map() (because the output is fully evaluated) but represent a common and useful way of supplying arguments to imap().

In [58]:
from itertools import imap

def myMapper(value):
    return value**2

for x in imap(myMapper, [1,2,3]):
    print x
print '\n--\nPOW:' #1**10, 2**5, 10**2
for x in imap(pow, (1,2,10), (10,5,2)):
    print x

1
4
9

--
POW:
1
32
100


### isslice

Make an iterator that returns selected elements from the iterable. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless step is set higher than one which results in items being skipped. If stop is None, then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, islice() does not support negative values for start, stop, or step. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line).

In [67]:
from itertools import islice

for x in islice('ABCDEFGHJKLMNOP', 5,7): # start and stop
    print x
print '\n--\n'
for x in islice('ABCDEFGHJKLMNOP', 5): # stop only
    print x

F
G

--

A
B
C
D
E


### izip 

The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip(*[iter(s)]*n).

izip() should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use **izip_longest()** instead.

In [76]:
from itertools import izip, izip_longest

for pair in izip('ABCDEF', 'xyzw'):
    print pair
print '\n--\n'
for pair in izip_longest('ABCDEF', 'xyzw', fillvalue='*'):
    print pair

('A', 'x')
('B', 'y')
('C', 'z')
('D', 'w')

--

('A', 'x')
('B', 'y')
('C', 'z')
('D', 'w')
('E', '*')
('F', '*')


### Permutations

Return successive *r* length permutations of elements in the iterable.

If r is not specified or is None, then r defaults to the length of the iterable and all possible full-length permutations are generated.

Permutations are emitted in lexicographic sort order. So, if the input *iterable* is sorted, the permutation tuples will be produced in sorted order.

Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation.



In [88]:
from itertools import permutations

for pair in permutations('ABCD', 2):
    print pair


('A', 'B')
('A', 'C')
('A', 'D')
('B', 'A')
('B', 'C')
('B', 'D')
('C', 'A')
('C', 'B')
('C', 'D')
('D', 'A')
('D', 'B')
('D', 'C')


### Product

Cartesian product of input iterables.

Roughly equivalent to nested for-loops in a generator expression. For example, product(A, B) returns the same as ((x,y) for x in A for y in B).

The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order.

To compute the product of an iterable with itself, specify the number of repetitions with the optional repeat keyword argument. For example, product(A, repeat=4) means the same as product(A, A, A, A).

In [89]:
from itertools import product

for pair in product('ABCD', 'xy'):
    print pair

('A', 'x')
('A', 'y')
('B', 'x')
('B', 'y')
('C', 'x')
('C', 'y')
('D', 'x')
('D', 'y')


### repeat / startMap / takeWhile / tee

**repeat**: Make an iterator that returns object over and over again. Runs indefinitely unless the times argument is specified. Used as argument to imap() for invariant function parameters. Also used with izip() to create constant fields in a tuple record.

**starmap**: Make an iterator that computes the function using arguments obtained from the iterable. Used instead of imap() when argument parameters are already grouped in tuples from a single iterable (the data has been “pre-zipped”). The difference between imap() and starmap() parallels the distinction between function(a,b) and function(*c).

**takewhile**: Make an iterator that returns elements from the iterable as long as the predicate is true. 

**tee**: Return n independent iterators from a single iterable. 


In [90]:
from itertools import repeat

for x in repeat(10,3):
    print x

10
10
10


In [92]:
from itertools import starmap
# 2**5, 3**2, 10**3
for x in starmap(pow, [(2,5), (3,2), (10,3)]):
    print x

32
9
1000


In [93]:
from itertools import takewhile

for x in takewhile(lambda x: x<10, [1,2,5,6,7,99]):
    print x

1
2
5
6
7


In [97]:
from itertools import tee

for x_iter in tee([1,2,3,4,5,6,7],3):
    for x in x_iter:
        print x
    print '----'

1
2
3
4
5
6
7
----
1
2
3
4
5
6
7
----
1
2
3
4
5
6
7
----
