Itertools contains a number of ommonly used iterators and functions used for combining several iterators

In [3]:
import itertools

- Count function

In [2]:
counter = itertools.count() # This returns an iterator that counts to infinity with a step of 1

In [3]:
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # This will continue for ever

0
1
2
3


In [6]:
# Lets say we wanna graph the daily data below. that is, the data and the day the data was recorded

data = [100, 200, 300, 400]

daily_data = list(zip(itertools.count(), data))
print(daily_data)

[(0, 100), (1, 200), (2, 300), (3, 400)]


In [7]:
# We can also pass some keyword args to the count() method. e.g start and step

counter = itertools.count(start = 5)
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # This starts from 5 and continue forward with step 1

5
6
7
8


In [8]:
# If we want an interval of 5, we can also pass a step keyword arg of 5

counter = itertools.count(start = 5, step = 5)
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # this starts from 5 and step forward with an interval of 5

5
10
15
20


In [9]:
counter = itertools.count(start = 5, step = -2.5)
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # We can also move backward using a negative number and and also pass an interval with a decimal

5
2.5
0.0
-2.5


- Zip_longest : This helps to return a zipped object. except that it doesnt end until the longest iterable is exhausted

In [13]:
data = [100, 200, 300, 400]

daily_data = list(itertools.zip_longest(range(10), data))
print(daily_data)  # This returns a list of zipped data until the longest iterable is exhausted. It complements the missing 
                   # values of the shorter iterable with None values as seen below
         

[(0, 100), (1, 200), (2, 300), (3, 400), (4, None), (5, None), (6, None), (7, None), (8, None), (9, None)]


- Cycle : this continue to infinity as well. It helps to revolve around a certain band of data continuously

In [15]:
counter = itertools.cycle([1, 2, 3])
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) 

1
2
3
1
2
3


In [16]:
# It can have a use case of performing a certain operation at a certain value. It could also be useful in turning a switch
# ON and OFF

counter = itertools.cycle(['On', 'Off'])
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))

On
Off
On
Off
On
Off


- repeat : it is used for passing in a stream of constant values to functions like map/zip that also work on iterables

In [17]:
counter = itertools.repeat(2)
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # This iterator will keep looping over ad infinitum

2
2
2
2
2
2
2


In [18]:
# Repeat operation can be performed for a certain amount of time without going endlessly

In [19]:
counter = itertools.repeat(2, times=3)
print(next(counter))
print(next(counter))
print(next(counter))
print(next(counter)) # A StopIteration is triggered on the fourth iteration, if for loop had been used it would have handled
                     # the stopIteartion error

2
2
2


StopIteration: 

In [21]:
# A typical usecase is computing is square of 0-9

square = map(pow, range(10), itertools.repeat(2)) # this returns an iterator an it is cast on a list
print(list(square)) # this returns the squares of the values 0-9

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


- starmap : this modifies the map function a little bit. it is similar to map in functionality but instaead of taking args from iterables like we did above, it takes values that are already paired together as tuples

In [22]:
square = itertools.starmap(pow, [(0, 2), (1, 2), (2, 2)])
print(list(square))

[0, 1, 4]


In [23]:
square = itertools.starmap(pow, zip(range(10), itertools.repeat(2)))
print(list(square))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


#### The iterators above go on forever. the below will be iterators that get terminated

- The typical examples of terminated functions are combinations and permutations. This allow one to take an iterable and then 
then return their combinations and permutations. __Combinations__ is the different ways of grouping certain number of items where the order does not matter. However __Permutations__ is the different ways of grouping certain number of items wher ethe order does matter

In [24]:
letters = ['a', 'b', 'c', 'd']
numbers = [0, 1, 2, 3]
names = ['Corey', 'Nicole']

In [27]:
# To group variable letter into all possible 2 distinct groups, combinations is used

groups = itertools.combinations(letters, 2)

for group in groups:
    print(group)  # This outputs the different combinations of letters with length 2 where order does not matter

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')


In [28]:
groups = itertools.permutations(letters, 2)

for group in groups:
    print(group)       # Different possible ways of grouping 2 items in letters where order does matter

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'a')
('b', 'c')
('b', 'd')
('c', 'a')
('c', 'b')
('c', 'd')
('d', 'a')
('d', 'b')
('d', 'c')


__The above did not repeat values, that is,  permutaions and combinations do give us different possible ways of arranging our letters however they do not give__(a,a), (b,b), (c,c), (d,d)__and this is perhaps because we only have__ one a, b , c , d __in the letters__

- How do we entertain replacement?

-- By using itertools.product and itertools.combinations_with_replacement

In [37]:
# Trying to create all possible arrangements of numbers variable with replacements using itertools.product

results = itertools.product(numbers, repeat=4)

for index, result in enumerate(results):
    print(f'{index} --> {result}')       # This is a cartesian product of (numbers by numbers by numbers by numbers)

0 --> (0, 0, 0, 0)
1 --> (0, 0, 0, 1)
2 --> (0, 0, 0, 2)
3 --> (0, 0, 0, 3)
4 --> (0, 0, 1, 0)
5 --> (0, 0, 1, 1)
6 --> (0, 0, 1, 2)
7 --> (0, 0, 1, 3)
8 --> (0, 0, 2, 0)
9 --> (0, 0, 2, 1)
10 --> (0, 0, 2, 2)
11 --> (0, 0, 2, 3)
12 --> (0, 0, 3, 0)
13 --> (0, 0, 3, 1)
14 --> (0, 0, 3, 2)
15 --> (0, 0, 3, 3)
16 --> (0, 1, 0, 0)
17 --> (0, 1, 0, 1)
18 --> (0, 1, 0, 2)
19 --> (0, 1, 0, 3)
20 --> (0, 1, 1, 0)
21 --> (0, 1, 1, 1)
22 --> (0, 1, 1, 2)
23 --> (0, 1, 1, 3)
24 --> (0, 1, 2, 0)
25 --> (0, 1, 2, 1)
26 --> (0, 1, 2, 2)
27 --> (0, 1, 2, 3)
28 --> (0, 1, 3, 0)
29 --> (0, 1, 3, 1)
30 --> (0, 1, 3, 2)
31 --> (0, 1, 3, 3)
32 --> (0, 2, 0, 0)
33 --> (0, 2, 0, 1)
34 --> (0, 2, 0, 2)
35 --> (0, 2, 0, 3)
36 --> (0, 2, 1, 0)
37 --> (0, 2, 1, 1)
38 --> (0, 2, 1, 2)
39 --> (0, 2, 1, 3)
40 --> (0, 2, 2, 0)
41 --> (0, 2, 2, 1)
42 --> (0, 2, 2, 2)
43 --> (0, 2, 2, 3)
44 --> (0, 2, 3, 0)
45 --> (0, 2, 3, 1)
46 --> (0, 2, 3, 2)
47 --> (0, 2, 3, 3)
48 --> (0, 3, 0, 0)
49 --> (0, 3, 0, 1)
50 --> (0,

In [39]:
# Another approach is to use combinations_with replacement


results = itertools.combinations_with_replacement(numbers, 4)

for index, result in enumerate(results):
    print(f'{index} --> {result}')  # This shows distinct arrangement of groups of 4 elements with the possibility of 
                                    # of repeating values 

0 --> (0, 0, 0, 0)
1 --> (0, 0, 0, 1)
2 --> (0, 0, 0, 2)
3 --> (0, 0, 0, 3)
4 --> (0, 0, 1, 1)
5 --> (0, 0, 1, 2)
6 --> (0, 0, 1, 3)
7 --> (0, 0, 2, 2)
8 --> (0, 0, 2, 3)
9 --> (0, 0, 3, 3)
10 --> (0, 1, 1, 1)
11 --> (0, 1, 1, 2)
12 --> (0, 1, 1, 3)
13 --> (0, 1, 2, 2)
14 --> (0, 1, 2, 3)
15 --> (0, 1, 3, 3)
16 --> (0, 2, 2, 2)
17 --> (0, 2, 2, 3)
18 --> (0, 2, 3, 3)
19 --> (0, 3, 3, 3)
20 --> (1, 1, 1, 1)
21 --> (1, 1, 1, 2)
22 --> (1, 1, 1, 3)
23 --> (1, 1, 2, 2)
24 --> (1, 1, 2, 3)
25 --> (1, 1, 3, 3)
26 --> (1, 2, 2, 2)
27 --> (1, 2, 2, 3)
28 --> (1, 2, 3, 3)
29 --> (1, 3, 3, 3)
30 --> (2, 2, 2, 2)
31 --> (2, 2, 2, 3)
32 --> (2, 2, 3, 3)
33 --> (2, 3, 3, 3)
34 --> (3, 3, 3, 3)


- Chain : This allows us to chain together iterables so that it will go through the items in the first iterable and then to the next iterable and so on. It is a very efficient way of iterating over a set of iterables. 

In [40]:
combined = itertools.chain(letters, numbers, names)

for item in combined:
    print(item)

a
b
c
d
0
1
2
3
Corey
Nicole


-islice : this could allow us to get a slice of an iterator. this works more like list slicing only that it has the added benefit of only holding the value in memory. nice for memory management

In [43]:
# Lets slice the first 5 element in an iterable range(9)

results = itertools.islice(range(9), 5)

for result in results:
    print(result)          # the default starting point is 0 and a step of 1

0
1
2
3
4


In [48]:
# To start at 2 and step by 2 for the first 5 element

results = itertools.islice(range(9), 2, 9, 2)

for result in results:
    print(result)       # the step must be an integer otherwise, it throws value error

2
4
6
8


In [42]:
#Files are like iterators, whenever you call next on them, it goes to the next line in the file

In [52]:
# To read the first 3 lines in a text file to extract info about the author

with open('test.log.txt', 'r') as f:
    author_info = itertools.islice(f, 3)
    for line in author_info:
        print(line, end='')  # To take away the line in between each result

Date: 2018-11-08
Author: Corey
Description: This is a sample log file


- Compress : This is used in data science. Where there is data and selectors which can be used to filter down data

In [54]:
# Lets assume we have a list of True/False values that correspond to letters list above. the compress function 
# will help return the values that correspond to True values

In [76]:
letters = ['a', 'b', 'c', 'd']
numbers = [0, 1, 2, 3]
names = ['Corey', 'Nicole']

In [77]:
selectors = [True, True, False, True]

In [58]:
results = itertools.compress(letters, selectors)

for result in results:
    print(result)  # These are the values that correspond to True in the selectors

a
b
d


In [None]:
# Another built-in python function that can be used for this, is the filter function. Only that it requires the a function
# as an argument

# Lets say we wanna output values that are less than 2 in the numbers variable. The filter actually returns an iterator 
# for which the items in the iterable is True


In [5]:
def less_than_2(n):
    if n < 2:
        return True
    else:
        return False

In [86]:
filter_results = filter(less_than_2, numbers)

for item in filter_results:
    print(item)

0
1


In [87]:
results = itertools.filterfalse(less_than_2, numbers)

for result in results:
    print(result)      # This prints out the false values

2
3


In [79]:
itertools.filterfalse?

In [82]:
numbers

[0, 1, 2, 3]

- Dropwhile: drop values from the iterables that are __True__ until it hits a value that is __False__

In [1]:
numbers = [0, 1, 2, 3, 2, 1, 0]

In [6]:
results = itertools.filterfalse(less_than_2, numbers)

for result in results:
    print(result)    # This outputs all values greater than 2

2
3
2


In [7]:
# Using the dropwhile function

results = itertools.dropwhile(less_than_2, numbers)

for result in results:
    print(result)       # This drops value that are less than 2 until it hits a number >= 2 and iteration stops

2
3
2
1
0


In [8]:
results = itertools.takewhile(less_than_2, numbers)

for result in results:
    print(result)      # This outputs the True values until it hits a False

0
1


-- accumulate : this takes an iterable and returns accumulated sums of each item that it sees. It uses addition by default. However other operations suc as __multiplication, division and substraction__ could be used

In [9]:
numbers = [0, 1, 2, 3, 2, 1, 0]

In [12]:
results = itertools.accumulate(numbers)

for result in results:
    print(result)      # This returns a running total

0
1
3
6
8
9
9


In [15]:
numbers = [1, 2, 3, 2, 1, 0]

- To use multiplication

In [16]:
import operator

In [17]:
results = itertools.accumulate(numbers, operator.mul)

for result in results:
    print(result)      # This returns a running total

1
2
6
12
12
0


-- __Groupby__: This will go through a iterable and group values based on a certain key returning a stream of tuples consisting the key-iterator pair. the key is the item the items were grouped on. the iterator being the items grouped by the key

In [18]:
itertools.groupby?

In [19]:
people = [
    {
        'name': 'John Doe',
        'city': 'Gotham',
        'state': 'NY'
    },
    {
        'name': 'Jane Doe',
        'city': 'Kings Landing',
        'state': 'NY'
    },
    {
        'name': 'Corey Schafer',
        'city': 'Boulder',
        'state': 'CO'
    },
    {
        'name': 'Al Einstein',
        'city': 'Denver',
        'state': 'CO'
    },
    {
        'name': 'John Henry',
        'city': 'Hinton',
        'state': 'WV'
    },
    {
        'name': 'Randy Moss',
        'city': 'Rand',
        'state': 'WV'
    },
    {
        'name': 'Nicole K',
        'city': 'Asheville',
        'state': 'NC'
    },
    {
        'name': 'Jim Doe',
        'city': 'Charlotte',
        'state': 'NC'
    },
    {
        'name': 'Jane Taylor',
        'city': 'Faketown',
        'state': 'NC'
    }
]


In [41]:
def get_state(person):
    return person['state']

In [54]:
 results = itertools.groupby(people, get_state)

In [46]:
for key, group in results:   # tuples of key-iterator pair generated
    print(key, group)        

NY <itertools._grouper object at 0x0000018527433408>
CO <itertools._grouper object at 0x00000185273F1348>
WV <itertools._grouper object at 0x0000018527433408>
NC <itertools._grouper object at 0x00000185273F1348>


In [52]:
# Re-run the groupby since the for loop above has already exhausted the groupby object

for key, group in results:
    print(key)
    for person in group:         # Looping over group since it is an iterator
        print(person)

NY
{'name': 'John Doe', 'city': 'Gotham', 'state': 'NY'}
{'name': 'Jane Doe', 'city': 'Kings Landing', 'state': 'NY'}
CO
{'name': 'Corey Schafer', 'city': 'Boulder', 'state': 'CO'}
{'name': 'Al Einstein', 'city': 'Denver', 'state': 'CO'}
WV
{'name': 'John Henry', 'city': 'Hinton', 'state': 'WV'}
{'name': 'Randy Moss', 'city': 'Rand', 'state': 'WV'}
NC
{'name': 'Nicole K', 'city': 'Asheville', 'state': 'NC'}
{'name': 'Jim Doe', 'city': 'Charlotte', 'state': 'NC'}
{'name': 'Jane Taylor', 'city': 'Faketown', 'state': 'NC'}


In [55]:
# Re-run, the groupby object, to get the number of items in each grouped state. group is an iterator hence it is 
# cast on a list before running len function

for key, group in results:
    print(key, len(list(group)))

NY 2
CO 2
WV 2
NC 3


__N.B__: the itertools groupby object needs the values to be sorted beforehand. If the second person in __N.Y__ were to be at the bottom of the list the key __N.Y__ would not appear first on the list

__tee function__: this helps in replicating an iterator

In [57]:
# Lets assume we wanna replicate the person group above into 2 different iterables

results = itertools.groupby(people, get_state) # This returns a groupby iterator that can be looped over

# To get copies of the groupby iterator, tee function is used

copy1, copy2 = itertools.tee(results)  # Default copies is 2, if more is needed the number of copies can be passed

-- Once the groupby iterator is passed into the tee function. It is recommended not to use it cos it could have unintended consequences. Instead, the copies should be worked upon, or looped or iterated over.