# Itertools

A collection of tools that allows us to work with iterators in a fast and memory efficient way.

- [Count](#count): Returns an iterator starting at 0 and gets iterated by 1 everytime.

- [Zip Longest](#Zip_Longest): Much like the zip function except that zip stops when the shortest iterable ends, and zip_longest stops after the longest iterable ends filling in the missing value for the short iterable - usually with None.

- [Cycle](#Cycle): Takes in an iterable and cycles through it, starting from the beginning again once exhausted.

- [Repeat](#Repeat): Repeats a give value for each iteration. You can set how many times you wnt it to repeat.

- [StarMap](#StarMap): Like the map function but instead of taking in two iterables it takes in one iterable that has a pair of values.

- [Combinations and permutations](#Combinations-and-Permutations)
    - Combinations
        - all the possible combinations where order doesn't matter.

    - Permutations
        - all the possible combinations where order does matter.

- [Product](Product): Is the permutation of list, but values can repeat.

- [Combinations with replacetment](#combinations_with_replacement): Similar to combinations, but it allows for repetation.

- [Chain](#Chain): Let's us combine iterators and/or iterables in an effiecent way. Which we otherwise don't have a way to combine iterators.

- [islice](#islice): Allows us to perform slicing on iterators and/or iterables.

- [compress](#compress): Much like the filter function, that takes in an iterable and a function and returns the values that returned true from the function, however, the truth table is based as an iterable instead of a function.

- [filterfalse](#filterfalse): Much like filter but it returns the values that return false.

- [DropWhile](#DropWhile): Has the same parameters as filter but it drops every value that returns true until it finds one that returns false and returns it along with rest of the values after it.

- [TakeWhile](#TakeWhile): Much like Dropwhile, but it keeps going until it meets a value that returns false. At this point it returns all of the values it passed by, which all returned true.

- [Accumulate](#Accumulate): Accumaltes the values as it sees them through iteration. By default it will add them up, but we can specifiy anyother mathematical calculation.

- [Groupby](#Groupby): Much like the groupby of SQL, it groups the values by a given key. But it needs the values to be sorted.

- [tee](#tee): Allows us to replicate iterators, however, you can't iterator through the original or you'll exhaust the copies as well.

In [1]:
# Available in the python standard library
import itertools

---

## Count

In [2]:
# Returns an iterator starting at 0 and gets iterated by 1 everytime
counter = itertools.count()

# Recall the next dunder method
print(next(counter))
print(next(counter))
print(next(counter))

0
1
2


If we were to iterate through this iterator in a for loop, though, we would get an infinite loop, since count function was not passed an ending point.

### Application

This count function would be useful if we wanted to assign an index value to a list, for example, but we don't know how long the list would be.

In [3]:
data = ["day 0", "day 1", "day 2", "day 3", "day 4"]

# Recall the zip function?
daily_data = zip(itertools.count(), data)

print(list(daily_data))

[(0, 'day 0'), (1, 'day 1'), (2, 'day 2'), (3, 'day 3'), (4, 'day 4')]


In [4]:
# We can pass arguements to the count function
counter = itertools.count(start=5, step=5)

print(next(counter))
print(next(counter))
print(next(counter))

5
10
15


In [5]:
# Count can also count backwards and by decimal
counter = itertools.count(start=5, step=-2.5)

print(next(counter))
print(next(counter))
print(next(counter))

5
2.5
0.0


---

## Zip_longest

Much like the zip function except that zip stops when the shortest iterable ends, and zip_longest stops after the longest iterable ends filling in the missing value for the short iterable - usually with `None`.

In [6]:
data = ["day 0", "day 1", "day 2", "day 3", "day 4"]


zip_data = zip(range(10), data)
zip_longest_data = itertools.zip_longest(range(10), data)

print(f"zip: {list(zip_data)}")
print(f"zip Longest: {list(zip_longest_data)}")

zip: [(0, 'day 0'), (1, 'day 1'), (2, 'day 2'), (3, 'day 3'), (4, 'day 4')]
zip Longest: [(0, 'day 0'), (1, 'day 1'), (2, 'day 2'), (3, 'day 3'), (4, 'day 4'), (5, None), (6, None), (7, None), (8, None), (9, None)]


---

## Cycle

Takes in an iterable and cycles through it, starting from the beginning again once exhausted.

In [7]:
cycle = itertools.cycle(range(3))

print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))
print(next(cycle))

0
1
2
0
1
2


In [8]:
light = itertools.cycle(("on", "Off"))

print(next(light))
print(next(light))
print(next(light))
print(next(light))
print(next(light))
print(next(light))

on
Off
on
Off
on
Off


---

## Repeat

Repeats a give value for each iteration. You can set how many times you wnt it to repeat.

In [9]:
# Repeat indefintly
repeat = itertools.repeat(7)

print(next(repeat))
print(next(repeat))
print(next(repeat))

# Repeat indefintly
repeat = itertools.repeat(("on", "Off"))

print(next(repeat))
print(next(repeat))
print(next(repeat))

7
7
7
('on', 'Off')
('on', 'Off')
('on', 'Off')


In [10]:
# Repeat for a given number of times
repeat = itertools.repeat(7, times=2)

print(next(repeat))
print(next(repeat))

7
7


In [11]:
print(next(repeat))

StopIteration: 

This function is commondly used when you use `map()` or `zip()` and want to use the same value.

In [12]:
# Using the pow function, which takes in two arguemnts: a number and a power,
# We can map all the values in the range(10) to 2
power = map(pow, range(10), itertools.repeat(2))

print(list(power))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


---

## StarMap

Like the map function but instead of taking in two iterables it takes in one iterable that has a pair of values.

In [13]:
power = itertools.starmap(pow, [(0, 2), (1, 2), (2, 2)])

print(list(power))

[0, 1, 4]


---

## Combinations and Permutations

### Combinations

all the possible combinations where order doesn't matter.

### Permutations
all the possible combinations where order does matter.

In [14]:
letters = ['a', 'b', 'c', 'd']

# A list of all possible combinations of 2
for pair in itertools.combinations(letters, 2):
    print(pair)

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'c')
('b', 'd')
('c', 'd')


In [15]:
# A list of all possible permutations of 2
for pair in itertools.permutations(letters, 2):
    print(pair)

('a', 'b')
('a', 'c')
('a', 'd')
('b', 'a')
('b', 'c')
('b', 'd')
('c', 'a')
('c', 'b')
('c', 'd')
('d', 'a')
('d', 'b')
('d', 'c')


---

## Product

Is the permutation of list, but values can repeat.

In [16]:
# A pair of all possible permutations where repetition is allowed
for pair in itertools.product(letters, repeat=2):
    print(pair)

('a', 'a')
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'a')
('b', 'b')
('b', 'c')
('b', 'd')
('c', 'a')
('c', 'b')
('c', 'c')
('c', 'd')
('d', 'a')
('d', 'b')
('d', 'c')
('d', 'd')


This can be used to generate all the possible number permutations available.

In [17]:
numbers = [0, 1, 2, 3]

for code in itertools.product(numbers, repeat=4):
    print(code)

(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 0, 2)
(0, 0, 0, 3)
(0, 0, 1, 0)
(0, 0, 1, 1)
(0, 0, 1, 2)
(0, 0, 1, 3)
(0, 0, 2, 0)
(0, 0, 2, 1)
(0, 0, 2, 2)
(0, 0, 2, 3)
(0, 0, 3, 0)
(0, 0, 3, 1)
(0, 0, 3, 2)
(0, 0, 3, 3)
(0, 1, 0, 0)
(0, 1, 0, 1)
(0, 1, 0, 2)
(0, 1, 0, 3)
(0, 1, 1, 0)
(0, 1, 1, 1)
(0, 1, 1, 2)
(0, 1, 1, 3)
(0, 1, 2, 0)
(0, 1, 2, 1)
(0, 1, 2, 2)
(0, 1, 2, 3)
(0, 1, 3, 0)
(0, 1, 3, 1)
(0, 1, 3, 2)
(0, 1, 3, 3)
(0, 2, 0, 0)
(0, 2, 0, 1)
(0, 2, 0, 2)
(0, 2, 0, 3)
(0, 2, 1, 0)
(0, 2, 1, 1)
(0, 2, 1, 2)
(0, 2, 1, 3)
(0, 2, 2, 0)
(0, 2, 2, 1)
(0, 2, 2, 2)
(0, 2, 2, 3)
(0, 2, 3, 0)
(0, 2, 3, 1)
(0, 2, 3, 2)
(0, 2, 3, 3)
(0, 3, 0, 0)
(0, 3, 0, 1)
(0, 3, 0, 2)
(0, 3, 0, 3)
(0, 3, 1, 0)
(0, 3, 1, 1)
(0, 3, 1, 2)
(0, 3, 1, 3)
(0, 3, 2, 0)
(0, 3, 2, 1)
(0, 3, 2, 2)
(0, 3, 2, 3)
(0, 3, 3, 0)
(0, 3, 3, 1)
(0, 3, 3, 2)
(0, 3, 3, 3)
(1, 0, 0, 0)
(1, 0, 0, 1)
(1, 0, 0, 2)
(1, 0, 0, 3)
(1, 0, 1, 0)
(1, 0, 1, 1)
(1, 0, 1, 2)
(1, 0, 1, 3)
(1, 0, 2, 0)
(1, 0, 2, 1)
(1, 0, 2, 2)
(1, 0, 2, 3)
(1, 0, 3, 0)

---

## Combinations_with_replacement

Similar to combinations, but it allows for repetation.

In [18]:
for pair in itertools.combinations_with_replacement(letters, 2):
    print(pair)

('a', 'a')
('a', 'b')
('a', 'c')
('a', 'd')
('b', 'b')
('b', 'c')
('b', 'd')
('c', 'c')
('c', 'd')
('d', 'd')


---

## Chain

Let's us combine iterators and/or iterables in an effiecent way. Which we otherwise don't have a way to combine iterators.

In [19]:
letters = ['a', 'b', 'c', 'd']
numbers = [0, 1, 2, 3]
names = ['Yahya', 'Saleh']

loop = itertools.chain(letters, numbers, names)

for i in loop:
    print(i)

a
b
c
d
0
1
2
3
Yahya
Saleh


> Note how it looped over them in order

---

## islice

Allows us to perform slicing on iterators and/or iterables.

In [20]:
letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

# Slice from the beginning till index 5 exclusive
sliced = itertools.islice(letters, 5)

for item in sliced:
    print(item)

a
b
c
d
e


In [21]:
# Slice from index 2 till index 6
sliced = itertools.islice(letters, 2, 6)

for item in sliced:
    print(item)

c
d
e
f


In [22]:
# Slice from index 2 till index 6 with step 2
sliced = itertools.islice(letters, 2, 6, 2)

for item in sliced:
    print(item)

c
e


### Application

This and other itertools are useful when handling large files. Let's take a file like sample.log for example. If we wanted to just access the first three lines which have the header, we can do so without having to load the entire file.

As it turns out files are iterators themselves, and whenever you call next on them you get the next line. So, let's put this knowledge to action.

In [23]:
with open('sample.log') as f:
    header = itertools.islice(f, 3)

    # We perform the loop in the scope where the file is open
    for line in header:
        # Since the lines in teh file have '\n' we don't want print to add a second one
        print(line, end='')

Date: 2020-8-23
Author: Yahya the amazing
Description: This is a sample log file


---

## Compress

Much like the filter function, that takes in an iterable and a function and returns the values that returned true from the function, however, the truth table is based as an iterable instead of a function.

The truth table can the result of filtering some data like keep only the names where age > 18.

### Filter

In [26]:
# Define a function to filter out values greater than or equal 2
def lt_2(n):
    if n < 2:
        return True
    return False

numbers = [0, 1, 2, 3]
result = filter(lt_2, numbers)

for item in result:
    print(item)

0
1


### Compress

In [28]:
selectors = [True, True, False, False]

result = itertools.compress(numbers, selectors)

for item in result:
    print(item)

0
1


---

## FilterFalse

Much like filter but it returns the values that return false.

In [29]:
result = itertools.filterfalse(lt_2, numbers)

for item in result:
    print(item)

2
3


In [None]:
---

## DropWhile

Has the same parameters as filter but it drops every value that returns true until it finds one that returns false and returns it along with rest of the values after it.

In [31]:
# Expand numbers to show the difference
numbers = [0, 1, 2, 3, 2, 1, 0]

result = itertools.dropwhile(lt_2, numbers)

for item in result:
    print(item)

2
3
2
1
0


Note how it printed the last 1 and 0, something that filter false wouldn't do.

## TakeWhile

Much like Dropwhile, but it keeps going until it meets a value that returns false. At this point it returns all of the values it passed by, which all returned true. 

In [32]:
result = itertools.takewhile(lt_2, numbers)

for item in result:
    print(item)

0
1


---

## Accumulate

Accumulate the values as it sees them through iteration. By default it will add them up, but we can specifiy anyother mathematical calculation.

In [33]:
numbers = [1, 2, 3, 2, 1, 0]

result = itertools.accumulate(numbers)

for item in result:
    print(item)

1
3
6
8
9
9


Like wise we can accumlate the product.

In [34]:
import operator

# mul stands for multiplication
result = itertools.accumulate(numbers, operator.mul)

for item in result:
    print(item)

1
2
6
12
12
0


---

## Groupby

Much like the groupby of SQL, it groups the values by a given key. But it needs the values to be sorted.

In [35]:
people = [
    {
        'name': 'John Doe',
        'city': 'Gotham',
        'state': 'NY'
    },
    {
        'name': 'Jane Doe',
        'city': 'Kings Landing',
        'state': 'NY'
    },
    {
        'name': 'Corey Schafer',
        'city': 'Boulder',
        'state': 'CO'
    },
    {
        'name': 'Al Einstein',
        'city': 'Denver',
        'state': 'CO'
    },
    {
        'name': 'John Henry',
        'city': 'Hinton',
        'state': 'WV'
    },
    {
        'name': 'Randy Moss',
        'city': 'Rand',
        'state': 'WV'
    },
    {
        'name': 'Nicole K',
        'city': 'Asheville',
        'state': 'NC'
    },
    {
        'name': 'Jim Doe',
        'city': 'Charlotte',
        'state': 'NC'
    },
    {
        'name': 'Jane Taylor',
        'city': 'Faketown',
        'state': 'NC'
    }
]

Assume that we have this list of dictionaries which holds an individual's information, and we want to group people by their state.

The first step is to define a function what will take in a person's dictionary and returns the **key**, i.e. the state of the individual.

In [72]:
def get_state(person):
    return person["state"]

groups = itertools.groupby(people, get_state)

In [67]:
for key, group in groups:
    print(key, group)

NY <itertools._grouper object at 0x00000182983688E0>
CO <itertools._grouper object at 0x00000182981E34C0>
WV <itertools._grouper object at 0x000001829807AD00>
NC <itertools._grouper object at 0x0000018297E13C10>


In [60]:
# Let's improve the output
for key, group in groups:
    print(key)

    for person in group:
        print(person)

    print()

NY
{'name': 'John Doe', 'city': 'Gotham', 'state': 'NY'}
{'name': 'Jane Doe', 'city': 'Kings Landing', 'state': 'NY'}

CO
{'name': 'Corey Schafer', 'city': 'Boulder', 'state': 'CO'}
{'name': 'Al Einstein', 'city': 'Denver', 'state': 'CO'}

WV
{'name': 'John Henry', 'city': 'Hinton', 'state': 'WV'}
{'name': 'Randy Moss', 'city': 'Rand', 'state': 'WV'}

NC
{'name': 'Nicole K', 'city': 'Asheville', 'state': 'NC'}
{'name': 'Jim Doe', 'city': 'Charlotte', 'state': 'NC'}
{'name': 'Jane Taylor', 'city': 'Faketown', 'state': 'NC'}



In [73]:
# We can print the number of people in each group
for key, group in groups:
    print(f"{key}: {len(list(group))} people")

NY: 2 people
CO: 2 people
WV: 2 people
NC: 3 people


## Tee

allows us to replicate iterators, however, you can't iterator through the original or you'll exhaust the copies as well.

In [75]:
copy1, copy2 = itertools.tee(groups)