# Iteration Tools

## Aggregators

They are functions that iterate through an iterable object and return a single value that usually takes into account every element of the iterable   

We have some aggregators built-in in Python.
- Min, minimun of the iterator.
- Max, maximum of the iterator.
- Sum, sum of all elements in the iterator.

In [5]:
def squares(n):
    for i in range(n):
        yield i**2

# Remember that the functions iterate through the iterator that means that if we instantiate it once
# it will get exhausted (generator / iterator)

print(f'we have a generator function squares: {squares}')
print(f"Min of squares(5): {min(squares(5))}")
print(f"Max of squares(5): {max(squares(5))}")
print(f"Sum of squares(5): {sum(squares(5))}")

we have a generator function squares: <function squares at 0x10e961870>
Min of squares(5): 0
Max of squares(5): 16
Sum of squares(5): 30


In [8]:
# We can also review the bool function and the __bool__ method for objects.
print("---- BOOL ----")
print(f"bool(0) - > {bool(0)}")
print(f"bool(10) - > {bool(10)}")
print(f"bool(0.0) - > {bool(0.0)}")
print(f"bool(10.0) - > {bool(10.0)}")
print(f"bool('') - > {bool('')}")
print(f"bool('holi') - > {bool('holi')}")
print(f"bool([]) - > {bool([])}")
print(f"bool([False]) - > {bool([False])}")

# Let's now verify what happens to a generator.
def squares(n):
    for i in range(n):
        yield i**2

sq = squares(3)
print(f'squares generator sq -> {sq}')
print(f'min value for sq, {min(sq)}')
print(f"bool(sq) it is exhausted now, {bool(sq)}") # it is true.

---- BOOL ----
bool(0) - > False
bool(10) - > True
bool(0.0) - > False
bool(10.0) - > True
bool('') - > False
bool('holi') - > True
bool([]) - > False
bool([False]) - > True
squares generator sq -> <generator object squares at 0x10e9c0900>
min value for sq, 0
bool(sq) it is exhausted now, True


python for evaluating the `bool` function searches for two things.
- `__bool__` is defined.
- `__len__` is defined, for True if len > 0
- if nothing is defined the True

In [11]:
class PersonAlone:
    pass

class PersonBool:
    def __bool__(self):
        return False

class PersonLen:
    def __len__(self):
        return 0

class PersonBoth:
    def __bool__(self):
        return True

    def __len__(self):
        return 0

print(f"bool(PersonAlone) -> {bool(PersonAlone())}")
print(f"bool(PersonBool) -> {bool(PersonBool())}")
print(f"bool(PersonLen) -> {bool(PersonLen())}")
print(f"bool(PersonBoth) -> {bool(PersonBoth())}")

# This is why is important to define the __len__ method for our custom sequences/iterators since this will
# help with the truth value

bool(PersonAlone) -> True
bool(PersonBool) -> False
bool(PersonLen) -> False
bool(PersonBoth) -> True


We have two functions `any` and `all` that help us verify the truthynes(*?) of all values in an iterable.
- `any` tests for a truthy value in our iterable and returns `True` if found. (similar to or)
- `all` tests for all values in our iterable to be Truthy and return `True`. (similar to and)

In [47]:
l1 = [1, 0, (1)]
# generator
gen1 = (i for i in range(5))
# Exhaust generator
max(gen1)


print("---- ANY ----")
print(f'any for l1: {l1} -> {any(l1)}')
print(f'any for gen1: {gen1} -> {any(gen1)}')

print("---- ALL ----")
print(f'all for l1: {l1} -> {all(l1)}')
print(f'all for gen1: {gen1} -> {all(gen1)}')

print("---- USAGE ----")
# We can use all and any to verify a condition that is expected in 
# all values for our iterable.

# we can use what are called predicates (functions that receive a value and evaluate to a bool)
from numbers import Number

l1 = [1, 3, 7.5, 8.9]
l2 = [1, 3, '7.4', 8]
print(f"is every element of l1: {l1} a number -> {all(map(lambda x: isinstance(x, Number), l1))}")
print(f"is every element of l2: {l1} a number -> {all(map(lambda x: isinstance(x, Number), l2))}")

print("---- USAGE WITH FILE ----")

# validate every manufacturer is at least 3 characters long
with open('car-brands.txt') as file:
    print(f"all car manufacturers are at least 3 char long -> {all(map(lambda x: len(x) >= 4, file))}")

with open('car-brands.txt') as file:
    print(f"any car manufacturers longer than 10 chars -> {any(len(row) > 10 for row in file)}")


---- ANY ----
any for l1: [1, 0, 1] -> True
any for gen1: <generator object <genexpr> at 0x10eab1e70> -> False
---- ALL ----
all for l1: [1, 0, 1] -> False
all for gen1: <generator object <genexpr> at 0x10eab1e70> -> True
---- USAGE ----
is every element of l1: [1, 3, 7.5, 8.9] a number -> True
is every element of l2: [1, 3, 7.5, 8.9] a number -> False
---- USAGE WITH FILE ----
all car manufacturers are at least 3 char long -> True
any car manufacturers longer than 10 chars -> True


## Slicing Iterables

we can also slice iterables (not just sequences) by using the `itertools.islice`
- it takes the following parameters `islice(iterable, start, stop, step)`.
- Just like regular slice the stop parameter is non inclusive.
- It returns a lazy iterator.
- When slicing iterators remember that the object get exhausted with every call to next(), that
could lead to potential issues if not handled correctly.

In [15]:
import itertools
import math

def factorials(n):
    for i in range(n):
        yield math.factorial(i)

def slice_(iterable, start, stop):
    for _ in range(0, start):
        next(iterable) # we remove the non wanted values from the iter.
    for _ in range(start, stop):
        yield next(iterable)

print("---- MY SLICE FUN ----")

print(f"factorial generator expression: {factorials(10)} sliced: factorials[0:10] -> {list(slice_(factorials(10), 0, 10))}")
print(f"factorial generator expression: {factorials(10)} sliced: factorials[6:8] -> {list(slice_(factorials(10), 6, 8))}")
print(f"slice_(factorial(10), 0, 10) is of type -> {slice_(factorials(10), 0, 10)}")

print("---- ITERTOOLS ----")

print(f"factorial generator expression: {factorials(10)} sliced: factorials[0:10] -> {list(itertools.islice(factorials(10), 0, 10))}")
print(f"factorial generator expression: {factorials(10)} sliced: factorials[6:8] -> {list(itertools.islice(factorials(10), 6, 8))}")
print(f"factorial generator expression: {factorials(10)} sliced: factorials[0:10:2] -> {list(itertools.islice(factorials(10), 0, 10, 2))}")
print(f"itertools.islice(factorial(10), 0, 10) is of type -> {itertools.islice(factorials(10), 0, 10)}")

# Notice how to print the values we need to iterate through it (calling list) that means it is a lazy iterator.

print("---- CAVEATS ----")

# We get a 20 len generator
f1 = factorials(20)
# we get a 20 len slice from the generator
f1_slice = itertools.islice(f1, 0, 20)
# Exhaust
for _ in range(10):
    next(f1)

print(f"f1_slice(f1, 0, 20) after exhausting some values from generator -> {list(f1_slice)}")
# We exhaust the f1 iterator(generator) so the slice now only had 10 values left since the generator f1 had 10 values left.

---- MY SLICE FUN ----
factorial generator expression: <generator object factorials at 0x11a348510> sliced: factorials[0:10] -> [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
factorial generator expression: <generator object factorials at 0x11a348580> sliced: factorials[6:8] -> [720, 5040]
slice_(factorial(10), 0, 10) is of type -> <generator object slice_ at 0x11a348580>
---- ITERTOOLS ----
factorial generator expression: <generator object factorials at 0x11a348580> sliced: factorials[0:10] -> [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
factorial generator expression: <generator object factorials at 0x11a348580> sliced: factorials[6:8] -> [720, 5040]
factorial generator expression: <generator object factorials at 0x11a348580> sliced: factorials[0:10:2] -> [1, 2, 24, 720, 40320]
itertools.islice(factorial(10), 0, 10) is of type -> <itertools.islice object at 0x1135529d0>
---- CAVEATS ----
f1_slice(f1, 0, 20) after exhausting some values from generator -> [3628800, 39916800, 479001

## Selecting and Filtering

We have different functions to select/filter iterators.
- `filter(predicate, iterable)`, return all elements of the iterable where the predicate evaluates to True.
    - if predicate is `None` just evaluates Truthyness of value.
- `itertools.filterfalse(predicate, iterable)`, return all elements of the iterable where the predicate evaluates to False.
    - if predicate is `None` just evaluates Falsyness of value.
- `itertools.compress(data, selectors)`, return all elements of the data where the selector evaluates to True.
    - the evaluation consists on the truthy value of the selector in the same position.
    - after the selector is exhausted then `None` is returned (`None` is falsy).
- `itertools.takewhile(predicate, iterable)`, return all elements of the iterable until a value evaluates to False.
- `itertools.dropwhile(predicate, iterable)`, return all elements of the iterable after a value evaluates to False.

---

**NOTE**: all functions return a lazy iterator.

In [32]:
from itertools import filterfalse, dropwhile, takewhile, compress
from math import sin, pi

# Generators
def gen_cubes(n):
    for i in range(n):
        print(f"yielding {i}")
        yield i**3

def sine_wave(n):
    start = 0
    max_ = 2 * pi
    step = (max_ - start) / (n - 1)
    for _ in range(n):
        yield round(sin(start), 2)
        start += step

# Predicates
def is_odd(n):
    return n % 2 == 1

print(f"generator expression gen_cubes: {gen_cubes(10)}")

print("---- FILTER ----")

print(f"filtering gen_cubes with predicate is_odd (filter func) -> {list(filter(is_odd, gen_cubes(4)))}") 

print("---- FILTERFALSE ----")

print(f"filtering gen_cubes with predicate is_odd (filterfalse func) -> {list(filterfalse(is_odd, gen_cubes(4)))}") 
# We did not need to write a new predicate to get even numbers.

print("---- TAKEWHILE ----")
print(f"filtering sine_wave with predicate lambda (takewhile func) -> {list(takewhile(lambda x: 0 <= x <= 0.9 , sine_wave(15)))}") 
# only takes up until the condition yields false, after everything is dropped.

print("---- DROPWHILE ----")
print(f"filtering sine_wave with predicate lambda (takewhile func) -> {list(dropwhile(lambda x: 0 <= x <= 0.9 , sine_wave(15)))}") 
# only takes up after the condition yields false, before everything is dropped, notice how the output + the output of takewhile get a len of 15(generator len)

print("---- COMPRESS ----")
data = ['a', 'b', 'c', 'd', 'e']
selectors = [True, False, 1, 0] # None(this value for everything after the last element)
print(f"filtering data: {data} with selectors: {selectors} -> {list(compress(data, selectors))}")
# Notice 'e' never got picked up since None evaluates to False.
print(f"filtering data: {data} with selectors: {selectors} -> {[i for i, val in zip(data, selectors) if val]}")

generator expression gen_cubes: <generator object gen_cubes at 0x11a182650>
---- FILTER ----
yielding 0
yielding 1
yielding 2
yielding 3
filtering gen_cubes with predicate is_odd (filter func) -> [1, 27]
---- FILTERFALSE ----
yielding 0
yielding 1
yielding 2
yielding 3
filtering gen_cubes with predicate is_odd (filterfalse func) -> [0, 8]
---- TAKEWHILE ----
filtering sine_wave with predicate lambda (takewhile func) -> [0.0, 0.43, 0.78]
---- DROPWHILE ----
filtering sine_wave with predicate lambda (takewhile func) -> [0.97, 0.97, 0.78, 0.43, 0.0, -0.43, -0.78, -0.97, -0.97, -0.78, -0.43, -0.0]
---- COMPRESS ----
filtering data: ['a', 'b', 'c', 'd', 'e'] with selectors: [True, False, 1, 0] -> ['a', 'c']
filtering data: ['a', 'b', 'c', 'd', 'e'] with selectors: [True, False, 1, 0] -> ['a', 'c']


## Infinite Iterators

We have some functions from itertools that return an infinite iterator.
- `itertools.count(start, step)`, it will yield the next value starting from the start parameter
and moving through the defined step (defaults to 1).
    - Takes any numeric type, not just ints like `Range`.
- `itertools.cycle(iterable)`, yields an iterator that will repeat indefinitely.
- `itertools.repeat(value, n)`, yields the value indifinitely or until n is defined.
    - Yields the same object in memory every time, this could be a caveat for immutable value types.

---

**Note** -> All return a lazy iterator.

In [1]:
from itertools import count, cycle, repeat, islice
from decimal import Decimal

print("---- COUNT ----")
count_gen_int = count(10)
count_gen_float = count(10.0, 0.1)
count_gen_ima = count(1+1j, 1+2j)
count_gen_dec = count(Decimal('0'), Decimal('0.1'))
print(f"The first 5 elements of count object: {count_gen_int} are -> {list(islice(count_gen_int, 5))}")
print(f"The first 5 elements of count object: {count_gen_float} are -> {list(islice(count_gen_float, 5))}")
print(f"The first 5 elements of count object: {count_gen_ima} are -> {list(islice(count_gen_ima, 5))}")
print(f"The first 5 elements of count object: {count_gen_dec} are -> {list(islice(count_gen_dec, 5))}")

print("---- CYCLE ----")

def colors():
    yield 'red'
    yield 'green'
    yield 'blue'

t1 = ('red', 'green', 'blue')

cycle_tup = cycle(t1)
cols = colors()
cycle_gen = cycle(cols)
print(f"The first 5 elements of cycle object: {cycle_tup} are -> {list(islice(cycle_tup, 5))}")
print(f"The first 5 elements of cycle object: {cycle_gen} are -> {list(islice(cycle_gen, 5))}, the cols gen is exhausted -> {list(cols)}")
# As we can see the colors generator function did repeat after yielding the last value.

print("---- REPEAT ----")

s1 = 'Python'

repeat_str = repeat(s1)
repeat_finite_str = repeat(s1, 3)
print(f"The first 5 elements of repeat object: {repeat_str} are -> {list(islice(repeat_str, 5))}")
print(f"The first elements of repeat finite object: {repeat_finite_str} are -> {list(repeat_finite_str)}")

---- COUNT ----
The first 5 elements of count object: count(10) are -> [10, 11, 12, 13, 14]
The first 5 elements of count object: count(10.0, 0.1) are -> [10.0, 10.1, 10.2, 10.299999999999999, 10.399999999999999]
The first 5 elements of count object: count((1+1j), (1+2j)) are -> [(1+1j), (2+3j), (3+5j), (4+7j), (5+9j)]
The first 5 elements of count object: count(Decimal('0'), Decimal('0.1')) are -> [Decimal('0'), Decimal('0.1'), Decimal('0.2'), Decimal('0.3'), Decimal('0.4')]
---- CYCLE ----
The first 5 elements of cycle object: <itertools.cycle object at 0x1063a45c0> are -> ['red', 'green', 'blue', 'red', 'green']
The first 5 elements of cycle object: <itertools.cycle object at 0x1063a4a40> are -> ['red', 'green', 'blue', 'red', 'green'], the cols gen is exhausted -> []
---- REPEAT ----
The first 5 elements of repeat object: repeat('Python') are -> ['Python', 'Python', 'Python', 'Python', 'Python']
The first elements of repeat finite object: repeat('Python', 3) are -> ['Python', 'Pyth

In [46]:
# Example
import itertools
from collections import namedtuple


Card = namedtuple('Card', 'rank suit')

def card_deck():
    ranks = tuple((str(num) for num in range(2, 11))) + tuple('JQKA')
    suits = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
    for suit in suits:
        for rank in ranks:
            yield Card(rank, suit)

# hands i want to deal out
hands = [list() for _ in range(4)]

print("---- DEALING ----")
hands_cycle = itertools.cycle(hands)
for card in card_deck():
    next(hands_cycle).append(card)

print(hands)

---- DEALING ----
[[Card(rank='2', suit='Spades'), Card(rank='6', suit='Spades'), Card(rank='10', suit='Spades'), Card(rank='A', suit='Spades'), Card(rank='5', suit='Hearts'), Card(rank='9', suit='Hearts'), Card(rank='K', suit='Hearts'), Card(rank='4', suit='Diamonds'), Card(rank='8', suit='Diamonds'), Card(rank='Q', suit='Diamonds'), Card(rank='3', suit='Clubs'), Card(rank='7', suit='Clubs'), Card(rank='J', suit='Clubs')], [Card(rank='3', suit='Spades'), Card(rank='7', suit='Spades'), Card(rank='J', suit='Spades'), Card(rank='2', suit='Hearts'), Card(rank='6', suit='Hearts'), Card(rank='10', suit='Hearts'), Card(rank='A', suit='Hearts'), Card(rank='5', suit='Diamonds'), Card(rank='9', suit='Diamonds'), Card(rank='K', suit='Diamonds'), Card(rank='4', suit='Clubs'), Card(rank='8', suit='Clubs'), Card(rank='Q', suit='Clubs')], [Card(rank='4', suit='Spades'), Card(rank='8', suit='Spades'), Card(rank='Q', suit='Spades'), Card(rank='3', suit='Hearts'), Card(rank='7', suit='Hearts'), Card(ra

## Chaining and Teeing

From `itertools` we have some functions to chain and tee iterables/iterator. Again keep in mind they return Lazy iterators.
- `itertools.chain(*args)`, this is analogous to sequence concatenation.
    - every `arg` in `args` should be an iterator
    - we could unpack them from an iterable, however, unpacking is **Eager** so it evaluates all of the iterators at once.
- `itertools.chain_from_iterable(it)`, receives an iterable and chains the iterators.
    - Returns an iterator so we could still have a lazy evaluation for our values.
- `itertools.tee(iterable, n)`, receives an iterable and returns a tuple of the iterators n number of times.
    - all are different objects in memory.
    - returns iterators so they get exhausted.

In [10]:
from itertools import chain, tee

l1 = (i**2 for i in range(4))
l2 = (i**2 for i in range(4, 8))
l3 = (i**2 for i in range(8, 12))

def squares():
    yield (i**2 for i in range(4))
    yield (i**2 for i in range(4, 8))
    yield (i**2 for i in range(8, 12))

print("---- CHAIN ----")

def chain_iterables(*iterables):
    for iterable in iterables:
        yield from iterable

print(f"generator expression l1, l2 and l3 chained with my function -> {list(chain_iterables(l1, l2, l3))}")
print(f"generator expression l1, l2 and l3 chained itertools.chain -> {list(chain(l1, l2, l3))}")
list1 = [l1, l2, l3]
print(f"a list of my generator expression list1: {list1} chained together will return the generator expression-> {list(chain(list1))}.")
print(f"to get the values from my iterables inside an iterable we need to unpack what we get from chained -> {[el for el in chain(*squares())]}")

print("---- CHAIN.FROM_ITERABLE ----")

def chain_from_iterable(iterable):
    for item in iterable:
        yield from item

print(f"generator function squares: {squares()} chained with chain_from_iterable -> {list(chain.from_iterable(squares()))}")
print(f"generator function squares: {squares()} chained itertools.chain.chain_iterables -> {list(chain.from_iterable(squares()))}")

print("---- TEE ----")

def squares(n):
    for i in n:
        yield i ** 2

print(f"Teeing generator function squares: {squares(10)} results in a tuple with three different iterators {tee(squares(10), 3)}")
# they are 3 copys of the iterators which are independent.
# they will get exhausted and are lazy iterators.

---- CHAIN ----
generator expression l1, l2 and l3 chained with my function -> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
generator expression l1, l2 and l3 chained itertools.chain -> []
a list of my generator expression list1: [<generator object <genexpr> at 0x10d28f840>, <generator object <genexpr> at 0x10d28f530>, <generator object <genexpr> at 0x10d28fdf0>] chained together will return the generator expression-> [<generator object <genexpr> at 0x10d28f840>, <generator object <genexpr> at 0x10d28f530>, <generator object <genexpr> at 0x10d28fdf0>].
to get the values from my iterables inside an iterable we need to unpack what we get from chained -> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
---- CHAIN.FROM_ITERABLE ----
generator function squares: <generator object squares at 0x10d28fa00> chained with chain_from_iterable -> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]
generator function squares: <generator object squares at 0x10d28fa00> chained itertools.chain.chain_iterable

## Mapping and Reducing

map and reduce work by first mapping a function to each of the elements of an iterable and reducing said iterable to a value by accumulating a result from a given function (i.e sum, max, min)

`map(fn, iterable)`, applies a function to every item in an iterable.
    - it yields each value so it evaluates lazyly
    - it just loops through the iterable so any iterable with nested iterables just get returned complete.
`functools.reduce(fn, iterable, start)`, applies a function on each of the elements by accumulating the result.
    - returns just the final value.

From `itertools` we have some functions to extend the functionality of `map` and `reduce`. again keep in mind they return Lazy iterators.
- `itertools.starmap(fn, iter)`, this map function works with an iterable of iterables, which applies the fn to each value of each of the iterables.
- `itertools.accumulate(it, fn)`, receives an iterable and a function that reduces the output to a single value, however, each step is yielded.
    - yields each step.

In [29]:
print("---- MAP ----")

maps = map(lambda x: x**2, range(5))
print(f"map returns an iterator, {iter(maps) is maps}")
print(f"maps iterator returns values -> {list(maps)}")

# There is a limitation on map when it comes to an iterable of iterables, since it just returns each value in the outer iterable
def add(x, y):
    return x + y

print(f"a map can work with a function that requires unpacking but it is cumbersome -> {list(add(*t) for t in [(0, 1), range(2, 4)])}")

print("---- STARMAP ----")

from itertools import starmap
print(f"startmap works with nested iterables making it easy to apply a function to each -> {list(starmap(add, [(0,1), range(2, 4)]))}")


print("---- REDUCE ----")
from functools import reduce
print(f"multiplying a list values by using reduce -> {reduce(lambda x, y: x*y, [1, 2, 3, 4, 5])}")
print(f"multiplying but this time we start with 10 -> {reduce(lambda x, y: x*y, [1, 2, 3, 4, 5], 10)}")

# with the reduce function we don't get the intermediate results
def sum_(iterable):
    it = iter(iterable)
    acc = next(it)
    for element in it:
        acc += element
        yield acc

def running_reduce(fn, iterable, start=None):
    it = iter(iterable)
    if start is None:
        acc = next(it)
    else:
        acc = start
    yield acc
    for item in it:
        acc = fn(acc, item)
        yield acc

print(f"creating our own reduce/sum_ that yields intermediate results -> {list(sum_([1,2,3,4,5]))}")
print(f"Creating our own running_reduce function -> {list(running_reduce(lambda x, y: x*y, [1, 2, 3, 4, 5]))}")
print(f"Creating our own running_reduce function with a start -> {list(running_reduce(lambda x, y: x*y, [1, 2, 3, 4, 5], 10))}")

# there are already built-ins that do this.
import operator
print(f"operator.mul to get the same result -> {list(running_reduce(operator.mul,[1, 2, 3, 4, 5]))}")

print("---- ACCUMULATE ----")
from itertools import accumulate, chain

print(f"accumulate function words as our running_reduce with the default to sum -> {list(accumulate([1, 2, 3, 4, 5]))}")
print(f"accumulate with the multiplication -> {list(accumulate([1, 2, 3, 4, 5], operator.mul))}")
print(f"to get the same behaviour as with the parameter start -> {list(accumulate(chain((10,), [1, 2, 3, 4, 5]), operator.mul))}")

---- MAP ----
map returns an iterator, True
maps iterator returns values -> [0, 1, 4, 9, 16]
a map can work with a function that requires unpacking but it is cumbersome -> [1, 5]
---- STARMAP ----
startmap works with nested iterables making it easy to apply a function to each -> [1, 5]
---- REDUCE ----
multiplying a list values by using reduce -> 120
multiplying but this time we start with 10 -> 1200
creating our own reduce/sum_ that yields intermediate results -> [3, 6, 10, 15]
Creating our own running_reduce function -> [1, 2, 6, 24, 120]
Creating our own running_reduce function with a start -> [10, 10, 20, 60, 240, 1200]
operator.mul to get the same result -> [1, 2, 6, 24, 120]
---- ACCUMULATE ----
accumulate function words as our running_reduce with the default to sum -> [1, 3, 6, 10, 15]
accumulate with the multiplication -> [1, 2, 6, 24, 120]
to get the same behaviour as with the parameter start -> [10, 10, 20, 60, 240, 1200]


## Zipping

Zipping in python is done by the built-in `zip` function, however, to extend the functionality we also have the `zip_longest`
- `zip(*args)`, receives an arbitrary number of arguments (iterables) and returns each element of each iterable iterated by one.
    - it's length is determined by the shortest iterable.
    - We can use an infinite iterator as long as we provide a finite iterable.
- `itertools.zip_longest(*args)`, like the zip but instead of stopping on the shortest iterable it does on the longest.
    - Be mindful of infinite iterators, since it will always be the longest.

In [4]:
l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3, 4]
l3 = [1, 2, 3]

print("---- ZIP ----")

result = zip(l1, l2, l3)
print(f"we can see that zip function call: {result} is an iterable -> {iter(result) is result and '__next__' in dir(result)}")


def integer(n):
    for i in range(n):
        yield i

def squares(n):
    for i in range(n):
        yield i ** 2

def cubes(n):
    for i in range(n):
        yield i ** 3

print(f"we can also zip lazy iterators together -> {list(zip(integer(6), squares(5), cubes(4)))}")

print("---- ZIP_LONGEST ----")
from itertools import zip_longest

result = zip_longest(l1, l2, l3)

print(f"we can see that zip_longest function call: {result} is an iterable -> {iter(result) is result and '__next__' in dir(result)}")
print(f"result zip_longest returns all lists zipped with the length of the longest l1: {l1} -> {list(result)}")

print(f"we can also zip_longest lazy iterators together -> {list(zip_longest(integer(6), squares(5), cubes(4)))}")


---- ZIP ----
we can see that zip function call: <zip object at 0x10799b200> is an iterable -> True
we can also zip lazy iterators together -> [(0, 0, 0), (1, 1, 1), (2, 4, 8), (3, 9, 27)]
---- ZIP_LONGEST ----
we can see that zip_longest function call: <itertools.zip_longest object at 0x1075f6fc0> is an iterable -> True
result zip_longest returns all lists zipped with the length of the longest l1: [1, 2, 3, 4, 5] -> [(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, None), (5, None, None)]
we can also zip_longest lazy iterators together -> [(0, 0, 0), (1, 1, 1), (2, 4, 8), (3, 9, 27), (4, 16, None), (5, None, None)]


## Grouping

`itertools.groupby(data, [keyfunc])`, groups data(iterable) based off of the keyfunc provided as a parameter.
- Does not sort the values based on the key, only groups. Make sure to sort the data
- works by calling `next()` on the underlying iterator and not on the `groupby` object.
- consumes the whole group once a `next(groupby())` is called if there are items in the group left.

In [23]:
from itertools import groupby
l1 = [1,2,3,3,3]
gb1 = groupby(l1)
l2 = [1,2,3,3,2,1]
gb2 = groupby(l2)
print(f"we get {len(list(gb1))} iterator from the group by on l1: {l1} if sorted")
print(f"we get {len(list(gb2))} iterator from the group by on l2: {l2} if unsorted")

print("---- PRINTING A GROUPBY ----")
gb1 = groupby(l1)
for group_key, group in gb1:
    print(group_key, list(group))

print("---- GROUPING WITH A KEY ----")

def gen_groups():
    for key in range(1, 4):
        for i in range(3):
            yield (key, i)

g = gen_groups()
gb3 = groupby(g, key=lambda x: x[0])

for group_key, group in gb3:
    print(group_key, list(group))

# the original g iterator is exhausted and the groups of the group by are now iterators.

we get 3 iterator from the group by on l1: [1, 2, 3, 3, 3] if sorted
we get 5 iterator from the group by on l2: [1, 2, 3, 3, 2, 1] if unsorted
---- PRINTING A GROUPBY ----
1 [1]
2 [2]
3 [3, 3, 3]
---- GROUPING WITH A KEY ----
1 [(1, 0), (1, 1), (1, 2)]
2 [(2, 0), (2, 1), (2, 2)]
3 [(3, 0), (3, 1), (3, 2)]


In [35]:
from itertools import groupby
from collections import defaultdict

print("---- CHECKING DATA ----")
with open('cars_2014.csv') as f:
    print(f"opened file f: {f} is an iterator -> {iter(f) is f and '__next__' in dir(f)}")
    for row in itertools.islice(f, 0, 8): # slice iterator.
        print(row, end='')


print("---- DEFAULTDICT ----")
print("we can use the defaultdict in python to count all occurences of each model on the file")
makes = defaultdict(int)
with open('cars_2014.csv') as f:
    # skip the header
    next(f)
    for row in f:
        make, _ = row.strip('\n').split(',')
        makes[make] += 1

print("---- GROUP BY ----")

with open('cars_2014.csv') as f:
    next(f)
    # as f is an iterator and groupby just returns an iterator if we leave the context manager we will be working on a closed file.
    make_groups = itertools.groupby(f, lambda x: x.split(',')[0])
    make_counts = ((key, sum(1 for i in models)) for key, models in make_groups)
    print(list(make_counts))



---- CHECKING DATA ----
opened file f: <_io.TextIOWrapper name='cars_2014.csv' mode='r' encoding='UTF-8'> is an iterator -> True
make,model
ACURA,ILX
ACURA,MDX
ACURA,RDX
ACURA,RLX
ACURA,TL
ACURA,TSX
ALFA ROMEO,4C
---- DEFAULTDICT ----
we can use the defaultdict in python to count all occurences of each model on the file
---- GROUP BY ----
[('ACURA', 6), ('ALFA ROMEO', 2), ('APRILIA', 4), ('ARCTIC CAT', 96), ('ARGO', 4), ('ASTON MARTIN', 5), ('AUDI', 27), ('BENTLEY', 2), ('BLUE BIRD', 1), ('BMW', 86), ('BUGATTI', 1), ('BUICK', 5), ('CADILLAC', 7), ('CAN-AM', 61), ('CHEVROLET', 33), ('CHRYSLER', 2), ('DODGE', 7), ('DUCATI', 4), ('FERRARI', 6), ('FIAT', 2), ('FORD', 34), ('FREIGHTLINER', 7), ('GMC', 12), ('HARLEY DAVIDSON', 29), ('HINO', 7), ('HONDA', 91), ('HUSABERG', 4), ('HUSQVARNA', 9), ('HYUNDAI', 13), ('INDIAN', 3), ('INFINITI', 8), ('JAGUAR', 9), ('JEEP', 5), ('JOHN DEERE', 19), ('KAWASAKI', 59), ('KENWORTH', 11), ('KIA', 10), ('KTM', 13), ('KUBOTA', 4), ('KYMCO', 28), ('LAMBORGHIN

## Combinatorics

`itertools.product(*args)`, Cartessian product of the iterables passed in the args.


In [13]:
import itertools

print("---- CARTESSIAN PRODUCT - NESTED LOOP ----")

def matrix(n):
    for i in range(1, n+1):
        for j in range(1, n+1):
            yield f"{i} x {j} = {i*j}"

print(f"Cartessian product result for {matrix(10)} sliced to get from 10 to 20 -> {list(itertools.islice(matrix(10), 10, 20))}")

print("---- CARTESSIAN PRODUCT - ITERTOOLS ----")

l1 = ('x1', 'x2', 'x3', 'x4')
l2 = ('y1', 'y2', 'y3')
cp1 = itertools.product(l1, l2)

print(f"Cartessian product of {l1} and {l2} -> {list(cp1)}")

def matrix(n):
    for pr in itertools.product(*itertools.tee(range(1, n+1), 2)): # Tee unpacked gets two of the same and unpacking evaluates (unpacking is Eager)
        yield  f"{pr[0]} x {pr[1]} = {pr[0] * pr[1]}"

print(f"the same cartessian product for the matrix n using itertools -> {list(itertools.islice(matrix(10), 10, 20))}")

---- CARTESSIAN PRODUCT - NESTED LOOP ----
Cartessian product result for <generator object matrix at 0x10d5cc4a0> sliced to get from 10 to 20 -> ['2 x 1 = 2', '2 x 2 = 4', '2 x 3 = 6', '2 x 4 = 8', '2 x 5 = 10', '2 x 6 = 12', '2 x 7 = 14', '2 x 8 = 16', '2 x 9 = 18', '2 x 10 = 20']
---- CARTESSIAN PRODUCT - ITERTOOLS ----
Cartessian product of ('x1', 'x2', 'x3', 'x4') and ('y1', 'y2', 'y3') -> [('x1', 'y1'), ('x1', 'y2'), ('x1', 'y3'), ('x2', 'y1'), ('x2', 'y2'), ('x2', 'y3'), ('x3', 'y1'), ('x3', 'y2'), ('x3', 'y3'), ('x4', 'y1'), ('x4', 'y2'), ('x4', 'y3')]
the same cartessian product for the matrix n using itertools -> ['2 x 1 = 2', '2 x 2 = 4', '2 x 3 = 6', '2 x 4 = 8', '2 x 5 = 10', '2 x 6 = 12', '2 x 7 = 14', '2 x 8 = 16', '2 x 9 = 18', '2 x 10 = 20']


In [24]:
from fractions import Fraction

print("---- ALL TOGETHER ----")

print("---- GRID GENERATOR ----")
def grid(min_val, max_val, step, *, num_dimensions=2):
    axis = itertools.takewhile(lambda x: x <= max_val,
    itertools.count(min_val, step))
    axes = itertools.tee(axis, num_dimensions)
    return itertools.product(*axes)

print(f"grid from -1 to 1 in steps of 0.5, in 3D -> {list(grid(-1, 1, 0.5, num_dimensions=3))}")

print("---- ROLLING A DICE BRUTEFORCE ----")

sample_space = list(itertools.product(range(1, 7), range(1, 7)))
outcomes = list(filter(lambda x: sum(x) == 8, sample_space))

print(Fraction(len(outcomes), len(sample_space)))



---- ALL TOGETHER ----
---- GRID GENERATOR ----
grid from -1 to 1 in steps of 0.5, in 3D -> [(-1, -1, -1), (-1, -1, -0.5), (-1, -1, 0.0), (-1, -1, 0.5), (-1, -1, 1.0), (-1, -0.5, -1), (-1, -0.5, -0.5), (-1, -0.5, 0.0), (-1, -0.5, 0.5), (-1, -0.5, 1.0), (-1, 0.0, -1), (-1, 0.0, -0.5), (-1, 0.0, 0.0), (-1, 0.0, 0.5), (-1, 0.0, 1.0), (-1, 0.5, -1), (-1, 0.5, -0.5), (-1, 0.5, 0.0), (-1, 0.5, 0.5), (-1, 0.5, 1.0), (-1, 1.0, -1), (-1, 1.0, -0.5), (-1, 1.0, 0.0), (-1, 1.0, 0.5), (-1, 1.0, 1.0), (-0.5, -1, -1), (-0.5, -1, -0.5), (-0.5, -1, 0.0), (-0.5, -1, 0.5), (-0.5, -1, 1.0), (-0.5, -0.5, -1), (-0.5, -0.5, -0.5), (-0.5, -0.5, 0.0), (-0.5, -0.5, 0.5), (-0.5, -0.5, 1.0), (-0.5, 0.0, -1), (-0.5, 0.0, -0.5), (-0.5, 0.0, 0.0), (-0.5, 0.0, 0.5), (-0.5, 0.0, 1.0), (-0.5, 0.5, -1), (-0.5, 0.5, -0.5), (-0.5, 0.5, 0.0), (-0.5, 0.5, 0.5), (-0.5, 0.5, 1.0), (-0.5, 1.0, -1), (-0.5, 1.0, -0.5), (-0.5, 1.0, 0.0), (-0.5, 1.0, 0.5), (-0.5, 1.0, 1.0), (0.0, -1, -1), (0.0, -1, -0.5), (0.0, -1, 0.0), (0.0, -1,

`itertools.permutations(iterable, r)`, r is the size of the permutation based on the iterable passed as an arg.
- Remember in permutations order is important.
- Remember if an element is repeated then two outputs may look the same but point to different objects.

`itertools.combinations(iterable, r)`, r is the size of the combination based on the iterable passed as an arg.
- Remember in combinations order is not important.
- We can have `itertools.combinations_with_replacement(iterable, r)` where the values can be repeated, order is still not important.

In [40]:
l1 = 'abc'
l2 = 'abca'
l3 = '1234'

print("---- PERMUTATIONS ----")
print(f"The permutations for list l1: {l1} size 3 are -> {list(itertools.permutations(l1))}")
print(f"The permutations for list l2: {l2} size 4 are -> {list(itertools.permutations(l2))}")

print("---- COMBINATIONS ----")
print(f"The combinations for list l1: {l1} size 3 are -> {list(itertools.combinations(l1, r=3))}")
print(f"The combinations for list l2: {l2} size 4 are -> {list(itertools.combinations(l2, r=4))}")
print(f"The combinations with replacement for list l3: {l3} size 2 are -> {list(itertools.combinations_with_replacement(l3, r=4))}")

---- PERMUTATIONS ----
The permutations for list l1: abc size 3 are -> [('a', 'b', 'c'), ('a', 'c', 'b'), ('b', 'a', 'c'), ('b', 'c', 'a'), ('c', 'a', 'b'), ('c', 'b', 'a')]
The permutations for list l2: abca size 4 are -> [('a', 'b', 'c', 'a'), ('a', 'b', 'a', 'c'), ('a', 'c', 'b', 'a'), ('a', 'c', 'a', 'b'), ('a', 'a', 'b', 'c'), ('a', 'a', 'c', 'b'), ('b', 'a', 'c', 'a'), ('b', 'a', 'a', 'c'), ('b', 'c', 'a', 'a'), ('b', 'c', 'a', 'a'), ('b', 'a', 'a', 'c'), ('b', 'a', 'c', 'a'), ('c', 'a', 'b', 'a'), ('c', 'a', 'a', 'b'), ('c', 'b', 'a', 'a'), ('c', 'b', 'a', 'a'), ('c', 'a', 'a', 'b'), ('c', 'a', 'b', 'a'), ('a', 'a', 'b', 'c'), ('a', 'a', 'c', 'b'), ('a', 'b', 'a', 'c'), ('a', 'b', 'c', 'a'), ('a', 'c', 'a', 'b'), ('a', 'c', 'b', 'a')]
---- COMBINATIONS ----
The combinations for list l1: abc size 3 are -> [('a', 'b', 'c')]
The combinations for list l2: abca size 4 are -> [('a', 'b', 'c', 'a')]
The combinations with replacement for list l3: 1234 size 2 are -> [('1', '1', '1', '1')

In [56]:
from collections import namedtuple
from random import sample

print("---- ALL TOGETHER ----")
SUITS = 'SHDC'
RANKS = tuple(map(str, range(2, 11))) + tuple('JQKA')
Card = namedtuple("Card", "rank suit")


deck = (Card(rank, suit) for suit, rank in itertools.product(SUITS, RANKS))
sample_space = itertools.combinations(deck, 4)

total = 0
acceptable = 0

for outcome in sample_space:
    total += 1
    if all(map(lambda x: x.rank == 'A', outcome)):
        acceptable += 1

print(f"total={total}, acceptable={acceptable}")
print(f'odds = {Fraction(acceptable, total)}')
print(f"odds = {acceptable/total:.10f}")

---- ALL TOGETHER ----
total=270725, acceptable=1
odds = 1/270725
odds = 0.0000036938


## Exercise

### Goal 1

Create lazy iterators for each of the four files.
- Return Namedtuples
- DataTypes are appropiate to the column.
- The 4 iterators are independent of each other.

In [167]:
import csv
import datetime
from collections import namedtuple

def parse_int(value, *, default=None):
    try:
        return int(value)
    except ValueError:
        return default

def parse_date(value, *, default=None):
    try:
        return datetime.datetime.strptime(value, "%Y-%m-%dT%H:%M:%SZ").date()
    except ValueError:
        return default

def parse_string(value, *, default=None):
    try:
        cleaned = value.strip()
        if not cleaned:
            return default
        else:
            return cleaned
    except ValueError:
        return default

def parse_row(Row, row, column_parsers, *, default=None):
    parsed_row = [func(field) for func, field in zip(column_parsers, row)]
    if all(item is not None for item in parsed_row):
        return Row(*parsed_row)
    else:
        return default

def row_reader(file_name):
    with open(file_name) as f:
        reader = csv.reader(f, delimiter=",", quotechar='"')
        yield from reader


def file_reader(file_name, column_parser):
    file = row_reader(file_name)
    file_object = namedtuple(file_name.split('.')[0].capitalize(), next(file))
    for row in file:
        yield parse_row(file_object, row,column_parser)

In [168]:
car_column_parser = (parse_string, parse_string, parse_string, parse_int)
emp_column_parser = (parse_string, parse_string, parse_string, parse_string,)
pi_column_parser = (parse_string, parse_string, parse_string, parse_string, parse_string,)
us_column_parser = (parse_string,parse_date,parse_date,)

column_row_parsers = (car_column_parser, emp_column_parser, pi_column_parser, us_column_parser)
file_names = ("vehicles.csv", "employment.csv", "personal_info.csv", "update_status.csv",)

# Verify
for file, parser in zip(file_names, column_row_parsers):
    print(list(itertools.islice(file_reader(file, parser),2)))

[Vehicles(ssn='100-53-9824', vehicle_make='Oldsmobile', vehicle_model='Bravada', model_year=1993), Vehicles(ssn='101-71-4702', vehicle_make='Ford', vehicle_model='Mustang', model_year=1997)]
[Employment(employer='Stiedemann-Bailey', department='Research and Development', employee_id='29-0890771', ssn='100-53-9824'), Employment(employer='Nicolas and Sons', department='Sales', employee_id='41-6841359', ssn='101-71-4702')]
[Personal_info(ssn='100-53-9824', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic'), Personal_info(ssn='101-71-4702', first_name='Cayla', last_name='MacDonagh', gender='Female', language='Lao')]
[Update_status(ssn='100-53-9824', last_updated=datetime.date(2017, 10, 7), created=datetime.date(2016, 1, 24)), Update_status(ssn='101-71-4702', last_updated=datetime.date(2017, 1, 23), created=datetime.date(2016, 1, 27))]


### Goal 2

Create a single iterable that combines all the data from all four files.
- Re-use iterators from goal 1
- Return one row per SSN containing data from all four files in a single named tuple.

In [190]:
def get_headers():
    files_header_1, files_header_2, files_header_3, files_header_4 = [next(row_reader(file)) for file in file_names]
    return list(dict.fromkeys([*files_header_1, *files_header_2, *files_header_3, *files_header_4]))

def whole_file():
    list_gen = [file_reader(file, parser) for file, parser in zip(file_names, column_row_parsers)]
    union_list = zip(*list_gen)
    headers = get_headers()
    Row = namedtuple("Row", headers)
    result = (itertools.chain.from_iterable(zipped_tuple) for zipped_tuple in union_list)

    for row in result:
        row = list(dict.fromkeys(row))
        yield Row(*row)

list(itertools.islice(whole_file(), 1))


[Row(ssn='100-53-9824', vehicle_make='Oldsmobile', vehicle_model='Bravada', model_year=1993, employer='Stiedemann-Bailey', department='Research and Development', employee_id='29-0890771', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic', last_updated=datetime.date(2017, 10, 7), created=datetime.date(2016, 1, 24))]

### Goal 3

create a function that filters on the iterator from goal 2 and returns an iterator without stale records.
- stale, records with last update date < 3/1/2017

In [201]:
def filter_stale():
    filtered_data = filter(lambda x: x.last_updated >= datetime.datetime(2017, 3, 1).date(), whole_file())
    yield from filtered_data

list(itertools.islice(filter_stale(), 1))

[Row(ssn='100-53-9824', vehicle_make='Oldsmobile', vehicle_model='Bravada', model_year=1993, employer='Stiedemann-Bailey', department='Research and Development', employee_id='29-0890771', first_name='Sebastiano', last_name='Tester', gender='Male', language='Icelandic', last_updated=datetime.date(2017, 10, 7), created=datetime.date(2016, 1, 24))]

### Goal 4

For all non-stale records from goal 3 generate a list of number of car makes by gender.
- to verify the largest are:
    - Female -> Ford and Chevrolet (both 42 persons in those groups)
    - Male -> Ford (40 persons in the group)

In [232]:
def group_data(gender):
    data = filter_stale()
    data_g = (row for row in data if row.gender == gender)
    sorted_g = sorted(data_g, key=lambda x: x.vehicle_make)
    group_g = itertools.groupby(sorted_g, key=lambda x: x.vehicle_make)
    group_g_counts = ((g, len(list(v))) for g, v in group_g)
    return sorted(group_g_counts, key=lambda x: x[1], reverse=True)

print("---- FEMALE ----")
for group, value in group_data('Female'):
    print(group, value)
print("---- MALE ----")
for group, value in group_data('Male'):
    print(group, value)

---- FEMALE ----
Chevrolet 42
Ford 42
GMC 22
Mitsubishi 22
Toyota 20
Dodge 17
Mercedes-Benz 17
Lexus 15
Pontiac 14
Audi 13
Mazda 13
Volvo 13
BMW 12
Nissan 12
Suzuki 12
Buick 11
Volkswagen 10
Acura 9
Infiniti 9
Kia 9
Honda 8
Land Rover 8
Oldsmobile 8
Cadillac 6
Chrysler 6
Subaru 6
Jeep 5
Lotus 5
Mercury 5
Bentley 4
Hyundai 4
Lincoln 4
Isuzu 3
Jaguar 3
Plymouth 3
Porsche 3
Saab 3
Saturn 3
Aston Martin 2
Lamborghini 2
Scion 2
Austin 1
Bugatti 1
Eagle 1
Geo 1
Morgan 1
Panoz 1
Rolls-Royce 1
---- MALE ----
Ford 40
Chevrolet 30
GMC 28
Mitsubishi 28
Dodge 22
Toyota 21
Mercedes-Benz 19
Volkswagen 16
Audi 14
Buick 13
Mazda 13
BMW 12
Mercury 11
Pontiac 11
Volvo 10
Cadillac 9
Honda 9
Hyundai 8
Saab 8
Subaru 8
Acura 7
Infiniti 7
Jeep 7
Lexus 6
Nissan 6
Kia 5
Lincoln 5
Lotus 5
Oldsmobile 5
Jaguar 4
Lamborghini 4
Plymouth 4
Porsche 4
Aston Martin 3
Bentley 3
Chrysler 3
Isuzu 3
Land Rover 3
Maserati 3
Saturn 3
Geo 2
Maybach 2
Panoz 2
Suzuki 2
Aptera 1
Austin 1
Corbin 1
Daewoo 1
Eagle 1
Jensen 1
Rolls-