## Aggregators
- Functions that iterate through an interable and return a single value that (usually) takes into account every element of the iterable.
- Examples: **`min(iterable)`**, **`max(iterable)`**
<br><br>

### Associated truth values:
- Every object in Python as an associated truth value - bool(obj) --> True/False
- Every object is  <span style='color:yellow; font-weight:bold '> truthy</span>, except:
    * None
    * False
    * 0 (in any numeric type)
    * empty sequences (list, tuple, string, ...)
    * empty mapping types (dictionary, set, ...)
    * custom classes that implement a ** \__bol__ or ** ** \__len__ ** methat that returns **False** or **0**
    * Python first looks for the   <span style='color:yellow; font-weight:bold; margin: 0, 5px'> \_\_bool\_\_</span> method, but if it does not exist, then Python uses the <span style='color:yellow; font-weight:bold; margin: 0, 5px'> \_\_len\_\_</span> function to determine the <span style='color:yellow; font-weight:bold; margin: 0, 5px'> truth value </span>

<br>

### Predicate
 - A function that takes a single argument and returns **True** or **False** is called a  <span style='color:yellow; font-weight:bold'>predicate</span>

<br>

### Any
- any(iterable) --> returns   <span style='color:yellow; font-weight:bold; margin: 0, 5px'> True </span> if <span style='color:yellow; font-weight:bold; margin: 0, 5px'> any (or more) </span> element in iterable is  <span style='color:yellow; font-weight:bold; margin: 0, 5px'> truthy </span>

### All
- all(iterable) --> returns **`True`** if all elements in iterable are truthy

### Map
- map(fn, iterable) --> applies fn to every element of the iterable
- alternatively we can use a comprehension (fn(item) for item in iterable)

### Filter
- filter(predicate, iterable) --> returns all elements of the iterable where predicate(element) is **`True`**
- predicate can be **`None`** - in which case we filter based on the **`thruthiness`** of each element.
- returns a lazy iterator


In [10]:
from numbers import Number
seq = [50, 65, 80.5, '90']

# Predicate
is_number = lambda item: isinstance(item, Number)

# Test whether each item in the sequence is a number
print(list(map(is_number, seq)))

# Test whether any of the items in the sequence is a number
print(any(map(is_number, seq)))

# Test whether all of the items in the sequence is a number
print(all(map(is_number, seq)))

[True, True, True, False]
True
False


### Doing the same thing with the  <span style='color:yellow; font-weight:bold; margin: 0, 5px'>reduce</span> function

In [8]:
from functools import reduce

seq = [50, 65, 80.5, '90']

# Replicate the any function
predicate = lambda i, j: isinstance(i, Number) or isinstance(j, Number)
is_any = reduce(predicate, seq)

# Replicate the all funciton
predicate = lambda i, j: isinstance(i, Number) and isinstance(j, Number)
is_all = reduce(predicate, seq)

print('is any: ', is_any)
print('is all: ', is_all)

is any:  True
is all:  False


In [15]:
# Check whether all car brands are at least 3 characters
with open('../../../datasets/python-deep-dive/car-brands-1.txt') as f: # f is an iterator
    predicate = lambda line: len(line.strip()) >= 3
    result = all(map(predicate, f))

print('All car brands are at least three characters? ', result)

# Alternatively we could write
with open('../../../datasets/python-deep-dive/car-brands-1.txt') as f: # f is an iterator
    result = all(len(row) >=4 for row in f) 

print('All car brands are at least three characters? ', result)

All car brands are at least three characters?  True
All car brands are at least three characters?  True


### Slicing generators using  <span style='color:yellow; font-weight:bold; margin: 0, 5px'> islice </span>
- We can not slice a generator the same way we slice a sequence
- Instead we use islice from itertools
- islice returns a  <span style='color:yellow; font-weight:bold; margin: 0, 5px'>lazy iterator</span>

In [20]:
import math
from itertools import islice

def factorials(n):
    for i in range(n):
        yield math.factorial(i)

num_factorials = 100
start = 3
stop = 10
        
print('islice: ', list(islice(factorials(num_factorials), start, stop)))

islice:  [6, 24, 120, 720, 5040, 40320, 362880]


In [17]:
# The islice function works a bit like this
def slice_(iterable, start, stop):
    for _ in range(0, start):
        next(iterable)
    for _ in range(start, stop):
        yield next(iterable)
        
print('slice_: ', list(slice_(factorials(100), 3, 10)))

slice_:  [6, 24, 120, 720, 5040, 40320, 362880]


### Selecting and filtering

- <span style='color:yellow; font-weight:bold; margin: 0, 5px'>Filter</span> returns all element of an iterable where  <span style='color:yellow; font-weight:bold; margin: 0, 5px'> predicate(element) </span> is  <span style='color:yellow; font-weight:bold; margin: 0, 5px'>True</span>

- Filter returns a   <span style='color:yellow; font-weight:bold; margin: 0, 5px'> lazy iterator </span>

In [42]:
seq = [50, 65, 80.5, 90]

# Test whether any of the numbers are below 60
# Method 1 - using the map function
test = list(map(lambda num: num < 60, seq))
print('Method 1:', test, 'Numbers below 60 =', [x for x, y in zip(seq, test) if y is True])

zipped
# Method 2 - list comprehension (Check the truthiness of y)
test = [(lambda num: num < 60)(num) for num in seq]
print('Method 2:', test, 'Numbers below 60 =', [x for x, y in zip(seq, test) if y])


# Method 3 - using the filter function
test = list(filter(lambda num: num < 60, seq))
print('Method 3: Numbers below 60 =', test)


# method 4 - using filterfalse function
from itertools import filterfalse
test = list(filterfalse(lambda num: num > 60, seq))
print('Method 4: Numbers below 60 =', test)

Method 1: [True, False, False, False] Numbers below 60 = [50]
Method 2: [True, False, False, False] Numbers below 60 = [50]
Method 3: Numbers below 60 = [50]
Method 4: Numbers below 60 = [50]


### Compress
- Filter one iterable by using the **thruthiness** in another iterable
- Returns a **lazy iterator**

In [45]:
from itertools import compress

data = ['a', 'b', 'c', 'd']
selectors = [True, [], False, 1, 0]

# find the items in data where the corresponding index values in selectors are truthy
print(list(compress(data, selectors)), 'are truthy')

# using list comprehension to do the same thing
print([item for item, truth_value in zip(data, selectors) if truth_value], 'are truthy')

['a', 'd'] are truthy
['a', 'd'] are truthy


### Takewhile
- returns an interator that will yield irems while pred(item) is truthy - and then stop as soon as pred(item) returns an untruthy value
- returns a lazy iterator

In [57]:
from itertools import takewhile
seq = [1, 3, 5, 2, 6]

# return all elements in the list that are less
# than 5, and stop the iteration as soon as an 
# element does not satisfy the requirement
list(takewhile(lambda x: x < 5, seq))

[1, 3]

### Dropwhile
- returns a lazy iterator that will start iterating (and yield all remaining elements) once pred(item) becomes False

In [58]:
from itertools import dropwhile
seq = [1, 3, 5, 2, 6]

# return all elements in the list after iterating
# over the first element that does not satisfy
# the requirement
list(dropwhile(lambda x: x < 5, seq))

[5, 2, 6]

### Infinite iterators

In [None]:
from itertools import count, cycle, repeat, islice

In [49]:
# count
g = count(10)
h = count(10, 0.01)

# g and h are infinite iterators so we have to slice them using islice
print('g sliced: ', list(islice(g, 5)))
print('h sliced ', list(islice(h, 5)))

g sliced:  [10, 11, 12, 13, 14]
h sliced  [10, 10.01, 10.02, 10.03, 10.04]


In [51]:
# cycle
colors = ['red', 'green', 'blue']
g = cycle(colors)
print('list cycled: ', list(islice(g, 9)))

list cycled:  ['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green', 'blue']


In [None]:
# Hand out cards to four players
from collections import namedtuple
Card = namedtuple('Card', 'rank suit')

def card_deck():
    ranks = tuple(str(num) for num in range(2, 11)) + tuple('JQKA')
    suits = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
    for suit in suits:
        for rank in ranks:
            yield Card(rank, suit)
            
hands = [list() for _ in range(4)]

index = 0
for card in card_deck():
    index = index % 4
    hands[index].append(card)
    index += 1


In [None]:
# we can solve this problem using the cycle function
from collections import namedtuple
Card = namedtuple('Card', 'rank suit')

def card_deck():
    ranks = tuple(str(num) for num in range(2, 11)) + tuple('JQKA')
    suits = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
    for suit in suits:
        for rank in ranks:
            yield Card(rank, suit)
            
hands = [list() for _ in range(4)]
index_cycle = cycle([0, 1, 2, 3])
for card in card_deck():
    hands[next(index_cycle)].append(card)


In [None]:
# we can simplify the above
from collections import namedtuple
Card = namedtuple('Card', 'rank suit')

def card_deck():
    ranks = tuple(str(num) for num in range(2, 11)) + tuple('JQKA')
    suits = ('Spades', 'Hearts', 'Diamonds', 'Clubs')
    for suit in suits:
        for rank in ranks:
            yield Card(rank, suit)
            
hands = [list() for _ in range(4)]
hands_cycle = cycle(hands)
for card in card_deck():
    next(hands_cycle).append(card)

In [56]:
# repeat
g = repeat('python', 5)
list(g)

['python', 'python', 'python', 'python', 'python']

### Chaining and Teeing

- Chaining is analogous to sequence concatenation, but not the same:
    * dealing with iterables (including iterators)
    * chaining is itself a lazy iterator
    
- Teeing is kind of like copying, except that each element will have a unique id
    * The elements of the returned tuple are **lazy iterators**

In [65]:
from itertools import chain
l1, l2, l3 = 'abc', 'def', 'ghi'
letters = l1, l2, l3
chained_iterable = chain.from_iterable(letters)
list(chained_iterable)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

In [68]:
l1 = (i**2 for i in range(4))
l2 = (i**2 for i in range(4, 8))
l3 = (i**2 for i in range(8, 12))

def chain_iterables(*iterables):
    for iterable in iterables:
        yield from iterable
        
[i for i in chain_iterables(l1, l2, l3)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [69]:
# doing the same thing with itertools.chain function
l1 = (i**2 for i in range(4))
l2 = (i**2 for i in range(4, 8))
l3 = (i**2 for i in range(8, 12))

[i for i in chain(l1, l2, l3)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [71]:
def squares():
    yield (i**2 for i in range(2))
    yield (i**2 for i in range(2, 4))
    yield (i**2 for i in range(4, 6))

# this will work. The issue is that unpacking squares is eager and not lazy
# which means that it will create all the elements upfront
[i for i in chain(*squares())]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [72]:
# so instead we can write:
[i for i in chain.from_iterable(squares())]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121]

In [73]:
# teeing ("copying" iterables)
from itertools import tee

def squares(n):
    for i in range(n):
        yield i**2
        
gen = squares(10)
iters = tee(gen, 3)
# we now have three different copies of the generator
iters

(<itertools._tee at 0x16fc7406300>,
 <itertools._tee at 0x16fc7406a40>,
 <itertools._tee at 0x16fc7406b40>)

### Mapping and Reducing

Mapping: map(fn, iterable, initializer)
- Applying a callable to each element of an iterable
- fn must be a callable that requires a single argument
- Returns a lazy iterator
- An equivalent method to maps is to use a generator expression: 
    * maps = (fn(item) for item in iterable)

Accumulation/Reduce:
- Reducing an iterable down to a single value
- reduce(lambda, iterable, start_value)

In [92]:
my_list = [1, 2, 3, 4]
# find the square of each element in the list
result = map(lambda x: x**2, my_list)
list(result)

[1, 4, 9, 16]

In [78]:
# Starmap
from itertools import starmap
my_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# add the elements in each of the lists
list(starmap(lambda x, y, z: x + y + z, my_list))

[6, 15, 24]

In [81]:
# Reduce
from functools import reduce
import operator
my_list = [1, 2, 3, 4]
result = reduce(operator.add, my_list)
result

10

In [89]:
# Accumulate
from itertools import accumulate
my_list = [1, 2, 3, 4]
result = accumulate(my_list, operator.add)
# accumulate returns a lazy iterator
result

<itertools.accumulate at 0x16fc53ab6c0>

In [90]:
print(next(result))
print(next(result))
print(next(result))
print(next(result))

1
3
6
10


### Grouping

In [109]:
data = (
    (1, 'abc'),
    (1, 'bcd'),
    (2, 'pyt'),
    (2, 'yth'),
    (2, 'tho'),
    (3, 'hon')
)

groups = itertools.groupby(data, key=lambda x: x[0])
for group_key, sub_iter in groups:
    print(group_key, list(sub_iter))

1 [(1, 'abc'), (1, 'bcd')]
2 [(2, 'pyt'), (2, 'yth'), (2, 'tho')]
3 [(3, 'hon')]


In [116]:
import itertools
from collections import defaultdict

# Count the number of each car manufacturer listed inside the file cars_

# generic approach
makes = defaultdict(int)
with open('../../../datasets/python-deep-dive/cars_2014.csv') as f:
    # skip header row
    next(f)
    for row in f:
        make, _ = row.strip('\n').split(',')
        makes[make] += 1
        
{
    key: val for key, val in 
    sorted(makes.items(), key=lambda item: item[1], reverse=True) 
}

{'YAMAHA': 110,
 'POLARIS': 101,
 'ARCTIC CAT': 96,
 'HONDA': 91,
 'BMW': 86,
 'SKI-DOO': 67,
 'CAN-AM': 61,
 'MERCEDES-BENZ': 60,
 'KAWASAKI': 59,
 'SUZUKI': 48,
 'FORD': 34,
 'CHEVROLET': 33,
 'HARLEY DAVIDSON': 29,
 'KYMCO': 28,
 'AUDI': 27,
 'NISSAN': 24,
 'JOHN DEERE': 19,
 'TOYOTA': 19,
 'VOLKSWAGEN': 16,
 'LEXUS': 14,
 'VICTORY': 14,
 'HYUNDAI': 13,
 'KTM': 13,
 'GMC': 12,
 'KENWORTH': 11,
 'KIA': 10,
 'SUBARU': 10,
 'TRIUMPH': 10,
 'HUSQVARNA': 9,
 'JAGUAR': 9,
 'MACK': 9,
 'INFINITI': 8,
 'MITSUBISHI': 8,
 'VOLVO': 8,
 'CADILLAC': 7,
 'DODGE': 7,
 'FREIGHTLINER': 7,
 'HINO': 7,
 'ACURA': 6,
 'FERRARI': 6,
 'LAND ROVER': 6,
 'LINCOLN': 6,
 'RAM': 6,
 'ASTON MARTIN': 5,
 'BUICK': 5,
 'JEEP': 5,
 'MAZDA': 5,
 'SCION': 5,
 'APRILIA': 4,
 'ARGO': 4,
 'DUCATI': 4,
 'HUSABERG': 4,
 'KUBOTA': 4,
 'PORSCHE': 4,
 'RENAULT': 4,
 'VESPA': 4,
 'INDIAN': 3,
 'MASERATI': 3,
 'MINI': 3,
 'PEUGEOT': 3,
 'ROLLS ROYCE': 3,
 'SEAT': 3,
 'ALFA ROMEO': 2,
 'BENTLEY': 2,
 'CHRYSLER': 2,
 'FIAT': 2,


In [111]:
with open('../../../datasets/python-deep-dive/cars_2014.csv') as f:
    next(f)
    makes_groups = itertools.groupby(f, lambda x: x.split(',')[0])
    print(list(itertools.islice(makes_groups, 5)))

[('ACURA', <itertools._grouper object at 0x0000016FC7143040>), ('ALFA ROMEO', <itertools._grouper object at 0x0000016FC7143160>), ('APRILIA', <itertools._grouper object at 0x0000016FC72C4A90>), ('ARCTIC CAT', <itertools._grouper object at 0x0000016FC72C4460>), ('ARGO', <itertools._grouper object at 0x0000016FC73253D0>)]


In [127]:
with open('../../../datasets/python-deep-dive/cars_2014.csv') as f:
    next(f)
    makes_groups = itertools.groupby(f, lambda x: x.split(',')[0])
    make_counts = ((key, sum(1 for model in models ))
                   for key, models in makes_groups)

    print(list(make_counts))

[('ACURA', 6), ('ALFA ROMEO', 2), ('APRILIA', 4), ('ARCTIC CAT', 96), ('ARGO', 4), ('ASTON MARTIN', 5), ('AUDI', 27), ('BENTLEY', 2), ('BLUE BIRD', 1), ('BMW', 86), ('BUGATTI', 1), ('BUICK', 5), ('CADILLAC', 7), ('CAN-AM', 61), ('CHEVROLET', 33), ('CHRYSLER', 2), ('DODGE', 7), ('DUCATI', 4), ('FERRARI', 6), ('FIAT', 2), ('FORD', 34), ('FREIGHTLINER', 7), ('GMC', 12), ('HARLEY DAVIDSON', 29), ('HINO', 7), ('HONDA', 91), ('HUSABERG', 4), ('HUSQVARNA', 9), ('HYUNDAI', 13), ('INDIAN', 3), ('INFINITI', 8), ('JAGUAR', 9), ('JEEP', 5), ('JOHN DEERE', 19), ('KAWASAKI', 59), ('KENWORTH', 11), ('KIA', 10), ('KTM', 13), ('KUBOTA', 4), ('KYMCO', 28), ('LAMBORGHINI', 2), ('LAND ROVER', 6), ('LEXUS', 14), ('LINCOLN', 6), ('LOTUS', 1), ('MACK', 9), ('MASERATI', 3), ('MAZDA', 5), ('MCLAREN', 2), ('MERCEDES-BENZ', 60), ('MINI', 3), ('MITSUBISHI', 8), ('NISSAN', 24), ('PEUGEOT', 3), ('POLARIS', 101), ('PORSCHE', 4), ('RAM', 6), ('RENAULT', 4), ('ROLLS ROYCE', 3), ('SCION', 5), ('SEAT', 3), ('SKI-DOO', 6