**Table of contents**<a id='toc0_'></a>    
- [What is an `iterable` in python?](#toc1_)    
- [`itertools` Module](#toc2_)    
- [Function breakdown of `itertools`](#toc3_)    
  - [Infinite iterators](#toc3_1_)    
  - [Iterators terminating on the shortest input sequence](#toc3_2_)    
  - [Combinatoric iterators](#toc3_3_)    
- [`itertools` recipes](#toc4_)    
  - [`more_itertools`!](#toc4_1_)    
- [Filtering Data](#toc5_)    
  - [`itertools.compress()`](#toc5_1_)    
  - [`more_itertools.filter_except`](#toc5_2_)    
- [Sorting Data](#toc6_)    
  - [`more_it.sort_together`](#toc6_1_)    
- [Summarizing data](#toc7_)    
  - [`more_itertools.map_reduce()`](#toc7_1_)    
    - [The strangeness of `pi`](#toc7_1_1_)    
- [Splitting Data](#toc8_)    
  - [`itertools.takewhile()`](#toc8_1_)    
  - [`itertools.dropwhile()`](#toc8_2_)    
  - [`more_itertools.map_except()`](#toc8_3_)    
  - [`moreitertools.partition`](#toc8_4_)    
- [Combinations](#toc9_)    
  - [`itertools.product()` | `more_itertools.gray_product()`](#toc9_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

In [186]:
import itertools as it
import more_itertools as more_it
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
%load_ext memory_profiler

The memory_profiler extension is already loaded. To reload it, use:
  %reload_ext memory_profiler


# <a id='toc1_'></a>[What is an `iterable` in python?](#toc0_)

Any Python object with a `.__iter__()` or `.__getitem__()` methods is iterable.

**E.g. #1:**: `map()` is a built-in operator function that applies `len` to each element in the iterable.

```python

    >>> list(map(len, ['cat','dogs','wombats']))
    [3, 4, 7]

```
**E.g. #2:**: Iterators are iterable, and can be used to compose _iterator algebra_

```python
    >>> list(map(math.prod, zip([2.0,3.1,4], [4, 5, 6])))
    [8.0, 15.5, 24]

```
**E.g. #3:**: `map()` can be used with custom functions as well.

```python

    number_list = [x for x in range(0,10)]

    def function(number):
        print(f"Currently performing transformation on number {number}")
        return number**2

    print('Original list: ', number_list)

    mapped_list = list(map(function, number_list))

    print(f'Post-Transformed list: {mapped_list}')

    Original list:  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    Currently performing transformation on number 0
    Currently performing transformation on number 1
    Currently performing transformation on number 2
    Currently performing transformation on number 3
    Currently performing transformation on number 4
    Currently performing transformation on number 5
    Currently performing transformation on number 6
    Currently performing transformation on number 7
    Currently performing transformation on number 8
    Currently performing transformation on number 9
    Post-Transformed list: [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

```

# <a id='toc2_'></a>[`itertools` Module](#toc0_)

- Python's approach to `iterator algebra` 
- fast, memory-efficient, concise code 
    - _lazy evaluation_ (call-by-need) delays evaluatio of expression until its value is needed.

```python

    def itertools_repeat():
        for _ in it.repeat(None, 1_000_000):
            pass

    >>>> %timeit itertools_repeat()
        9.4 ms ± 602 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

    def standard_loop():
        for _ in range(1_000_000):
            pass

    >>>> %timeit standard_loop()
        20 ms ± 976 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

<!-- # Remarks

Refer to the Real Python example on memory efficiency -->

# <a id='toc3_'></a>[Function breakdown of `itertools`](#toc0_)


## <a id='toc3_1_'></a>[Infinite iterators](#toc0_)

| Iterator | Arguments     | Results                                        | Example                               |
|----------|---------------|------------------------------------------------|---------------------------------------|
| count()  | start, [step] | start, start+step, start+2*step, …             | <mark>`count(10) --> 10 11 12 13 14 ...` </mark>     |
| cycle()  | p             | p0, p1, … plast, p0, p1, …                     | <mark> `cycle('ABCD') --> A B C D A B C D ...` </mark> |
| repeat() | elem [,n]     | elem, elem, elem, … endlessly or up to n times | <mark> `repeat(10, 3) --> 10 10 10` </mark>            |

## <a id='toc3_2_'></a>[Iterators terminating on the shortest input sequence](#toc0_)

| Iterator              | Arguments                   | Results                                    | Example                                                  |
|-----------------------|-----------------------------|--------------------------------------------|----------------------------------------------------------|
| accumulate()          | p [,func]                   | p0, p0+p1, p0+p1+p2, …                     | <mark>`accumulate([1,2,3,4,5]) --> 1 3 6 10 15 `</mark>                 |
| chain()               | p, q, …                     | p0, p1, … plast, q0, q1, …                 | <mark>`chain('ABC', 'DEF') --> A B C D E F`</mark>                      |
| chain.from_iterable() | iterable                    | p0, p1, … plast, q0, q1, …                 | <mark>`chain.from_iterable(['ABC', 'DEF']) --> A B C D E F`</mark>      |
| compress()            | data, selectors             | (d[0] if s[0]), (d[1] if s[1]), …          | <mark>`compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F`</mark>            |
| dropwhile()           | pred, seq                   | seq[n], seq[n+1], starting when pred fails | <mark>`dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1` </mark>         |
| filterfalse()         | pred, seq                   | elements of seq where pred(elem) is false  | <mark>`filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8`</mark>      |
| groupby()             | iterable[, key]             | sub-iterators grouped by value of key(v)   |                                                          |
| islice()              | seq, [start,] stop [, step] | elements from seq[start:stop:step]         | <mark>`islice('ABCDEFG', 2, None) --> C D E F G`</mark>                 |
| pairwise()            | iterable                    | (p[0], p[1]), (p[1], p[2])                 | <mark>`pairwise('ABCDEFG') --> AB BC CD DE EF FG`</mark>                |
| starmap()             | func, seq                   | func(*seq[0]), func(*seq[1]), …            | <mark>`starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000`</mark>       |
| takewhile()           | pred, seq                   | seq[0], seq[1], until pred fails           | <mark>`takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4 `</mark>           |
| tee()                 | it, n                       | it1, it2, … itn splits one iterator into n |                                                          |
| zip_longest()         | p, q, …                     | (p[0], q[0]), (p[1], q[1]), …              | <mark>`zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-`</mark> |


## <a id='toc3_3_'></a>[Combinatoric iterators](#toc0_)

| Iterator                        | Arguments          | Results                                                       |
|---------------------------------|--------------------|---------------------------------------------------------------|
| product()                       | p, q, … [repeat=1] | cartesian product, equivalent to a nested for-loop            |
| permutations()                  | p[, r]             | r-length tuples, all possible orderings, no repeated elements |
| combinations()                  | p, r               | r-length tuples, in sorted order, no repeated elements        |
| combinations_with_replacement() | p, r               | r-length tuples, in sorted order, with repeated elements      |

Some examples:

| Examples                                 | Results                                         | Results                                                       |
|------------------------------------------|-------------------------------------------------|---------------------------------------------------------------|
| <mark>`product('ABCD', repeat=2)`</mark>               | <mark>`AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD`</mark> | cartesian product, equivalent to a nested for-loop            |
| <mark>`permutations('ABCD', 2)`</mark>                  | <mark>`AB AC AD BA BC BD CA CB CD DA DB DC`</mark>             | r-length tuples, all possible orderings, no repeated elements |
| <mark>`combinations('ABCD', 2)`</mark>                  | <mark>`AB AC AD BC BD CD`</mark>                               | r-length tuples, in sorted order, no repeated elements        |
| <mark>`combinations_with_replacement('ABCD', 2)`</mark> | <mark>`AA AB AC AD BB BC BD CC CD DD`</mark>                   | r-length tuples, in sorted order, with repeated elements      |

# <a id='toc4_'></a>[`itertools` recipes](#toc0_)

Itertools Reciples [URL](https://docs.python.org/3.6/library/itertools.html#itertools-recipes)

## <a id='toc4_1_'></a>[`more_itertools`!](#toc0_)

Honestly, too much to cover even in a table because it's **that** good! [URL](https://more-itertools.readthedocs.io/en/stable/index.html)

<!-- - `itertools.zip_longest`
- `itertools.combinations`
- `itertolls.combinations_with_replacement`
- `itertools.permutations`
- `itertools.count`
- `itertools.repeat`
- `itertools.cycle`
- `itertools.accumulate`
- `itertools.product`
- `itertools.tee`
- `itertools.islice`
- `itertools.chain`
- `itertools.filterfalse`
- `itertools.takewhile`
- `itertools.dropwhile` -->

# <a id='toc5_'></a>[Filtering Data](#toc0_)

## <a id='toc5_1_'></a>[`itertools.compress()`](#toc0_)

Combine an iterable and a boolean selector. Returns corresponding elements where boolean is `True`.

_Scenario_: Using the `mpg` dataset, locate the model name and production year for all cars meeting an efficiency threshold of 40 mpg.

In [208]:
df_mpg = sns.load_dataset('mpg')

df_mpg.head()


Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,model_year,origin,name
0,18.0,8,307.0,130.0,3504,12.0,70,usa,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693,11.5,70,usa,buick skylark 320
2,18.0,8,318.0,150.0,3436,11.0,70,usa,plymouth satellite
3,16.0,8,304.0,150.0,3433,12.0,70,usa,amc rebel sst
4,17.0,8,302.0,140.0,3449,10.5,70,usa,ford torino


In [209]:
# Implement the threshold criterion
efficiency_threshold = 40
matching_cars = [n > efficiency_threshold for n in df_mpg['mpg']]

# Return the compressed iterator For demonstration, we are zipping multiple iterables
matching_cars_iter = it.compress(list(zip(df_mpg['model_year'], df_mpg['name'])),matching_cars)

# Consume the iterator
print(list(matching_cars_iter))

[(78, 'volkswagen rabbit custom diesel'), (80, 'vw rabbit'), (80, 'mazda glc'), (80, 'datsun 210'), (80, 'vw rabbit c (diesel)'), (80, 'vw dasher (diesel)'), (80, 'honda civic 1500 gl'), (80, 'renault lecar deluxe'), (82, 'vw pickup')]


## <a id='toc5_2_'></a>[`more_itertools.filter_except()`](#toc0_)

Yield the items from iterable for which the validator function does not raise one of the specified exceptions.

validator is called for each item in iterable. It should be a function that accepts one argument and raises an exception if that item is not valid.

_Scenario_: You need to get only the numerical values in a dataset with both numbers and strings, all included in a string form.

In [260]:
data = ['1.5', '6', 'not-important', '11', '1.23E-7', 'remove-me', '25', 'trash']
list(map(float, more_it.filter_except(float, data, TypeError, ValueError)))

[1.5, 6.0, 11.0, 1.23e-07, 25.0]

# <a id='toc6_'></a>[Sorting Data](#toc0_)

## <a id='toc6_1_'></a>[`more_it.sort_together`](#toc0_)

In [279]:
# SO URL: https://stackoverflow.com/questions/16503560/read-specific-columns-from-a-csv-file-with-csv-module

import csv
from collections import defaultdict

columns = defaultdict(list) # each value in each column is appended to a list

with open('mpgData.csv') as f:
    reader = csv.DictReader(f) # read rows into a dictionary format
    for row in reader: # read a row as {column1: value1, column2: value2,...}
        for (k,v) in row.items(): # go over each column name and value 
            columns[k].append(v) # append the value into the appropriate list
                                 # based on column name k

columns.keys()

dict_keys(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name'])

In [256]:
more_it.sort_together(columns.values())

[('10.0',
  '10.0',
  '11.0',
  '11.0',
  '11.0',
  '11.0',
  '12.0',
  '12.0',
  '12.0',
  '12.0',
  '12.0',
  '12.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '13.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.0',
  '14.5',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.0',
  '15.5',
  '15.5',
  '15.5',
  '15.5',
  '15.5',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.0',
  '16.2',
  '16.5',
  '16.5',
  '16.5',
  '16.9',
  '17.0',
  '17.0',
  '17.0',
  '17.0',
  '17.0',
  '17.0',
  '17.0',
  '17.5',
  '17.5',


# <a id='toc7_'></a>[Summarizing data](#toc0_)

## <a id='toc7_1_'></a>[`more_itertools.map_reduce()`](#toc0_)


Return a dictionary that maps the items in iterable to categories defined by keyfunc, transforms them with valuefunc, and then summarizes them by category with reducefunc.

### <a id='toc7_1_1_'></a>[The strangeness of `pi`](#toc0_)

How are the digits of pi distributed in the first 1,000,000 decimal places?

Original Sources:
- Stack Overflow [URL](https://stackoverflow.com/questions/9004789/1000-digits-of-pi-in-python)
- Spigots Algorithm [URL](https://www.cs.ox.ac.uk/people/jeremy.gibbons/publications/spigot.pdf)


```python
    DIGITS = 1000

    def pi_digits(x):
        """Generate x digits of Pi."""
        k,a,b,a1,b1 = 2,4,1,12,4
        while x > 0:
            p,q,k = k * k, 2 * k + 1, k + 1
            a,b,a1,b1 = a1, b1, p*a + q*a1, p*b + q*b1
            d,d1 = a/b, a1/b1
            while d == d1 and x > 0:
                yield int(d)
                x -= 1
                a,a1 = 10*(a % b), 10*(a1 % b1)
                d,d1 = a/b, a1/b1

    digits = [str(n) for n in list(pi_digits(DIGITS))]
    print("%s.%s\n" % (digits.pop(0), "".join(digits)))

```

In [207]:
with open("pi_100k.txt","r") as f:
    digits = f.read()


keyfunc = lambda x: x
valuefunc = lambda x: 1
reducefunc = sum
result = more_it.map_reduce(digits, keyfunc, valuefunc, reducefunc)

result

defaultdict(None,
            {'1': 99757,
             '4': 100230,
             '5': 100359,
             '9': 100106,
             '2': 100026,
             '6': 99548,
             '3': 100229,
             '8': 99985,
             '7': 99800,
             '0': 99959})

# <a id='toc8_'></a>[Splitting Data](#toc0_)

## <a id='toc8_1_'></a>[`itertools.takewhile()`](#toc0_)

_Scenario_: You have a datastream from a sensor that's streaming numbers. The end of transmission is marked by a letter `e`. 

In [266]:
list(it.takewhile(lambda x: x.isdigit(),'123456e'))

['1', '2', '3', '4', '5', '6']

## <a id='toc8_2_'></a>[`itertools.dropwhile()`](#toc0_)

_Scenario_: You have a datastream from a sensor that's streaming numbers. The start of transmission is marked by a letter `a`. 

In [267]:
list(it.dropwhile(lambda x: x.isdigit(),'s123456'))

['s', '1', '2', '3', '4', '5', '6']

## <a id='toc8_3_'></a>[`more_itertools.map_except()`](#toc0_)

_Scenario_: You have an instrument that tags a letter `s` to indicate a start, and a letter `e` to indicate the end of a transmission.

We can _combine_ multiple libraries!

In [265]:
transmission_string = 's3213127846921834289123e'

list(it.compress(transmission_string, more_it.map_except(lambda x: x.isdigit(),transmission_string)))

['3',
 '2',
 '1',
 '3',
 '1',
 '2',
 '7',
 '8',
 '4',
 '6',
 '9',
 '2',
 '1',
 '8',
 '3',
 '4',
 '2',
 '8',
 '9',
 '1',
 '2',
 '3']

## <a id='toc8_4_'></a>[`moreitertools.partition`](#toc0_)

_Scenario_: You have an instrument that is generating multiple cycles worth of information. The data is written to a text file. 
You need to separate out the cycles for calculations. Each cycle is separated by a string x



In [268]:
lines = '213x3132x312312'

list(more_it.split_at(lines,lambda x: 'x' in x))

[['2', '1', '3'], ['3', '1', '3', '2'], ['3', '1', '2', '3', '1', '2']]

# <a id='toc9_'></a>[Combinations](#toc0_)

## <a id='toc9_1_'></a>[`itertools.product()` | `more_itertools.gray_product()`](#toc0_)

Cartesian product of input iterables.

Roughly equivalent to nested for-loops in a generator expression. For example, `product(A, B)` returns the same as `((x,y) for x in A for y in B)`.

Like `itertools.product()`, but return tuples in an order such that only one element in the generated tuple changes from one iteration to the next.

In [277]:
list(it.product('AB','CD','EF'))

[('A', 'C', 'E'),
 ('A', 'C', 'F'),
 ('A', 'D', 'E'),
 ('A', 'D', 'F'),
 ('B', 'C', 'E'),
 ('B', 'C', 'F'),
 ('B', 'D', 'E'),
 ('B', 'D', 'F')]

In [276]:
list(more_it.gray_product('AB','CD','EF'))

[('A', 'C', 'E'),
 ('B', 'C', 'E'),
 ('B', 'D', 'E'),
 ('A', 'D', 'E'),
 ('A', 'D', 'F'),
 ('B', 'D', 'F'),
 ('B', 'C', 'F'),
 ('A', 'C', 'F')]

dict_keys(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'model_year', 'origin', 'name'])