# Introduction to Programming - 05 May 2020
### Agenda for today:

+ higher order functions (map, filter, reduce)
+ a little bit on generators
+ anonymous (lambda) functions and examples
+ use all of the above for some data analysis


### Recap on functions
- So far, we have learned to write functions as a way to **generalize some operation** so we don't have to write the same code repeatedly. For example, sum(x, y) function performs addition over **any** input numbers. 
- Functions can take in objects as arguments, which so far have always been e.g. `lists, ints or strings`.


```python
def greeting(names=['John', 'Doe']):
    greet = 'Hello ' + ' '.join(names)
    print(greet)
```
### Higher order functions
- Functions themselves however are also objects, so they can also be passed as an argument to a function.
- Functions that expect another function as one of their arguments are called **higher order functions**. You already know some higher order functions which take a function as an argument to decide how to order values in a sequence:
    - `max`
    - `sorted`
    
    
```python
def digit_sum(someint):
    """
    Returns the digit sum of an integer:
    digitsum(158) -> 1 + 5 + 8 -> 14
    """
    digits = [int(digit) for digit in str(someint)]
    return sum(digits)

numbers = [11, 9, 23000]
print(max(numbers, key=digit_sum))
print(sorted(numbers, key=digit_sum))

def gc_fraction(dna):
    """
    Returns the gc percentage of a string:
    gc_percentage(ACgt) = 0.5    
    """
    dna = dna.lower()
    gc_content = dna.count('g') + dna.count('c')
    gc_frac = gc_content / len(dna)
    return gc_frac

primers = ['AGACGTC', 'ACGTTT', 'AATACTATGATACTATG']
print(max(primers, key=gc_fraction))
print(sorted(primers, key=gc_fraction))
```

In [10]:
def digit_sum(someint):
    """
    Returns the digit sum of an integer:
    digitsum(158) -> 1 + 5 + 8
    """
    digits = [int(digit) for digit in str(someint)]
    return sum(digits)

numbers = [11, 9, 23000, 152]
print(f'Out of {numbers}, the number with the largest digit sum is {max(numbers, key=digit_sum)}\n')
print(f'{numbers} sorted by digit sum is {sorted(numbers, key=digit_sum)}')

Out of [11, 9, 23000, 152], the number with the largest digit sum is 9

[11, 9, 23000, 152] sorted by digit sum is [11, 23000, 152, 9]


In [6]:
def gc_fraction(dna):
    """
    Returns the gc percentage of a string:
    gc_percentage(ACgt) = 0.5    
    """
    dna = dna.lower()
    gc_content = dna.count('g') + dna.count('c')
    gc_frac = gc_content / len(dna)
    return gc_frac

primers = ['AGACGTC', 'ACGTTT', 'AATACTACGGGTGATACTATG']
print(max(primers, key=gc_fraction))
print(sorted(primers, key=gc_fraction))

AGACGTC
['ACGTTT', 'AATACTACGGGTGATACTATG', 'AGACGTC']


- Higher-order functions can be used to build powerful abstractions that **generalize over processes**:
    - Ex. formatting of time / plot style/ printing style; user authentication in web applications, connecting to a database etc.
    - helps achieve:
        - **modularity** - break functionality into re-usable pieces; easier to read, test, debug
        - **composability** - you’ll write a number of functions with varying inputs and outputs. Some of these functions will be unavoidably specialized to a particular application, but others will be useful in a wide variety of programs. For example:
            - a function that takes a directory path and returns all the image files in the directory
            - more general, a function that takes a directory and a function and applies that function to all files in that directory

Example:
```python
def print_capitalized_greeting(name):
    print(f"Hello {name.upper()}!")
def print_lower_case_greeting(name):
    print(f"Hello {name.lower()}!")
def print_title_case_greeting(name):
    print(f"Hello {name.title()}!")
my_name = "Methos"

print_capitalized_greeting(my_name)
print_lower_case_greeting(my_name)
print_title_case_greeting(my_name)
```
More generalized:
```python
def print_greeting(name, method):
    print(f"Hello {method(name)}!")
def capitalize(name):
    return name.upper()
def lower_case(name):
    return name.lower()
def title_case(name):
    return name.title()

my_name = "MeThOS"
print_greeting(my_name, capitalize)
print_greeting(my_name, lower_case)
print_greeting(my_name, title_case)
```

In [16]:
def print_capitalized_greeting(name):
    print(f"Hello {name.upper()}!")
def print_lower_case_greeting(name):
    print(f"Hello {name.lower()}!")
def print_title_case_greeting(name):
    print(f"Hello {name.title()}!")
my_name = "Methos"

print_capitalized_greeting(my_name)
print_lower_case_greeting(my_name)
print_title_case_greeting(my_name)

Hello METHOS!
Hello methos!
Hello Methos!


In [17]:
def print_greeting(name, method):
    print(f"Hello {method(name)}!")
def capitalize(name):
    return name.upper()
def lower_case(name):
    return name.lower()
def title_case(name):
    return name.title()

my_name = "MeThOS"
print_greeting(my_name, capitalize)
print_greeting(my_name, lower_case)
print_greeting(my_name, title_case)


Hello METHOS!
Hello methos!
Hello Methos!


### **<font color='blue'>Anonymous (lambda) functions</font>**
- useful shortcut for small functions to be created at run-time like the string formatting functions in the greetings example
- single expression, the value of which is returned 
- for more complex logic, use regular functions defined using the **def** keyword
- **Syntax**:

```python
lambda x, y, ..., *args, **kwargs: <expression>    # return value: value of <expression>
```

```python
def print_greeting(name, method):
    print(f"Hello {method(name)}!")

# capitalize = lambda name: name.upper()
# lower_case = lambda name: name.lower()
# title_case = lambda name: name.title()
    
my_name = "MeThOS"
print_greeting(my_name, lambda name: name.upper())
print_greeting(my_name, lambda name: name.lower())
print_greeting(my_name, lambda name: name.title())
```

In [18]:
def print_greeting(name, method):
    print(f"Hello {method(name)}!")

my_name = "MeThOS"
print_greeting(my_name, lambda name: name.upper())
print_greeting(my_name, lambda name: name.lower())
print_greeting(my_name, lambda name: name.title())

Hello METHOS!
Hello methos!
Hello Methos!


### More examples of lambdas

- **Sorting complex objects**:

```python
# Example: list of genes and transcript counts:
x = [('gene_1', 2), ('gene_2', 4), ('gene_3', 1), ('gene_4', 3)]
x.sort()
print(x)
x.sort(key=lambda z: z[1])      # Sort list objects (tuples) using 2nd item as the key
print(x)
```

```python
# Ex
import pprint
x = [{'firstname': 'Frodo', 'lastname': 'Took'}, 
     {'firstname': 'Samwise', 'lastname': 'Brandybuck'}, 
     {'firstname': 'Pippin', 'lastname': 'Gamgee'}, 
     {'firstname': 'Merry', 'lastname': 'Baggins'}]
x.sort(key=lambda name: name['firstname'])  # Sort list objects (dicts) using  value for
                                            # 'firstname' as the sorting key
pprint.pprint(x)
x.sort(key=lambda name: name['lastname'])   # Sort list objects by value for 'lastname'
pprint.pprint(x)
```  
<br/> 

In [2]:
# Example: list of genes and transcript counts:
x = [('gene_1', 2), ('gene_2', 4), ('gene_3', 1), ('gene_4', 3)]
x.sort()
print(x)
x.sort(key=lambda z: z[1])      # Sort list objects (tuples) using 2nd item as the key
print(x)

[('gene_1', 2), ('gene_2', 4), ('gene_3', 1), ('gene_4', 3)]
[('gene_3', 1), ('gene_1', 2), ('gene_4', 3), ('gene_2', 4)]


In [1]:
# Ex
import pprint
x = [{'firstname': 'Frodo', 'lastname': 'Took'}, 
     {'firstname': 'Samwise', 'lastname': 'Brandybuck'}, 
     {'firstname': 'Pippin', 'lastname': 'Gamgee'}, 
     {'firstname': 'Merry', 'lastname': 'Baggins'}]
x.sort(key=lambda name: name['firstname'])  # Sort list objects (dicts) using  value for
                                            # 'firstname' as the sorting key
pprint.pprint(x)
x.sort(key=lambda name: name['lastname'])   # Sort list objects by value for 'lastname'
pprint.pprint(x)

[{'firstname': 'Frodo', 'lastname': 'Took'},
 {'firstname': 'Merry', 'lastname': 'Baggins'},
 {'firstname': 'Pippin', 'lastname': 'Gamgee'},
 {'firstname': 'Samwise', 'lastname': 'Brandybuck'}]
[{'firstname': 'Merry', 'lastname': 'Baggins'},
 {'firstname': 'Samwise', 'lastname': 'Brandybuck'},
 {'firstname': 'Pippin', 'lastname': 'Gamgee'},
 {'firstname': 'Frodo', 'lastname': 'Took'}]


- **Grouping complex objects**:

```python
import itertools
city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
             ('Anchorage', 'AK'), ('Nome', 'AK'),
             ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')]

def get_state(city_state):
    return city_state[1]

groups = itertools.groupby(city_list, get_state)
## or use lambdas
# groups = itertools.groupby(city_list, lambda city_state: city_state[1])
for group, group_iter in groups:
    print(group, list(group_iter))
```

#### Note: you may still like to use regular functions here, if the the logic is complex

In [110]:
import itertools
city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'),
             ('Anchorage', 'AK'), ('Nome', 'AK'),
             ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')]

def get_state(city_state):
    return city_state[1]

groups = itertools.groupby(city_list, get_state)
## or use lambdas
# groups = itertools.groupby(city_list, lambda city_state: city_state[1])
for group, group_iter in groups:
    print(group, list(group_iter))

AL [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL')]
AK [('Anchorage', 'AK'), ('Nome', 'AK')]
AZ [('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ')]


## Some built-in higher-order functions often used with iterables

### <font color='blue'>map</font>:
- Apply a function - any function- to each element of an iterable (a very common operation on an iterable's output)
    - For example, given a list of strings, you might want to strip off leading/trailing whitespaces or log transform each value in a list of numbers
- **map** generalizes the process of **transforming** a whole collection (both the data/collection as well as the transformation function are provided as arguments).
- **Syntax**:

```python
map(f, iterA, iterB, ...)
    # iterA, iterB, etc. are iterables, that provide a sequence of values
    # returns a generator (lazy obj) over the sequence
    # f(iterA[0],iterB[0],...), f(iterA[1],iterB[1],...), f(iterA[2],iterB[2],...), ...
help(map)
```
- **map** returns a `generator`, a type of lazy object which `yield`s values one at a time as we ask for them.
    - generators are iterables, so we can give **map** another **map** object as input
    - we can ask the generator to yield the next item using the `next` function --> many higher-order functions will do this for us automatically
    - a generator has no length, i.e. it does not know how many items it has
    - we cannot index a generator - but the itertools library gives us a function called `islice` which produces a  generator with the slice we are asking for.
    - we can only iterate over a generator once! This is unlike range, which is also lazy but can both be indexed and iterated over repeatedly.
    - asking a generator for items until it can yield no more items is called `consuming` a generator. We will often use the `list()` function for this. 
    
    
```python
# Ex. 1
from math import sqrt
x = [1,2,3,4]
result = map(sqrt, x)
print(type(result), result)      # lazy map object; great for working w/ large iterables
print(list(result))              # forcefully expand lazy map results object
                                 # or use in for loop context
```

```python
# Ex. 2
def upper(s):
    return s.upper()
print(list(map(upper, ['sentence', 'fragment'])))
```

Let's take a look at an example from a previous lecture that we can rewrite with map:
```python
# dot product of 2 vectors
vector_1 = [1, 2, 3, 4]
vector_2 = [5, 6, 7, 8]
dot = 0.0
for idx in range(len(vector_1)):
    dot += vector_1[idx]*vector_2[idx]
print("dot product of {} and {} is: {}".format(vector_1, vector_2, dot))
```
```python
# dot product of 2 vectors
vector_1 = [1, 2, 3, 4]
vector_2 = [5, 6, 7, 8]
dot = sum( map(lambda el1, el2: el1 * el2, vector_1, vector_2) ) #in python 3.8 we can use math.prod instead of this lambda function
print("dot product of {} and {} is: {}".format(vector_1, vector_2, dot))
```

In [20]:
# import random
# vector_1_test = random.choices(range(0,10), k=10000000)
# vector_2_test = random.choices(range(0,10), k=10000000)

In [21]:
# # %%timeit
# # This is benchmarking code
# dot = 0
# for idx in range(len(vector_1_test)):
#     dot += vector_1_test[idx]*vector_2_test[idx]

1.24 s ± 54.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [22]:
# # %%timeit
# # This is benchmarking code
# dot = sum(
#     map( lambda el1, el2: el1 * el2, #in python 3.8 we can use math.prod instead of this lambda function
#         vector_1_test, vector_2_test
#     )
# )

1.04 s ± 48.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [13]:
# dot product of 2 vectors
vector_1 = [1, 2, 3, 4]
vector_2 = [5, 6, 7, 8]
dot = sum(
    map( lambda el1, el2: el1 * el2, #in python 3.8 we can use math.prod instead of this lambda function
        vector_1, vector_2
    )
)
print("dot product of {} and {} is: {}".format(vector_1, vector_2, dot))

dot product of [1, 2, 3, 4] and [5, 6, 7, 8] is: 70


We can also use map in conjunction with the functions `any()` and `all()` to check a boolean condition on all of our data.<br>
This comes up quite often when we need to quickly sanity-check our data. In the following example, we have only 3 items in a list but most of the time we'll have hundreds or thousands, so being able to check if any or all of them fulfill some condition is faster than looking at the whole list.

Additionally, this is also computationally faster - any and all will give an answer as soon as they can, i.e.
+ when `any` encounters the first `True`, it returns `True`
+ when `all` encounters the first `False`, it returns `False`

Meaning in our case that any won't consume all of our map generator object - only as much as it needs for an answer!

Let's assume a program made DNA primers for a massive PCR experiment you're designing. <br>
You want to quickly check if the GC percentage of any or all of them is below 0.5:
```python
def gc_fraction(dna):
    """
    Returns the gc percentage of a string:
    gc_percentage(ACgt) = 0.5    
    """
    dna = dna.lower()
    gc_content = dna.count('g') + dna.count('c')
    gc_frac = gc_content / len(dna)
    return gc_frac

primers = ['AGACGTC', 'ACGTTT', 'AATACTACGGGTGATACTATG']
any(map(lambda prims: gc_fraction(prims) < 0.5,  primers))
all(map(lambda prims: gc_fraction(prims) < 0.5,  primers))
```

Note that map can take as input any iterables, even another map:
```python
any(map(lambda frac: frac < 0.5,  map(gc_fraction, primers)))
any(map(lambda frac: frac < 0.5,  map(gc_fraction, primers)))
```

In [23]:
def gc_fraction(dna):
    """
    Returns the gc percentage of a string:
    gc_percentage(ACgt) = 0.5    
    """
    dna = dna.lower()
    gc_content = dna.count('g') + dna.count('c')
    gc_frac = gc_content / len(dna)
    return gc_frac

primers = ['AGACGTC', 'ACGTTT', 'AATACTACGGGTGATACTATG']

In [35]:
any(map(lambda prims: gc_fraction(prims) < 0.5,  primers))

True

In [36]:
all(map(lambda prims: gc_fraction(prims) < 0.5,  primers))

False

### <font color='blue'>filter</font>
- Another common higher-order operation on elements of an iterator: select a subset of elements that meet some condition
- **Syntax**: 
```python
filter(predicate, iter)
# predicate is a function that returns the truth value of some condition (w.r.t. input); must take a single value as input
# returns an iterator over all the sequence elements that meet the condition defined by the predicate
```

```python
def is_even(x):
    return (x % 2) == 0
print(list(filter(is_even, range(10))))

# or use lambdas
#list(filter(lambda x: (x % 2) == 0, range(10)))
```

### <font color='blue'>reduce</font>
+ Another common higher-order operation: **cumulatively** performs an operation on all the iterable’s elements
+ ***reduce / aggregate*** an iterable to a single object
+ **Syntax**:
```python
import functools
functools.reduce(func, iter, [initial_value])
help(reduce)
```
    - func must be a function that takes two elements and returns a single value
    - it takes the first two elements A and B returned by iter and calculates func(A, B). It then requests the third element, C, calculates func(func(A, B), C), combines this result with the fourth element returned, and continues until the iterable is exhausted. 


+ Ex:

```python
from functools import reduce
def sum_(a, b):
    print('args received: ', a, b)
    return a+b
    
x = [1,2,3,4]
result = reduce(sum_, x)
# result = reduce(lambda a, b: a+b, x)
print(result)

#  [1, 2, 3, 4]
#    \/
#    [3, 3, 4]
#      \/
#     [6, 4]
#       \/
#      [10]
```

In [4]:
from functools import reduce
def sum_(a, b):
    print('args received: ', a, b)
    return a+b

x = [1,2,3,4]
result = reduce(sum_, x)
# result = reduce(lambda a, b: a+b, x)
print(result)

args received:  1 2
args received:  3 3
args received:  6 4
10


+ ***reduce*** can also optionally take an initial value. If the initial value is supplied, it’s used as a starting point and func(initial_value, A) is the first calculation.

```python
x = [1,2,3,4]
print(reduce(lambda a, b: a+b, x, 100))

#  100  [1, 2, 3, 4]
#     \ /
#     [101, 2, 3, 4]
#        \ /
#        [103, 3, 4]
#           \ /
#           [106, 4]
#              \ /
#             [110]
```

In [5]:
x = [1,2,3,4]
print(reduce(lambda a, b: a+b, x, 100))

110


#### Note: 
- Check out <font color='blue'>***functools***</font> and <font color='blue'>***itertools***</font> modules for more on these types of generic higer-order functions and processes.
- Resources:
    https://docs.python.org/3.6/howto/functional.html

## Putting it all together

Combined, `map`, `filter` and other generator functions allow us to do relatively complex tasks in few lines of code.

+ Building (or transforming) lists with for loops and if statements
    - same as ***map*** and ***filter*** discussed above (try out a few examples)

Example: 

*Give the sum of the square of all numbers that are below 1000 and are divisible by both 5 and 3:*

```python
# Solution with For-loops
sum_ = 0
for number in range(1,1001):
    if number % 5 == 0:
        if number % 3 == 0:
            sum_ += number ** 2
sum_
```
Nested for-loops can be challenging to read, and difficult to break up into several loops without losing a lot of performance.
For this reason some programmers prefer to make use of map and filter instead where possible without too much of a headache:

```python
sum(
    map(lambda x: x**2, 
        filter(lambda x: x%3==0, 
               filter(lambda x: x%5==0, range(1,1001)
                     )
              )
       )
)
```


### Let's analyse some more real data
I've downloaded `Film_Permits.csv` from [NYC Open Data](https://data.cityofnewyork.us/City-Government/Film-Permits/tg4x-b46p). 
We will answer the following questions:
+ What type of events are in the data set?
+ What is the average/ median length of a shooting permit by burrough? What's the standard deviation?

First, let's read in the data with the csv module:

In [49]:
import csv

with open('Film_Permits.csv') as f:
    film_data = list(csv.DictReader(f))

film_data[0]

OrderedDict([('EventID', '446040'),
             ('EventType', 'Shooting Permit'),
             ('StartDateTime', '10/19/2018 02:00:00 PM'),
             ('EndDateTime', '10/20/2018 04:00:00 AM'),
             ('EnteredOn', '10/16/2018 11:57:27 AM'),
             ('EventAgency', "Mayor's Office of Film, Theatre & Broadcasting"),
             ('ParkingHeld',
              'THOMPSON STREET between PRINCE STREET and SPRING STREET,  SPRING STREET between WOOSTER STREET and 6TH AVENUE,  SPRING STREET between THOMPSON STREET and 6TH AVENUE,  6TH AVENUE between VANDAM STREET and BROOME STREET,  SULLIVAN STREET between WEST HOUSTON STREET and PRINCE STREET,  PRINCE STREET between SULLIVAN STREET and 6 AVENUE'),
             ('Borough', 'Manhattan'),
             ('CommunityBoard(s)', '2'),
             ('PolicePrecinct(s)', '1'),
             ('Category', 'Television'),
             ('SubCategoryName', 'Cable-episodic'),
             ('Country', 'United States of America'),
             ('ZipC

### What type of events are in the data set?

In [48]:
set(map(lambda event: event['EventType'], film_data))

{'DCAS Prep/Shoot/Wrap Permit',
 'Rigging Permit',
 'Shooting Permit',
 'Theater Load in and Load Outs'}

### What is the average/ median length of a shooting permit by borough? What's the standard deviation?
Thinking process:
+ filter data to only include shooting permits
+ group data by borough
+ parse start and end time of permits and calculate time difference in days
+ calculate aggregate statistics for each borough

To test an algorithm I want to apply to all my data, I usually pick one of the observations in my data to test on. <br>
I tend to name it dummy because I will make tons of mistakes and that's ok:

In [79]:
dummy = film_data[0]
dummy

OrderedDict([('EventID', '446040'),
             ('EventType', 'Shooting Permit'),
             ('StartDateTime', '10/19/2018 02:00:00 PM'),
             ('EndDateTime', '10/20/2018 04:00:00 AM'),
             ('EnteredOn', '10/16/2018 11:57:27 AM'),
             ('EventAgency', "Mayor's Office of Film, Theatre & Broadcasting"),
             ('ParkingHeld',
              'THOMPSON STREET between PRINCE STREET and SPRING STREET,  SPRING STREET between WOOSTER STREET and 6TH AVENUE,  SPRING STREET between THOMPSON STREET and 6TH AVENUE,  6TH AVENUE between VANDAM STREET and BROOME STREET,  SULLIVAN STREET between WEST HOUSTON STREET and PRINCE STREET,  PRINCE STREET between SULLIVAN STREET and 6 AVENUE'),
             ('Borough', 'Manhattan'),
             ('CommunityBoard(s)', '2'),
             ('PolicePrecinct(s)', '1'),
             ('Category', 'Television'),
             ('SubCategoryName', 'Cable-episodic'),
             ('Country', 'United States of America'),
             ('ZipC

In [52]:
from statistics import mean, median, stdev
from itertools import groupby

Through the imports we already covered the aggregate statistics and the groupby, filter is a builtin.

But part of this question requires us to handle dates and times so we'll use the `datetime` module for that. We won't get super in-depth on datetimes here, but I wanted to show my process. 

I used this handy [guide on custom datetime string formats](https://www.journaldev.com/23365/python-string-to-datetime-strptime#python-strptime-format-directives) to come up with the `film_datetime_frmt` string to decode the datetimes in this dataset

In [77]:
from datetime import datetime

In [154]:
#'10/19/2018 02:00:00 PM'
film_datetime_frmt = '%m/%d/%Y %H:%M:%S %p' 

dummy_timediff = datetime.strptime(dummy['EndDateTime'], film_datetime_frmt) - \
                    datetime.strptime(dummy['StartDateTime'], film_datetime_frmt)

seconds_in_an_hour = 60 * 60

print(f'The shooting permit lasted {dummy_timediff.total_seconds() / seconds_in_an_hour:.3} hours')

The shooting permit lasted 26.0 hours


Ok, let's turn this into a function that takes in a row and returns the time difference in days:

In [155]:
from datetime import datetime
def timediff(event, datetime_frmt='%m/%d/%Y %H:%M:%S %p', seconds_divider=( 60 * 60) ):
    """
    Takes an event orderedDict and returns the time difference in seconds divided by seconds_divider.
    By default this returns time in hours.
    """
    start_datetime = datetime.strptime(event['StartDateTime'], datetime_frmt)
    end_datetime = datetime.strptime(event['EndDateTime'], datetime_frmt)
    time_diff = end_datetime - start_datetime
    return (time_diff.total_seconds() / seconds_divider)
timediff(dummy)

26.0

From here, we can put it all together:

In [157]:
filtered_film_data = filter(lambda event: event['EventType'] == 'Shooting Permit', film_data)

grouped_film_data = groupby(
    sorted(filtered_film_data, key=lambda event: event['Borough']), 
    key=lambda event: event['Borough'])

permit_time_by_borough = dict(
    map(lambda grouped: (grouped[0], list(map(timediff, grouped[1])) ),
        grouped_film_data)
)

In [158]:
{borough:{
    'mean': mean(timedeltas),
    'median': median(timedeltas),
    'stdev': stdev(timedeltas)
         } 
 for borough, timedeltas in permit_time_by_borough.items()}
    

{'Bronx': {'mean': 7.728938452851496,
  'median': 4.0,
  'stdev': 9.923332421251596},
 'Brooklyn': {'mean': 6.89942192431819,
  'median': 3.0,
  'stdev': 14.021060218510298},
 'Manhattan': {'mean': 10.676383811173112,
  'median': 3.0,
  'stdev': 84.52045519386384},
 'Queens': {'mean': 7.646902021975164,
  'median': 4.0,
  'stdev': 17.542569475933117},
 'Staten Island': {'mean': 8.553664036076663,
  'median': 4.0,
  'stdev': 9.757654576797563}}

## Conclusions

Today we learned:
+ about some higher-order functions, especially `map` and `filter`
+ how we can use these higher-order functions in conjunction with the `itertools` module to answer complex questions about our data with relatively little code
+ a little bit about generators, there is a section with a little more on generators for the extra curious at the end.

In our data analysis example, you may have noticed that we had to jump through some hoops to analyze the tabular data from our csv file. Next lecture, we will learn about two packages that are core to data analysis in python and abstract away lots of the hoops we have to jump through in regular python, while also being ~100x faster. Those two libraries are `numpy` and `pandas` - definitely **don't miss the next lecture!**

Finally, here are some of my favorite python talks, again for the extra curious:
#### Useful talks
- [Transforming Code into Beautiful, Idiomatic Python - Raymond Hettinger](https://www.youtube.com/watch?v=OSGv2VnC0go)
- [Losing your loops with numpy - Jake Vanderplas](https://www.youtube.com/watch?v=EEUXKG97YRw)

#### Interesting talks

- [Modern Dictionaries - Raymond Hettinger](https://www.youtube.com/watch?v=p33CVV29OG8)
- [Does Code Quality Matter - James Powell](https://www.youtube.com/watch?v=QuTmLeWL3C0)
- [Principles of Reporting Systems | It's About Time we Talked About Bitemporality - James Powell](https://www.youtube.com/watch?v=6Wf221t8WO8)
- [The Web is Terrifying! Using the PyData stack to spy on the spies - Sarah Bird](https://pyvideo.org/europython-2018/the-web-is-terrifying-using-the-pydata-stack-to-spy-on-the-spies.html)

### #ExtraCredit Generators
The following section explains more about generators - for those interested, feel free to read on but this is not required knowledge:
- simple and powerful tool for creating iterators, including infinitely large iterators
- very useful for efficient processing of 'big' data - lazy by nature
- written like regular functions but use a **"yield"** statement in stead of "return" - this returns a datum and suspends the generator
- calling `next(<gen>)` on the generator object resumes the generator where it left off (it remembers all the data values and which statement was last executed)
- If you want to learn more about generators, [watch this excellent talk by James Powell @dontusethiscode](https://youtu.be/_facl0cNX6g) for an introduction on the fundamentals of generators (follow the accompanying notebook [here](https://github.com/dutc/generators/blob/master/Generator%20Showcase.ipynb)) and his [more advanced talk on generators](https://youtu.be/RdhoN4VVqq8). Don't worry if you don't understand everything!

```python
# Ex. 1: simple counter (infinite sequence generator) --> already implemented in itertools.count()
def counter(start=0):
    current = start
    while True:
        yield current
        current += 1

# Finite sequence    -> raises StopIteration when finished
def counter(size=5):
    current = 0
    while current < size:
        yield current
        current += 1

# limited fibonacci number generator
def fib_finite(n=10):
    a,b = 0,1
    while a < n:
        yield a
        a,b = b, a+b

# Infinite fibonacci number generator
def fib():
    a,b = 0,1
    while True:
        yield a
        a,b = b, a+b

```

Let's the first 10 Fibonacci numbers that are divisible by 3:

In [159]:
from itertools import islice
def fib():
    a,b = 0,1
    while True:
        yield a
        a,b = b, a+b

In [44]:
n = 10
list(
    islice(
        filter(lambda i: i%3==0, fib()),
        n)
)

[0, 3, 21, 144, 987, 6765, 46368, 317811, 2178309, 14930352]

`itertools.islice` let's us slice any iterables, including generators, the way we're used to with lists:
```python
some_list[start:stop:step]
```
turns into:
```python
itertools.islice(some_generator, start, stop, step)
```
Combined with some other tools, this can give us this monster of a one-liner called `nwise` that turns any iterable efficiently into a generator of overlapping windows of size n - [taken from @dontusethiscode and his excellent talk](https://www.youtube.com/watch?v=RdhoN4VVqq8).

Window functions are useful for calculating windowed averages and other windowed statistics. We use these windowed statistics to get a sense of overall trends over time in very noisy data.

In [164]:
from itertools import tee, islice, zip_longest
nwise = lambda g, n=2: zip_longest(*(islice(g, i, None) for i,g in enumerate(tee(g,n))))

print(f'The whole iterable is {list(range(1,5))}')
for index, window in enumerate(nwise(range(1,5), n=2)):
    print(f'The window at index {index} is: {window}')

The whole iterable is [1, 2, 3, 4]
The window at index 0 is: (1, 2)
The window at index 1 is: (2, 3)
The window at index 2 is: (3, 4)
The window at index 3 is: (4, None)


So to disassemble this into parts:
```python
nwise = lambda g, n=2: zip_longest(
    *(islice(g, i, None) for i,g in 
                                     enumerate(tee(g,n))))
```

`itertools.tee` produces n generators that are copies of g. It does so in a very memory and computationally-efficient way, reusing values.
`enumerate` enumerates these generators.
```python
enumerate(tee(g,n))
```

This next bit generates an islice that starts at index `i`, where `i` is the index returned by enumerate. 
Together with the previous bit, this gives us a collection of n generator-like objects g. Let's take a look at what that would look like with n=3:
+ g0: $it_{0}, it_{1}, ... it_{n}$
+ g1: $it_{1}, it_{2}, ... it_{n}$
+ g2: $it_{2}, it_{3}, ... it_{n}$

```python
    islice(g, i, None) for i,g in 

```
Finally `zip_longest` creates a generator-like object that takes the next item from each of those three generators (g0, g1, g2) and returns a tuple of the three values. Unlike `zip`, this will keep returning tuples of length three until the longest iterable - in this case g0 - is exhausted, not the shortest. Missing values will be padded with `None`s:
+  1st item: $it_{0}, it_{1}, it_{2}$
+  2nd item: $it_{1}, it_{2}, it_{3}$
+  3rd item: $it_{2}, it_{3}, it_{4}$
+ ...
+  3rd to last item: $it_{n-2}, it_{n-1}, it_{n}$
+  2nd to last item: $it_{n-1}, it_{n}, None$
+  last item: $it_{n}, None, None$



```python
zip_longest(*
```
The * just unpacks the tuple of generators returned by:
```python
(islice(g, i, None) for i,g in 
                                     enumerate(tee(g,n)))
```
So it turns `zip_longest((g0, g1, g2))` into `zip_longest(g0, g1, g2)`