# Lab 2 - Higher-Order Functions

In this lab, we will practice Python's higher order functions, in particular, `map()`, `filter()`, and `reduce()`. Please complete all the tasks below.

## Task 0

Let's start with with some `lambda` exercises.


### Sub-Task 0.1

Please complete the **lambda f1** definition below by filling in the _________ part. **f1** is expected to take a single string argument and returns whether the string can be converted to a natural number or zero. If it's possible, returns **'Number'**, otherwise returns **'Not a number'**. For example *'0123'* is a valid number 'Number', whereas *'0xff'* is not.

In [None]:
f1 = lambda x: 'Number' if x.isnumeric() else 'Not a Number'
print(f1('1A'))
print(f1('12'))
print(f1('b1'))

Not a Number
Number
Not a Number



### Sub-Task 0.2

Please complete the **lambda f2** definition below by filling in the _________ part. **f2** takes a single iterable (e.g. a list or a string), and returns the number of unique elements. Please see the sample output below.

In [None]:
f2 = lambda x: len(set(x))
print(f2([1,2,3,4,1,5,2]))
print(f2('hello world'))

5
8



### Sub-Task 0.3

Please complete the **lambda f3** definition below by filling in the _________ part. **f3** takes two strings *x* and *y*, and return all the words in *x* that do not appear in *y*. Please note that the word comparison iscase insensitive. Samples area also provided below.

In [None]:
f3 = lambda x, y: list(set(x.split()) - set(y.split()))
print(f3('big data management and analysis', 'big data computing'))
print(f3('this is a phrase', 'this is another phrase'))

['and', 'analysis', 'management']
['a']


## Task 1

You are provided a list of service status updates scraped from an MTA information website. Each update may indicate <i>Good Service</i>, <i>Planned Work</i>, or <i>Delays</i> for one or more subway lines. Our first objective is to list all the lines that are running with <i>Delays</i>. To guide you through the process, our problem are also split into smaller tasks.

In [None]:
from functools import reduce

# This is your input data, a list of subway line status.
# It is a list of string in a specific format

status = [
    '1,2,3 : Good Service',
    '4,5,6 : Delays',
    '7 : Good Service',
    'A,C : Good Service',
    'E : Planned Work',
    'G : Delays',
    'B,D,F,M : Good Service',
    'J,Z : Delays',
    'L : Good Service',
    'N,Q,R : Planned Work',
    'S : Good Service',
]

### Sub-Task 1.1

Please complete the lambda expression to filter only the status updates for the lines that run with <i>Delays</i>.

In [None]:
delayUpdates = list(filter(lambda x: 'Delays' in x, status))
print(delayUpdates)

# After this, your delayUpdates should be
# ['4,5,6 : Delays', 'G : Delays', 'J,Z : Delays']

['4,5,6 : Delays', 'G : Delays', 'J,Z : Delays']


### Sub-Task 1.2

Please complete the lambda expression below to convert each status line into a list of subway lines, i.e. <b><i>'4,5,6 : Delays'</i></b> would become <b><i>['4','5','6']</i></b>

In [None]:
delayLineList = list(map(lambda x: x.split(' : ')[0].split(',') ,delayUpdates))

print(delayLineList)

# After this, your delayLineList should be
# [['4', '5', '6'], ['G'], ['J', 'Z']]

[['4', '5', '6'], ['G'], ['J', 'Z']]


### Sub-Task 1.3

Please complete the reduce command below to convert each the list of subway lists given in <i>delayLineList</i> into a single list of subway lines running with delay.

In [None]:
delayLines = reduce(lambda x,y: x+y, delayLineList, [])

print(delayLines)

# After this, your delayLines should be
# ['4', '5', '6', 'G', 'J', 'Z']

['4', '5', '6', 'G', 'J', 'Z']


### Sub-Task 1.4

Please complete the reduce command below to count the number of lines in <b>delayLines</b>.

In [None]:
delayLineCount = reduce(lambda x,_: x+1, delayLines, 0)

print(delayLineCount)

# After this, your delayLineCount should be
# 6

6


## Task 2

In this excercise, we would like to expand the combined service updatse into separate updates for each subway line. For example, instead of having a single line <b>'1,2,3 : Good Service'</b> to indicate that line 1, 2, and 3 are in good service, we would like to convert that into 3 separate updates: <b>'1 : Good Service'</b>, <b>'2 : Good Service'</b>, and <b>'3 : Good Service'</b>.

You are tasked to write a chain of map(), filter(), and/or reduce() to convert the <b>status</b> variable into the list below. Please note that you may only use higher order functions without access to global variables. Your expression should contain only map(), filter() and/or reduce() and your custom function definitions.

In [None]:
x = '1,2,3 : Good Service'

def explore(x):
  lines, status = x.split(" : ")
  return list(map(lambda l: l + " : " + status, lines.split(',')))
  
explore(x) 

['1 : Good Service', '2 : Good Service', '3 : Good Service']

In [None]:
list(map(explore, status))

[['1 : Good Service', '2 : Good Service', '3 : Good Service'],
 ['4 : Delays', '5 : Delays', '6 : Delays'],
 ['7 : Good Service'],
 ['A : Good Service', 'C : Good Service'],
 ['E : Planned Work'],
 ['G : Delays'],
 ['B : Good Service',
  'D : Good Service',
  'F : Good Service',
  'M : Good Service'],
 ['J : Delays', 'Z : Delays'],
 ['L : Good Service'],
 ['N : Planned Work', 'Q : Planned Work', 'R : Planned Work'],
 ['S : Good Service']]

In [None]:

updates = reduce(lambda x,y: x+y, map(explore, status))

updates

['1 : Good Service',
 '2 : Good Service',
 '3 : Good Service',
 '4 : Delays',
 '5 : Delays',
 '6 : Delays',
 '7 : Good Service',
 'A : Good Service',
 'C : Good Service',
 'E : Planned Work',
 'G : Delays',
 'B : Good Service',
 'D : Good Service',
 'F : Good Service',
 'M : Good Service',
 'J : Delays',
 'Z : Delays',
 'L : Good Service',
 'N : Planned Work',
 'Q : Planned Work',
 'R : Planned Work',
 'S : Good Service']

## Task 3

In this excercise, you are tasked to perform a similar task as in Task 3 of Lab 1 but extracting the birth year of the first 'Subscriber' ride of the day from the *citibike.csv*. However, instead of iterating through the stream using generators, you are asked to complete the task using higher order functions map(), filter() and/or reduce(). You are free to define additional functions to be used in your higher order functions, however, you are not allowed to use global variables within these functions without being passed in as arguments.

In [None]:
!gdown --id 1I8eqA1Zy3vFq4mN8z0ZRl7ABXrdzCRYI -O citibike.csv

Downloading...
From: https://drive.google.com/uc?id=1I8eqA1Zy3vFq4mN8z0ZRl7ABXrdzCRYI
To: /content/citibike.csv
  0% 0.00/8.16M [00:00<?, ?B/s]100% 8.16M/8.16M [00:00<00:00, 103MB/s]


In [None]:
import pandas as pd

pd.read_csv('citibike.csv').head()

Unnamed: 0,cartodb_id,the_geom,tripduration,starttime,stoptime,start_station_id,start_station_name,start_station_latitude,start_station_longitude,end_station_id,end_station_name,end_station_latitude,end_station_longitude,bikeid,usertype,birth_year,gender
0,1,,801,2015-02-01 00:00:00+00,2015-02-01 00:14:00+00,521,8 Ave & W 31 St,40.75045,-73.994811,423,W 54 St & 9 Ave,40.765849,-73.986905,17131,Subscriber,1978.0,2
1,2,,379,2015-02-01 00:00:00+00,2015-02-01 00:07:00+00,497,E 17 St & Broadway,40.73705,-73.990093,504,1 Ave & E 15 St,40.732219,-73.981656,21289,Subscriber,1993.0,1
2,3,,2474,2015-02-01 00:01:00+00,2015-02-01 00:42:00+00,281,Grand Army Plaza & Central Park S,40.764397,-73.973715,127,Barrow St & Hudson St,40.731724,-74.006744,18903,Subscriber,1969.0,2
3,4,,818,2015-02-01 00:01:00+00,2015-02-01 00:15:00+00,2004,6 Ave & Broome St,40.724399,-74.004704,505,6 Ave & W 33 St,40.749013,-73.988484,21044,Subscriber,1985.0,2
4,5,,544,2015-02-01 00:01:00+00,2015-02-01 00:10:00+00,323,Lawrence St & Willoughby St,40.692362,-73.986317,83,Atlantic Ave & Fort Greene Pl,40.683826,-73.976323,19868,Subscriber,1957.0,1


In [None]:
import csv

def extract(result, record):
  day = record['starttime'].split(' ')[0]
  if len(result) == 0 or result[-1][0] != day:
    result.append((day, record['birth_year']))
  return result

with open('citibike.csv','r') as fi:
    reader = csv.DictReader(fi)
    first_birth_years = map(lambda x: int(x[1]), reduce(extract , reader, []))

list(first_birth_years)

# After this, your first_birth_years should be

[1978, 1992, 1982, 1969, 1971, 1989, 1963]