![Ironhack logo](https://i.imgur.com/1QgrNNw.png)

# Lab | Map, Reduce, Filter


## Introduction

In this lab, we will implement what we have learned about functional programming using the map, reduce, and filter functions. These functions allow us to pass an input and a transformation to a function and produce an output. 

## Getting Started

Follow the instructions and add your code and explanations as necessary. By the end of this lab, you will have learned about mapping, reducing, and filtering as well as applying functions in Pandas.


## Resources

[The Official Python Documentation on Mapping, Reducing, and Filtering](https://docs.python.org/3/howto/functional.html#built-in-functions)

[The `apply` Function in Pandas](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.apply.html)


# Before your start:
- Comment as much as you can
- Happy learning!

In [93]:
# Import reduce from functools, numpy and pandas
from functools import reduce
import pandas as pd
import numpy as np
import re

# Challenge 1 - Mapping

#### We will use the map function to clean up words in a book.

In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran.

In [87]:
# Run this code:

location = '../data/58585-0.txt'
with open(location, 'r', encoding="utf8") as f:
    prophet = f.read().split(' ')

#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. 

Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element).

In [88]:
# your code here
prophet = prophet[568:]

If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10.

Expected output:

````python
            ['PROPHET\n\n|Almustafa,',
             'the{7}',
             'chosen',
             'and',
             'the\nbeloved,',
             'who',
             'was',
             'a',
             'dawn',
             'unto']

````

In [89]:
# your code here
prophet

['PROPHET\n\n|Almustafa,',
 'the{7}',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his',
 'own\nday,',
 'had',
 'waited',
 'twelve',
 'years',
 'in',
 'the',
 'city\nof',
 'Orphalese',
 'for',
 'his',
 'ship',
 'that',
 'was',
 'to\nreturn',
 'and',
 'bear',
 'him',
 'back',
 'to',
 'the',
 'isle',
 'of\nhis',
 'birth.\n\nAnd',
 'in',
 'the',
 'twelfth',
 'year,',
 'on',
 'the',
 'seventh\nday',
 'of',
 'Ielool,',
 'the',
 'month',
 'of',
 'reaping,',
 'he\nclimbed',
 'the',
 'hill',
 'without',
 'the',
 'city',
 'walls\nand',
 'looked',
 'seaward;',
 'and',
 'he',
 'beheld',
 'his\nship',
 'coming',
 'with',
 'the',
 'mist.\n\nThen',
 'the',
 'gates',
 'of',
 'his',
 'heart',
 'were',
 'flung\nopen,',
 'and',
 'his',
 'joy',
 'flew',
 'far',
 'over',
 'the',
 'sea.\nAnd',
 'he',
 'closed',
 'his',
 'eyes',
 'and',
 'prayed',
 'in',
 'the\nsilences',
 'of',
 'his',
 'soul.\n\n*****\n\nBut',
 'as',
 'he',
 'descended',
 'the',
 'hill,',
 'a',
 'sadness\

#### The next step is to create a function that will remove references. 

We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below.

In [95]:
def reference(x):
    '''
    Input: A string
    Output: The string with references removed
    
    Example:
    Input: 'the{7}'
    Output: 'the'
    '''
    
    # your code here
    if '{' in x:
        return re.sub('{\d}','', x)
    else:
        return x

Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`.

In [96]:
# your code here
prophet_reference = list(map(reference, prophet))
prophet_reference

['PROPHET\n\n|Almustafa,',
 'the',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his',
 'own\nday,',
 'had',
 'waited',
 'twelve',
 'years',
 'in',
 'the',
 'city\nof',
 'Orphalese',
 'for',
 'his',
 'ship',
 'that',
 'was',
 'to\nreturn',
 'and',
 'bear',
 'him',
 'back',
 'to',
 'the',
 'isle',
 'of\nhis',
 'birth.\n\nAnd',
 'in',
 'the',
 'twelfth',
 'year,',
 'on',
 'the',
 'seventh\nday',
 'of',
 'Ielool,',
 'the',
 'month',
 'of',
 'reaping,',
 'he\nclimbed',
 'the',
 'hill',
 'without',
 'the',
 'city',
 'walls\nand',
 'looked',
 'seaward;',
 'and',
 'he',
 'beheld',
 'his\nship',
 'coming',
 'with',
 'the',
 'mist.\n\nThen',
 'the',
 'gates',
 'of',
 'his',
 'heart',
 'were',
 'flung\nopen,',
 'and',
 'his',
 'joy',
 'flew',
 'far',
 'over',
 'the',
 'sea.\nAnd',
 'he',
 'closed',
 'his',
 'eyes',
 'and',
 'prayed',
 'in',
 'the\nsilences',
 'of',
 'his',
 'soul.\n\n*****\n\nBut',
 'as',
 'he',
 'descended',
 'the',
 'hill,',
 'a',
 'sadness\nca

Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\n`. Write your function in the cell below.

In [97]:
def line_break(x):
    '''
    Input: A string
    Output: A list of strings split on the line break (\n) character
        
    Example:
    Input: 'the\nbeloved'
    Output: ['the', 'beloved']
    '''
    
    # your code here
    if '\n' in x:
        return x.split('\n')
    else:
        return x

In [98]:
line_break('PROPHET\n\n|Almustafa,')

['PROPHET', '', '|Almustafa,']

Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`.

In [101]:
# your code here
prophet_line = list(map(line_break, prophet_reference))
prophet_line

[['PROPHET', '', '|Almustafa,'],
 'the',
 'chosen',
 'and',
 ['the', 'beloved,'],
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his',
 ['own', 'day,'],
 'had',
 'waited',
 'twelve',
 'years',
 'in',
 'the',
 ['city', 'of'],
 'Orphalese',
 'for',
 'his',
 'ship',
 'that',
 'was',
 ['to', 'return'],
 'and',
 'bear',
 'him',
 'back',
 'to',
 'the',
 'isle',
 ['of', 'his'],
 ['birth.', '', 'And'],
 'in',
 'the',
 'twelfth',
 'year,',
 'on',
 'the',
 ['seventh', 'day'],
 'of',
 'Ielool,',
 'the',
 'month',
 'of',
 'reaping,',
 ['he', 'climbed'],
 'the',
 'hill',
 'without',
 'the',
 'city',
 ['walls', 'and'],
 'looked',
 'seaward;',
 'and',
 'he',
 'beheld',
 ['his', 'ship'],
 'coming',
 'with',
 'the',
 ['mist.', '', 'Then'],
 'the',
 'gates',
 'of',
 'his',
 'heart',
 'were',
 ['flung', 'open,'],
 'and',
 'his',
 'joy',
 'flew',
 'far',
 'over',
 'the',
 ['sea.', 'And'],
 'he',
 'closed',
 'his',
 'eyes',
 'and',
 'prayed',
 'in',
 ['the', 'silences'],
 'of',
 'his',
 ['soul.', '', '*****', '

If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`.

In [102]:
# your code here
prophet_flat = [i for a in prophet_line if type(a) == list for i in a if i != '']
prophet_flat

['PROPHET',
 '|Almustafa,',
 'the',
 'beloved,',
 'own',
 'day,',
 'city',
 'of',
 'to',
 'return',
 'of',
 'his',
 'birth.',
 'And',
 'seventh',
 'day',
 'he',
 'climbed',
 'walls',
 'and',
 'his',
 'ship',
 'mist.',
 'Then',
 'flung',
 'open,',
 'sea.',
 'And',
 'the',
 'silences',
 'soul.',
 '*****',
 'But',
 'sadness',
 'came',
 'his',
 'heart:',
 'How',
 'without',
 'sorrow?',
 'the',
 'spirit',
 'Long',
 'were',
 'spent',
 'within',
 'the',
 'nights',
 'depart',
 'from',
 'without',
 'regret?',
 'Too',
 'I',
 'scattered',
 'many',
 'are',
 'walk',
 'naked',
 'cannot',
 'withdraw',
 'and',
 'an',
 'ache.',
 'It',
 'this',
 'day,',
 'own',
 'hands.',
 'Nor',
 'me,',
 'but',
 'and',
 'with',
 'thirst.',
 '*****',
 'Yet',
 'longer.',
 'The',
 'her',
 'calls',
 'embark.',
 'For',
 'in',
 'the',
 'crystallize',
 'and',
 'mould.',
 'Fain',
 'is',
 'here.',
 'I?',
 'A',
 'and',
 'the',
 'Alone',
 'must',
 'ether.',
 'And',
 'the',
 'eagle',
 'sun.',
 '*****',
 'Now',
 'the',
 'hill,',
 '

# Challenge 2 - Filtering

When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise.

In [103]:
def word_filter(x):
    '''
    Input: A string
    Output: True if the word is not in the specified list 
    and False if the word is in the list.
        
    Example:
    word list = ['and', 'the']
    Input: 'and'
    Output: False
    
    Input: 'John'
    Output: True
    '''
    
    word_list = ['and', 'the', 'a', 'an']
    
    # your code here
    if x in word_list:
        return True
    else:
        return False

Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`.

In [104]:
prophet_filter = [word for word in prophet_flat if not word_filter(word)] 
prophet_filter

['PROPHET',
 '|Almustafa,',
 'beloved,',
 'own',
 'day,',
 'city',
 'of',
 'to',
 'return',
 'of',
 'his',
 'birth.',
 'And',
 'seventh',
 'day',
 'he',
 'climbed',
 'walls',
 'his',
 'ship',
 'mist.',
 'Then',
 'flung',
 'open,',
 'sea.',
 'And',
 'silences',
 'soul.',
 '*****',
 'But',
 'sadness',
 'came',
 'his',
 'heart:',
 'How',
 'without',
 'sorrow?',
 'spirit',
 'Long',
 'were',
 'spent',
 'within',
 'nights',
 'depart',
 'from',
 'without',
 'regret?',
 'Too',
 'I',
 'scattered',
 'many',
 'are',
 'walk',
 'naked',
 'cannot',
 'withdraw',
 'ache.',
 'It',
 'this',
 'day,',
 'own',
 'hands.',
 'Nor',
 'me,',
 'but',
 'with',
 'thirst.',
 '*****',
 'Yet',
 'longer.',
 'The',
 'her',
 'calls',
 'embark.',
 'For',
 'in',
 'crystallize',
 'mould.',
 'Fain',
 'is',
 'here.',
 'I?',
 'A',
 'Alone',
 'must',
 'ether.',
 'And',
 'eagle',
 'sun.',
 '*****',
 'Now',
 'hill,',
 'sea,',
 'harbour,',
 'mariners,',
 'land.',
 'And',
 'he',
 'said:',
 'Sons',
 'of',
 'tides,',
 'How',
 'dream

# Bonus Challenge

Rewrite the `word_filter` function above to not be case sensitive.

In [105]:
def word_filter_case(x):
   
    word_list = ['and', 'the', 'a', 'an']
    
    # your code here
    if x.lower() in word_list:
        return True
    else:
        return False

In [106]:
prophet_filter = [word for word in prophet_flat if not word_filter_case(word)] 
prophet_filter

['PROPHET',
 '|Almustafa,',
 'beloved,',
 'own',
 'day,',
 'city',
 'of',
 'to',
 'return',
 'of',
 'his',
 'birth.',
 'seventh',
 'day',
 'he',
 'climbed',
 'walls',
 'his',
 'ship',
 'mist.',
 'Then',
 'flung',
 'open,',
 'sea.',
 'silences',
 'soul.',
 '*****',
 'But',
 'sadness',
 'came',
 'his',
 'heart:',
 'How',
 'without',
 'sorrow?',
 'spirit',
 'Long',
 'were',
 'spent',
 'within',
 'nights',
 'depart',
 'from',
 'without',
 'regret?',
 'Too',
 'I',
 'scattered',
 'many',
 'are',
 'walk',
 'naked',
 'cannot',
 'withdraw',
 'ache.',
 'It',
 'this',
 'day,',
 'own',
 'hands.',
 'Nor',
 'me,',
 'but',
 'with',
 'thirst.',
 '*****',
 'Yet',
 'longer.',
 'her',
 'calls',
 'embark.',
 'For',
 'in',
 'crystallize',
 'mould.',
 'Fain',
 'is',
 'here.',
 'I?',
 'Alone',
 'must',
 'ether.',
 'eagle',
 'sun.',
 '*****',
 'Now',
 'hill,',
 'sea,',
 'harbour,',
 'mariners,',
 'land.',
 'he',
 'said:',
 'Sons',
 'of',
 'tides,',
 'How',
 'dreams.',
 'which',
 'is',
 'dream.',
 'Ready',
 'w

# Challenge 3 - Reducing

#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. 

We will start by writing a function that takes two strings and concatenates them together with a space between the two strings.

In [107]:
def concat_space(a,b):
    '''
    Input:Two strings
    Output: A single string separated by a space
        
    Example:
    Input: 'John', 'Smith'
    Output: 'John Smith'
    '''
    
    # your code here
    single_string = ' '.join([a,b])
    return single_string

In [108]:
concat_space('John', 'Smith')

'John Smith'

Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`.

In [109]:
# your code here
prophet_string = reduce(concat_space,prophet_filter)
prophet_string

'PROPHET |Almustafa, beloved, own day, city of to return of his birth. seventh day he climbed walls his ship mist. Then flung open, sea. silences soul. ***** But sadness came his heart: How without sorrow? spirit Long were spent within nights depart from without regret? Too I scattered many are walk naked cannot withdraw ache. It this day, own hands. Nor me, but with thirst. ***** Yet longer. her calls embark. For in crystallize mould. Fain is here. I? Alone must ether. eagle sun. ***** Now hill, sea, harbour, mariners, land. he said: Sons of tides, How dreams. which is dream. Ready with sails wind. Only in this look cast backward, seafarer you, vast mother, Who river stream, Only stream make, glade, boundless ocean. ***** men their vineyards city gates. his name, field telling his ship. himself: Shall of gathering? in truth dawn? has left to him his winepress? tree may gather them? fountain cups? Am mighty may breath may me? what treasure I may confidence? If what fields in what seaso

# Challenge 4 - Applying Functions to DataFrames

#### Our next step is to use the apply function to a dataframe and transform all cells.

To do this, we will connect to Ironhack's database and retrieve the data from the *pollution* database. Select the *beijing_pollution* table and retrieve its data. The data is also available at https://archive.ics.uci.edu/ml/datasets/Beijing+PM2.5+Data#

In [78]:
# your code here
pollution = pd.read_csv('../data/beijing_pollution.csv')

Let's look at the data using the `head()` function.

Expected output:

>
>|    |   No |   year |   month |   day |   hour |   pm2.5 |   DEWP |   TEMP |   PRES | cbwd   |   Iws |   Is |   Ir |
|---:|-----:|-------:|--------:|------:|-------:|--------:|-------:|-------:|-------:|:-------|------:|-----:|-----:|
|  0 |    1 |   2010 |       1 |     1 |      0 |     nan |    -21 |    -11 |   1021 | NW     |  1.79 |    0 |    0 |
|  1 |    2 |   2010 |       1 |     1 |      1 |     nan |    -21 |    -12 |   1020 | NW     |  4.92 |    0 |    0 |
|  2 |    3 |   2010 |       1 |     1 |      2 |     nan |    -21 |    -11 |   1019 | NW     |  6.71 |    0 |    0 |
|  3 |    4 |   2010 |       1 |     1 |      3 |     nan |    -21 |    -14 |   1019 | NW     |  9.84 |    0 |    0 |
|  4 |    5 |   2010 |       1 |     1 |      4 |     nan |    -20 |    -12 |   1018 | NW     | 12.97 |    0 |    0 |

In [79]:
# your code here
pollution.head()

Unnamed: 0,No,year,month,day,hour,pm2.5,DEWP,TEMP,PRES,cbwd,Iws,Is,Ir
0,1,2010,1,1,0,,-21,-11.0,1021.0,NW,1.79,0,0
1,2,2010,1,1,1,,-21,-12.0,1020.0,NW,4.92,0,0
2,3,2010,1,1,2,,-21,-11.0,1019.0,NW,6.71,0,0
3,4,2010,1,1,3,,-21,-14.0,1019.0,NW,9.84,0,0
4,5,2010,1,1,4,,-20,-12.0,1018.0,NW,12.97,0,0


The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below.

In [80]:
def hourly(x):
    '''
    Input: A numerical value
    Output: The value divided by 24
        
    Example:
    Input: 48
    Output: 2.0
    '''
    
    # your code here
    return x/24

Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`.

In [84]:
# your code here
pm25_hourly = pollution.loc[:,['Iws', 'Is', 'Ir']].apply(hourly)
pm25_hourly.head()

Unnamed: 0,Iws,Is,Ir
0,0.074583,0.0,0.0
1,0.205,0.0,0.0
2,0.279583,0.0,0.0
3,0.41,0.0,0.0
4,0.540417,0.0,0.0


#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.

Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation.

In [86]:
def sample_sd(x):
    '''
    Input: A Pandas series of values
    Output: the standard deviation divided by the number of elements in the series
        
    Example:
    Input: pd.Series([1,2,3,4])
    Output: 0.3726779962
    '''
    
    # your code here
    return np.std/(count(x)-1)