# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [1]:
import numpy as np
import pandas as pd
from functools import reduce

# Challenge 1 - Mapping

#### We will use the map function to clean up words in a book.

In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran.

In [2]:
location = '../data/58585-0.txt'
with open(location, 'r', encoding="utf8") as f:
    prophet = f.read().split(' ')

#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. 

Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element).

In [3]:
bookitself = prophet[568:]
bookitself

['PROPHET\n\n|Almustafa,',
 'the{7}',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his',
 'own\nday,',
 'had',
 'waited',
 'twelve',
 'years',
 'in',
 'the',
 'city\nof',
 'Orphalese',
 'for',
 'his',
 'ship',
 'that',
 'was',
 'to\nreturn',
 'and',
 'bear',
 'him',
 'back',
 'to',
 'the',
 'isle',
 'of\nhis',
 'birth.\n\nAnd',
 'in',
 'the',
 'twelfth',
 'year,',
 'on',
 'the',
 'seventh\nday',
 'of',
 'Ielool,',
 'the',
 'month',
 'of',
 'reaping,',
 'he\nclimbed',
 'the',
 'hill',
 'without',
 'the',
 'city',
 'walls\nand',
 'looked',
 'seaward;',
 'and',
 'he',
 'beheld',
 'his\nship',
 'coming',
 'with',
 'the',
 'mist.\n\nThen',
 'the',
 'gates',
 'of',
 'his',
 'heart',
 'were',
 'flung\nopen,',
 'and',
 'his',
 'joy',
 'flew',
 'far',
 'over',
 'the',
 'sea.\nAnd',
 'he',
 'closed',
 'his',
 'eyes',
 'and',
 'prayed',
 'in',
 'the\nsilences',
 'of',
 'his',
 'soul.\n\n*****\n\nBut',
 'as',
 'he',
 'descended',
 'the',
 'hill,',
 'a',
 'sadness\

If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10.

In [72]:
bookitself[1:10]

['the{7}', 'chosen', 'and', 'the\nbeloved,', 'who', 'was', 'a', 'dawn', 'unto']

#### The next step is to create a function that will remove references. 

We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below.

In [66]:
def reference(x):
    '''
    Input: A string
    Output: The string with references removed
    
    Example:
    Input: 'the{7}'
    Output: 'the'
    '''

    string_to_split = x
    split_string = string_to_split.split('{')
    
    print([split_string[0]])

In [67]:
reference('the{7}')

['the']


Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`.

In [65]:
prophet_reference = (map(reference, bookitself))
list(prophet_reference)

['PROPHET\n\n|Almustafa,']
['the']
['chosen']
['and']
['the\nbeloved,']
['who']
['was']
['a']
['dawn']
['unto']
['his']
['own\nday,']
['had']
['waited']
['twelve']
['years']
['in']
['the']
['city\nof']
['Orphalese']
['for']
['his']
['ship']
['that']
['was']
['to\nreturn']
['and']
['bear']
['him']
['back']
['to']
['the']
['isle']
['of\nhis']
['birth.\n\nAnd']
['in']
['the']
['twelfth']
['year,']
['on']
['the']
['seventh\nday']
['of']
['Ielool,']
['the']
['month']
['of']
['reaping,']
['he\nclimbed']
['the']
['hill']
['without']
['the']
['city']
['walls\nand']
['looked']
['seaward;']
['and']
['he']
['beheld']
['his\nship']
['coming']
['with']
['the']
['mist.\n\nThen']
['the']
['gates']
['of']
['his']
['heart']
['were']
['flung\nopen,']
['and']
['his']
['joy']
['flew']
['far']
['over']
['the']
['sea.\nAnd']
['he']
['closed']
['his']
['eyes']
['and']
['prayed']
['in']
['the\nsilences']
['of']
['his']
['soul.\n\n*****\n\nBut']
['as']
['he']
['descended']
['the']
['hill,']
['a']
['sadness\nca

['sea']
['between\nthe']
['shores']
['of']
['your']
['souls.\n\nFill']
['each']
['other’s']
['cup']
['but']
['drink']
['not']
['from\none']
['cup.\n\nGive']
['one']
['another']
['of']
['your']
['bread']
['but']
['eat\nnot']
['from']
['the']
['same']
['loaf.']
['']
['and\ndance']
['together']
['and']
['be']
['joyous,']
['but']
['let\neach']
['one']
['of']
['you']
['be']
['alone,\n\nEven']
['as']
['the']
['strings']
['of']
['a']
['lute']
['are']
['alone\nthough']
['they']
['quiver']
['with']
['the']
['same']
['music.\n\n*****\n\nGive']
['your']
['hearts,']
['but']
['not']
['into']
['each\nother’s']
['keeping.\n\nFor']
['only']
['the']
['hand']
['of']
['Life']
['can']
['contain\nyour']
['hearts.\n\nAnd']
['stand']
['together']
['yet']
['not']
['too']
['near\ntogether:\n\nFor']
['the']
['pillars']
['of']
['the']
['temple']
['stand\napart,\n\nAnd']
['the']
['oak']
['tree']
['and']
['the']
['cypress']
['grow\nnot']
['in']
['each']
['other’s']
['shadow.\n\n[Illustration:']
['0032]\n\n*****']


['and\nsilent,']
['when']
['all']
['else']
['sings']
['together']
['in\nunison?\n\nAlways']
['you']
['have']
['been']
['told']
['that']
['work']
['is']
['a\ncurse']
['and']
['labour']
['a']
['misfortune.\n\nBut']
['I']
['say']
['to']
['you']
['that']
['when']
['you']
['work']
['you\nfulfil']
['a']
['part']
['of']
['earth’s']
['furthest']
['dream,\n']
['to']
['you']
['when']
['that']
['dream']
['was\nborn,\n\nAnd']
['in']
['keeping']
['yourself']
['with']
['labour']
['you\nare']
['in']
['truth']
['loving']
['life,\n\nAnd']
['to']
['love']
['life']
['through']
['labour']
['is']
['to']
['be\nintimate']
['with']
['life’s']
['inmost']
['secret.\n\n*****\n\nBut']
['if']
['you']
['in']
['your']
['pain']
['call']
['birth']
['an\naffliction']
['and']
['the']
['support']
['of']
['the']
['flesh\na']
['curse']
['written']
['upon']
['your']
['brow,']
['then']
['I\nanswer']
['that']
['naught']
['but']
['the']
['sweat']
['of\nyour']
['brow']
['shall']
['wash']
['away']
['that']
['which']
['is\nwritte

['lest']
['walls\nshould']
['crack']
['and']
['fall']
['down.\n\nYou']
['shall']
['not']
['dwell']
['in']
['tombs']
['made']
['by']
['the\ndead']
['for']
['the']
['living.\n\nAnd']
['though']
['of']
['magnificence']
['and\nsplendour,']
['your']
['house']
['shall']
['not']
['hold\nyour']
['secret']
['nor']
['shelter']
['your']
['longing.\n\nFor']
['that']
['which']
['is']
['boundless']
['in']
['you\nabides']
['in']
['the']
['mansion']
['of']
['the']
['sky,']
['whose\ndoor']
['is']
['the']
['morning']
['mist,']
['and']
['whose\nwindows']
['are']
['the']
['songs']
['and']
['the']
['silences\nof']
['night.\n\n*****']
['*****\n\n']
['the']
['weaver']
['said,']
['Speak']
['to']
['us']
['of\n_Clothes_.\n\nAnd']
['he']
['answered:\n\nYour']
['clothes']
['conceal']
['much']
['of']
['your\nbeauty,']
['yet']
['they']
['hide']
['not']
['the\nunbeautiful.\n\nAnd']
['though']
['you']
['seek']
['in']
['garments']
['the\nfreedom']
['of']
['privacy']
['you']
['may']
['find']
['in']
['them\na']
['harnes

['your\nlife']
['and']
['yet']
['you']
['rise']
['above']
['them']
['naked\nand']
['unbound.\n\n*****\n\nAnd']
['how']
['shall']
['you']
['rise']
['beyond']
['your']
['days\nand']
['nights']
['unless']
['you']
['break']
['the\nchains']
['which']
['you']
['at']
['the']
['dawn']
['of']
['your\nunderstanding']
['have']
['fastened']
['around']
['your\nnoon']
['hour?\n\nIn']
['truth']
['that']
['which']
['you']
['call']
['freedom']
['is\nthe']
['strongest']
['of']
['these']
['chains,']
['though\nits']
['links']
['glitter']
['in']
['the']
['sun']
['and']
['dazzle\nyour']
['eyes.\n\nAnd']
['what']
['is']
['it']
['but']
['fragments']
['of']
['your\nown']
['self']
['you']
['would']
['discard']
['that']
['you']
['may\nbecome']
['free?\n\nIf']
['it']
['is']
['an']
['unjust']
['law']
['you']
['would\nabolish,']
['that']
['law']
['was']
['written']
['with']
['your\nown']
['hand']
['upon']
['your']
['own']
['forehead.\n\nYou']
['cannot']
['erase']
['it']
['by']
['burning']
['your']
['law\nbooks']
['

['of']
['that']
['first']
['moment\nwhich']
['scattered']
['the']
['stars']
['into']
['space.\n']
['among']
['you']
['does']
['not']
['feel']
['that']
['his\npower']
['to']
['love']
['is']
['boundless?\n\nAnd']
['yet']
['who']
['does']
['not']
['feel']
['that']
['very\nlove,']
['though']
['boundless,']
['encompassed\nwithin']
['the']
['centre']
['of']
['his']
['being,']
['and\nmoving']
['not']
['from']
['love']
['thought']
['to']
['love\nthought,']
['nor']
['from']
['love']
['deeds']
['to']
['other\nlove']
['deeds?\n\nAnd']
['is']
['not']
['time']
['even']
['as']
['love']
['is,\nundivided']
['and']
['paceless?\n\n*****\n\nBut']
['if']
['in']
['your']
['thought']
['you']
['must']
['measure\ntime']
['into']
['seasons,']
['let']
['each']
['season\nencircle']
['all']
['the']
['other']
['seasons,\n\nAnd']
['let']
['today']
['embrace']
['the']
['past']
['with\nremembrance']
['and']
['the']
['future']
['with']
['longing.\n\n*****']
['*****\n\n']
['one']
['of']
['the']
['elders']
['of']
['the'

['her']
['holy']
['face.\n\nBut']
['you']
['are']
['life']
['and']
['you']
['are']
['the']
['veil.\n']
['is']
['eternity']
['gazing']
['at']
['itself\nin']
['a']
['mirror.\n\nBut']
['you']
['are']
['eternity']
['and']
['you']
['are']
['the\nmirror.\n\n*****']
['*****\n\n']
['an']
['old']
['priest']
['said,']
['Speak']
['to']
['us\nof']
['_Religion_.\n\nAnd']
['he']
['said:\n\nHave']
['I']
['spoken']
['this']
['day']
['of']
['aught']
['else?\n\nIs']
['not']
['religion']
['all']
['deeds']
['and']
['all\nreflection,\n\nAnd']
['that']
['which']
['is']
['neither']
['deed']
['nor\nreflection,']
['but']
['a']
['wonder']
['and']
['a']
['surprise\never']
['springing']
['in']
['the']
['soul,']
['even']
['while\nthe']
['hands']
['hew']
['the']
['stone']
['or']
['tend']
['the\nloom?\n\nWho']
['can']
['separate']
['his']
['faith']
['from\nhis']
['actions,']
['or']
['his']
['belief']
['from']
['his\noccupations?\n\nWho']
['can']
['spread']
['his']
['hours']
['before']
['him,\nsaving,']
['“This']
['f

['come']
['to']
['the']
['fountain\nto']
['drink']
['I']
['find']
['the']
['living']
['water']
['itself\nthirsty;\n\nAnd']
['it']
['drinks']
['me']
['while']
['I']
['drink']
['it.\n\n*****\n\nSome']
['of']
['you']
['have']
['deemed']
['me']
['proud']
['and\nover-shy']
['to']
['receive']
['gifts.\n\nToo']
['proud']
['indeed']
['am']
['I']
['to']
['receive']
['wages,\nbut']
['not']
['gifts.\n\nAnd']
['though']
['I']
['have']
['eaten']
['berries']
['among\nthe']
['hills']
['when']
['you']
['would']
['have']
['had']
['me']
['sit\nat']
['your']
['board,\n\nAnd']
['slept']
['in']
['the']
['portico']
['of']
['the']
['temple\nwhen']
['you']
['would']
['gladly']
['have']
['sheltered']
['me,\n\nYet']
['was']
['it']
['not']
['your']
['loving']
['mindfulness\nof']
['my']
['days']
['and']
['my']
['nights']
['that']
['made']
['food\nsweet']
['to']
['my']
['mouth']
['and']
['girdled']
['my']
['sleep\nwith']
['visions?\n\nFor']
['this']
['I']
['bless']
['you']
['most:\n\nYou']
['give']
['much']
['and'

['laws']
['of']
['the']
['place']
['where']
['you']
['are']
['located']
['also']
['govern\nwhat']
['you']
['can']
['do']
['with']
['this']
['work.']
['Copyright']
['laws']
['in']
['most']
['countries']
['are\nin']
['a']
['constant']
['state']
['of']
['change.']
['If']
['you']
['are']
['outside']
['the']
['United']
['States,\ncheck']
['the']
['laws']
['of']
['your']
['country']
['in']
['addition']
['to']
['the']
['terms']
['of']
['this\nagreement']
['before']
['downloading,']
['copying,']
['displaying,']
['performing,\ndistributing']
['or']
['creating']
['derivative']
['works']
['based']
['on']
['this']
['work']
['or']
['any\nother']
['Project']
['Gutenberg-tm']
['work.']
['The']
['Foundation']
['makes']
['no\nrepresentations']
['concerning']
['the']
['copyright']
['status']
['of']
['any']
['work']
['in']
['any\ncountry']
['outside']
['the']
['United']
['States.\n\n1.E.']
['Unless']
['you']
['have']
['removed']
['all']
['references']
['to']
['Project']
['Gutenberg:\n\n1.E.1.']
['The']
[

['of']
['the']
['efforts']
['of']
['hundreds']
['of']
['volunteers']
['and']
['donations\nfrom']
['people']
['in']
['all']
['walks']
['of']
['life.\n\nVolunteers']
['and']
['financial']
['support']
['to']
['provide']
['volunteers']
['with']
['the\nassistance']
['they']
['need']
['are']
['critical']
['to']
['reaching']
['Project']
["Gutenberg-tm's\ngoals"]
['and']
['ensuring']
['that']
['the']
['Project']
['Gutenberg-tm']
['collection']
['will\nremain']
['freely']
['available']
['for']
['generations']
['to']
['come.']
['In']
['2001,']
['the']
['Project\nGutenberg']
['Literary']
['Archive']
['Foundation']
['was']
['created']
['to']
['provide']
['a']
['secure\nand']
['permanent']
['future']
['for']
['Project']
['Gutenberg-tm']
['and']
['future\ngenerations.']
['To']
['learn']
['more']
['about']
['the']
['Project']
['Gutenberg']
['Literary\nArchive']
['Foundation']
['and']
['how']
['your']
['efforts']
['and']
['donations']
['can']
['help,']
['see\nSections']
['3']
['and']
['4']
['and']
['t

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\n`. Write your function in the cell below.

In [76]:
def line_break(x):
    '''
    Input: A string
    Output: A list of strings split on the line break (\n) character
        
    Example:
    Input: 'the\nbeloved'
    Output: ['the', 'beloved']
    '''
    
    return x.split()

In [77]:
line_break('the\nbeloved')

['the', 'beloved']

Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`.

In [79]:
prophet_line = list(map(line_break, prophet_reference))
prophet_line

[]

If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`.

In [None]:
# your code here

# Challenge 2 - Filtering

When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise.

In [80]:
def word_filter(x):
    '''
    Input: A string
    Output: True if the word is not in the specified list 
    and False if the word is in the list.
        
    Example:
    word list = ['and', 'the']
    Input: 'and'
    Output: False
    
    Input: 'John'
    Output: True
    '''
    
    word_list = ['and', 'the', 'a', 'an']
    
    return False if x in word_list else True

In [81]:
prophet_filter = list(filter(word_filter, prophet_flat))

NameError: name 'prophet_flat' is not defined

Use the `filter()` function to filter out the words speficied in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`.

# Bonus Challenge

Rewrite the `word_filter` function above to not be case sensitive.

In [82]:
def word_filter_case(x):
   
    word_list = ['and', 'the', 'a', 'an']
    
    return False if x.lower in word_list else True

In [85]:
prophet_filter = list(filter(word_filter, prophet_flat))

NameError: name 'prophet_flat' is not defined

# Challenge 3 - Reducing

#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. 

We will start by writing a function that takes two strings and concatenates them together with a space between the two strings.

In [86]:
def concat_space(a, b):
    '''
    Input:Two strings
    Output: A single string separated by a space
        
    Example:
    Input: 'John', 'Smith'
    Output: 'John Smith'
    '''
    
    return a+' '+b

Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`.

In [87]:
prophet_string=reduce(concat_space, prophet_filter)

NameError: name 'prophet_filter' is not defined

# Challenge 4 - Applying Functions to DataFrames

#### Our next step is to use the apply function to a dataframe and transform all cells.

To do this, we will connect to Ironhack's database and retrieve the data from the *pollution* database. Select the *beijing_pollution* table and retrieve its data.

In [None]:
# your code here

Let's look at the data using the `head()` function.

In [None]:
# your code here

The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below.

In [None]:
def hourly(x):
    '''
    Input: A numerical value
    Output: The value divided by 24
        
    Example:
    Input: 48
    Output: 2.0
    '''
    
    return x/24

Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`.

In [None]:
# your code here

#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.

Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation.

In [88]:
def sample_sd(x):
    '''
    Input: A Pandas series of values
    Output: the standard deviation divided by the number of elements in the series
        
    Example:
    Input: pd.Series([1,2,3,4])
    Output: 0.3726779962
    '''
    
    return np.std(x)/(x.count()-1)

In [None]:
pm25_hourly.apply(sample_sd)

In [89]:
pm25[['Iws', 'Is', 'Ir']].apply(sample_sd)

NameError: name 'pm25' is not defined