# Before your start:
- Read the README.md file
- Comment as much as you can and use the resources in the README.md file
- Happy learning!

In [1]:
#Import reduce from functools, numpy and pandas
from functools import reduce 
import numpy as np
import pandas as pd

# Challenge 1 - Mapping

#### We will use the map function to clean up a words in a book.

In the following cell, we will read a text file containing the book The Prophet by Khalil Gibran.

In [2]:
location = '../58585-0.txt'
with open(location, 'r', encoding="utf8") as f:
    prophet = f.read().split(' ')

#### Let's remove the first 568 words since they contain information about the book but are not part of the book itself. 

Do this by removing from `prophet` elements 0 through 567 of the list (you can also do this by keeping elements 568 through the last element).

In [3]:
del prophet[0:567]

If you look through the words, you will find that many words have a reference attached to them. For example, let's look at words 1 through 10.

In [4]:
prophet[0:10]

['Farewell................92\n\n\n\n\nTHE',
 'PROPHET\n\n|Almustafa,',
 'the{7}',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn']

#### The next step is to create a function that will remove references. 

We will do this by splitting the string on the `{` character and keeping only the part before this character. Write your function below.

In [5]:
def removeReferences(string):
    return string.split("{")[0] #Split string in two parts and keep the first one
    
#Test
removeReferences("the{7}")

'the'

Now that we have our function, use the `map()` function to apply this function to our book, The Prophet. Return the resulting list to a new list called `prophet_reference`

In [6]:
#Without map() function (Guido likes that)
prophet_reference = [string.split("{")[0] for string in prophet]
    
#Test
prophet_reference

['Farewell................92\n\n\n\n\nTHE',
 'PROPHET\n\n|Almustafa,',
 'the',
 'chosen',
 'and',
 'the\nbeloved,',
 'who',
 'was',
 'a',
 'dawn',
 'unto',
 'his',
 'own\nday,',
 'had',
 'waited',
 'twelve',
 'years',
 'in',
 'the',
 'city\nof',
 'Orphalese',
 'for',
 'his',
 'ship',
 'that',
 'was',
 'to\nreturn',
 'and',
 'bear',
 'him',
 'back',
 'to',
 'the',
 'isle',
 'of\nhis',
 'birth.\n\nAnd',
 'in',
 'the',
 'twelfth',
 'year,',
 'on',
 'the',
 'seventh\nday',
 'of',
 'Ielool,',
 'the',
 'month',
 'of',
 'reaping,',
 'he\nclimbed',
 'the',
 'hill',
 'without',
 'the',
 'city',
 'walls\nand',
 'looked',
 'seaward;',
 'and',
 'he',
 'beheld',
 'his\nship',
 'coming',
 'with',
 'the',
 'mist.\n\nThen',
 'the',
 'gates',
 'of',
 'his',
 'heart',
 'were',
 'flung\nopen,',
 'and',
 'his',
 'joy',
 'flew',
 'far',
 'over',
 'the',
 'sea.\nAnd',
 'he',
 'closed',
 'his',
 'eyes',
 'and',
 'prayed',
 'in',
 'the\nsilences',
 'of',
 'his',
 'soul.\n\n*****\n\nBut',
 'as',
 'he',
 'desce

Another thing you may have noticed is that some words contain a line break. Let's write a function to split those words. Our function will return the string split on the character `\n`. Write your function in the cell below.

In [8]:
def line_break(string):
    return string.split("\n") #Split strings with '\n' in two parts
    
#Test
line_break("own\nday")    

['own', 'day']

Apply the `line_break` function to the `prophet_reference` list. Name the new list `prophet_line`.

In [9]:
#Without map() function
prophet_line = [line_break(string) for string in prophet_reference]

#Test
prophet_line

[['Farewell................92', '', '', '', '', 'THE'],
 ['PROPHET', '', '|Almustafa,'],
 ['the'],
 ['chosen'],
 ['and'],
 ['the', 'beloved,'],
 ['who'],
 ['was'],
 ['a'],
 ['dawn'],
 ['unto'],
 ['his'],
 ['own', 'day,'],
 ['had'],
 ['waited'],
 ['twelve'],
 ['years'],
 ['in'],
 ['the'],
 ['city', 'of'],
 ['Orphalese'],
 ['for'],
 ['his'],
 ['ship'],
 ['that'],
 ['was'],
 ['to', 'return'],
 ['and'],
 ['bear'],
 ['him'],
 ['back'],
 ['to'],
 ['the'],
 ['isle'],
 ['of', 'his'],
 ['birth.', '', 'And'],
 ['in'],
 ['the'],
 ['twelfth'],
 ['year,'],
 ['on'],
 ['the'],
 ['seventh', 'day'],
 ['of'],
 ['Ielool,'],
 ['the'],
 ['month'],
 ['of'],
 ['reaping,'],
 ['he', 'climbed'],
 ['the'],
 ['hill'],
 ['without'],
 ['the'],
 ['city'],
 ['walls', 'and'],
 ['looked'],
 ['seaward;'],
 ['and'],
 ['he'],
 ['beheld'],
 ['his', 'ship'],
 ['coming'],
 ['with'],
 ['the'],
 ['mist.', '', 'Then'],
 ['the'],
 ['gates'],
 ['of'],
 ['his'],
 ['heart'],
 ['were'],
 ['flung', 'open,'],
 ['and'],
 ['his'],
 ['jo

If you look at the elements of `prophet_line`, you will see that the function returned lists and not strings. Our list is now a list of lists. Flatten the list using list comprehension. Assign this new list to `prophet_flat`.

In [30]:
#Go through each sub-item and add it to the new list
prophet_flat = [subitem for item in prophet_line for subitem in item]
    
#Test
print(prophet_flat)

['Farewell................92', '', '', '', '', 'THE', 'PROPHET', '', '|Almustafa,', 'the', 'chosen', 'and', 'the', 'beloved,', 'who', 'was', 'a', 'dawn', 'unto', 'his', 'own', 'day,', 'had', 'waited', 'twelve', 'years', 'in', 'the', 'city', 'of', 'Orphalese', 'for', 'his', 'ship', 'that', 'was', 'to', 'return', 'and', 'bear', 'him', 'back', 'to', 'the', 'isle', 'of', 'his', 'birth.', '', 'And', 'in', 'the', 'twelfth', 'year,', 'on', 'the', 'seventh', 'day', 'of', 'Ielool,', 'the', 'month', 'of', 'reaping,', 'he', 'climbed', 'the', 'hill', 'without', 'the', 'city', 'walls', 'and', 'looked', 'seaward;', 'and', 'he', 'beheld', 'his', 'ship', 'coming', 'with', 'the', 'mist.', '', 'Then', 'the', 'gates', 'of', 'his', 'heart', 'were', 'flung', 'open,', 'and', 'his', 'joy', 'flew', 'far', 'over', 'the', 'sea.', 'And', 'he', 'closed', 'his', 'eyes', 'and', 'prayed', 'in', 'the', 'silences', 'of', 'his', 'soul.', '', '*****', '', 'But', 'as', 'he', 'descended', 'the', 'hill,', 'a', 'sadness', '

# Challenge 2 - Filtering

When printing out a few words from the book, we see that there are words that we may not want to keep if we choose to analyze the corpus of text. Below is a list of words that we would like to get rid of. Create a function that will return false if it contains a word from the list of words specified and true otherwise.

In [37]:
def word_filter(word):
    '''
    Input: A string
    Output: true if the word is not in the specified list and false if the word is in the list
        
    Example:
    word list = ['and', 'the']
    Input: 'and'
    Output: False
    
    Input: 'John'
    Output: True
    '''
    
    word_list = ['and', 'the', 'a', 'an']
    
    if word in word_list:
        return False
    else:
        return True

Use the `filter()` function to filter out the words specified in the `word_filter()` function. Store the filtered list in the variable `prophet_filter`.

In [49]:
prophet_filter = filter(word_filter, prophet_flat)

#Test
for word in prophet_filter:
    print(word)

Farewell................92




THE
PROPHET

|Almustafa,
chosen
beloved,
who
was
dawn
unto
his
own
day,
had
waited
twelve
years
in
city
of
Orphalese
for
his
ship
that
was
to
return
bear
him
back
to
isle
of
his
birth.

And
in
twelfth
year,
on
seventh
day
of
Ielool,
month
of
reaping,
he
climbed
hill
without
city
walls
looked
seaward;
he
beheld
his
ship
coming
with
mist.

Then
gates
of
his
heart
were
flung
open,
his
joy
flew
far
over
sea.
And
he
closed
his
eyes
prayed
in
silences
of
his
soul.

*****

But
as
he
descended
hill,
sadness
came
upon
him,
he
thought
in
his
heart:

How
shall
I
go
in
peace
without
sorrow?
Nay,
not
without
wound
in
spirit
shall
I
leave
this
city.

days
of
pain
I
have
spent
within
its
walls,
long
were
nights
of
aloneness;
who
can
depart
from
his
pain
his
aloneness
without
regret?

Too
many
fragments
of
spirit
have
I
scattered
in
these
streets,
too
many
are
children
of
my
longing
that
walk
naked
among
these
hills,
I
cannot
withdraw
from
them
without
burden
ache.

It
i

ears
to
voices
of
day
voices
of
night.

*****
*****


woman
said,
Speak
to
us
of
_Joy
Sorrow_.

And
he
answered:

Your
joy
is
your
sorrow
unmasked.

And
selfsame
well
from
which
your
laughter
rises
was
oftentimes
filled
with
your
tears.

And
how
else
can
it
be?

The
deeper
that
sorrow
carves
into
your
being,
more
joy
you
can
contain.

Is
not
cup
that
holds
your
wine
very
cup
that
was
burned
in
potter’s
oven?

And
is
not
lute
that
soothes
your
spirit,
very
wood
that
was
hollowed
with
knives?

When
you
are
joyous,
look
deep
into
your
heart
you
shall
find
it
is
only
that
which
has
given
you
sorrow
that
is
giving
you
joy.

When
you
are
sorrowful
look
again
in

heart,
you
shall
see
that
in
truth
you
are
weeping
for
that
which
has
been
your
delight.

*****

Some
of
you
say,
“Joy
is
greater
than
sorrow,”
others
say,
“Nay,
sorrow
is
greater.”

But
I
say
unto
you,
they
are
inseparable.

Together
they
come,
when
one
sits
alone
with
you
at
your
board,
remember
that
other
is
asleep
upon
your
bed.


meet
your
friend
on
roadside
or
in
market
place,
let
spirit
in
you
move
your
lips
direct
your
tongue.

Let
voice
within
your
voice
speak
to
ear
of
his
ear;

For
his
soul
will
keep
truth
of
your
heart
as
taste
of
wine
is
remembered

When
colour
is
forgotten
vessel
is
no
more.

*****
*****


astronomer
said,
Master,
what
of
_Time_?

And
he
answered:

You
would
measure
time
measureless
immeasurable.

You
would
adjust
your
conduct
even
direct
course
of
your
spirit
according
to
hours
seasons.

Of
time
you
would
make
stream
upon
whose
bank
you
would
sit
watch
its
flowing.

Yet
timeless
in
you
is
aware
of
life’s
timelessness,

And
knows
that
yesterday
is
but
today’s
memory
tomorrow
is
today’s
dream.

And
that
that
which
sings
contemplates
in
you
is
still
dwelling
within
bounds
of
that
first
moment
which
scattered
stars
into
space.

among
you
does
not
feel
that
his
power
to
love
is
boundless?

And
yet
who
does
not
feel
that
very
love,
though
boundless,
encompassed
within
centre
of
his
being,
m

holds
council
with
trees
of
forest,
but
not
with
men.

He
sits
alone
on
hill-tops
looks
down
upon
our
city.”

True
it
is
that
I
have
climbed
hills
walked
in
remote
places.

How
could
I
have
seen
you
save
from
great
height
or
great
distance?

How
can
one
be
indeed
near
unless
he
be
tar?

And
others
among
you
called
unto
me,
not
in
words,
they
said,

“Stranger,
stranger,
lover
of
unreachable
heights,
why
dwell
you
among
summits
where
eagles
build
their
nests?

seek
you
unattainable?

What
storms
would
you
trap
in
your
net,

And
what
vaporous
birds
do
you
hunt
in
sky?

Come
be
one
of
us.

Descend
appease
your
hunger
with
our
bread
quench
your
thirst
with
our
wine.”

In
solitude
of
their
souls
they
said
these
things;

But
were
their
solitude
deeper
they
would
have
known
that
I
sought
but
secret
of
your
joy
your
pain,

And
I
hunted
only
your
larger
selves
that
walk
sky.

*****

But
hunter
was
also
hunted;

For
many
of
my
arrows
left
my
bow
only
to
seek
my
own
breast.

And
flier
was
also
cre

Project
Gutenberg-tm
electronic
works,
harmless
from
all
liability,
costs
expenses,
including
legal
fees,
that
arise
directly
or
indirectly
from
any
of
following
which
you
do
or
cause
to
occur:
(a)
distribution
of
this
or
any
Project
Gutenberg-tm
work,
(b)
alteration,
modification,
or
additions
or
deletions
to
any
Project
Gutenberg-tm
work,
(c)
any
Defect
you
cause.

Section
2.
Information
about
Mission
of
Project
Gutenberg-tm

Project
Gutenberg-tm
is
synonymous
with
free
distribution
of
electronic
works
in
formats
readable
by
widest
variety
of
computers
including
obsolete,
old,
middle-aged
new
computers.
It
exists
because
of
efforts
of
hundreds
of
volunteers
donations
from
people
in
all
walks
of
life.

Volunteers
financial
support
to
provide
volunteers
with
assistance
they
need
are
critical
to
reaching
Project
Gutenberg-tm's
goals
ensuring
that
Project
Gutenberg-tm
collection
will
remain
freely
available
for
generations
to
come.
In
2001,
Project
Gutenberg
Literary
Archive
Foundation
w

# Bonus Challenge - Part 1

Rewrite the `word_filter` function above to not be case sensitive.

In [52]:
def word_filter_case(x):
   
    word_list = ['and', 'the', 'a', 'an']
    
    if word.lower() in word_list:
        return False
    else:
        return True

In [58]:
#Rewrite prophet_filter, now being case insensitive
prophet_filter = filter(word_filter_case, prophet_flat)

# Challenge 3 - Reducing

#### Now that we have significantly cleaned up our text corpus, let's use the `reduce()` function to put the words back together into one long string separated by spaces. 

We will start by writing a function that takes two strings and concatenates them together with a space between the two strings.

In [42]:
def concat_space(a, b):
    '''
    Input: Two strings
    Output: A single string separated by a space
        
    Example:
    Input: 'John', 'Smith'
    Output: 'John Smith'
    '''
    
    sentence = a + " " + b
    return(sentence)

Use the function above to reduce the text corpus in the list `prophet_filter` into a single string. Assign this new string to the variable `prophet_string`.

In [47]:
prophet_string = reduce(concat_space, prophet_filter)

TypeError: reduce() of empty sequence with no initial value

# Challenge 4 - Applying Functions to DataFrames

#### Our next step is to use the apply function to a dataframe and transform all cells.

To do this, we will load a dataset below and then write a function that will perform the transformation.

In [17]:
# Run this code:

# The dataset below contains information about pollution from PM2.5 particles in Beijing 

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00381/PRSA_data_2010.1.1-2014.12.31.csv"
pm25 = pd.read_csv(url)

Let's look at the data using the `head()` function.

In [18]:
# Your code here:



The next step is to create a function that divides a cell by 24 to produce an hourly figure. Write the function below.

In [19]:
def hourly(x):
    '''
    Input: A numerical value
    Output: The value divided by 24
        
    Example:
    Input: 48
    Output: 2.0
    '''
    
    # Your code here:
    

Apply this function to the columns `Iws`, `Is`, and `Ir`. Store this new dataframe in the variable `pm25_hourly`.

In [20]:
# Your code here:



#### Our last challenge will be to create an aggregate function and apply it to a select group of columns in our dataframe.

Write a function that returns the standard deviation of a column divided by the length of a column minus 1. Since we are using pandas, do not use the `len()` function. One alternative is to use `count()`. Also, use the numpy version of standard deviation.

In [21]:
def sample_sd(x):
    '''
    Input: A Pandas series of values
    Output: the standard deviation divided by the number of elements in the series
        
    Example:
    Input: pd.Series([1,2,3,4])
    Output: 0.3726779962
    '''
    
    # Your code here:
    
    