# Functions and Regular Expressions
## Functions
As we write more code in Python, we may find that we perform many of the same operations more than once. Rather than write the same loops multiple times, __functions__ allow us to perform those same operations in a condensed fashion, provided the parameters are similar. We might think of __functions__ as being akin to the saying "You can give a person fish and feed them for the day or teach them how to fish and feed them for a lifetime"; while fishing will be different depending on where and how one fishes, there are some universal parameters to fishing: a body of water, a means of extracting a fish from the water, and the presence fish. If we know the parameters in Python, we can create functions that will let us repeat operations.

Today, we're using four poems that will all need to be preprocessed identically. Creating functions to perform this preprocessing will guarantee uniform results.

In [None]:
###Spring Poems###

WSM_Late_Spring = "Coming into the high room again after years/after oceans and shadows of hills and the/sounds of lies/after losses and feet on stairs/after looking and mistakes and forgetting/turning there thinking to find/no one except those I knew/finally I saw you/sitting in white/already waiting/you of whom I had heard/with my own ears since the beginning/for whom more than once/I had opened the door/believing you were not far"
ED_LXXXV = "A light exists in Spring/Not present on the year/At any other period/When March is scarely here/A color stands abroad/On solitary hills/That silence cannot overtake,/But human nature feels./It waits upon the lawn;/It shows the furthest tree/Upon the furthest slope we know;It almost speaks to me./Then,as horizons step,/Or noons report away./Without the formula of sound,/It passes, and we stay:/A quality of loss/Affecting our content,/As trade has suddenly encroached/Upon a sacrament"

###Fall Poems###
CS_Written_in_October = "The Blasts of Autumn as they scatter round/The faded foilage of another year,/And muttering many a sad and solemn sound,/Drive the pale fragments over the stubble sere,/Are well attuned to my dejected mood;/(Ah! better far than airs that breathe of Spring!)/While the high rooks, that horsely clamouring/Seek in the black phalanx the half-leafless wood,/I rather hear, than that enraptured lay/Harmonious, and of Lover and Pleasure born,/ Which from the golden furze, or flowering thorn/Awakes the Shepherd in the ides of May;/Nature delights me most when she mourns,/For never more to me the Spring of Hope returns!"

JK_To_Autumn = "Season of mists and mellow fruitfulness,/Close bosom-friend of the maturing sun;/Conspiring with him how to load and bless/With fruit the vines that round the thatch-eves run;/To bend with apples the moss'd cottage-trees,/And fill all fruit with ripeness to the core;/To swell the gourd, and plump the hazel shells;With a sweet kernel; or set budding more,/And still more, later flowers for the bees,/Until they think warm days will never cease,/For summer has o'erbrimm'd their clammy cells./Who hath not seen thee oft amid thy store?/Sometimes whoever seeks abroad may find/Thee sitting careless on a granary floor,/Thy hair soft-lifted by the winnowing wind;/Or on a half-reap'd furrow sound asleep,/Drows'd with the fume of poppies, while thy hook/Spares the next swath and all its twined flowers:/And sometimes like a gleaner thou dost keep/Steady they laden head across a brook;/Or by a cyder-press, with patient look,/Thou watchest the last oozings hours by hours./Where are the songs of spring? Ay, Where are they?/ Think not of them, thou has thy mustic too,/While barred clouds bloom the soft-dying day,/And touch the stubble-plains with rosy hue;/ Then in a wailful choir the small gnats mourn/Among the river sallows, bourn aloft/ Or sinking as the light wind lives or dies;/And full-grown lambs loud bleat from hilly bourn;/Hedge-crickets sing; and now with treble soft/The red-breast whistles from a garden-croft;/And gathering swallows twitter in the skies."

As you can see, each line of the poem is delimited by a slash. Each poem is currently a single string and we'll want to divide it by line number and then by individual word.

For the first step, we would typically use Python's .split() function. Since all of these poems are delimited by the same character, however, we're going to soup up the .split() function into a more useful, purpose-driven function.

In [None]:
def poem_splitter(poem):
	return(poem.split('/'))

To define a function in Python, we first type in def, followed by a name for the function. After the name, we include the parameters for what the function will accept. For example, the function defined as wash_clothing may have the parameter (quarters) or a more elaborate version may have several parameters, such as (quarters, detergent, dirty_laundry,existing_supply_of_clean_clothing). Here, we accept (poem), which is really the same as just accepting a single string. Choosing appropriately named variables, however, will help others look at and understand the code.

After we've defined our parameters, we need to determine what we want to do with that information. Since I want to split the poem (a string) into separate lines (a list), I'm having python return a list where the poem is split by the '/' character. Let's see what this will give us.

In [None]:
wsm_split = poem_splitter(WSM_Late_Spring)
for line in wsm_split:
    print(line)

Looks like a poem alright! Use the poem_splitter function here to turn the other three poems into split versions.

In [None]:
ed_split =
cs_split =
jk_split =

While these lines are excellent, today we're interested in the individual words that appear within the poem. As such, we need to take one more step in preprocessing our poems: tokenizing. 

We've kind of tokenized in the past by using the split() function, namely by splitting at white space characters. Today, however, we're going to gently introduce you to __regular expressions__. Regular Expressions are a tool that allow you to find match and find character strings based on far more sophisticated techniques than seeing if one string equals the other. Notably __regular expressions__ allow you to include wild card characters and give directions about the positionality of a match based on others. You might find opening up regex101.com in an alternate window helpful. Let's try this out.

In [None]:
import re
secret_message = "t$%11x349t;]g*r21o?>u<-p"
wp = re.compile("[a-z]")
print(''.join(wp.findall(secret_message)))


The code above uses a regular expression to find all of the characters between lowercase a and z in a string, joining them to reveal the secret message. This is only the tip of the iceberg when it comes to regular expression, but today we'll be using them to delimit word tokens.

In [None]:
def tokenizer(poem_split):
	word_pattern = re.compile("\w[\w\-\']*\w|\w")
	tagged_poem = []
	for line in range(len(poem_split)):
		t = poem_split[line].rstrip()
		word = word_pattern.findall(t)
		vital_info = [word, line]
		tagged_poem.append(vital_info)
	return(tagged_poem)

Let's break down the code above. First, the tokenizer function takes in the split poem as a parameter. The second line introduces a word pattern that counts words surrounded by whitespace (including those that are hyphenated or feature an apostrophe) as a single word token. Next, we create an empty list where we'll create our tagged poem.

A for-loop follows. For every line in the split poem, we clean away excess whitespace and find word tokens based on our regular expression pattern. Those tokens are then joined in a list with the line number, helping us retain important metadata. The tagged_poem is returned as a result. Let's see how this looks on our split poems

In [None]:
wsm_tagged = tokenizer(wsm_split)
#ed_tagged = tokenizer(ed_split)
#cs_tagged = tokenizer(cs_split)
#jk_tagged = tokenizer(jk_split)

print(wsm_tagged)

Our poetry data is so tidy now! While this format may seem less readable to human eyes, it allows us to start doing some pretty interesting things regarding both textual analysis and literary criticism.

One of the hallmarks of narratology, a midcentury branch of English criticism, is finding out who the speaker of a poem is. Rather than attribute the voice to the poem's author, the speaker of the poem is typically a figure embedded within the world of the poem itself.

By identifying how and where pronouns appear in a poem, we can begin to make claims about how the poet's speaker articulates themself. Below is a function designed to help us find where pronouns appear in each poem.

In [None]:
def pronoun_finder(tagged_poem):
	Sing_First_Person = ["I", "Mine", "My", "Me"]
	Sing_Second_Person = ["You", "Your", "Thee", "Thy", "Thou"]
	Sing_Third_Person = ["He", "She", "It", "His", "Her", "Its", "Who", "Whom"]
	Plural_First_Person = ["We", "Our"]
	Plural_Second_Person = ["You", "Your"]
	Plural_Third_Person = ["They", "Their", "Those"]
	pronoun_terms = [Sing_First_Person, Sing_Second_Person, Sing_Third_Person, Plural_First_Person, Plural_Second_Person, Plural_Third_Person]
	for category in pronoun_terms:
		for word in category:
			word = word.lower()
			for cell in tagged_poem:
				line = cell[0]
				for token in line:
					token = token.lower()
					if token == word:
						print(cell[1], token, line)

In [None]:
pronoun_finder(wsm_tagged)

While the function itself takes lists of pronouns and searches for them within each line of the poem, we may have some issues with the results. For one, "You" provides some ambiguity regarding whether the poem is discussing a singular you or a group of them. Second, and this may be a feature, it splits the lines based on the term its looking for rather than show us which terms each line features. Still, let's see if this provides any insight into the poems.

In [None]:
pronoun_finder(wsm_tagged)
pronoun_finder(ed_tagged)
pronoun_finder(cs_tagged)
pronoun_finder(jk_tagged)

Working together, let's create two similar functions that will provide us information about where month names appear and also how many prepositions appear in each poem.

In [None]:
def month_finder(tagged_poem):
Month_Names = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]

In [None]:
def prep_finder(tagged_poem): 
	preps = ["about", "below", "excepting", "off", "toward", "above", "beneath", "for", "on", "under", "across", "beside", "besides", "from", "onto", "underneath", "after", "between", "in", "out", "until", "against", "beyond", "in front of", "outside", "up", "along", "but", "inside", "over", "upon", "among", "by", "in spite of", "past", "up to", "around", "concerning", "instead of", "regarding", "with", "at", "despite", "into", "since", "within", "because", "down", "like", "through", "without", "before", "during", "near", "throughout", "with", "behind", "except", "of", "to"]

## Challenge ##
Using regular expressions, develop a function that will search for verbs. One approach is to search for words after nouns that feature "ing", "ed", "s" or other typical verb endings; using the preposition finder should help you find some nouns.

*Note* This challenge is extremely difficult and it's unlikely you'll have complete accuracy finding verbs unless you import an NLP package. Still, it's worth testing which approaches garner the best results.