# Homework 1: Dealing with text
The purpose of this homework is to calculate the grade-level of some text. 

One fun way to exercise your familiarity with Python is to process text. Let's do this by assessing the readablility of some text. Readability is often expressed in terms of grade level, e.g., This sentence is at a seventh grade reading level. For this assignment, you will write Python functions for calculating the readability grade level for files using two different measures.



*   The Flesch-Kincaid Grade Level Formula estimates grade level using the average number of words per sentence and the average number of syllables per word:
$\hbox{F-K grade level} =  0.39*avgWordsPerSentence + 11.8 * avg SyllablesPerWord - 15.59$
*   The SMOG (Simple Measure of Gobbledygook) Formula estimates grade level using the average number of complex words (i.e., words with three or more syllables) per sentence:
$\hbox{SMOG grade level} = 1.043*\sqrt{30*avgComplexWordsPerSentence} + 3.1291$







## Part 1:

Due to the complexity of the English language, identifying the ends of sentences and the number of syllables in a word can be tricky. To make these tasks manageable, we will make the following simplifications:

* We will assume that any word that ends in a period, exclamation point, or question mark (ignoring trailing quotation marks) is the end of a sentence. For example, the following paragraph contains three sentence:

    
        What? He told me to "Go away."
        So, I left as soon as I could.

* In general, we will assume that any sequence of consecutive vowels (including 'y') corresponds to a syllable. Thus, "h**ea**v**y**" has two syllables and "**I**t**a**l**ia**n" has three syllables. However, words whose last letter is an "e" are a special case. If the "e" is preceded by a vowel (e.g., "tr**ee**") or the letter "l" (e.g., "wh**i**stl**e**"), or if the "e" is the only vowel in the word (e.g., "th**e**"), then it counts as a syllable. Otherwise, the trailing "e" does not count as a syllable (e.g., "sp**i**te").

> Define a function named `isEndOfSentence` that has a single word as input. The function should return `True` if the word ends in a period, exclamation point, or question mark (ignoring trailing quotation marks). For example, `isEndOfSentence("What?")` should return `True`, while `isEndOfSentence("So,")` should return `False`. *Hint:* to ignore trailing quotation marks, use the string `rstrip` method. For example, the following assignment will strip trailing quotation marks off of a word and save the resulting string in `stripped`:




            stripped = word.rstrip("\"\'")
>Define a function named countSyllables that has a single word as input. The function should return the number of syllables in that word (using the above rules for estimating syllables). For example, `countSyllables("people")` should return 2, while `countSyllables("mezzanine")` should return 3.



In [None]:
def isEndOfSentence(word):
    """
    This is a function which takes a variable line and returns if the word 
    is the last one in the sentence.
    Attributes:
        word (string): This is the input variable to the function
    Output:
        returns a bool if this is the last word in the sentence
    """
    end_punct = ['.', '?', '!']
    stripped = word.rstrip("\"\'")
    return stripped[-1] in end_punct
isEndOfSentence("What!")

In [None]:
print('The sentence "What?" should return True by the function isEndOfSentence')
print('Testing "What?"', isEndOfSentence("What?"))
print()
print('The sentence "So," should return False by the function isEndOfSentence')
print('Testing "So,"', isEndOfSentence("So,"))

In [None]:
def countSyllables(word):
    """
    This is a function which takes as input a word and returns a heuristic
    estimate of the number of syllables in the word
    Attributes:
        word (string): This is the input variable to the function
    Output:
        This function returns an int which is the number of syllables
    Question:
        What if the word is invalid? What should you do?
    """
    vowels = ['a', 'e', 'i', 'o', 'u', 'y']
    nOs = 0
    for letter in range(len(word)):
      if word[letter] in vowels and word[letter - 1] not in vowels:
        nOs += 1
    if word[-1] == 'e' and word[-2] not in vowels and word[-2] != 'l' and nOs != 1:
        nOs -= 1
    return nOs

In [None]:
print('"people" should return two when you run countSyllables("people")')
print('Number of syllables is ', countSyllables("people"))
print()
print('"mezzanine" should return three when you run countSyllables("mezzanine")')
print('Number of syllables is', countSyllables("mezzanine"))

Now definte your own function which counts the number of sentences in the document.

In [None]:
def numberOfSentences(document):
  nOS = 0
  end_punct = ['.', '?', '!']
  for character in range(len(document)):
    if document[character] in end_punct and document[character + 1] not in end_punct:
      nOS += 1
  return nOS

Now define a function which calculates the number of words in a sentence.

In [None]:
def numberOfWords(sentence):
  nOW = len(sentence.split())
  return nOW

Now define a function which calculates the number of syllables in a sentence.

In [None]:
def syllablesInSentence(sentence):
  sIS = 0
  for word in sentence.split():
    sIS += countSyllables(word)
  return sIS

Finally, let's put it all together ...

In [None]:
document = """
We hold these truths to be self-evident, that all men are created equal, that 
they are endowed by their Creator with certain unalienable Rights, that among 
these are Life, Liberty and the pursuit of Happiness.--That to secure these 
rights, Governments are instituted among Men, deriving their just powers from 
the consent of the governed, --That whenever any Form of Government becomes 
destructive of these ends, it is the Right of the People to alter or to abolish 
it, and to institute new Government, laying its foundation on such principles 
and organizing its powers in such form, as to them shall seem most likely to 
effect their Safety and Happiness. Prudence, indeed, will dictate that 
Governments long established should not be changed for light and transient 
causes; and accordingly all experience hath shewn, that mankind are more 
disposed to suffer, while evils are sufferable, than to right themselves by 
abolishing the forms to which they are accustomed.
"""

Calculate the answer yourself by hand. Now check your code.

In [None]:
def fk_level(document):
  avgWordsPerSentence = numberOfWords(document) / numberOfSentences(document)
  avgSyllablesPerWord = syllablesInSentence(document) / numberOfWords(document)

  fk_level = 0.39 * avgWordsPerSentence + 11.8 * avgSyllablesPerWord - 15.59
  return fk_level

def smog_level(document):
  nOCW = 0
  words = document.split()
  for word in words:
    if countSyllables(word) >=3:
      nOCW += 1
  avgComplexWordsPerSentence = nOCW / numberOfSentences(document)

  smog_level = 1.043 * (30 * avgComplexWordsPerSentence)**0.5 + 3.1291
  return smog_level