The purpose of this project is to perform text analysis on Robert Frost's poem "Home Burial" to Edgar Allen Poe's "Annabel Lee" using Python 3. Using Python, we will be extracting the most frequently used words in each poem and then comparing the results. The theme of each poem centers around death and love. In Forst's "Home Burial," the narrative focuses on the collapse of a marriage caused by the death of the couple's young child. In Poe's "Annabel Lee," the narrative focuses on the love the narrator shared with a woman named, Annabel, who past away. Although there are overlapping themes within each poem, we expect Frost's "Home Burial" will have more negative words than Poe's "Annabel Lee" within the 10 top words most frequently used. We believe "Home Burial" will have more negative words since the poem focuses on marital conflict, while "Annabel Lee" focuses on fond memories the narrator shared.

#### Python Process :

In [94]:
import operator
import time
import string
import re
from collections import defaultdict
from nltk.corpus import stopwords

In [95]:
try:
    stops = stopwords.words('english')
except LookupError:
      import nltk
      nltk.download('stopwords')
      stops = stopwords.words('english')

> The try and except block will try to load the stopwords. If the stopwords are not installed then it will install them and load them.

In [96]:
stops = set(stops) 

> The code above checks something is in a set than a list.

In [97]:
f = open('homeburial_clean.txt', 'rb')

> We use the open method to open the Home Burial file. The 'r' argument reads the file and the 'b' argument tells Python to read the file as bytes. Note, the poem was downloaded from gutenberg.org and text file was cleaned to get rid of errorenous any text using UNIX.

In [98]:
start = time.time()
frost = defaultdict(int)
punc = string.punctuation
for line in f:
    #remove punctuation
    cln_line = re.sub('[' + punc + ']', '', line.decode('utf‐8'))
    cln_line = cln_line.lower()  # convert to lowercase
    spl_line = cln_line.split()  # splits a string, by default on spaces
    for word in spl_line:
        #if the word is a stopword then the loop continues without going further
        if word in stops:
            continue
        frost.setdefault(word, 0) # set word to 0 if it is not in frost
        frost[word] += 1 #increase the value by 1

> The loop removes punctuation, transforms all words to lowercase, splits the text into lines, and then counts each word. If it has already counted a word, then the value increases by 1 for that word.

In [99]:
sorted_frost = sorted(frost.items(), key=operator.itemgetter(1), reverse=True)
elapsed = time.time()-start
print('Run took', elapsed, ' seconds.')
print('Number of distinct words: ', len(sorted_frost))

Run took 0.11928701400756836  seconds.
Number of distinct words:  299


> The code above sorts the word according to frequency.

In [100]:
top_n = 10
y=[]
for pair in range(top_n):
    y.append([sorted_frost[pair][1]])
    print(sorted_frost[pair])

('dont', 16)
('see', 8)
('know', 7)
('man', 7)
('go', 7)
('must', 6)
('cant', 6)
('time', 5)
('oh', 5)
('little', 5)


> The code above outputs the results for the top 10 words in Frost's poem "Home Burial."

In [102]:
f = open('annabellee_clean.txt', 'rb')

> We use the open method to open the Annabelle Lee file. The 'r' argument reads the file and the 'b' argument tells Python to read the file as bytes. Note, the poem was downloaded from gutenberg.org and text file was cleaned to get rid of errorenous any text using UNIX.

In [103]:
start = time.time()
poe = defaultdict(int)
punc = string.punctuation
for line in f:
    cln_line = re.sub('[' + punc + ']', '', line.decode('utf‐8')) #remove punctuation
    cln_line = cln_line.lower()  # convert to lowercase
    spl_line = cln_line.split()  # splits a string, by default on spaces
    for word in spl_line:
        #if the word is a stopword then the loop continues without going further
        if word in stops: 
            continue
        poe.setdefault(word, 0) # set word to 0 if it is not in poe
        poe[word] += 1 #increase the value by 1

In [104]:
sorted_poe = sorted(poe.items(), key=operator.itemgetter(1), reverse=True)
elapsed = time.time()-start
print('Run took', elapsed, ' seconds.')
print('Number of distinct words: ', len(sorted_poe))

Run took 0.13471293449401855  seconds.
Number of distinct words:  79


> The code above sorts the word according to frequency.

In [105]:
top_n = 10
y=[]
for pair in range(top_n):
    y.append([sorted_poe[pair][1]])
    print(sorted_poe[pair])

('sea', 8)
('annabel', 7)
('lee', 7)
('love', 6)
('kingdom', 5)
('beautiful', 4)
('many', 3)
('heaven', 3)
('ago', 2)
('maiden', 2)


> The code above outputs the results for the top 10 words in Poe's "Annabel Lee"

#### Conclusion:

Using Python, this project aims to perform text analysis on Robert Frost's poem "Home Burial" and compare it to Edgar Allen Poe's "Annabel Lee." We previously predicted that "Home Burial" would have more negative words within the top frequently used words than "Annabel Lee." The results reveal the top 10 words in "Home Burial" in descending order were: dont, see, know, man, go, must, cant, time, oh, time. The top ten words in Poe's "Annabel Lee" were: sea, annabel, love, kingdom, beautiful, many, heaven, ago, maiden. With these results, we can infer Poe's "Annabel Lee" had more frequently used positive words with words such as "love," "beautiful," and "heaven." While "Home burial" had more regularly used negative words with words such as "dont" and "cant."