### Discover insights into classic texts
<div data-testid="markdown" class="spacing-tight__2Gp7GTqG0TykPQ18OnUOVt markdown__1eeYJ4WPKUcvX_LDDGJR12"><p class="p__1qg33Igem5pAgn4kPMirjw">Novels and text contain insights into ideologies and places that are often originally unknown to the reader. By reading a written piece, you uncover the opinions of the author on their chosen topic and come to understand both the topic and how the author thinks.</p>
<p class="p__1qg33Igem5pAgn4kPMirjw">In this project you will perform a natural language parsing analysis to gain deeper insight into one of two famous and often discussed novels in the public domain: <a href="http://www.gutenberg.org/ebooks/174" target="_blank" rel="noopener" class="gamut-15hd59n-Anchor e14vpv2g0">Oscar Wilde’s <em>The Picture of Dorian Gray</em></a> or <a href="http://www.gutenberg.org/ebooks/6130" target="_blank" rel="noopener" class="gamut-15hd59n-Anchor e14vpv2g0">Homer’s <em>The Iliad!</em></a> Fear not if you haven’t heard or read the novels, one of the beauties of natural language parsing with regular expressions is the ability to gain insight into lengthy pieces of text without a formal read!</p>
<p class="p__1qg33Igem5pAgn4kPMirjw">By the end of this project, you will find out the main topics of discussion in the novel of your choosing and can begin to discern some of the author’s thoughts and beliefs!</p>
</div>

In [6]:
import import_ipynb

from nltk import pos_tag, RegexpParser
from tokenize_words import word_sentence_tokenize
from chunk_counters import np_chunk_counter, vp_chunk_counter

# import text of choice here
text = open('dorian_gray.txt', encoding= 'utf-8').read().lower()
text1 = open('the_iliad.txt', encoding= 'utf-8').read().lower()

# sentence and word tokenize text here
word_tokenized_text = word_sentence_tokenize(text1)
# print(word_tokenized_text)

# store and print any word tokenized sentence here
single_word_tokenized_sentence = word_tokenized_text[0]
print(single_word_tokenized_sentence)

# create a list to hold part-of-speech tagged sentences here
pos_tagged_text = []

# create a for loop through each word tokenized sentence here
for word in word_tokenized_text:
  # part-of-speech tag each sentence and append to list of pos-tagged sentences here
  pos_tagged_text.append(pos_tag(word))

# store and print any part-of-speech tagged sentence here
single_pos_sentence = pos_tagged_text[0]
print(single_pos_sentence)

# define noun phrase chunk grammar here
np_chunk_grammar = 'NP: {<DT>?<JJ>*<NN>}'

# create noun phrase RegexpParser object here
np_chunk_parser = RegexpParser(np_chunk_grammar)

# define verb phrase chunk grammar here
vp_chunk_grammar = 'VP: {<DT>?<JJ>*<NN><VB.*><RB.?>?}'

# create verb phrase RegexpParser object here
vp_chunk_parser = RegexpParser(vp_chunk_grammar)

# create a list to hold noun phrase chunked sentences and a list to hold verb phrase chunked sentences here
np_chunked_text = []
vp_chunked_text = []

# create a for loop through each pos-tagged sentence here
for tag in pos_tagged_text:
  # chunk each sentence and append to lists here
  np_chunked_text.append(np_chunk_parser.parse(tag))
  vp_chunked_text.append(vp_chunk_parser.parse(tag))

# store and print the most common NP-chunks here
most_common_np_chunks = np_chunk_counter(np_chunked_text)
print('\n\n',most_common_np_chunks)

['the', 'iliad', 'of', 'homer', 'translated', 'by', 'alexander', 'pope', ',', 'with', 'notes', 'by', 'the', 'rev', '.', 'theodore', 'alois', 'buckley', ',', 'm.a.', ',', 'f.s.a', '.', 'and', 'flaxman', "'s", 'designs', '.']
[('the', 'DT'), ('iliad', 'NN'), ('of', 'IN'), ('homer', 'NN'), ('translated', 'VBN'), ('by', 'IN'), ('alexander', 'NN'), ('pope', 'NN'), (',', ','), ('with', 'IN'), ('notes', 'NNS'), ('by', 'IN'), ('the', 'DT'), ('rev', 'NN'), ('.', '.'), ('theodore', 'NN'), ('alois', 'NN'), ('buckley', 'NN'), (',', ','), ('m.a.', 'NN'), (',', ','), ('f.s.a', 'NN'), ('.', '.'), ('and', 'CC'), ('flaxman', 'NN'), ("'s", 'POS'), ('designs', 'NNS'), ('.', '.')]


 [((('hector', 'NN'),), 322), ((('i', 'NN'),), 277), ((('jove', 'NN'),), 257), ((('troy', 'NN'),), 208), ((('vain', 'NN'),), 195), ((('war', 'NN'),), 193), ((('son', 'NN'),), 170), ((('thou', 'NN'),), 158), ((('the', 'DT'), ('plain', 'NN')), 157), ((('the', 'DT'), ('field', 'NN')), 154), ((('the', 'DT'), ('ground', 'NN')), 138

##### Analysis for The Picture of Dorian Gray

Looking at most_common_np_chunks, you can identify characters of importance in the text such as henry, harry, dorian gray, and basil, based on their frequency. Additionally another noun phrase the picture appears to be very relevant.

##### Analysis for The Iliad

Looking at most_common_np_chunks, you can identify characters of importance in the text such as hector and jove based on their frequency. Additionally a location of importance, troy, is mentioned often. A theme of war can also implied by its high frequency count.

In [8]:
# store and print the most common VP-chunks here
most_common_vp_chunks = vp_chunk_counter(vp_chunked_text)
print(most_common_vp_chunks)

[((("'t", 'NN'), ('is', 'VBZ')), 19), ((('i', 'NN'), ('am', 'VBP')), 11), ((("'t", 'NN'), ('was', 'VBD')), 11), ((('the', 'DT'), ('hero', 'NN'), ('said', 'VBD')), 9), ((('i', 'NN'), ('know', 'VBP')), 8), ((('i', 'NN'), ('saw', 'VBD')), 8), ((('the', 'DT'), ('scene', 'NN'), ('lies', 'VBZ')), 7), ((('i', 'NN'), ('was', 'VBD')), 6), ((('confess', 'NN'), ("'d", 'VBD')), 6), ((('the', 'DT'), ('scene', 'NN'), ('is', 'VBZ')), 6), ((('view', 'NN'), ("'d", 'VBD')), 5), ((('i', 'NN'), ('felt', 'VBD')), 5), ((('i', 'NN'), ('bear', 'VBP')), 5), ((('hector', 'NN'), ('is', 'VBZ')), 5), ((('vain', 'NN'), ('was', 'VBD')), 5), ((('homer', 'NN'), ('was', 'VBD')), 4), ((('i', 'NN'), ('have', 'VBP')), 4), ((('hunger', 'NN'), ('was', 'VBD')), 4), ((('glory', 'NN'), ('lost', 'VBN')), 4), ((('i', 'NN'), ('see', 'VBP')), 4), ((('war', 'NN'), ('be', 'VB')), 4), ((('the', 'DT'), ('weapon', 'NN'), ('stood', 'VBD')), 4), ((('i', 'NN'), ('go', 'VBP')), 4), ((('the', 'DT'), ('silence', 'NN'), ('broke', 'VBD')), 4),

##### Analysis for The Picture of Dorian Gray

Looking at most_common_vp_chunks, some interesting findings appear. The verb phrases i want, i know and i have occur frequently, indicating a theme of desire and need.

##### Analysis for The Iliad

Looking at most_common_vp_chunks, you can see that verb phrases of the form you defined in your chunk grammar do not appear as often in The Iliad as noun phrases. This can indicate a different style of writing taken by the author that does not follow traditional grammatical style (i.e. poetry). Even when chunks are not found, their absence can give you insight!