# Prologue

- In this project I will perform a natural language parsing analysis to gain deeper insight into one of two famous and often discussed novels in the public domain: Oscar Wilde’s <i>Homer’s The Iliad</i>! 

# Importing Modules 

In [3]:
import re
import nltk

# Importing Text

In [1]:
text = open("the_illiad.txt", encoding = 'utf-8').read().lower()

# Sentence Tokenization

In [6]:
from nltk.tokenize import sent_tokenize

sentences = sent_tokenize(text)

# Word Tokenization

In [8]:
from nltk.tokenize import word_tokenize

tokens = [word_tokenize(sentence) for sentence in sentences]

# P-O-S Tagging

In [10]:
from nltk import pos_tag

tagged = [pos_tag(token) for token in tokens]

In [14]:
print(tagged[0])

[('the', 'DT'), ('iliad', 'NN'), ('of', 'IN'), ('homer', 'NN'), ('translated', 'VBN'), ('by', 'IN'), ('alexander', 'NN'), ('pope', 'NN'), (',', ','), ('with', 'IN'), ('notes', 'NNS'), ('by', 'IN'), ('the', 'DT'), ('rev', 'NN'), ('.', '.')]


# Parsing

## NP-Chunking

In [15]:
NP_ChunkGram = "NP: {<DT>?<JJ>*<NN>}"

In [19]:
from nltk import RegexpParser

NP_ChunkParser = RegexpParser(NP_ChunkGram)

In [22]:
NP_ChunkedText = [NP_ChunkParser.parse(sentence) for sentence in tagged]

## VP-Chunking

In [20]:
VP_ChunkGram = "VP: {<DT>?<JJ>*<NN><VB.*><RB.?>?}"

In [21]:
VP_ChunkParser = RegexpParser(VP_ChunkGram)

In [23]:
VP_ChunkedText = [VP_ChunkParser.parse(sentence) for sentence in tagged]

# Discovering Insights

In [27]:
from Chunk_Counter import np_chunk_counter, vp_chunk_counter

np_chunk_counter(NP_ChunkedText)

[((('hector', 'NN'),), 322),
 ((('i', 'NN'),), 274),
 ((('jove', 'NN'),), 257),
 ((('troy', 'NN'),), 208),
 ((('vain', 'NN'),), 195),
 ((('war', 'NN'),), 193),
 ((('son', 'NN'),), 170),
 ((('thou', 'NN'),), 158),
 ((('the', 'DT'), ('plain', 'NN')), 157),
 ((('the', 'DT'), ('field', 'NN')), 154),
 ((('the', 'DT'), ('ground', 'NN')), 138),
 ((('death', 'NN'),), 134),
 ((('hand', 'NN'),), 134),
 ((('greece', 'NN'),), 128),
 ((('heaven', 'NN'),), 127),
 ((('fate', 'NN'),), 127),
 ((('thee', 'NN'),), 122),
 ((('breast', 'NN'),), 121),
 ((('the', 'DT'), ('trojan', 'NN')), 120),
 ((('the', 'DT'), ('god', 'NN')), 119),
 ((('the', 'DT'), ('war', 'NN')), 117),
 ((('the', 'DT'), ('greeks', 'NN')), 116),
 ((('blood', 'NN'),), 115),
 ((('homer', 'NN'),), 112),
 ((('the', 'DT'), ('king', 'NN')), 105),
 ((('rage', 'NN'),), 103),
 ((('force', 'NN'),), 103),
 ((('care', 'NN'),), 99),
 ((('head', 'NN'),), 98),
 ((('man', 'NN'),), 97)]

In [28]:
vp_chunk_counter(VP_ChunkedText)

[((("'t", 'NN'), ('is', 'VBZ')), 19),
 ((('i', 'NN'), ('am', 'VBP')), 11),
 ((("'t", 'NN'), ('was', 'VBD')), 11),
 ((('the', 'DT'), ('hero', 'NN'), ('said', 'VBD')), 9),
 ((('i', 'NN'), ('know', 'VBP')), 8),
 ((('i', 'NN'), ('saw', 'VBD')), 8),
 ((('the', 'DT'), ('scene', 'NN'), ('lies', 'VBZ')), 7),
 ((('i', 'NN'), ('was', 'VBD')), 6),
 ((('confess', 'NN'), ("'d", 'VBD')), 6),
 ((('the', 'DT'), ('scene', 'NN'), ('is', 'VBZ')), 6),
 ((('view', 'NN'), ("'d", 'VBD')), 5),
 ((('i', 'NN'), ('felt', 'VBD')), 5),
 ((('i', 'NN'), ('bear', 'VBP')), 5),
 ((('hector', 'NN'), ('is', 'VBZ')), 5),
 ((('vain', 'NN'), ('was', 'VBD')), 5),
 ((('homer', 'NN'), ('was', 'VBD')), 4),
 ((('i', 'NN'), ('have', 'VBP')), 4),
 ((('hunger', 'NN'), ('was', 'VBD')), 4),
 ((('glory', 'NN'), ('lost', 'VBN')), 4),
 ((('i', 'NN'), ('see', 'VBP')), 4),
 ((('war', 'NN'), ('be', 'VB')), 4),
 ((('the', 'DT'), ('weapon', 'NN'), ('stood', 'VBD')), 4),
 ((('i', 'NN'), ('go', 'VBP')), 4),
 ((('the', 'DT'), ('silence', 'NN'),

- Looking at most common np chunks, you can identify characters of importance in the text such as hector and jove based on their frequency. 
- Additionally a location of importance, troy, is mentioned often.
- A theme of war can also implied by its high frequency count.

- Looking at most_common vp chunks, you can see that verb phrases of the form you defined in your chunk grammar do not appear as often in The Iliad as noun phrases. 
- This can indicate a different style of writing taken by the author that does not follow traditional grammatical style (i.e. poetry). 
- Even when chunks are not found, their absence can give you insight!