This is a web parsing code that specifically designed on hamlet. Not verry efficent because of getFreq() iterates through whole words every time. Fast enough to handle for some part of the book
-
Reads the text of the game Hamlet by W. Shakespeare from the URL http://shakespeare.mit.edu/hamlet/full.html
-
Parses text into separate words
-
Calculates the frequency of each word, ie. counts how many times a word is contained in the text,
-
Prints a table of the 20 most common words and their number of occurrences.
The workspace contains two folders by default, where:
src
: the folder to maintain sourceslib
: the folder to maintain dependencies (MISSING)
I do not own the library files. You need to have Html Unit library to run this code. Please refer to the owner of the library