Skip to content

sinujohn/ngram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Developer:
            Sinu John
            sinuvian at gmail dot com
            www.sinujohn.wordpress.com
            
1. USAGE
	Usage:
		java -jar ngram.jar sourcefile [--debug]

	To run ngram.jar :
		java -jar ngram.jar sourcefile
				Here 'sourcefile' is the text file from which trigrams are first obtained.

	To run in DEBUG mode (be careful if the trigrams are large in number it could printout loads of output):
		java -jar ngram.jar sourcefile --debug
		
2. OBJECTIVE
	Write a program which first collects all word-level trigrams in a novel-length story. Then, choosing any arbitrary trigram as the starting point, generate a short story which is between 200 to 300 words. The next trigram in the chain must always have the first two words same as that of the previous trigrams last two words.
What is a trigram you ask? The sentence "The quick red fox jumps over the lazy brown dog" has the following trigrams - "The quick red", "quick red fox", "red fox jumps", "fox jumps over", "jumps over the", "over the lazy", "the lazy brown", "lazy brown dog".
Hint: Project Gutenberg is a good place to find a lot of plain text books
		
3. DESIGN
	The trigrams are stored using HashMaps and ArrayLists. The storage can be viewed as 3 levels. The first 2 levels are implemented using HashMaps and the 3rd level is implemented using ArrayLists.
	The Level 1 has all the words once. Each of these words act as as Key to obtain its corresponding value which is another HashMap. By following the key (a word) in Level 1 we can obtain another HashMap which contains all the words which are preceded by the key word of Level 1. This HashMap forms the Level 2.
	In Level 2, the keys consist of all those words which are successors to the key word of Level 1. Each key of Level 2 has a corresponding value which is made up of by the ArrayList, and this is Level 3.
	Level 3 contains all the words which are successors of the key word in Level 2, which is in turn the successor of key word in Level 1.

	To illustrate a small portion of this trigram storage, consider the following trigrams (which is incomplete ofcourse):
Trigrams: (a b c), (a c b), (a c e), (b a b)

		Level 1		Level 2 	Level 3
		   a--------------b---------------c
		           |------c---------------b
				 	   |------e

		   b--------------a---------------b

For another example please see "block.jpg".


About

Creates a random story out of the input file containing a novel-sized story

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages