-
Generate 5 grams from corpus
-
Generate long-log plot of frequency vs rank order
-
Show whether 5-grams follow Zipf's law and approximate value of alpha
-
Working of parse.py
- Uses parallel processing to digest the corpus and generate the 5-gram files
- Creates a master_dict file that contains an inverted frequency tuple list
- plots the freq vs rank order in log scale using the matplotlib library
-
To Do
- effectively use the multiprocessing functionality to build the master_dict file the reduce part
- try using cluster of computers, probably a cloud to digest data
This repository has been archived by the owner on Dec 10, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
exploring n gram
License
fxdpntthm/info-ret
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
exploring n gram
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published