forked from turian/pytextpreprocess
-
Notifications
You must be signed in to change notification settings - Fork 0
Munduruca/pytextpreprocess
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
pytextpreprocess ================ written by Joseph Turian released under a BSD license Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.) REQUIREMENTS: * My Python common library: http://github.com/turian/common and sub-requirements thereof. * NLTK, for word tokenization e.g. apt-get install python-nltk * Splitta if you want to sentence tokenize The English stoplist is from: http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop However, I added words at the top (above "a").
About
Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Python 100.0%