bennylin/indostemmer
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# indostemmer Indonesian Language Word Stemmer ------------------------------------------------------------------------ This is the project README file. Here, you should describe your project. Tell the reader (someone who does not know anything about this project) all he/she needs to know. The comments should usually include at least: ------------------------------------------------------------------------ PROJECT TITLE: Indo Porter Stemmer PURPOSE OF PROJECT: Indonesian Words Stemmer VERSION or DATE: April 2009 HOW TO START THIS PROJECT: AUTHORS: Benny Lin (@bennylin) USER INSTRUCTIONS: This project was made way back in 2009 for a single purpose. Now after serving that purpose well enough for 6 years, I'm releasing the sourcecode to be used by others. At that time there were no other Indonesian Stemmer good enough for our needs, so I built this using Porter Stemmer (English), and it was optimized for a single corpus of text. As far as I remember, it worked with Lucene and SOLR, and you might need to download other repos to compile it properly. My main contribution can be found in IndoStemmer.java. Since then (2009) I have never touched it again. I'm sure it could be improved somewhat, and if you're planning to do that, please do let me know. My final comment is: Indonesian affixes are very complex, typical for Austronesian languages, but still nowhere as complex as Javanese. My hope is someday I can build a Javanese Stemmer. It's been fun doing this project. Hope this code can be useful for you too. Solo, April 2015 Benny Lin
About
Indonesian Language Dictionaryless-Word-Stemmer using Porter Stemmer
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published