Skip to content

bennylin/indostemmer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

# indostemmer
Indonesian Language Word Stemmer
------------------------------------------------------------------------
This is the project README file. Here, you should describe your project.
Tell the reader (someone who does not know anything about this project)
all he/she needs to know. The comments should usually include at least:
------------------------------------------------------------------------

PROJECT TITLE: Indo Porter Stemmer
PURPOSE OF PROJECT: Indonesian Words Stemmer
VERSION or DATE: April 2009
HOW TO START THIS PROJECT:
AUTHORS: Benny Lin (@bennylin)
USER INSTRUCTIONS:

This project was made way back in 2009 for a single purpose.
Now after serving that purpose well enough for 6 years, I'm
releasing the sourcecode to be used by others. 

At that time there were no other Indonesian Stemmer good enough
for our needs, so I built this using Porter Stemmer (English), 
and it was optimized for a single corpus of text. 

As far as I remember, it worked with Lucene and SOLR, and you 
might need to download other repos to compile it properly.
My main contribution can be found in IndoStemmer.java.

Since then (2009) I have never touched it again. I'm sure it 
could be improved somewhat, and if you're planning to do that, 
please do let me know.

My final comment is: Indonesian affixes are very complex, 
typical for Austronesian languages, but still nowhere as complex
as Javanese. My hope is someday I can build a Javanese Stemmer.

It's been fun doing this project. Hope this code can be useful
for you too.


Solo, April 2015
Benny Lin

About

Indonesian Language Dictionaryless-Word-Stemmer using Porter Stemmer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages