Added Stemmer for Spanish #77

Merged
merged 5 commits into from Nov 20, 2012

Projects

None yet

3 participants

Contributor
dav009 commented Nov 19, 2012

Based on: http://stemmer-es.sourceforge.net/ I re-implemented a stemmer for spanish.
Using as well the test corpora given in http://stemmer-es.sourceforge.net/ for asessing the quality of the Stemmer.

@chrisumbel chrisumbel merged commit c82c096 into NaturalNode:master Nov 20, 2012
Owner

Outstanding, thanks!

Contributor
dav009 commented Nov 20, 2012

should I add an example in the readme.md ?.
I followed the patterns given in the framework so it is prettymuch chaning the 'Ru' for 'Es' in the Russian Example ;).

Owner

Please do.

I'm hoping to get a release out this weekend.

On 11/20/2012 07:02 PM, David Przybilla wrote:

should I add an example in the readme.md ?.
I followed the patterns given in the framework so it is prettymuch
chaning the 'Ru' for 'Es' in the Russian Example ;).


Reply to this email directly or view it on GitHub
#77 (comment).

Hi, is this the expected behavior?

console.log(natural.PorterStemmerEs.tokenizeAndStem('cebolla morada'));
console.log(natural.PorterStemmerEs.tokenizeAndStem('cebollas morada'));
console.log(natural.PorterStemmerEs.tokenizeAndStem('cebollas moradas'));

// output
[ 'ceboll', 'mor' ]
[ 'ceboll', 'mor' ]
[ 'ceboll', 'morad' ]

I don't seem to get consistent results.
Also, by any chance, do you know any spanish inflector? I just need to singularize words and produce valid words.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment