-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stemmer for Marathi #697
Comments
Sure, go for it, but could you explain a little bit more about how you will do this. |
@kylepjohnson , if we compare English and Marathi:
Here the verb is 'दवड' while the suffixes are 'तो' and 'ला'. |
Hi any work done on this yet? Rule-based stemming is challenging but how it has been done in the past. For inspiration look at the Stemmer() for Latin, which @lukehollis did some years back. And you should look at stemmers for other, similar Indian languages. For example, I read that Lucene uses the algorithm described in this paper: http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf There's some been some work on modern Marathi, though the burden would be upon you to prove to us that these would either (a) work on older Marathi or (b) could be adapted by you for pre-modern Marathi. |
@kylepjohnson , yes a PR is pending for review. Please see PR #719 |
I would be adding the stemmer for Marathi language with the testing file also.
The text was updated successfully, but these errors were encountered: