Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stemmer for Marathi #697

Closed
the-ethan-hunt opened this issue Feb 18, 2018 · 4 comments
Closed

Stemmer for Marathi #697

the-ethan-hunt opened this issue Feb 18, 2018 · 4 comments
Assignees

Comments

@the-ethan-hunt
Copy link
Contributor

I would be adding the stemmer for Marathi language with the testing file also.

@kylepjohnson
Copy link
Member

Sure, go for it, but could you explain a little bit more about how you will do this.

@the-ethan-hunt
Copy link
Contributor Author

the-ethan-hunt commented Feb 20, 2018

@kylepjohnson , if we compare English and Marathi:

I run : मी दवडतो
I ran: मी दवडला

Here the verb is 'दवड' while the suffixes are 'तो' and 'ला'.
A suffix stripping algorithm is ineffective in English(the example of run and ran as given here); however it very effective in Marathi as suffixes are added to convert a verb form into past, present or the future. Once the suffixes are stripped, we get the verb.
Would love to hear about this approach from you. 😄

This was referenced Feb 27, 2018
@kylepjohnson
Copy link
Member

Hi any work done on this yet? Rule-based stemming is challenging but how it has been done in the past.

For inspiration look at the Stemmer() for Latin, which @lukehollis did some years back.

And you should look at stemmers for other, similar Indian languages. For example, I read that Lucene uses the algorithm described in this paper: http://computing.open.ac.uk/Sites/EACLSouthAsia/Papers/p6-Ramanathan.pdf

There's some been some work on modern Marathi, though the burden would be upon you to prove to us that these would either (a) work on older Marathi or (b) could be adapted by you for pre-modern Marathi.

@the-ethan-hunt
Copy link
Contributor Author

@kylepjohnson , yes a PR is pending for review. Please see PR #719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants