New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Stemmer for Marathi #719
Conversation
@kylepjohnson , the Travis CI informs me of the issue as: Is there something I am missing? 😅 |
@the-ethan-hunt IMO , you haven't synced your fork's master after this merge . |
cltk/tests/test_stem.py
Outdated
@@ -16,6 +16,8 @@ | |||
from cltk.stem.akkadian.stem import Stemmer as AkkadianStemmer | |||
from cltk.stem.akkadian.syllabifier import Syllabifier as AkkadianSyllabifier | |||
from cltk.stem.french.stem import stem | |||
from cltk.stem.marathi.stem import stem | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this blank line
cltk/tests/test_stem.py
Outdated
target="मी वाच आहे" | ||
self.assertEqual(stemmed_text,target) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this same for this and the following
cltk/stem/marathi/stem.py
Outdated
|
||
return word | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same for this and the following
cltk/corpus/swadesh.py
Outdated
swadesh_syc=['ܐܢܐ','ܐܢܬ, ܐܢܬܝ', 'ܗܘ', 'ܚܢܢ,, ܐܢܚܢܢ', 'ܐܢܬܘܢ , ܐܢܬܝܢ ', 'ܗܢܘܢ , ܗܢܝܢ', 'ܗܢܐ, ܗܕܐ', 'ܗܘ, ܗܝ', 'ܗܪܟܐ', 'ܬܡܢ', 'ܡܢ', 'ܡܐ, ܡܢ, ܡܢܐ, ܡܘܢ', 'ܐܝܟܐ', 'ܐܡܬܝ', 'ܐܝܟܢ,, ܐܝܟܢܐ', 'ܠܐ', 'ܟܠ', 'ܣܓܝ ', 'ܟܡܐ ', 'ܒܨܝܪܐ', 'ܐܚܪܢܐ, ܐܚܪܬܐ', 'ܚܕ , ܚܕܐ', 'ܬܪܝܢ, ܬܪܬܝܢ', 'ܬܠܬܐ, ܬܠܬ', 'ܐܪܒܥܐ, ܐܪܒܥ', 'ܚܡܫܐ, ܚܡܫ', 'ܪܒܐ, ܟܒܝܪܐ ', 'ܐܪܝܟܐ', 'ܪܘܝܚܐ, ܦܬܝܐ', 'ܥܒܝܛܐ', 'ܢܛܝܠܐ, ܝܩܘܪܐ ', 'ܙܥܘܪܐ', 'ܟܪܝܐ', 'ܥܝܩܐ', 'ܪܩܝܩܐ, ܛܠܝܚܐ', 'ܐܢܬܬܐ', 'ܓܒܪܐ', 'ܐܢܫܐ', 'ܝܠܘܕܐ', 'ܐܢܬܬܐ', 'ܒܥܠܐ', 'ܐܡܐ', 'ܐܒܐ', 'ܚܝܘܬܐ', 'ܢܘܢܐ', 'ܛܝܪܐ, ܨܦܪܐ', 'ܟܠܒܐ', 'ܩܠܡܐ', 'ܚܘܝܐ', 'ܬܘܠܥܐ', 'ܐܝܠܢܐ', 'ܥܒܐ', 'ܩܝܣܐ', 'ܦܐܪܐ', 'ܙܪܥܐ', 'ܛܪܦܐ', 'ܫܪܫܐ ', 'ܩܠܦܬܐ', 'ܗܒܒܐ', 'ܓܠܐ', 'ܚܒܠܐ', 'ܓܠܕܐ ', 'ܒܣܪܐ', 'ܕܡܐ', 'ܓܪܡܐ', 'ܕܗܢܐ, ܫܘܡܢܐ', 'ܒܝܥܬܐ', 'ܩܪܢܐ', 'ܕܘܢܒܐ', 'ܐܒܪܐ', 'ܣܥܪܐ', 'ܪܝܫܐ', 'ܐܕܢܐ', 'ܥܝܢܐ', 'ܢܚܝܪܐ ', 'ܦܘܡܐ', 'ܫܢܐ, ܟܟܐ', 'ܠܫܢܐ', 'ܛܦܪܐ ', 'ܥܩܠܐ', 'ܪܓܠܐ', 'ܒܘܪܟܐ', 'ܐܝܕܐ', 'ܟܢܦܐ ', 'ܒܛܢܐ, ܟܪܣܐ ', 'ܡܥܝܐ, ܓܘܐ', 'ܨܘܪܐ, ܩܕܠܐ', 'ܚܨܐ, ܒܣܬܪܐ', 'ܚܕܝܐ', 'ܠܒܐ', 'ܟܒܕܐ', 'ܫܬܐ', 'ܐܟܠ', 'ܢܟܬ', 'ܡܨ ', 'ܪܩ', 'ܓܥܛ', 'ܢܦܚ', 'ܢܦܫ, ܢܫܡ', 'ܓܚܟ ', 'ܚܙܐ', 'ܫܡܥ', 'ܝܕܥ', 'ܚܫܒ', 'ܡܚ, ܣܩ', 'ܕܚܠ, ܟܘܪ', 'ܕܡܟ', 'ܚܝܐ ', 'ܡܝܬ', 'ܩܛܠ', 'ܟܬܫ', 'ܨܝܕ ', 'ܡܚܐ, ܢܩܫ', 'ܓܕܡ, ܩܛܥ', 'ܫܪܩ, ܦܕܥ, ܦܪܬ', 'ܕܓܫ', 'ܚܟ, ܣܪܛ', 'ܚܦܪ', 'ܣܚܐ', 'ܦܪܚ ', 'ܗܠܟ ', 'ܐܬܐ ', 'ܫܟܒ, ܡܟ', 'ܝܬܒ', 'ܬܪܨ', 'ܦܢܐ, ܥܛܦ ', 'ܢܦܠ', 'ܝܗܒ, ܢܬܠ', 'ܐܚܕ', 'ܩܡܛ, ܥܨܪ', 'ܫܦ, ܚܟ', 'ܚܠܠ, ܦܝܥ', 'ܟܦܪ', 'ܓܪܫ', 'ܙܥܦ ', 'ܪܡܐ', 'ܐܣܪ, ܩܛܪ', 'ܚܝܛ', 'ܡܢܐ', 'ܐܡܪ', 'ܙܡܪ', 'ܫܥܐ', 'ܛܦ', 'ܪܣܡ, ܫܚܠ', 'ܓܠܕ, ܩܪܫ', 'ܙܘܐ, ܥܒܐ', 'ܫܡܫܐ', 'ܣܗܪܐ', 'ܟܘܟܒܐ', 'ܡܝܐ ', 'ܡܛܪܐ', 'ܢܗܪܐ', 'ܝܡܬܐ', 'ܝܡܐ', 'ܡܠܚܐ ', 'ܟܐܦܐ, ܐܒܢܐ, ܫܘܥܐ', 'ܚܠܐ', 'ܐܒܩܐ, ܕܩܬܐ', 'ܐܪܥܐ', 'ܥܢܢܐ, ܥܝܡܐ, ܥܝܒܐ', 'ܥܪܦܠܐ ', 'ܫܡܝܐ', 'ܪܘܚܐ ', 'ܬܠܓܐ', 'ܓܠܝܕܐ', 'ܬܢܢܐ ', 'ܢܘܪܐ, ܐܫܬܐ', 'ܩܛܡܐ ', 'ܝܩܕ', 'ܐܘܪܚܐ', 'ܛܘܪܐ', 'ܣܘܡܩܐ', 'ܝܘܪܩܐ', 'ܫܥܘܬܐ', 'ܚܘܪܐ', 'ܐܘܟܡܐ ', 'ܠܠܝܐ ', 'ܝܘܡܐ ', 'ܫܢܬܐ', 'ܫܚܝܢܐ', 'ܩܪܝܪܐ', 'ܡܠܝܐ', 'ܚܕܬܐ', 'ܥܬܝܩܐ', 'ܛܒܐ', 'ܒܝܫܐ', 'ܒܩܝܩܐ ܚܪܝܒܐ', 'ܫܘܚܬܢܐ', 'ܬܪܝܨܐ ', 'ܚܘܕܪܢܝܐ', 'ܚܪܝܦܐ', 'ܩܗܝܐ', 'ܦܫܝܩܐ', 'ܪܛܝܒܐ, ܬܠܝܠܐ', 'ܝܒܝܫܐ', 'ܬܪܝܨܐ ', 'ܩܪܝܒܐ', 'ܪܚܝܩܐ', 'ܝܡܝܢܐ', 'ܣܡܠܐ', 'ܒ-, ܠܘܬ', 'ܥܡ', 'ܐܢ', '-ܡܛܠ ܕ, ܒܥܠܬ', 'ܫܡܐ'] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same
@LBenzahia does this look good now? 😄 |
What about this travis error |
@LBenzahia , I have correct the error. It was the change of name of a package of a PR pulled after this one. 😅 |
@kylepjohnson , could you please have a look at this, #687 and #706 ? 😅 |
Hi @the-ethan-hunt , are still looking at this? What about if we could also add lemmatizer after this? Let me know your thoughts. |
@maheshbhosale , the PR is yet to be reviewed by the maintainers. 😅 . We can work on the lemmatizer after that. |
Cool, I will watch out when it gets merged. |
@the-ethan-hunt I know this PR this is very old -- it never got merged, if I recall correctly, because of several merge conflicts. For the stemmer, we need to know more about how it works and at least some idea of its accuracy. For example, is it based on a known algorithm for other Indian languages? If it is simply stripping off suffixes ( Also, the OF swadesh shouldn't be included here. This would need to be reviewed in a separate PR. |
In context to #697 , a suffix stripping algorithm has been used for a stemmer.
stem/marathi/stem.py
marathi.rst
The previous PR was closed due to a mistake of branches. 😅