Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS tags are not correct #2

Open
maryam-foradi opened this issue Feb 8, 2016 · 8 comments
Open

POS tags are not correct #2

maryam-foradi opened this issue Feb 8, 2016 · 8 comments

Comments

@maryam-foradi
Copy link

For some words it doesn't refer to anything:
http://services.perseids.org/pysvc/morphologyservice/analysis/word?word=%D9%84%D8%B7%D9%81&lang=per&engine=hazm
For some gives wrong POS:
http://services.perseids.org/pysvc/morphologyservice/analysis/word?word=%D8%A8%DA%AF%D9%88&lang=per&engine=hazm
It refers to noun, although بگو is a verb.

@balmas
Copy link
Contributor

balmas commented Feb 8, 2016

I'm not sure if this is a problem with the way we are using hazm or something else.

the service calls tagger.tag(بگو) and gets back:
[('ب', 'N'), ('گ', 'N'), ('و', 'CONJ')]

@elijahjcooke any thoughts?

@elijahjcooke
Copy link
Member

So the problem is for some reason Hazm is not tokenizing the text correctly. Hazm should break the text into sentences and then break it into words but for some reason is breaking the individual characters apart instead of the words.
Maryam does it happen when you send texts with multiple sentences or does it only happen when you send single sentences or single words? This will help in trying to find how to fix the tokenizing bug we are getting.

@balmas
Copy link
Contributor

balmas commented Feb 10, 2016

Arethusa currently only sends single words to the parser, not entire sentences.

@maryam-foradi
Copy link
Author

I haven't tried it with multiple sentences, as it makes the treebanking
complicated, if not impossible.

On Wed, Feb 10, 2016 at 2:22 PM, Bridget Almas notifications@github.com
wrote:

Arethusa currently only sends single words to the parser, not entire
sentences.


Reply to this email directly or view it on GitHub
#2 (comment)
.

@elijahjcooke
Copy link
Member

Ok then I might know a fix to the problem, @balmas Will Arethusa be automatically updated if change the code on github?

@elijahjcooke
Copy link
Member

@balmas

@balmas
Copy link
Contributor

balmas commented Feb 25, 2016

Thanks! I'll deploy tomorrow!

@balmas
Copy link
Contributor

balmas commented Feb 26, 2016

ah, sorry misunderstood the question here .. the morphology service api will not be automatically updated but I'm happy to deploy for testing when you're ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants