Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kazakh language wrong result #124

Closed
pas-valkov opened this issue Apr 2, 2021 · 3 comments
Closed

Kazakh language wrong result #124

pas-valkov opened this issue Apr 2, 2021 · 3 comments

Comments

@pas-valkov
Copy link

Evaluated example from readme for kazakh language with no error, but result is wrong. English language works fine.

Expected Result

I expected a list of words with their correct normalized form but for every word normalization form consists of only 1 letter.

Actual Result

[[1 Алтай а NOUN adj Case=Gen 2 nmod:poss _ _,
2 жерінің ж NOUN n _ 3 obl _ _,
3 асты а VERB adj _ 4 nsubj _ _,
4 қандай қ PRON adv _ 5 nsubj _ _,
5 қазыналы қ VERB n _ 0 root _ _,
6 болса б VERB v Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin 5 cop _ SpaceAfter=No,
7 . . PUNCT sent _ 5 punct _ _],
[1 Ағаш а VERB v Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin 0 root _ SpaceAfter=No,
2 . . PUNCT sent _ 1 punct _ SpaceAfter=No]]

Reproduction Steps

from cube.api import Cube # import the Cube object
cube=Cube(verbose=True) # initialize it
cube.load("kk") # select the desired language (it will auto-download the model on first run)
text="Алтай жерінің асты қандай қазыналы болса. Ағаш."
sentences=cube(text) # call with your own text (string) to obtain the annotations
sentences

System Information

  • Python version 3.6.12
  • Operating system Ubuntu 20.04
@tiberiu44
Copy link
Contributor

Hi @pas-valkov - I'm really sorry for the late response. I'm updating the models for 3.0 right now and hopefully it will fix the issue. Sorry again, I don't know how I missed this issue. I will let you know as soon as it's fixed.

@tiberiu44
Copy link
Contributor

I've just uploaded the updated model. Take into consideration that Kazakh is a really small treebank in UD and the system will not have a high accuracy.

@pas-valkov
Copy link
Author

thanks for your reply! Better late than never) This issue was fixed by your new thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants