Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

поши, пошиең #33

Open
mansayk opened this issue Jan 30, 2019 · 14 comments
Open

поши, пошиең #33

mansayk opened this issue Jan 30, 2019 · 14 comments
Assignees
Labels
bug Something isn't working lexc twol

Comments

@mansayk
Copy link
Member

mansayk commented Jan 30, 2019

Hi!

The correct 3rd person possession form of "поши" is "пошие" (пошиең, пошиемны...). But currently it is processed as "пошисы", what I think is not correct, @IlnarSelimcan, right?

It also applies to other words ending with "и": тарихи (тарихиена, тарихисы?), бриджи (бриджиен, бриджисы?), гади (гадиен), песи (песиең), ралли (раллиена, раллисы?), verb ки (киеп)...

The form -сы is also used in some cases: абый абые абыйсы, сеңел сеңеле сеңлесе, зур зурысы...

@mansayk mansayk added bug Something isn't working twol labels Jan 30, 2019
@jonorthwash
Copy link
Member

Some of these will require changes to the lexc also. Could you create a yaml file for these in tests/morphophonology/ (maybe something like final-i.yaml)?

@mansayk
Copy link
Member Author

mansayk commented Jan 30, 2019

I already modified lexc. What else can I do there? Could you please check twol rules?
Ok, I will try to create yaml later.

@mansayk
Copy link
Member Author

mansayk commented Jan 30, 2019

I created that file with some rules:
c1f843a

@jonorthwash
Copy link
Member

Could you add the абый, сеңел, and зур examples too? For analyses that have multiple "correct" forms, you can put two lines.

@mansayk
Copy link
Member Author

mansayk commented Jan 30, 2019

Done

@mansayk
Copy link
Member Author

mansayk commented Feb 1, 2019

@jonorthwash could you fix twol rules according to this table, please?

Base form Current Correct
поши пошисы пошие
тарихи тарихисы тарихие
гади гадисе гадие
песи песисе песие
ки - киеп
музей музейе музее
ансамбль - ансамбле
гаеп гаепы гаебе
гает гаеты гаете
каникул каникулләр каникуллар
суд судга судка
ю - юа
җәмгыять җәмгыятьны җәмгыятьне

@IlnarSelimcan
Copy link
Member

Hi!

The correct 3rd person possession form of "поши" is "пошие" (пошиең, пошиемны...). But currently it is processed as "пошисы", what I think is not correct, @IlnarSelimcan, right?

Right, "пошие" is the correct form.

It also applies to other words ending with "и": тарихи (тарихиена, тарихисы?), бриджи (бриджиен, бриджисы?), гади (гадиен), песи (песиең), ралли (раллиена, раллисы?), verb ки (киеп)...

The form -сы is also used in some cases: абый абые абыйсы, сеңел сеңеле сеңлесе, зур зурысы...

IlnarSelimcan added a commit that referenced this issue Feb 15, 2019
@jonorthwash
Copy link
Member

The form -сы is also used in some cases: абый абые абыйсы, сеңел сеңеле сеңлесе, зур зурысы...

You're saying that there are multiple correct realisations for these possessive forms, right? If so, which should be the default (at least for абый and сеңел)?

@mansayk
Copy link
Member Author

mansayk commented Jun 1, 2019

According to Tatar orthographic dictionary (2017) - абые,
Tatar explanatory dictionary (2013) - both абые and абыйсы.

I don't know how to choose the default one :) Personally I use абыйсы, сеңлесе. @IlnarSelimcan what do you think?

@mansayk
Copy link
Member Author

mansayk commented Jun 1, 2019

Maybe the default one should be that made according to rules: абые in this case?

@jonorthwash
Copy link
Member

jonorthwash commented Jun 1, 2019

Maybe the default one should be that made according to rules

I think it should be the most common one, whichever that is. It'll be just as hard to implement either way—it's just a matter of which entry isn't included in the generation transducer.

@mansayk
Copy link
Member Author

mansayk commented Jun 2, 2019

To be frank, I cannot decide which one is the most common: абые or абыйсы. I suppose @IlnarSelimcan thinks the same way. The only possibility is to use, for example, some corpus. According to the Corpus of Written Tatar consisting of ~350 mln word occurrences:
абые* - 12,998 occurrences
абыйсы* - 20,067 occurrence
Despite the second word does not obey grammatical rules, it is used significantly more often. So we should choose it for the default?

@jonorthwash
Copy link
Member

Maybe a better criterion than "which is used more often?" would be "which do native speakers find least unexpected?"

@mansayk, you use абыйсы and it's more common in the corpus—which would argue in favour of that form. You say you don't consider абые wrong, but is does it feel in any way less common? Like, when you encounter it in text, does it feel jarring, old-fashioned, or somehow unusual? That would be the best argument for going with абыйсы as the default.

@IlnarSelimcan, we're still waiting for your input on this.

@mansayk
Copy link
Member Author

mansayk commented Jun 3, 2019

you use абыйсы and it's more common in the corpus—which would argue in favour of that form.

Ok, I agree, I will take it into account next time. But the second form doesn't feel less common in any way to me: it is not old-fashioned, it is definitely not unusual, it doesn't feel jarring and as I pointed before it is used as the only form in Tatar orthographic dictionary (2017)... Personally I consider it just as an alternative form, some kind of absolute synonym. @IlnarSelimcan we really need your help here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lexc twol
Projects
None yet
Development

No branches or pull requests

3 participants