Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kashida/Tatweel check still too aggressive #8228

Open
2 tasks done
bgo-eiu opened this issue Oct 16, 2022 · 4 comments
Open
2 tasks done

Kashida/Tatweel check still too aggressive #8228

bgo-eiu opened this issue Oct 16, 2022 · 4 comments
Labels
backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.

Comments

@bgo-eiu
Copy link
Contributor

bgo-eiu commented Oct 16, 2022

Describe the issue

This follows up on #6877 in which exceptions were added for certain Arabic prepositions. While it was pointed out that there are limited number of these in Arabic, most languages which use Arabic script characters are not Arabic.

Uses cases of tatweel/kashida which should be permitted:

  • Arabic-based scripts have a number of combining characters and diacritics for which the tatweel is used as a "holder" in examples and illustrations, as in ــ٘ـ to highlight the form of the ghunna marker (used in Pakistani languages) without attaching it to surrounding characters. This may come up in translating documentation or applications which have language-specific considerations. Tatweel/kashida + any combining mark should be permitted, as well as tatweel/kashida + combining mark with tatweel/kashida on either side (easier to read with some buffer space around it).

  • Any sequence ending in a single tatweel/kashida followed by a spacing character, punctuation, or a combining mark should be permitted. An example I came across is translating an app which has a "Mo Tu We Th Fr Sa Su" header. Weekday abbreviations are not typically used in Punjabi, but there is no space in this context to use full words. So a workaround is مـ or اتـ for example to abbreviate the weekdays in a way that is more legible than just putting the isolated form of each letter. Any number of characters should be permitted before the kashida/tatweel for this, since in many alphabets a combination of multiple characters is required to represent a single sound (for example دھ or ای).

Really what the original kashida/tatweel check was likely trying to prevent is strings like this:

صـــفــــحـــــے

This is fair enough, but the check should be limited to [actual letter] + tatweel/kashida(s) + [actual letter] so that errors are not thrown for other contexts which have more valid use cases.

I already tried

  • I've read and searched the documentation.
  • I've searched for similar issues in this repository.

Steps to reproduce the behavior

No response

Expected behavior

No response

Screenshots

No response

Exception traceback

No response

How do you run Weblate?

weblate.org service

Weblate versions

No response

Weblate deploy checks

No response

Additional context

No response

@nijel nijel added enhancement Adding or requesting a new feature. help wanted Extra attention is needed. backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. labels Oct 17, 2022
@github-actions
Copy link

This issue has been added to the backlog. It is not scheduled on the Weblate roadmap, but it eventually might be implemented.

In case you need this feature soon, please consider helping or push it by funding the development.

@triallax
Copy link
Contributor

triallax commented Mar 5, 2023

@bgo-eiu this is a fair point, I failed to consider in my original issue that Arabic is not the only language using the Arabic script. Sorry for that. :)

@nijel
Copy link
Member

nijel commented Mar 6, 2023

This should be easy to fix, it's just a matter of fixing the regular expression:

kashida_regex = (
# Allow kashida after certain letters
"(?<![\u0628\u0643\u0644])"
# List of kashida letters to check
"[\u0640\uFCF2\uFCF3\uFCF4\uFE71\uFE77\uFE79\uFE7B\uFE7D\uFE7F]"
)

@nijel nijel added good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. labels Mar 6, 2023
@github-actions
Copy link

github-actions bot commented Mar 6, 2023

This issue seems to be a good fit for newbie contributors. You are welcome to contribute to Weblate! Don't hesitate to ask any questions you would have while implementing this.

You can learn about how to get started in our contributors documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. enhancement Adding or requesting a new feature. good first issue Opportunity for newcoming contributors. hacktoberfest This is suitable for Hacktoberfest. Don’t try to spam. help wanted Extra attention is needed.
Projects
None yet
Development

No branches or pull requests

3 participants