Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOCALIZATION REQUEST: #3266

Closed
alvynabranches opened this issue Sep 20, 2021 · 55 comments
Closed

LOCALIZATION REQUEST: #3266

alvynabranches opened this issue Sep 20, 2021 · 55 comments
Assignees
Labels
Localisation New language requests and or issues regarding localisation (l10n)

Comments

@alvynabranches
Copy link
Contributor

Use this template to request a new localizable language that is currently not available on Pontoon.

Note: This issue only applies to the web interface language - in order to activate language contributions on Common Voice you will also need to ensure 5,000+ sentences are available to be read in that language. Please refer to the full Language documentation for more details.

Language name
What language would you like to add?
Konkani

Language code
ISO 639 2/B

Language size
Number of active speakers of this language in the world - 3 millions - 3.5 millions

Plural forms
How would you translate the following in this language?

0 rocks - fatôr na
1 rock - ek fatôr
2 rocks - dah fatôr
3 rocks - tin fatôr
4 rocks - char fatôr
5 rocks - panch fatôr
10 rocks - dah fatôr
20 rocks - vis fatôr
100 rocks - sambar fatôr
1000 rocks - hôzar fatôr
I see 0 rocks on the ground - Maka zomnir fatôr diso na
I see 1 rock on the ground - Maka zomnir ek fatôr dista
I see 10 rocks on the ground - Maka zomnir dah fatôr dista
I see rocks on the ground - Maka zomnir khup fatôr dista

Pontoon manager
https://pontoon.mozilla.org/contributors/eTnOEuxbFg7f5DhwUCTUBLtcqKg/

Language Script
What is the name of the language scripts used to write your language?
Latin, Roman, Devanagari, Kannada, Malayalam, Arabic. (Latin, Roman preferred, also Devanagari)

@alvynabranches alvynabranches added the Localisation New language requests and or issues regarding localisation (l10n) label Sep 20, 2021
@Heyhillary
Copy link
Contributor

Hey @alvynabranches , thanks so much for the request.

Is it okay to explain, the multiple language scripts, to help me understand your request ?

@MichaelKohler
Copy link
Member

I will add this to the sentence collector once it is added to Pontoon, as the ISO code is not yet clear to me.

@ftyers
Copy link
Collaborator

ftyers commented Sep 20, 2021

As far as I understand, Devanagari is the official script, but in Goa, Latin can be used too. Afaik ISO code is kok.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Sep 21, 2021

The Konkani Wikipedia seems to be in both scripts (with the language code gom).

In any case it does not matter exactly which script is chosen first with a multi-script language, but it should be possible to find at least 5000 public domain sentences in that script. For Serbian for example we went with Cyrillic over Latin because it was easy to find public domain texts and that was what the requester asked for. But the final choice is with the community and contributors.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Sep 21, 2021

Whichever one is more convenient. I know some people working on NLP for Konkani and they have ~10k sentences in Konkani in Devanagari script that could be potentially used to bootstrap the sentence collector.

Is it possible to convert sentences to and from Latin and Devanagari automatically?

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Sep 21, 2021

The sentence collector is not available for Konkani yet. First we need to work out Pontoon and then enable it in the sentence collector. So a decision on script for localisation needs to be made first. We are currently working on this process, you can check out this link for an opportunity to participate in the New Language Workflow development.

@anniedhempe, do you have any thoughts on this? In terms of language code and script.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@Heyhillary
Copy link
Contributor

Hey @alvynabranches ,

Thanks so much for the clarification. I'm currently making Konkani available on Pontoon but I am struggling to find the CLDR Plurals for Konkani, by any chance do you know them ? In addition can you confirm the plural rule is n=2 ?

Many thank,s

Hillary

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 21, 2021 via email

@anniedhempe
Copy link

I can contribute sentences in Devanagari script for Konkani.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Sep 24, 2021 via email

@anniedhempe
Copy link

anniedhempe commented Sep 24, 2021 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 1, 2021 via email

@Heyhillary
Copy link
Contributor

Hey @alvynabranches and @anniedhempe

Thanks for your patience, I would like to ask if you could share your thoughts on the following..

Currently, the common Voice platform only supports one script for a language. However, we have already identified that this is a problem for many communities and are excited to let you know that it is in the roadmap to address around Q2 2022!

Currently, localisations are tied to the dataset, so if we make different localisations as per the request, we would be splitting the Konaki language into two distinct datasets. Given that in the next 6 months we already intend to support multiple scripts for a language, you may wish to wait and make use of that option.

To recap, there are three paths;

A. We enable both scripts but as completely different languages that will not be part of the same dataset e.g you would have Konaki (Latin) as one dataset and Konaki (Devanagari) as another. This would be available now, but would fragment the datasets and make them more difficult to use together.
B. Wait to progress Konaki altogether until the feature for multiple scripts is launched in the new year (circa May 2022)
C. Continue to localise Konaki and launch the dataset in one script, and then next year we will add the other. You would need to agree on which script is better to start with as a community and let us know.

Please let me know what you think ?

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 4, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Oct 4, 2021

Ideally there would be agreement from users of both scripts as to the best way forward. Also, "this is fine for me" does not really work here because there are issues of who gets which language code etc. It will make a lot of work for the team if the language codes have to be switched later, so it's better to come up with a consensus now. If you'd like to make a suggestion on how to deal with both scripts with respect to language code and dataset, we'd love to hear it.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 4, 2021 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 4, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Oct 4, 2021

I completely agree that if both scripts are available it will be good for the communities. The question remains about how to assign language codes. One possible option would be to use

  • knn for Konkani (in Devanagari)
  • gom for Goan Konkani (in Latin)

These could then be merged into kok at a possible future date. However this is unsatisfactory because Goan Konkani may also written in Devanagari.

Another option would be to do what Wikipedia does, and to include the sentences in both scripts:

imatge

When making the decision we need to take into account the whole language community, not just one organisation. If you would like an organisation-specific Common Voice, then you can set up your own instance. We are looking for a holistic solution, not a solution that only takes one subset of the speaker population into account.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 4, 2021 via email

@ftyers
Copy link
Collaborator

ftyers commented Oct 4, 2021

Ok, let's get some input from some other people and make sure that we are getting an inclusive consensus. Do you know anyone at Goa University you could contact?

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 4, 2021 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 5, 2021 via email

@Heyhillary
Copy link
Contributor

Heyhillary commented Oct 8, 2021

Hey everyone, thanks for sharing your thoughts regarding the localisation of Konaki.

To confirm, I will be enabling pontoon with the following ISO codes (once there is community consensus):

  • knn for Konkani (in Devanagari)
  • gom for Goan Konkani (in Latin)

Alyvn, is it possible if you could share the contact details for Jyoti ?

one way we can communicate with people how this decision was made for the script is in language pontoon description and the creation of the L10N style guide, which list styles for languages on Pontoon.

@MichaelKohler , on the sentence collector, is there any way we could signify what script is being used ? As a suggestion could it be under the name of the language ?

@ftyers
Copy link
Collaborator

ftyers commented Oct 8, 2021

Probably for sentence collector the easiest way is in the native name, e.g.

  • knn Konkani (कोंकणी) and
  • gom Konkani (Konknni)

Is the eventual plan to merge these two?

@MichaelKohler
Copy link
Member

Yes, what @ftyers said is spot on. Do you think this would be clear enough for contributors?

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Oct 21, 2021 via email

@MichaelKohler
Copy link
Member

We will add it to the Sentence Collector once it's added to Pontoon.

@Heyhillary
Copy link
Contributor

Hey @anniedhempe ,

Are you happy with the proposed solution(s)?

Please let us know your thoughts. So we can proceed.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Nov 7, 2021 via email

@Heyhillary
Copy link
Contributor

Hey @alvynabranches ,

As per my previous message, I want to ensure there is some community consensus on this before starting to localise. I have reached out to @anniedhempe, who was involved in the discussion via github and I am waiting for her to also way in. You mentioned that you had a contact Jyoti, who might also be interested in discussing the script choice.

By any chance have you reached out to Jyoti ?

@anniedhempe
Copy link

anniedhempe commented Nov 29, 2021 via email

@Heyhillary
Copy link
Contributor

Sir,I was using devnagari script of Konkani for my work. Even Madam Jyoti is working with devnagari script the language.I would like to know what I am suppose to doAnnie. Sent from Yahoo Mail on Android On Mon, Nov 29, 2021 at 5:19 PM, @.***> wrote: Hey @alvynabranches , As per my previous message, I want to ensure there is some community consensus on this before starting to localise. I have reached out to @anniedhempe, who was involved in the discussion via github and I am waiting for her to also way in. You mentioned that you had a contact Jyoti, who might also be interested in discussing the script choice. By any chance have you reached out to Jyoti ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

Hey Annie, thanks for your response. On October 4th, I shared three options we could take with handling varying scripts. Could you please share your thoughts on this ?

Reference: #3266 (comment)

@anniedhempe
Copy link

anniedhempe commented Nov 29, 2021 via email

@Heyhillary
Copy link
Contributor

Hey Annie,

Thanks for clarifying your position. Overall I understand your preference for the script but not out of three options proposed.

The three options are:
A. We enable both scripts but as completely different languages that will not be part of the same dataset e.g you would have Konaki (Latin) as one dataset and Konaki (Devanagari) as another. This would be available now, but would fragment the datasets and make them more difficult to use together.
B. Wait to progress Konaki altogether until the feature for multiple scripts is launched in the new year (circa May 2022)
C. Continue to localise Konaki and launch the dataset in one script, and then next year we will add the other. You would need to agree on which script is better to start with as a community and let us know.

Also, could you please refer to me as "Ms" as my pronouns are she/her.

Thank you

@anniedhempe
Copy link

anniedhempe commented Nov 29, 2021 via email

@Heyhillary
Copy link
Contributor

Heyhillary commented Jan 11, 2022

Hey @alvynabranches and @anniedhempe,

Sorry for the delay in response.

Thanks for your contributions on this discussion your responses imply that you prefer option 1, which I can implement.

A. We enable both scripts but as completely different languages that will not be part of the same dataset e.g you would have Konaki (Latin) as one dataset and Konaki (Devanagari) as another. This would be available now, but would fragment the datasets and make them more difficult to use together.

The Codes will be knn for Konkani (in Devanagari) and gom for Goan Konkani (in Latin).

Earlier in this conversation earlier, we highlighted the inclusion of varaints for languages and using BCP-47 which provides flexibility for orthography. I would like to encourage you to check out our blog, which explains the variants in more details: https://foundation.mozilla.org/en/blog/how-we-are-making-common-voice-even-more-linguistically-inclusive/

Could we host a virtual meeting to dicuss the possible variants with you ?

@anniedhempe could you please provide a pontoon user name, so I can set you up for Konkani (in Devanagari) please ?

@anniedhempe
Copy link

anniedhempe commented Jan 11, 2022 via email

@Heyhillary
Copy link
Contributor

Thanks @anniedhempe I will send you a few dates once, @alvynabranches responds - also you can create a profile on Pontoon using this link: https://pontoon.mozilla.org/accounts/fxa/login/

@Heyhillary
Copy link
Contributor

Hey @alvynabranches ,

Thanks for your patience

Goan Konkani is now syncing with Common Voice and will be available for localisation shortly.

I have also added you as a team manager. This means all your work will go to staging (and later to production) without peer review. I hope through your work, there would be others joining you to collaborate on the different phases of this project.

Here are a few docs that will help you with using Pontoon, working with other contributors, and localizing the file written in .ftl:

- Pontoon user guide
- Roles and responsibilities
- Localizing in Fluent
- Common Voice web part project info

Thank you and welcome to the Mozilla l10n and Common Voice community!

If you have any questions join us on the community chat, also if you would like any community support I'm happy to share ideas and support you.

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Jan 11, 2022 via email

@alvynabranches
Copy link
Contributor Author

alvynabranches commented Jan 11, 2022 via email

@anniedhempe
Copy link

anniedhempe commented Jan 11, 2022 via email

@Heyhillary
Copy link
Contributor

All you need to do next is share with me your profile user link for example mine is https://pontoon.mozilla.org/contributors/UJeECCxearEuO2zZfY6iyzbuPn8/

You can access this via the menu, when clicking on your username.

@MichaelKohler
Copy link
Member

I have checked Common Voice sentence collector now. It show only "gom",
some people might not know that it is Konkani Latin script. Instead if it
could be specified in detail it would be better. Even for Devnagari script
its just showing "knn".

That will be updated on the next deployment after the native language name is added in the Pontoon strings.

@anniedhempe
Copy link

anniedhempe commented Jan 11, 2022 via email

@Heyhillary
Copy link
Contributor

Heyhillary commented Jan 12, 2022 via email

@anniedhempe
Copy link

anniedhempe commented Jan 15, 2022 via email

@Heyhillary
Copy link
Contributor

@.*** https://pontoon.mozilla.org/contributors/Daa6FAmK9J0eZ9O9ekevzG9b0aY/ is this ok On Tuesday, 11 January, 2022, 08:53:21 pm IST, @.*** @.> wrote: @. | | | | | | | | | | | @.*** Mozilla Mozilla’s Localization Platform | | | I hope this is right On Tuesday, 11 January, 2022, 08:47:03 pm IST, Hillary @.> wrote: All you need to do next is share with me your profile user link for example mine is https://pontoon.mozilla.org/contributors/UJeECCxearEuO2zZfY6iyzbuPn8/ You can access this via the menu, when clicking on your username. — Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you were mentioned.Message ID: @.>

Thanks Annie, I have now made you the mamager for Konkani (Devanagari) . This means all your work will go to staging (and later to production) without peer review. I hope through your work, there would be others joining you to collaborate on the different phases of this project.

Here are a few docs that will help you with using Pontoon, working with other contributors, and localizing the file written in .ftl:

- Pontoon user guide
- Roles and responsibilities
- Localizing in Fluent
- Common Voice web part project info

Thank you and welcome to the Mozilla l10n and Common Voice community!

If you have any questions join us on the community chat, also if you would like any community support I'm happy to share ideas and support you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Localisation New language requests and or issues regarding localisation (l10n)
Projects
None yet
Development

No branches or pull requests

5 participants