Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upSpecification and workflow needed: How to translate/localize links within the Qubes (doc) website? #3547
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
commented
Feb 6, 2018
|
Concerning automated link translation, see this idea. Any comments? |
andrewdavidwong
added
task
localization
labels
Feb 7, 2018
andrewdavidwong
added this to the
Documentation/website milestone
Feb 7, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 7, 2018
Member
Use relative links instead of absolute ones?
I recently updated the documentation guidelines on this point:
https://www.qubes-os.org/doc/doc-guidelines/#markdown-conventions
(In short: Yes, please use relative instead of absolute paths.)
I recently updated the documentation guidelines on this point: (In short: Yes, please use relative instead of absolute paths.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 7, 2018
Use relative links instead of absolute ones?
I recently updated the documentation guidelines on this point:
https://www.qubes-os.org/doc/doc-guidelines/#markdown-conventions(In short: Yes, please use relative instead of absolute paths.)
I'm not sure if we talk about the same thing. Maybe I used the term "relative link" ambiguous.
With a "relative link" I mean rather a "relative path" (not URL) in the sense that the path does not begin with a slash /, like local paths on a Linux machine. However, URLs are always absolute in my understanding.
For example, while https://www.qubes-os.org/doc/doc-guidelines/ and /doc/doc-guidelines/ are absolute paths following my definition, the paths ../, ../../intro/ and intro/ are relative ones. (Let's say that these relative links exist on the page /doc/doc-guidelines/ then they would lead to /doc, /intro and /doc/doc-guidelines/intro respectively. See my prototype.)
tokideveloper
commented
Feb 7, 2018
I'm not sure if we talk about the same thing. Maybe I used the term "relative link" ambiguous. With a "relative link" I mean rather a "relative path" (not URL) in the sense that the path does not begin with a slash For example, while |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 8, 2018
Member
Oh, I see. Yes, I think we're talking about two different things. My main concern is to avoid https://www.qubes-os.org/doc/ in favor of /doc/, since the former prevents easy navigation on a locally-served copy of the website.
|
Oh, I see. Yes, I think we're talking about two different things. My main concern is to avoid |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 8, 2018
Oh, I see. Yes, I think we're talking about two different things. My main concern is to avoid https://www.qubes-os.org/doc/ in favor of /doc/, since the former prevents easy navigation on a locally-served copy of the website.
I see. Thank you.
So, now I want to discuss the use of
- relative paths (
../doc-guidelines), - absolute paths (
/doc/doc-guidelines) and - "prefixed paths" (
{{ page.langprefix }}/doc/doc-guidelines).
Relative Paths
Advantages
- No absolute prefix needed. Thus, no prefix to adapt. Thus, no explicit localization needed (besides fragments?).
Disadvantages
- All paths in all the canonical files have to be converted first.
- It is harder to see where a relative path points to. Thus, rather error-prone.
- When copying parts of an existing page to another page, all the relative paths have to be checked.
Absolute Paths
Advantages
- Easy to see where an absolute path points to.
- Robust when moving/copying (parts of) pages.
- No conversion of the existing paths needed.
Disadvantages
- They have to be localized manually. Automated localization could be hard, too.
"Prefixed Paths"
Advantages
- Easy to see where a "prefixed path" points to.
- Robust when moving/copying (parts of) pages.
- When converting existing paths, only the language-dependent ones have to be prefixed.
- Localization can be automated quite easily since only the YAML front matters need to be localized. Thus, much less error-prone and more generic.
Disadvantages
- Prefixing of existing paths needed, plus extending the YAML front matter (*).
(*) I tried to set a variable langprefix within the Liquid code of my langswitch prototype, hoping that the variable would exist when printing the {{ content }}, but it does not seem to work.
Hint: When I tried out "prefixed paths", some strange behaviour appeared (paths with a literally leading slash in the source MD file became relative ones in the produced HTML files). So, one should test "prefixed paths" with all possibilities of creating links in advance.
tokideveloper
commented
Feb 8, 2018
I see. Thank you. So, now I want to discuss the use of
Relative PathsAdvantages
Disadvantages
Absolute PathsAdvantages
Disadvantages
"Prefixed Paths"Advantages
Disadvantages
(*) I tried to set a variable Hint: When I tried out "prefixed paths", some strange behaviour appeared (paths with a literally leading slash in the source MD file became relative ones in the produced HTML files). So, one should test "prefixed paths" with all possibilities of creating links in advance. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 9, 2018
Member
Why would absolute paths have to be localized manually when the others don't?
|
Why would absolute paths have to be localized manually when the others don't? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 9, 2018
Why would absolute paths have to be localized manually when the others don't?
Let's say that, for example, the page /doc/doc-guidelines/ shall link to /doc/.
If this is done by the absolute path /doc/ then translators have to translate it to /de-DE/doc/.
On the opposite, a relative path like ../.., pointing to /doc/, must be translated to ../.., too. Thus, no "translation" is needed.
Also, the "prefixed path" {{ page.langprefix }}/doc/ does not need to be "translated" (it's still {{ page.langprefix }}/doc/ in the translated version). However, the prefix {{ page.langprefix }} must already exist in the canonical version (and therefore has to be inserted, but only once for all translations). In addition, the value for page.langprefix must be set in the YAML front matter (in this example to the value /de-DE), but this can easily be done by an awk script or something.
Thus, both relative and "prefixed" paths don't need an explicit translation. They are already translated implicitly.
tokideveloper
commented
Feb 9, 2018
Let's say that, for example, the page If this is done by the absolute path On the opposite, a relative path like Also, the "prefixed path" Thus, both relative and "prefixed" paths don't need an explicit translation. They are already translated implicitly. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 10, 2018
Member
Any reason we can't just do a recursive find-and-replace? Something like:
$ find . -type f -print0 | xargs -0 sed -i 's#/doc/#/de-DE/doc/#g'
|
Any reason we can't just do a recursive find-and-replace? Something like:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 14, 2018
Any reason we can't just do a recursive find-and-replace? Something like:
$ find . -type f -print0 | xargs -0 sed -i 's#/doc/#/de-DE/doc/#g'
I think it's hard to decide whether a string is a link or not if you don't use a MD/HTML/YAML parser.
But even if we would use an appropriate parser, there could be corner cases where it's still hard to decide.
Let's say there are these lines:
<a href="/">To the root directory of the canonical/English/official version.</a>
...
<a href="/">To the root directory of the localized version in your language.</a>
...
<img src="/to/the/language-independent/logo.png">
...
Use `[here I am][/somewhere/in/the/repo]` to create a labeled link.
...
<a href="http://example.org/doc/">To the doc's root directory on another planet.</a>
The slashes must be interpreted differently, depending on the context, and thus, they could need different translations.
tokideveloper
commented
Feb 14, 2018
I think it's hard to decide whether a string is a link or not if you don't use a MD/HTML/YAML parser. But even if we would use an appropriate parser, there could be corner cases where it's still hard to decide. Let's say there are these lines:
The slashes must be interpreted differently, depending on the context, and thus, they could need different translations. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 15, 2018
Member
Why not simply run different commands on .md and .html files, or do the recursive find-and-replace only on the .md files (which are the vast majority), then manually edit the .html files?
|
Why not simply run different commands on |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 15, 2018
Why not simply run different commands on
.mdand.htmlfiles, or do the recursive find-and-replace only on the.mdfiles (which are the vast majority), then manually edit the.htmlfiles?
Okay, I see that the vast majority should be handled automatically while some corner cases should be inspected manually. So, what about this compromise:
- Get a list of all existing permalinks.
- Copy all files in the repo. Do the next two steps only on the copies.
- Automatically prefix all (permalink) paths in all files with a unique placeholder.
- Manually check the placeholders to be in the correct place and nowhere else.
- Upload the temporary copy to Transifex.
- Automatically replace the placeholders on a temporary copy with the language-dependent path prefix
/de-DEetc. - On future changes, do the above steps only on the differences.
In more detail:
(1) First, we get a list of all existing permalinks (like /, /doc/ etc.):
cd REPO
grep -re 'permalink: ' . | grep --invert-match -e './_config.yml' | cut -f 2 -d' ' | grep -e '^/' | sort
(2) Then we copy the files of the canonical version to a dedicated directory, let's call it new_lang_prefixed_DATETIME where DATETIME is the current date and time.
(3) There, into all files, we automatically insert a (hopefully) unique prefix like %LangPrefix% in front of all permalink strings that look like translatable paths, depending on the language HTML/MD/YAML etc., for example:
[/doc/]to[%LangPrefix%/doc/]inMDfiles,(/)to(%LangPrefix%/)inMDfiles,permalink: /doc/anti-evil-maid/topermalink: %LangPrefix%/doc/anti-evil-maid/in theYAMLfront matters,href="/doc/"tohref="%LangPrefix%/doc/"inHTMLfiles andsrc="/"tosrc="%LangPrefix%/"inHTMLfiles.
This way, at least all paths should be covered. Hopefully, we won't miss any path.
(4) In a next step, we manually check all occurrences of %LangPrefix% that they shall be transformed to /de-DE etc. in the final files. If there is a failed check then we replace the prefix %LangPrefix% with %NoLangPrefix%.
(5) Upload the files to Transifex and tell the translators not to translate these special prefixes.
(6) Then we automatically go through all translation languages and all translated files and modify them by replacing all occurrences of %LangPrefix% with /de-DE etc. and %NoLangPrefix% with the empty string.
(7) In the future, when some of the canonical files change then we copy only the modified files to a new new_lang_prefixed_DATETIME directory and repeat the steps as described above only on the differences to the least recently new_lang_prefixed_DATETIME directory (via an appropriate use of the diff tool, for example). This way, we will reduce efforts and focus only on the changes.
Of course, obsolete new_lang_prefixed_DATETIME directories may be removed. The directories might be useful if a new translation language appears since the newest version of a path-prefixed and manually inspected file should be uploaded. So, the directories would work as a cache.
EDIT: I swapped steps 5 and 6 to be able to upload only language-independent versions.
tokideveloper
commented
Feb 15, 2018
•
Okay, I see that the vast majority should be handled automatically while some corner cases should be inspected manually. So, what about this compromise:
In more detail: (1) First, we get a list of all existing permalinks (like
(2) Then we copy the files of the canonical version to a dedicated directory, let's call it (3) There, into all files, we automatically insert a (hopefully) unique prefix like
This way, at least all paths should be covered. Hopefully, we won't miss any path. (4) In a next step, we manually check all occurrences of (5) Upload the files to Transifex and tell the translators not to translate these special prefixes. (6) Then we automatically go through all translation languages and all translated files and modify them by replacing all occurrences of (7) In the future, when some of the canonical files change then we copy only the modified files to a new Of course, obsolete EDIT: I swapped steps 5 and 6 to be able to upload only language-independent versions. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 15, 2018
Of course, obsolete
new_lang_prefixed_DATETIMEdirectories may be removed. The directories might be useful if a new translation language appears since the newest version of a path-prefixed and manually inspected file should be uploaded. So, the directories would work as a cache.
Another method could be to override the files in new_lang_prefixed_DATETIME with newer versions, rather than storing new versions in their own directories. Thus, only one new_lang_prefixed_DATETIME directory is needed, making the suffix _DATETIME superfluous.
tokideveloper
commented
Feb 15, 2018
Another method could be to override the files in |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 16, 2018
In the algorithm above, I forgot the redirect-from links. So, whenever it's about permalinks then all redirect-from links must be considered, too.
tokideveloper
commented
Feb 16, 2018
|
In the algorithm above, I forgot the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 17, 2018
Member
Okay, I see that the vast majority should be handled automatically while some corner cases should be inspected manually. So, what about this compromise: [...]
It sounds like this procedure would be something the localization team (including you) performs. If it doesn't entail any changes to the canonical English documentation, the details of the procedure for accomplishing the agreed-upon end result are up to you.
It sounds like this procedure would be something the localization team (including you) performs. If it doesn't entail any changes to the canonical English documentation, the details of the procedure for accomplishing the agreed-upon end result are up to you. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 17, 2018
If it doesn't entail any changes to the canonical English documentation, the details of the procedure for accomplishing the agreed-upon end result are up to you.
Okay, thank you! Of course, we'll try to minimize possible impacts on the canonical English documentation. But some things for that are not yet clear for me:
- Where (directory and/or repo) can we put all our folders (languages, doc etc.) and files (content, layout etc.) concerning translations?
- When it's about going live, the canonical English documentation should insert a language switch which contains links that are labeled with translated words and pointing to unofficial (i.e. translated) pages. Thus, (a) some minor adjustments on the canonical documentation and (b) some trust in translators etc. seem to be necessary. How to handle this?
tokideveloper
commented
Feb 17, 2018
Okay, thank you! Of course, we'll try to minimize possible impacts on the canonical English documentation. But some things for that are not yet clear for me:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 18, 2018
Member
Where (directory and/or repo) can we put all our folders (languages, doc etc.) and files (content, layout etc.) concerning translations?
@marmarek is going to make (a) separate submodule(s) for the actual translated content (#2925).
The "unverified translation" warning layouts are trickier. For example, we can't allow unverified translations of the warning itself, since a malicious translator could alter the warning such that it's no longer about the translation being unverified. So, those will probably have to stay in the main repo.
When it's about going live, the canonical English documentation should insert a language switch which contains links that are labeled with translated words and pointing to unofficial (i.e. translated) pages. Thus, (a) some minor adjustments on the canonical documentation and (b) some trust in translators etc. seem to be necessary. How to handle this?
I think this is what #2930 is about.
@marmarek is going to make (a) separate submodule(s) for the actual translated content (#2925). The "unverified translation" warning layouts are trickier. For example, we can't allow unverified translations of the warning itself, since a malicious translator could alter the warning such that it's no longer about the translation being unverified. So, those will probably have to stay in the main repo.
I think this is what #2930 is about. |
tokideveloper
referenced this issue
Feb 20, 2018
Open
Create separate untrusted submodule for translated files #2925
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Feb 20, 2018
The "unverified translation" warning layouts are trickier. For example, we can't allow unverified translations of the warning itself, since a malicious translator could alter the warning such that it's no longer about the translation being unverified. So, those will probably have to stay in the main repo.
I see and agree. But how can we verify that a translation of the warning is correct? Spontaneously, I got this idea: We enter the translated warning into several translation machines, let each machine translate the string into all languages we know well enough and then we check the translations for plausibility.
tokideveloper
commented
Feb 20, 2018
I see and agree. But how can we verify that a translation of the warning is correct? Spontaneously, I got this idea: We enter the translated warning into several translation machines, let each machine translate the string into all languages we know well enough and then we check the translations for plausibility. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Feb 21, 2018
Member
Sounds good to me. Similarly, given how short the warning is, we could try to have multiple (hopefully) independent human translators translate (or verify) it for each language.
|
Sounds good to me. Similarly, given how short the warning is, we could try to have multiple (hopefully) independent human translators translate (or verify) it for each language. |
andrewdavidwong
assigned
tokideveloper
Mar 18, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Mar 19, 2018
How to deal with fragments (*) in links?
Let me explain why it is problematic. The main concerns are the headings which get IDs created by the Markdown processor.
Let's say a translator wants to translate a link with fragment /file/#good-morning pointing to the heading Good Morning! in the document /file/.
To know how to translate it correctly, for example into German, the translator has to do several steps:
- Find the file the link/URL of the fragment is pointing to. (That is, in the list of files in Transifex, find the MD file with the given permalink
/file/in the YAML header.) - In that file, look for the correct heading of the target of the fragment:
Good Morning!. - Look for the translation of
Good Morning!, which isGuten Morgen!. If it's not yet translated then translate it first. - Transform a copy of that translation (
Guten Morgen!toguten-morgen) to match the ID the headingGuten Morgen!will have after processing the MD file to an HTML file. - Enter that transformed result (
guten-morgen) as the translated fragment. The resulting link is/de-DE/file/#guten-morgen(note that inserting/de-DEis another problem not discussed in this post).
(Note that step 2 and subsequent ones are different if there is no heading but any HTML element with that ID.)
These steps are cumbersome, error-prone and inconvenient. Also, if someone changes a header again then all related links/URLs have to be found and adapted again.
To deal with it in a better way, I suggest the following solution. The translator does NOT translate any fragments. Instead, a machine inserts additional empty anchors into the headings in the resulting HTML files. The IDs of these new anchors match the IDs of the appropriate headings in the canonical version.
Following the example:
- Let the heading in the (MD-processed) canonical HTML file be
<h3 id="good-morning">Good Morning!</h3>. - Let the heading in the (MD-processed) translated HTML file be
<h3 id="guten-morgen">Guten Morgen!</h3>. - Add the ID
good-morningfrom step 1 to a new anchor within the heading in step 2:<h3 id="guten-morgen"><a id="good-morning"></a>Guten Morgen!</h3>.
(Note: Skip step 3 if both IDs in the result would be equal.)
This way, the fragments given in the canonical files will also work with(in) the translated files. Thus, /de-DE/file/#good-morning (and /de-DE/file/#guten-morgen) will work.
tokideveloper
commented
Mar 19, 2018
Let me explain why it is problematic. The main concerns are the headings which get IDs created by the Markdown processor. Let's say a translator wants to translate a link with fragment To know how to translate it correctly, for example into German, the translator has to do several steps:
(Note that step 2 and subsequent ones are different if there is no heading but any HTML element with that ID.) These steps are cumbersome, error-prone and inconvenient. Also, if someone changes a header again then all related links/URLs have to be found and adapted again. To deal with it in a better way, I suggest the following solution. The translator does NOT translate any fragments. Instead, a machine inserts additional empty anchors into the headings in the resulting HTML files. The IDs of these new anchors match the IDs of the appropriate headings in the canonical version. Following the example:
(Note: Skip step 3 if both IDs in the result would be equal.) This way, the fragments given in the canonical files will also work with(in) the translated files. Thus, |
tokideveloper
referenced this issue
Mar 21, 2018
Open
Specifying the translation/localization workflow: How to use Transifex best? #3548
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 22, 2018
Member
Hmm, this looks like applying such fixups in md file wouldn't work. Which means translated offline documentation will be slightly limited. IMO it would be desirable to come back to the idea of having all changes applied in md files (maybe some layouts changes for that?). But we can go back to this later.
|
Hmm, this looks like applying such fixups in md file wouldn't work. Which means translated offline documentation will be slightly limited. IMO it would be desirable to come back to the idea of having all changes applied in md files (maybe some layouts changes for that?). But we can go back to this later. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Mar 24, 2018
Just before I forget it: Another reason for fixups after Jekyll-execution is the translation of redirecting pages.
But this could also be done by a specific execution of Jekyll while there is a dedicated (i.e. language-dependent) customized redirect template /_layouts/redirect.html.
tokideveloper
commented
Mar 24, 2018
|
Just before I forget it: Another reason for fixups after Jekyll-execution is the translation of redirecting pages. But this could also be done by a specific execution of Jekyll while there is a dedicated (i.e. language-dependent) customized redirect template |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Mar 24, 2018
How to translate links (without a fragment) in general?
It's quite late for this important question. So, here we go:
The URL path of a translated page shall get a language-(region?-)dependent super-directory and the rest of the URL shall remain as it does for the canonical version.
Example: The German version of https://www.qubes-os.org/doc/contributing/ shall be https://www.qubes-os.org/de/doc/contributing/ or https://www.qubes-os.org/de-DE/doc/contributing/, depending on the language code we want to use.
Also see this post.
tokideveloper
commented
Mar 24, 2018
It's quite late for this important question. So, here we go: The URL path of a translated page shall get a language-(region?-)dependent super-directory and the rest of the URL shall remain as it does for the canonical version. Example: The German version of Also see this post. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Mar 31, 2018
Which language code ("English", "en", "en-US", "eng" etc.) to use to differ the languages?
Currently, en is used as redirections to the canonical version. It's a language code without a specified region.
Instead, I would prefer the format LANGUAGE-REGION as listed in this ISO table (beside region-less codes). Pros are:
- It's clear which variety to use (e.g. either British or American English (
en-GBoren-US)). Note that currently, e.g. "color" and "colour" coexist in the documentation. - It's very unlikely that a top directory will be created in the canonical version that collides with the code. E.g.
bg, meaning "background" or such, could also be the name of a top directory in the canonical version, colliding withbgfor "Bulgarian". Contrarily,bg-BGis probably not "background-BackGround" or such. - The set of these languages/varieties is larger than without a region code.
- It's future-proof in case that people would beg for their region-specified language down the road.
One thing on the downside is that we would have to add redirections from (or permalinks to?) the en-US versions (The canonical version is written in American English, isn't it?) in the YAML front matters. Also note that Wikipedia seems to be fine with region-less language codes for their sub-domains.
How to deal with the permalink URLs of the canonical version? I see two main ways:
- We don't touch them (i.e. don't add an
en-UStop directory), - we add an
en-UStop directory.
While the first one will
- keep things simple for the canonical version and
- mark the canonical version as the canonical version better,
the latter has the advantage that all paths would start with a language code, making them consistent. I'm open for both options.
What do you think?
tokideveloper
commented
Mar 31, 2018
Currently, Instead, I would prefer the format
One thing on the downside is that we would have to add redirections from (or permalinks to?) the How to deal with the permalink URLs of the canonical version? I see two main ways:
While the first one will
the latter has the advantage that all paths would start with a language code, making them consistent. I'm open for both options. What do you think? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Mar 31, 2018
Member
Definitely this one:
We don't touch them (i.e. don't add an en-US top directory),
No language or region code for the canonical URLs. (And this is not elitism about English, BTW. I would say the same thing if the documentation were in any other language.)
There are good reasons that no major website has language or region codes in any of their canonical URLs. However, if anyone can provide a counterexample (of a major website that does this), I'd be interested to see it.
Other than that, sounds good to me.
|
Definitely this one:
No language or region code for the canonical URLs. (And this is not elitism about English, BTW. I would say the same thing if the documentation were in any other language.) There are good reasons that no major website has language or region codes in any of their canonical URLs. However, if anyone can provide a counterexample (of a major website that does this), I'd be interested to see it. Other than that, sounds good to me. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Mar 31, 2018
Definitely this one:
We don't touch them (i.e. don't add an en-US top directory),
No language or region code for the canonical URLs. (And this is not elitism about English, BTW. I would say the same thing if the documentation were in any other language.)
There are good reasons that no major website has language or region codes in any of their canonical URLs. However, if anyone can provide a counterexample (of a major website that does this), I'd be interested to see it.
Entering the URL to the website of Mozilla https://www.mozilla.org/ redirects to https://www.mozilla.org/de/ for me.
Entering https://www.mozilla.org/en/ redirects to https://www.mozilla.org/en-US/ in my case.
There is also a language switch on the bottom offering other languages.
It seems that they both use LANGUAGE-REGION and LANGUAGE mixed. The only rule I see there is: If there are at least two translations into the equal language but with different regions then use LANGUAGE-REGION. (Otherwise, use LANGUAGE-REGION or LANGUAGE.)
There are also codes which aren't in the mentioned list, e.g. Frysk (fy-NL). Don't know where it's from.
EDIT: Interestingly, when I visit https://www.mozilla.org/de/ using the text web browser elinks then I can see a list of links on top of the page. These links point to the available languages. The two top-most links are:
- named "canonical" leading to https://www.mozilla.org/de/ (sic!) and
- named "alternate" leading to https://www.mozilla.org/en-US/ .
A "canonical" link on Wikipedia also points to the German version in my case. So, maybe we don't really understand "canonical"? END OF EDIT.
tokideveloper
commented
Mar 31, 2018
•
Entering the URL to the website of Mozilla https://www.mozilla.org/ redirects to https://www.mozilla.org/de/ for me. Entering https://www.mozilla.org/en/ redirects to https://www.mozilla.org/en-US/ in my case. There is also a language switch on the bottom offering other languages. It seems that they both use There are also codes which aren't in the mentioned list, e.g. Frysk ( EDIT: Interestingly, when I visit https://www.mozilla.org/de/ using the text web browser
A "canonical" link on Wikipedia also points to the German version in my case. So, maybe we don't really understand "canonical"? END OF EDIT. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Mar 31, 2018
Member
Interesting. I agree that this is a good counterexample, and I agree that what you describe in your edit is puzzling. I think both approaches are reasonable. In our case, it might still make sense to leave the canonical English version without a language code, since there's no way our localization will be as thorough as Mozilla's anytime soon.
|
Interesting. I agree that this is a good counterexample, and I agree that what you describe in your edit is puzzling. I think both approaches are reasonable. In our case, it might still make sense to leave the canonical English version without a language code, since there's no way our localization will be as thorough as Mozilla's anytime soon. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 31, 2018
Member
|
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
marmarek
Mar 31, 2018
Member
As for language codes with or without region - indeed adding region code seams reasonable.
|
As for language codes with or without region - indeed adding region code seams reasonable. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Apr 1, 2018
Thank you both Andrew and Marek!
Let's summarize it:
- For the official version (I'll call it "official" now rather than "canonical"): No language or region code (thus: no new top directory in the URL).
- For translated versions: Always a top directory in the URL containing the language code together with a region code formatted as
LANGUAGE-REGION.
However, for internal processing purposes only, I suggest to use en for the official version. Reasons:
enis currently used in the redirection paths. So, I'll just use an existing name and won't create an additional one.enis neitheren-USnoren-GBand thus fits our current "needs" of using an "almost-English" language due to the lack of native speakers.endoesn't steal eitheren-USoren-GBand thus could be adapted in the future in case we get ample man power of native speakers.
tokideveloper
commented
Apr 1, 2018
|
Thank you both Andrew and Marek! Let's summarize it:
However, for internal processing purposes only, I suggest to use
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andrewdavidwong
Apr 1, 2018
Member
However, for internal processing purposes only, I suggest to use
enfor the official version.
I guess it depends on what practical effects this will have on our workflow. If it only happens inside of scripts (i.e., documentation contributors and maintainers don't have to change anything), then I'm on board.
I guess it depends on what practical effects this will have on our workflow. If it only happens inside of scripts (i.e., documentation contributors and maintainers don't have to change anything), then I'm on board. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Jun 2, 2018
However, for internal processing purposes only, I suggest to use en for the official version.
I guess it depends on what practical effects this will have on our workflow. If it only happens inside of scripts (i.e., documentation contributors and maintainers don't have to change anything), then I'm on board.
@andrewdavidwong I see. I'm not sure yet but we'll see.
tokideveloper
commented
Jun 2, 2018
@andrewdavidwong I see. I'm not sure yet but we'll see. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tokideveloper
Jun 2, 2018
I reviewed my algorithm shown in a previous post. Here are my outcomes:
- The handling of copies (steps 1, 5 and 6 (in a sense)) should be discussed in another thread.
- The handling of differences (step 7) is not really explained. Now, I thought about it and the result is that I have to adapt the algorithm to get it work right. So, here is the new version of how to treat a Markdown file (without mentioning the copy thing):
- Get a list of all existing
permalinks andredirect_fromlinks as listed in the YAML front matters of all files. - Automatically, in the file, prefix all paths that are in that list with the placeholder
%UndecidedLangPrefix%. The resulting state of the file may be called "UndecidedVersion". - If available, apply the patch
Decision.patchgenerated during step 7 of the last run. Rejected hunks may be ignored or even deleted. - If there is still an
%UndecidedLangPrefix%placeholder within the file then notify a person responsible to do this:- Replace all occurrences of
%UndecidedLangPrefix%with%LangPrefix%if the concerned links have to be translated (most frequent case). - Replace all occurrences of
%UndecidedLangPrefix%with%NoLangPrefix%if the concerned links must not be translated (probably seldom).
- Replace all occurrences of
- Check that there is no
%UndecidedLangPrefix%in the file. If there is one then go back to step 4. - The current state of the file may be called "DecidedVersion".
- Save the difference from "UndecidedVersion" to "DecidedVersion" as a patch called
Decision.patch. - Upload the file to Transifex and tell the translators not to touch the placeholders.
- Download a translated version of that file from Transifex. Let's say it's in German.
- Replace all occurrences of
%LangPrefix%EDIT and%ExtraLangPrefix%END EDIT with/de-DE. - Replace all occurrences of
%NoLangPrefix%with the empty string.
By using the patch Decision.patch, we'll save time in the next runs since only these spots of %UndecidedLangPrefix% must be adapted where the patch couldn't be applied.
EDIT As an additional step between 5 and 6 or between 7 and 8: Where necessary, add %ExtraLangPrefix% labels in front of all paths to translate that erroneously have not been detected. Save it as a patch and apply that patch in an earlier step in future runs. END EDIT
Of course, already existing sub-strings in the original files that are equal to the placeholders have to be escaped/treated specially.
If a demo example is needed then I'll write and post one.
tokideveloper
commented
Jun 2, 2018
•
|
I reviewed my algorithm shown in a previous post. Here are my outcomes:
By using the patch EDIT As an additional step between 5 and 6 or between 7 and 8: Where necessary, add Of course, already existing sub-strings in the original files that are equal to the placeholders have to be escaped/treated specially. If a demo example is needed then I'll write and post one. |
tokideveloper commentedFeb 6, 2018
•
edited
Edited 1 time
-
tokideveloper
edited Feb 6, 2018 (most recent)
In order to specify a translation workflow/guidelines, we need to specify how to translate/localize links within the Qubes OS (doc) website. In this specific issue, I would like to discuss ways to do so.
Here are some key questions (checked if solved):
(*) A fragment is the part after a hash sign ("#"), here: leading to a specific header on the linked page.
(**) "en" seems to be the currently used one. See the
redirect_fromlists in the YAML front matters in the Markdown files.Related issues:
#2824
#1452
#1333