This document is a brief of changes we're going to bring to the translation pipeline for Learn GDScript while also accounting for our translation needs beyond this application (interactive tours etc.).
Context about the current translation problems we face
Right now the app translates lesson content at runtime. This has some issues. First, a tiny edit to English text like fixing a typo invalidates translations for that string in every language. A bigger rewrite can erase a perfectly good translation entirely (say we make a v2 of a lesson in English, v1 of the lesson in Spanish gets lost). That's because when we call the built-in translation function at runtime and a translation is missing, Godot silently falls back to English. We get no info that something is untranslated to track what is complete or warn students.
Instead of translating at runtime we will build translated copies of each lesson at build time. The pipeline should provide and take PO files from translators and output full translated BBCode files like lesson.fr.bbcode or lesson.de.bbcode. A translated lesson should only be published if it is 100% complete. This way English content can change freely without breaking existing translations. If a French lesson was fully translated for version 1 and we tweak the English text in version 2, the French version 1 should stay valid until translators update it (or unless manually invalidated, for example, if associated practices or code files changed drastically).
Some strings, like for example UI labels, will still be translated at runtime, using the same approach as currently.
In general, the changes we need specifically for this app are:
- Add translation support for things that are missing it (most notably practice checks)
- Change lesson translation to use dedicated content files
- Add translation completion metadata per language to reflect it in the user interface
But this project goes beyond just this one application.
Building one translation pipeline for all platforms
We want this build pipeline for translations to work for the Learn GDScript app and for all our websites and web platforms. So we will write it in TypeScript, the primary language we use for our web platform and associated build pipelines. The pipeline will handle three things: syncing translations with gettext tools, measuring translation completion rates, and outputting metadata about what is translated.
For gettext file operations, we can keep using the command line tools we already rely on, like msgmerge, to merge and synchronize PO files. The TypeScript modules that count translation status and output metadata should be program agnostic. They should be self-contained libraries that work with any set of PO or POT files. Then we write thin scripts specific to Learn GDScript that use those libraries, and later we can also write thin scripts for other programs and content.
Extracting translation strings in Learn GDScript
For the Godot app, we should use the engine's built-in extractor for all the standard parts: UI scenes, GDScript files, resources. This avoids creating extra maintenance work. Our lesson content is now stored in custom BBCode markup files though. The built-in extractor likely cannot handle those. So we need a custom extraction step for the BBCode side specifically. We already have a BBCode parser that turns lesson files into an AST. We can walk that AST and collect all translatable text pieces, then write them to a POT file.
The TypeScript pipeline can invoke Godot in headless mode to trigger the built-in extractor, then run the custom BBCode extractor as a second step. The goal is that a developer runs one command and gets a complete, up-to-date POT file covering everything.
Tracking translation completion
The build system should produce metadata alongside translated lessons. We probably want a small data file that can be read quickly without loading all the translated lessons. This file should contain the overall completion percentage for each language, and a list of which lessons are fully translated and which are not. For the Learn GDScript app, it can be any format supported by Godot. JSON is the simplest to output from a TypeScript-based build system. But we can also have a small module that writes a generated GD script file, for example.
The pipeline also needs to check each individual translation file and report how many entries are untranslated, how many are marked as fuzzy and need review, and the completion rate per file.
In the user interface, this metadata lets us do two things. When a student selects a language, fully translated lessons appear normally. Incomplete or untranslated lessons would appear grayed out. We can show a little note like: "This lesson is not fully translated to French yet. If you would like to help complete the translations, please check out this page."
We will also need to write/refresh contribution documentation for translations and put it on the GDLibrary website.
Translating practice hints and checks
Currently our practice system discovers tests by looking at the PracticeTester script and finding all functions that start with test_.
The function names become the labels shown to students in the practice panel. Those labels come from code reflection and we could probably extract them for translation but I'd like to change how this works.
In Godot 4 though we can directly reference functions, and instead of using reflection to find tests and their names, I'd like to define checks explicitly.
Each practice would have a data structure listing its checks, where each check has an explicit label string, a reference to the test function, and optionally a tooltip. The label and tooltip would use the tr() function, which means Godot's built-in extractor can pick them up automatically.
This is similar to how gdpractice handles checks; you can port this part of the logic over from GDPractice.
Right to left language support
We need to open a separate ticket for this. Supporting right to left languages like Arabic requires two things. First we need Arabic fonts. Second the entire user interface needs to be able to flip its layout from left to right to right to left. This is a big UI task on its own.
What stays in Godot, what moves to TypeScript
For the Learn GDScript app, string extraction from Godot scenes and scripts stays in Godot, using the built-in tool.
BBCode extraction will need custom GDScript code that walks our parser AST.
For general gettext processing, synchronization, completion counting, and metadata output, we will write our own API/module in TypeScript.
For rebuilding translated BBCode files in this app specifically, we can do that in GDScript since the BBCode format is unique to this app.
Existing scripts that will be replaced by this system
Right now we have three scripts that we use as part of a build for manual synchronization that's really specific to Learn GDScript from Zero:
Part of the work here is to rewrite and really consolidate this into a more general purpose library. The code will go to our toolbox repository where we'll put shared libraries + a CLI for this.
The ideal scenario for me as part of this broad task is that by the end, we have... A little bit of code, some code in GDScript or some tool in Godot to extract the translation strings, specifically for Learn GDScript from Zero. And then we have one TypeScript program to synchronize the translations or integrate them back in the application.
At the same time the TypeScript code should be... An agnostic API we can reuse for all translation work for any program. Don't hesitate to ping me for any questions or doubts on the API for this module or any questions on developer experience.
We want to avoid external dependencies as much as possible and get out of/stay away from the NPM ecosystem. Using the official gettext programs like msgmerge is fine. You may also consider vendoring a good gettext parser and handling the different features like merging, syncing, creating a translation store directly in code. We've used these programs for correctness, but like the most widely used and mature translation tools like PO4ALL, which is a Perl program, can actually require custom Perl scripts to work for our needs so weirdly enough it can be simpler to just pick a parser and code our own fast and specific thing.
Requirements, tasks, and points to consider
Below I've made a more detailed checklist for all the requirements I could think of for this project. Some things can then be split out to separate issues if needed. This is mostly to help track all the needs and requirements.
Try to take as much of these as possible and let me know when you need me to do one of them. For example, updating the Weblate project. I'll also be there to help with any questions or, for example, discussion about developer experience or what the official gettext programs can do. We can also involve Yuri, notably for insights on the built-in translation system and how much we can hook into it or leverage it.
Godot app (Learn GDScript From Zero)
TypeScript translation library (toolbox repository)
Translation file structure
We will consolidate translations into fewer PO files per language: one file for all lesson BBCode content, one for all Godot UI strings (scenes, scripts, resources...), and maybe one for other strings (error database etc.):
Metadata and UI integration
This document is a brief of changes we're going to bring to the translation pipeline for Learn GDScript while also accounting for our translation needs beyond this application (interactive tours etc.).
Context about the current translation problems we face
Right now the app translates lesson content at runtime. This has some issues. First, a tiny edit to English text like fixing a typo invalidates translations for that string in every language. A bigger rewrite can erase a perfectly good translation entirely (say we make a v2 of a lesson in English, v1 of the lesson in Spanish gets lost). That's because when we call the built-in translation function at runtime and a translation is missing, Godot silently falls back to English. We get no info that something is untranslated to track what is complete or warn students.
Instead of translating at runtime we will build translated copies of each lesson at build time. The pipeline should provide and take PO files from translators and output full translated BBCode files like lesson.fr.bbcode or lesson.de.bbcode. A translated lesson should only be published if it is 100% complete. This way English content can change freely without breaking existing translations. If a French lesson was fully translated for version 1 and we tweak the English text in version 2, the French version 1 should stay valid until translators update it (or unless manually invalidated, for example, if associated practices or code files changed drastically).
Some strings, like for example UI labels, will still be translated at runtime, using the same approach as currently.
In general, the changes we need specifically for this app are:
But this project goes beyond just this one application.
Building one translation pipeline for all platforms
We want this build pipeline for translations to work for the Learn GDScript app and for all our websites and web platforms. So we will write it in TypeScript, the primary language we use for our web platform and associated build pipelines. The pipeline will handle three things: syncing translations with gettext tools, measuring translation completion rates, and outputting metadata about what is translated.
For gettext file operations, we can keep using the command line tools we already rely on, like msgmerge, to merge and synchronize PO files. The TypeScript modules that count translation status and output metadata should be program agnostic. They should be self-contained libraries that work with any set of PO or POT files. Then we write thin scripts specific to Learn GDScript that use those libraries, and later we can also write thin scripts for other programs and content.
Extracting translation strings in Learn GDScript
For the Godot app, we should use the engine's built-in extractor for all the standard parts: UI scenes, GDScript files, resources. This avoids creating extra maintenance work. Our lesson content is now stored in custom BBCode markup files though. The built-in extractor likely cannot handle those. So we need a custom extraction step for the BBCode side specifically. We already have a BBCode parser that turns lesson files into an AST. We can walk that AST and collect all translatable text pieces, then write them to a POT file.
The TypeScript pipeline can invoke Godot in headless mode to trigger the built-in extractor, then run the custom BBCode extractor as a second step. The goal is that a developer runs one command and gets a complete, up-to-date POT file covering everything.
Tracking translation completion
The build system should produce metadata alongside translated lessons. We probably want a small data file that can be read quickly without loading all the translated lessons. This file should contain the overall completion percentage for each language, and a list of which lessons are fully translated and which are not. For the Learn GDScript app, it can be any format supported by Godot. JSON is the simplest to output from a TypeScript-based build system. But we can also have a small module that writes a generated GD script file, for example.
The pipeline also needs to check each individual translation file and report how many entries are untranslated, how many are marked as fuzzy and need review, and the completion rate per file.
In the user interface, this metadata lets us do two things. When a student selects a language, fully translated lessons appear normally. Incomplete or untranslated lessons would appear grayed out. We can show a little note like: "This lesson is not fully translated to French yet. If you would like to help complete the translations, please check out this page."
We will also need to write/refresh contribution documentation for translations and put it on the GDLibrary website.
Translating practice hints and checks
Currently our practice system discovers tests by looking at the PracticeTester script and finding all functions that start with
test_.The function names become the labels shown to students in the practice panel. Those labels come from code reflection and we could probably extract them for translation but I'd like to change how this works.
In Godot 4 though we can directly reference functions, and instead of using reflection to find tests and their names, I'd like to define checks explicitly.
Each practice would have a data structure listing its checks, where each check has an explicit label string, a reference to the test function, and optionally a tooltip. The label and tooltip would use the tr() function, which means Godot's built-in extractor can pick them up automatically.
This is similar to how gdpractice handles checks; you can port this part of the logic over from GDPractice.
Right to left language support
We need to open a separate ticket for this. Supporting right to left languages like Arabic requires two things. First we need Arabic fonts. Second the entire user interface needs to be able to flip its layout from left to right to right to left. This is a big UI task on its own.
What stays in Godot, what moves to TypeScript
For the Learn GDScript app, string extraction from Godot scenes and scripts stays in Godot, using the built-in tool.
BBCode extraction will need custom GDScript code that walks our parser AST.
For general gettext processing, synchronization, completion counting, and metadata output, we will write our own API/module in TypeScript.
For rebuilding translated BBCode files in this app specifically, we can do that in GDScript since the BBCode format is unique to this app.
Existing scripts that will be replaced by this system
Right now we have three scripts that we use as part of a build for manual synchronization that's really specific to Learn GDScript from Zero:
Part of the work here is to rewrite and really consolidate this into a more general purpose library. The code will go to our toolbox repository where we'll put shared libraries + a CLI for this.
The ideal scenario for me as part of this broad task is that by the end, we have... A little bit of code, some code in GDScript or some tool in Godot to extract the translation strings, specifically for Learn GDScript from Zero. And then we have one TypeScript program to synchronize the translations or integrate them back in the application.
At the same time the TypeScript code should be... An agnostic API we can reuse for all translation work for any program. Don't hesitate to ping me for any questions or doubts on the API for this module or any questions on developer experience.
We want to avoid external dependencies as much as possible and get out of/stay away from the NPM ecosystem. Using the official gettext programs like msgmerge is fine. You may also consider vendoring a good gettext parser and handling the different features like merging, syncing, creating a translation store directly in code. We've used these programs for correctness, but like the most widely used and mature translation tools like PO4ALL, which is a Perl program, can actually require custom Perl scripts to work for our needs so weirdly enough it can be simpler to just pick a parser and code our own fast and specific thing.
Requirements, tasks, and points to consider
Below I've made a more detailed checklist for all the requirements I could think of for this project. Some things can then be split out to separate issues if needed. This is mostly to help track all the needs and requirements.
Try to take as much of these as possible and let me know when you need me to do one of them. For example, updating the Weblate project. I'll also be there to help with any questions or, for example, discussion about developer experience or what the official gettext programs can do. We can also involve Yuri, notably for insights on the built-in translation system and how much we can hook into it or leverage it.
Godot app (Learn GDScript From Zero)
tr()on the label and tooltip so the built-in extractor picks them up. You can port the relevant logic from GDPractice.tr()calls to loading translated lesson files built at build time. Our build script should be able to call our translation sync program to read PO files and produce translated BBCode files likelesson.fr.bbcode.TypeScript translation library (toolbox repository)
msgmerge, see Python script).Translation file structure
We will consolidate translations into fewer PO files per language: one file for all lesson BBCode content, one for all Godot UI strings (scenes, scripts, resources...), and maybe one for other strings (error database etc.):
Metadata and UI integration