A utility for generating a xliff and a skeleton file from a mdx file and back.
When translating a mdx document with an automatic tool, such as Google Translate or DeepL there is a significant possibility that it will break some of the syntax. It is likely that you have encountered instances where after translation some links look like this, where a space is inserted in the middle of it:
[Link text] (example.com)
Or arguably worse, since it breaks mdx compilation, alterations to html tags:
<Tabs>
<TabItem>
Somehow after translation both TabItem tags are opening!
<TabItem>
</Tabs>
The solution this package proposes is to separate text from the markup and translate only the text.
This is done using two file formats: xliff and skl. The former is just an xml with all the text content, and the latter is essentially an mdx file with all the text replaced by placeholders.
We translate only the xliff and then combine the result of the translation with the existing skeleton.
For example, a file like this:
# My file
With a paragraph, that contains a [link](https://example.com/)
Will be split into a XLIFF file:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xliff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="urn:oasis:names:tc:xliff:document:1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd" version="1.2">
<file original="namespace" datatype="plaintext" source-language="ru" target-language="en-US">
<body>
<trans-unit id="0">
<source>My file</source>
<target></target>
</trans-unit>
<trans-unit id="1">
<source>With a paragraph, that contains a</source>
<target></target>
</trans-unit>
<trans-unit id="2">
<source>link</source>
<target></target>
</trans-unit>
</body>
</file>
</xliff>
And a SKL file:
# %%%0%%%
%%%1%%%[%%%2%%%](https://example.com/)
This package provides two named exports: extract
and reconstruct
.
Generates a skeleton file and a xliff file from a given mdx.
{
fileContents: string
beforeDefaultRemarkPlugins?: Plugin[]
skipNodes?: string[]
sourceLanguage?: string
targetLanguage?: string
xliffVersion?: "1.2" | "2.0"
}
{
beforeDefaultRemarkPlugins: []
skipNodes: ["code", "inlineCode", "mdxjsEsm", "mdxFlowExpression", "mdxTextExpression"]
sourceLanguage: "ru"
targetLanguage: "en"
xliffVersion: "2.0"
}
Promise<{
skeleton: string
xliff: string
}>
Takes two files: skl and xliff, and replaces the placeholders in the skeleton file with the translations from the xliff.
If a translation is missing it throws an error by default. This can be changed by setting ignoreUntranslated
. Then any missing translation will be replaced with the source string.
{
skeleton: string
xliff: string
ignoreUntranslated?: boolean
xliffVersion?: "1.2" | "2.0"
}
{
ignoreUntranslated: false
xliffVersion: "2.0"
}
string
import { readFileSync, writeFileSync } from 'fs'
import { extract } from 'mdx2xliff'
import headingToHtml from 'mdx2xliff/remarkPlugins/headingToHtml'
;(async () => {
const fileContents = readFileSync('test.mdx', 'utf8')
const { skeleton, xliff } = await extract({
fileContents,
sourceLanguage: 'en',
targetLanguage: 'fr',
beforeDefaultRemarkPlugins: [headingToHtml]
})
writeFileSync('test.skl', skeleton)
writeFileSync('test.xliff', xliff)
})()
Whatever app is responsible for translation will have to deal with very short chunks of text. In a lot of cases they will be one or two words, this leads to suboptimal machine translation quality.
Headings like this:
## Some heading {#some-id}
are not part of any markdown spec and their MDX AST representation is the same as for a normal Markdown heading. This leads to that a machine translation can mess up and change the ID or malform the curly brace part so that the MDX will not even compile.
This can be worked around by using a built-in remark plugin mdx2xliff/remarkPlugins/headingToHtml
.
It replaces all Markdown headings with HTML headings, preserving the IDs.
Similar to the previous issue, frontmatter is easily malformed by machine translation.
mdx2xliff
does not yet provide a way of dealing with this.
Pretty old, last commit was in 2022. Uses unified version 6. Focuses on plain Markdown.
Actively maintained and developed. Focuses on YFM. No way to add support for MDX. Despite being new, uses xliff version 1.2.