Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hxltmcli --objectivum-XLIFF: HXL Trānslātiōnem Memoriam -> XLIFF Version 2.1 #1

Closed
fititnt opened this issue Jun 27, 2021 · 2 comments

Comments

fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 27, 2021
…ormatação para exportar primeiro um arquivo CSV mais simplificado
@fititnt
Copy link
Member Author

fititnt commented Jun 27, 2021

I think, in addition to the final XLIFF format, we're also draft one 'intermediary format, that is half the way between the HXL TM file convention and the XLIFF format.

This intermediate mostly parse input tm.hxl.csv (or anything that HXL tools are able to parse, like google spreadsheet, excel, etc) and rename columns that it knows that matters for the XLIFF format and prefix them with #x_xliff

Some corner cases, like lack of XLIFF support source of translations are not even ready for translation (something that may actually be very common for our use cases) we may prefix with #meta+xliff

Current examples

cat _hxltm/schemam-un-htcds-5items.tm.hxl.tsv

#x_xliff+unit+id	#meta+url	#item+wikidata+code	#meta+item+url+list	#meta+lat_sortem	#status	#item+type+lat_dominium+list	#item+type+lat_regnum	#item+type+lat_divisionem	#item+type+lat_classem	#item+type+lat_ordinem	#item+type+lat_familiam	#item+type+lat_genus	#item+type+lat_speciem	#item+type+lat_segmentum						#x_xliff+source+i_lat+is_latn	#item+i_la+i_lat+is_latn+alt+list	#meta+item+i_la+i_lat+is_latn	#item+i_pt+i_por+is_latn	#item+i_pt+i_por+is_latn+alt+list	#meta+item+i_pt+i_por+is_latn	#item+i_en+i_eng+is_latn	#item+i_en+i_eng+is_latn+alt+list	#meta+item+i_en+i_eng+is_latn	#item+i_es+i_spa+is_latn	#item+i_es+i_spa+is_latn+alt+list	#meta+item+i_es+i_spa+is_latn	#x_xliff+target+i_arb+is_arab	#item+i_es+i_arb+is_arab+alt+list	#meta+item+i_es+i_arb+is_arab	#item+i_hi+i_hin+is_deva	#item+i_hi+i_hin+is_deva+alt+list	#meta+item+i_hi+i_hin+is_deva	#item+i_sl+i_slv+is_latn	#item+i_sl+i_slv+is_latn+alt+list	#meta+item+i_sl+i_slv+is_latn
L10N_ego_summarius	[(ℹ️)]	Q1	https://github.com/HXL-CPLP/forum/issues/58|https://example.org	1	2	L10N	L10N	ego					summarius		Lingua Latina (Abecedarium Latinum)			Língua portuguesa (alfabeto latino)			English language (Latin script)			Idioma español (Alfabeto latino)	∅∅	اللغة العربية		يتطلب مراجعة بشرية.	हिन्दी भाषा (देवनागरी लिपि)			Slovenščina (Latinska abeceda)		
L10N_ego_codicem				2	2	L10N	L10N	ego					codicem										lat-Latn			por-Latn			eng-Latn			spa-Latn			arb-Arab			hin-Deva			slv-Latn	∅∅
L10N_ego_linguam_nomen				3	2	L10N	L10N	ego	linguam				nomen										Lingua Latina			Língua portuguesa			English language			Idioma español			اللغة العربية		يتطلب مراجعة بشرية.	हिन्दी भाषा		https://www.wikidata.org/wiki/Q1568	Slovenščina		
L10N_ego_scriptum_nomen	[(ℹ️)]	Q19845720	https://www.unicode.org/iso15924/	4	2	L10N	L10N	ego	scriptum				nomen				Abecedarium Latinum			Alfabeto latino			Latin script			Alfabeto latino						देवनागरी लिपि		https://www.wikidata.org/wiki/Q38592	Latinska abeceda		
L10N_ego_patriam_UN_M49_numerum	[(ℹ️)]	Q7865431	https://en.wikipedia.org/wiki/UN_M49	5	2	L10N	L10N	ego	patriam	UN	M49		numerum				001			001			001			001			001			001			001		

./_systema/programma/hxltm2xliff.py _hxltm/schemam-un-htcds-5items.tm.hxl.csv --archivum-extensionem=.csv

#x_xliff+unit+id,#meta+url,#item+wikidata+code,#meta+item+url+list,#meta+lat_sortem,#status,#item+type+lat_dominium+list,#item+type+lat_regnum,#item+type+lat_divisionem,#item+type+lat_classem,#item+type+lat_ordinem,#item+type+lat_familiam,#item+type+lat_genus,#item+type+lat_speciem,#item+type+lat_segmentum,,,,,,,,,,,,#x_xliff+source+i_lat+is_latn,#item+i_la+i_lat+is_latn+alt+list,#meta+item+i_la+i_lat+is_latn,#item+i_pt+i_por+is_latn,#item+i_pt+i_por+is_latn+alt+list,#meta+item+i_pt+i_por+is_latn,#item+i_en+i_eng+is_latn,#item+i_en+i_eng+is_latn+alt+list,#meta+item+i_en+i_eng+is_latn,#item+i_es+i_spa+is_latn,#item+i_es+i_spa+is_latn+alt+list,#meta+item+i_es+i_spa+is_latn,#x_xliff+target+i_arb+is_arab,#item+i_es+i_arb+is_arab+alt+list,#meta+item+i_es+i_arb+is_arab,#item+i_hi+i_hin+is_deva,#item+i_hi+i_hin+is_deva+alt+list,#meta+item+i_hi+i_hin+is_deva,#item+i_sl+i_slv+is_latn,#item+i_sl+i_slv+is_latn+alt+list,#meta+item+i_sl+i_slv+is_latn
L10N_ego_summarius,[(ℹ️)],Q1,https://github.com/HXL-CPLP/forum/issues/58|https://example.org,1,2,L10N,L10N,ego,,,,,summarius,,,,,,,,,,,,,Lingua Latina (Abecedarium Latinum),∅,∅,Língua portuguesa (alfabeto latino),∅,∅,English language (Latin script),∅,∅,Idioma español (Alfabeto latino),∅,∅,اللغة العربية,∅,يتطلب مراجعة بشرية.,हिन्दी भाषा (देवनागरी लिपि),∅,∅,Slovenščina (Latinska abeceda),∅,∅
L10N_ego_codicem,,,,2,2,L10N,L10N,ego,,,,,codicem,,,,,,,,,,,,,lat-Latn,∅,∅,por-Latn,∅,∅,eng-Latn,∅,∅,spa-Latn,∅,∅,arb-Arab,∅,∅,hin-Deva,∅,∅,slv-Latn,∅,∅
L10N_ego_linguam_nomen,,,,3,2,L10N,L10N,ego,linguam,,,,nomen,,,,,,,,,,,,,Lingua Latina,∅,∅,Língua portuguesa,∅,∅,English language,∅,∅,Idioma español,∅,∅,اللغة العربية,∅,يتطلب مراجعة بشرية.,हिन्दी भाषा,∅,https://www.wikidata.org/wiki/Q1568,Slovenščina,∅,∅
L10N_ego_scriptum_nomen,[(ℹ️)],Q19845720,https://www.unicode.org/iso15924/,4,2,L10N,L10N,ego,scriptum,,,,nomen,,,,,,,,,,,,,Abecedarium Latinum,∅,∅,Alfabeto latino,∅,∅,Latin script,∅,∅,Alfabeto latino,∅,∅,,∅,∅,देवनागरी लिपि,∅,https://www.wikidata.org/wiki/Q38592,Latinska abeceda,∅,∅
L10N_ego_patriam_UN_M49_numerum,[(ℹ️)],Q7865431,https://en.wikipedia.org/wiki/UN_M49,5,2,L10N,L10N,ego,patriam,UN,M49,,numerum,,,,,,,,,,,,,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅,001,∅,∅

fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 27, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 28, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 28, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 28, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 28, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 29, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 29, 2021
…iso6393_from_hxlattrs(), HXLTMUtil.iso115924_from_hxlattrs()
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 29, 2021
…nho que decidir qual estrategia usar, se estender libhxl ou criar algo manual
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 29, 2021
…sistir e usar uma aproximação mais minimaista, que apenas funcione no curto prazo; se necessário pode ser melhorado depois
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 29, 2021
…ormat (TMX) exporter (refs EticaAI/HXL-Data-Science-file-formats#19)
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jun 30, 2021
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jun 30, 2021
…com/EticaAI/HXL-Data-Science-file-formats

refs
- HXL-CPLP/forum#58
- EticaAI/HXL-Data-Science-file-formats#20
- EticaAI/HXL-Data-Science-file-formats#19
fititnt referenced this issue in HXL-CPLP/Auxilium-Humanitarium-API Jul 1, 2021
- HXL-CPLP/forum#58
- EticaAI/HXL-Data-Science-file-formats#19
- EticaAI/HXL-Data-Science-file-formats#20
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 1, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 2, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 2, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 3, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 3, 2021
…(used by libhxl) may have trouble decoding files that already are UTF-8 and have multibyle on very corner cases; but forcing add BOM at start of CSV files make it work as hotfix
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 3, 2021
…ound with BOM (was outdated version of libhxl only on local machine)
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 3, 2021
…0.xsd imported; test output for XLIFF and TMX
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 3, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 4, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 4, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 4, 2021
fititnt referenced this issue in EticaAI/HXL-Data-Science-file-formats Jul 4, 2021
@fititnt fititnt changed the title hxltm2xliff: HXL Trānslātiōnem Memoriam -> XLIFF Version 2.1 hxltmcli --objectivum-XLIFF: HXL Trānslātiōnem Memoriam -> XLIFF Version 2.1 Oct 15, 2021
@fititnt
Copy link
Member Author

fititnt commented Oct 15, 2021

What started as hxltm2xliff.py for some months ago already is a user-configurable generator from the hxltmcli (https://hdp.etica.ai/hxltm) program with options like hxltmcli --objectivum-XLIFF, see https://hdp.etica.ai/hxltm/archivum/.

How it was done?

The HXLTM ASA EticaAI/HXL-Data-Science-file-formats#22 abstract in such way how to iterate with HXL with some conventioned extra tags that is possible to both import from XLIFF and export from HXL to XLIFF only by configuring an custom plugin. So the XLIFF, like TBX, TMX, XML, etc, uses an user-friendly syntax, the liquid https://shopify.github.io/liquid/ for templating, and extra attributes

The hxltmcli v0.8.7 (can be used as standalone or with Python package hdp-toolchain https://pypi.org/project/hdp-toolchain/) uses the cor.hxltm.yml and the hxltmdexml (to convert back from any XML file used to export) based on this

  #### XLIFF-obsoletum: XML Localization Interchange File Format (XLIFF) v2.1 __
  # tag::normam_XLIFF[]

  # @TODO: JLIFF (XLIFF on JSON) <https://github.com/oasis-tcs/xliff-omos-jliff>
  XLIFF:
    __meta:
      archivum_extensionem: .xlf
      situs_interretialis:
        referens_officinale:
          - <https://www.oasis-open.org/committees/xliff/>
        vicipaedia:
          - <https://en.wikipedia.org/wiki/XLIFF>
      exemplum:
        - <https://github.com/oasis-tcs/xliff-xliff-22>
        - <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/allExtensions.xlf>
        - <https://github.com/oasis-tcs/xliff-xliff-22/blob/master/xliff-21/test-suite/core/valid/everything-core.xlf>
      normam:
        - <https://docs.oasis-open.org/xliff/xliff-core/v2.1/xliff-core-v2.1.html>
        # - <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/>
        # @see <https://github.com/redhat-developer/vscode-xml/wiki/XMLValidation#XML-catalog-with-XSD>
        # @see <https://github.com/redhat-developer/vscode-xml/issues/315>
        - <https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml>
      nomen:
        eng-Latn: 'XML Localization Interchange File Format (XLIFF) v2.1'

    asa:
      modus_operandi:
        # - multiplum_linguam
        - bilingue

    de_xml:
      # This is a working draft
      # @see https://terminator.readthedocs.io/en/latest/tbx_conformance.html
      # ontologia libellam: I glossarium > II conceptum > III linguam > IV terminum
      glossarium_radicem:
        signum: xliff
        # Exemplum I: <xliff version="1.2">
        # Exemplum II: <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
      glossarium_titulum: False

      # II conceptum
      conceptum_codicem:
        signum: unit
        de_attributum: id
        trivium:
          # de <xliff> ad <trans-unit>
          - file

      # III linguam
      linguam_codicem: False # XLIFF-obsoletum est bilingue

      linguam_fontem_codicem:
        # Exemplum: 'pt' ad '<source xml:lang="pt">por-Latn</source>''
        signum: source
        de_attributum: lang
        trivium: []

      linguam_objectivum_codicem:
        # Exemplum: 'es' ad '<target xml:lang="es">spa-Latn</target>''
        signum: target
        de_attributum: lang
        trivium: []

      # IV terminum

      terminum_accuratum: False # XLIFF terminum habendum accuratum? Falsum
      terminum_multum: False # XLIFF-obsoletum est bilingue
      terminum_habendum_fontem: True
      terminum_habendum_objectivum: True

      terminum_fontem_valorem:
        # Exemplum: 'por-Latn ad <source xml:lang="pt">por-Latn</source>
        signum: source
        # de_attributum: False
        trivium: []

      terminum_objectivum_valorem:
        # Exemplum: 'spa-Latn' ad <target xml:lang="es">spa-Latn</target>
        signum: target
        # de_attributum: False
        trivium: []

    formatum:

      # @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/catalog.xml
      # @see https://docs.oasis-open.org/xliff/xliff-core/v2.1/os/schemas/xliff_core_2.0.xsd
      initiale: |2
        <?xml version="1.0"?>
        <xliff version="2.0"
          xmlns="urn:oasis:names:tc:xliff:document:2.0"
          xmlns:fs="urn:oasis:names:tc:xliff:fs:2.0"
          xmlns:val="urn:oasis:names:tc:xliff:validation:2.0"
          srcLang="{{ globum.fontem_linguam.bcp47 | default: 'la' }}"
          trgLang="{{ globum.objectivum_linguam.bcp47 | default: 'ar' }}">
          <file id="f1">

      corporeum: |2
            {% if rem.de_fontem_linguam -%}
            <unit id="{{ conceptum.codicem | default: rem.de_nomen_breve.conceptum_codicem | default: 'errorem' | replace: '*', '' | replace: '+', '' | replace: '/', '' }}">
              {% if rem.de_auxilium_linguam or rem.de_nomen_breve.referens_situs_interretialis.size > 0 -%}
              <notes>
                {%- for item in rem.de_auxilium_linguam -%}
                <note appliesTo="source" priority="3"
                  category="de_auxilium_linguam">
                  _[{{- item.linguam -}}]
                  {{- item.rem -}}
                  [{{- item.linguam -}}]_
                </note>
                {%- endfor %}
                {% for item in rem.de_nomen_breve.referens_situs_interretialis -%}
                <note appliesTo="source" priority="1"
                  category="referens_situs_interretialis">
                  {{ item }}
                </note>
                {% endfor -%}
              </notes>
              {% else -%}
              <!--
                non rem.de_auxilium_linguam aut rem.de_nomen_breve.referens_situs_interretialis
              -->
              {% endif -%}
              <segment state="{{ rem.de_objectivum_linguam.codicem_XLIFF | default: 'initial' }}">
                <source>{{ rem.de_fontem_linguam.rem }}</source>
                {%- if rem.de_objectivum_linguam and rem.de_objectivum_linguam.rem != '' %}
                <target>{{ rem.de_objectivum_linguam.rem }}</target>
                {%- else %}
                <!-- non rem.de_objectivum_linguam -->
                {%- endif  %}
              </segment>
            </unit>
            {%- else -%}
              <!-- non rem.de_fontem_linguam -->
            {%- endif  %}

      # <!-- {{ rem }} -->
      finale: |2
          </file>
        </xliff>

The instructions above are for XLIFF 2, the XLIFF 1 is another option. While how to create other exporters/importers is not documented, using as starting point the close example than what is desired works best. One biggest difference is about either bilingual (like XLIFF and some common localization files) and multilingual (like TBX and TMX).

With future versions, the syntax may change a but HXL already is the best strategy to store multilingual content for who works with XLIFF. Most tools not even allow manage with more than one source language, so the HXLTM (as specialized tagging of HXL) actually now at least allow operate with translations from/to arbitrary number of source/target languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant