Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MVP] Exportar de formato "HXL TM" (eng: HXL translation memory) para um ou mais formatos já usados por softwares de localização #16

Open
fititnt opened this issue Jun 26, 2021 · 1 comment

Comments

@fititnt
Copy link
Member

fititnt commented Jun 26, 2021

Relacionado:


Tópico sobre Produto Mínimo Viável (em inglês: MVP) de pelo menos uma estratégia para exportar (não necessariamente importar) texto ainda não traduzido (ou que foi traduzido sem revisão) para um ou mais formatos tradicionais de localização de software.

Motivação

Vamos procurar usar o Human Trafficking Case Data Standard (HTCDS) (vide https://github.com/UNMigration/HTCDS) como um projeto de "quantidade de texto mediana" (não tão pouco como esquema de dados de ISOs, porém não tão completa como projetos inteiros de ebook) para testar como, de fato, permitir tradução inicial.

Ainda que seja sim, possível receber traduções diretamente via planilha no Google Planilhas (ou pequenas correções), tenho impressão inicial que pode ser mais complicado documentar como dar acesso (e como tradução nova pode ser feita diretamente via planilhas) do que talvez exportarmos para algum formato que poderia ser usado por algum projeto como MateCAT

Foco em exportação, importação do resultado ainda pode ser semi-manual

O Hapi é um projeto no momento Alpha. A ideia desse MVP é uma alternativa para alguém que quiser traduzir uma quantidade maior de conteúdo possa fazê-lo usando qualquer outro software existente (o MateCAT pode ser um muito interessante).

Quanto a importação, automação é pertinente, porém existe a chance de ao importar quantidade enorme de dados, um humano com acesso a planilha mestre teria que fazer revisão. Então, no nosso contexto, "é aceitável" a importação envolver um pouco de copia e cola.


Edições:

@fititnt
Copy link
Member Author

fititnt commented Jun 27, 2021

O arquivo _systema/infrastructuram/okapi-install-locally.sh tem documentação básica de como instalar o https://okapiframework.org/ e tem comentários de como estamos rascunhando os scripts básicos.

O Okapi tem inclusive interface gráfica e ele é extremamente poderoso para converter arquivos, porém creio que geralmente vamos usar linha de comando.

fititnt@bravo:~/Downloads/okapi$ sh /opt/okapi/tikal.sh -listconf
-------------------------------------------------------------------------------
Okapi Tikal - Localization Toolset
Version: 2.1.41.0
-------------------------------------------------------------------------------
List of all filter configurations available:
 - okf_odf = XML OpenDocument files (e.g. use inside OpenOffice.org documents).
 - okf_mosestext = Default Moses Text configuration.
 - okf_tradosrtf = Configuration for Trados-tagged RTF files - READING ONLY.
 - okf_rainbowkit = Configuration for Rainbow translation kit.
 - okf_rainbowkit-package = Configuration for Rainbow translation kit package.
 - okf_rainbowkit-noprompt = Configuration for Rainbow translation kit (without prompt).
 - okf_mif = Adobe FrameMaker MIF documents
 - okf_archive = Configuration for archive files
 - okf_transifex = Transifex project with prompt when starting
 - okf_transifex-noPrompt = Transifex project without prompt when starting
 - okf_xini = Configuration for XINI documents from ONTRAM
 - okf_xini-noOutputSegmentation = Configuration for XINI documents from ONTRAM (fields in the output are not segmented)
 - okf_itshtml5 = Configuration for standard HTML5 documents.
 - okf_txml = Wordfast Pro TXML documents
 - okf_txml-fillEmptyTargets = Wordfast Pro TXML documents with empty targets filled on output.
 - okf_wiki = Text with wiki-style markup
 - okf_doxygen = Doxygen-commented Text Documents
 - okf_transtable = Default TransTable configuration.
 - okf_simplification = Configuration for extracting resources from an XML file. Resources and then codes are simplified.
 - okf_simplification-xmlResources = Configuration for extracting resources from an XML file. Resources are simplified.
 - okf_simplification-xmlCodes = Configuration for extracting resources from an XML file. Codes are simplified.
 - okf_xliff2 = Configuration for XLIFF-2 documents.
 - okf_icml = Adobe InDesign ICML documents
 - okf_markdown = Markdown files
 - okf_pdf = Configuration for PDF documents
 - okf_sdlpackage = SDL Trados 2017 SDLPPX and SDLRPX files
 - okf_tex = Tex files
 - okf_autoxliff = Calls the appropriate filter for any version of XLIFF
 - okf_multiparsers = Configuration for CSV files with plain-text on all columns
 - okf_table = Table-like files such as tab-delimited, CSV, fixed-width columns, etc.
 - okf_table_csv = Comma-separated values, optional header with field names.
 - okf_table_catkeys = Haiku CatKeys resource files
 - okf_table_src-tab-trg = 2-column (source + target), tab separated files.
 - okf_table_fwc = Fixed-width columns table padded with white-spaces.
 - okf_table_tsv = Columns, separated by one or more tabs.
 - okf_plaintext = Plain text files.
 - okf_plaintext_trim_trail = Text files; trailing spaces and tabs removed from extracted lines.
 - okf_plaintext_trim_all = Text files; leading and trailing spaces and tabs removed from extracted lines.
 - okf_plaintext_paragraphs = Text files extracted by paragraphs (separated by 1 or more empty lines).
 - okf_plaintext_spliced_backslash = Spliced lines filter with the backslash character (\) used as the splicer.
 - okf_plaintext_spliced_underscore = Spliced lines filter with the underscore character (_) used as the splicer.
 - okf_plaintext_spliced_custom = Spliced lines filter with a user-defined splicer.
 - okf_plaintext_regex_lines = Plain Text Filter using regex-based linebreak search. Extracts by lines.
 - okf_plaintext_regex_paragraphs = Plain Text Filter using regex-based linebreak search. Extracts by paragraphs.
 - okf_xml = Configuration for generic XML documents (default ITS rules).
 - okf_xml-resx = Configuration for Microsoft RESX documents (without binary data).
 - okf_xml-MozillaRDF = Configuration for Mozilla RDF documents.
 - okf_xml-JavaProperties = Configuration for Java Properties files in XML.
 - okf_xml-AndroidStrings = Configuration for Android Strings XML documents.
 - okf_xml-WixLocalization = Configuration for WiX (Windows Installer XML) Localization files.
 - okf_xml-AppleStringsdict = Configuration for Apple Stringsdict files
 - okf_html = HTML or XHTML documents
 - okf_html-wellFormed = XHTML and well-formed HTML documents
 - okf_tmx = Configuration for Translation Memory eXchange (TMX) documents.
 - okf_dtd = Configuration for XML DTD documents (entities content)
 - okf_json = Configuration for JSON files
 - okf_idml = Adobe InDesign IDML documents
 - okf_ttx = Configuration for Trados TTX documents.
 - okf_properties = Java properties files (Output used \uHHHH escapes)
 - okf_properties-outputNotEscaped = Java properties files (Characters in the output encoding are not escaped)
 - okf_properties-skypeLang = Skype language properties files (including support for HTML codes)
 - okf_properties-html-subfilter = Java Property content processed by an HTML subfilter
 - okf_phpcontent = Default PHP Content configuration.
 - okf_openoffice = OpenOffice.org ODT, ODS, ODP, ODG, OTT, OTS, OTP, OTG documents
 - okf_vignette = Default Vignette Export/Import Content configuration.
 - okf_vignette-nocdata = Vignette files without CDATA sections.
 - okf_openxml = Microsoft Office documents (DOCX, DOCM, DOTX, DOTM, PPTX, PPTM, PPSX, PPSM, POTX, POTM, XLSX, XLSM, XLTX, XLTM, VSDX, VSDM).
 - okf_pensieve = Configuration for Pensieve translation memories.
 - okf_xliff = Configuration for XML Localisation Interchange File Format (XLIFF) documents.
 - okf_xliff-sdl = Configuration for SDL XLIFF documents. Supports SDL specific metadata
 - okf_xliff-iws = Configuration for IWS XLIFF documents. Supports IWS specific metadata
 - okf_ts = Configuration for Qt TS files.
 - okf_regex = Default Regex configuration.
 - okf_regex-srt = Configuration for SRT (Sub-Rip Text) sub-titles files.
 - okf_regex-textLine = Configuration for text files where each line is a text unit
 - okf_regex-textBlock = Configuration for text files where text units are separated by 2 or more line-breaks.
 - okf_regex-macStrings = Configuration for Macintosh .strings files.
 - okf_po = Standard bilingual PO files
 - okf_po-monolingual = Monolingual PO files (msgid is a real ID, not the source text).
 - okf_yaml = YAML files
 - okf_xmlstream = Large XML Documents
 - okf_xmlstream-dita = DITA XML
 - okf_xmlstream-JavaPropertiesHTML = Java Properties XML with Embedded HTML

Algo que estou rascunhando é deixar exemplos usando HXL JSON recibes (vide https://github.com/HXLStandard/hxl-proxy/wiki/JSON-recipes) de como pegar somente as "as partes que interessam" para conseguir gerar um formato como XLIFF.

fititnt added a commit that referenced this issue Jun 27, 2021
fititnt added a commit that referenced this issue Jun 27, 2021
fititnt added a commit that referenced this issue Jun 27, 2021
…ormatação para exportar primeiro um arquivo CSV mais simplificado
fititnt added a commit that referenced this issue Jun 27, 2021
fititnt added a commit that referenced this issue Jun 28, 2021
fititnt added a commit that referenced this issue Jun 28, 2021
fititnt added a commit that referenced this issue Jun 28, 2021
fititnt added a commit that referenced this issue Jun 29, 2021
…iso6393_from_hxlattrs(), HXLTMUtil.iso115924_from_hxlattrs()
fititnt added a commit that referenced this issue Jun 29, 2021
…nho que decidir qual estrategia usar, se estender libhxl ou criar algo manual
fititnt added a commit that referenced this issue Jun 29, 2021
…sistir e usar uma aproximação mais minimaista, que apenas funcione no curto prazo; se necessário pode ser melhorado depois
fititnt added a commit that referenced this issue Jun 29, 2021
…ormat (TMX) exporter (refs EticaAI/HXL-Data-Science-file-formats#19)
fititnt added a commit to EticaAI/HXL-Data-Science-file-formats that referenced this issue Jun 29, 2021
fititnt added a commit to EticaAI/HXL-Data-Science-file-formats that referenced this issue Jun 30, 2021
fititnt added a commit that referenced this issue Jun 30, 2021
…com/EticaAI/HXL-Data-Science-file-formats

refs
- HXL-CPLP/forum#58
- EticaAI/HXL-Data-Science-file-formats#20
- EticaAI/HXL-Data-Science-file-formats#19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant