Skip to content
master
Switch branches/tags
Go to file
Code

README.md

Introduction

This repository contains Neural Machine Translation tools and models built at Softcatalà using OpenNMT-tf 2 and TensorFlow 2

Description of the directories

  • data-processing-tools: set of data processing tools that convert for different formats to OpenNMT plain text input format
  • serving: contains a microservice that provides a basic transtion API calling TensorFlow serving.
  • use-models-tools: contains tools to use the models to translate text files or PO files
  • evaluate: set of tools and corpus to evaluatate diferent translation systems
  • training: scrips and configurations to train the models

Models

Softcatalà built models

We have created the following models:

Structure of the models

Description of the directories on the contained in the models zip file:

  • tensorflow: model exported in Tensorflow format
  • ctranslate2: model exported in CTranslate2 format (used for inference)
  • metadata: description of the model
  • tokenizer: SentencePiece models for both languages

Serving

Serving the models in production

You can download the docker that we use in production

Apertium API

One of the use cases for Machine Translation is to use it to speed up the work of translators.

In order to integrate easily with already existing translation tools we support the Apertium Web API. This means that you can use any tool that has support for Apertium.

We confirm that the following tools work using Apertium pluggins:

  • Okapi Framework
  • OmegaT translation plugin
  • Gedit's Apertium plugin

Supported methods

Method Verb
/translate GET or POST
/listLanguageNames GET
/listPairs GET

Using the models in your machine

This is useful for example if you want to translate large volumes using our prebuild English - Catalan models using the same exact version that we have in production.

  • You need Docker installed in your system

  • Type docker pull jordimash/use-models-tools

To test quickly that every works:

  • echo "Hello World" > input.txt
  • docker run -it -v "$(pwd)":/srv/files/ --env COMMAND_LINE="-f input.txt -t output.txt" --rm jordimash/use-models-tools --name jordimash/use-models-tools
  • more output.txt

To translate PO files:

  • File ca.po is your current directory
  • docker run -it -v "$(pwd)":/srv/files/ --env COMMAND_LINE="-f ca.po" --env FILE_TYPE='po' --rm jordimash/use-models-tools --name jordimash/use-models-tools

The translated file will be ca.po-ca.po

To translate a text file from Catalan to English:

  • echo "Hola món" > input.txt
  • docker run -it -v "$(pwd)":/srv/files/ --env COMMAND_LINE="-f input.txt -t output.txt -m cat-eng" --rm jordimash/use-models-tools --name jordimash/use-models-tools
  • more output.txt

License

See license

How to help?

See here (In Catalan)

Contact

Email address: Jordi Mas: jmas@softcatala.org

About

This repository contains Neural Machine Translation tools and models built at Softcatalà

Resources

License

Packages

No packages published