A cli tool using Apertium to translate Elastichsearch field.
ES Translator


A lazy yet bulletproof machine translation tool for Elastichsearch.

$ python --help                                                                                                                                                                   
Usage: [OPTIONS]

  --url TEXT                    Elastichsearch URL  [required]
  --index TEXT                  Elastichsearch Index  [required]
  --source-language TEXT        Source language to translate from  [required]
  --target-language TEXT        Target language to translate to  [required]
  --intermediary-language TEXT  An intermediary language to use when no
                                translation is available between the source
                                and the target. If none is provided this will
                                be calculated automatically.
  --source-field TEXT           Document field to translate
  --target-field TEXT           Document field where the translations are stored
  --query-string TEXT           Search query string to filter result
  --data-dir PATH               Path to the directory where to language model
                                will be downloaded
  --scan-scroll TEXT            Scroll duration (set to higher value if you're
                                processing a lot of documents)
  --dry-run                     Don't save anything in Elasticsearch
  --pool-size INTEGER           Number of parallel processes to start
  --pool-timeout INTEGER        Timeout to add a translation
  --syslog-address TEXT         Syslog address
  --syslog-port INTEGER         Syslog port
  --syslog-facility TEXT        Syslog facility
  --help                        Show this message and exit.

Installation (Ubuntu)

Install Apertium:

wget -O - | sudo bash
sudo apt-get update
sudo apt-get install apertium-all-dev

Create a Virtualenv and install Pip packages:

sudo apt-get install python3-virtualenv
make install

Installation (Docker)

Nothing to do as long as you have Docker on your system:

 docker run -it icij/es-translator python --help


Translates documents from French to Spanish on a local Elasticsearch. The translated field is content (the default).

python --url "http://localhost:9200" --index my-index --source-language fr --target-language es

To translate the title field we could do:

python --url "http://localhost:9200" --index my-index --source-language fr --target-language es --source-field title

Translates documents from English to Spanish on a local Elasticsearch using 4 threads:

python --url "http://localhost:9200" --index my-index --source-language en --target-language es --pool-size 4

Translates documents from Portuguese to English, using an intermediary language (Apertium doesn't offer this translation pair):

python --url "http://localhost:9200" --index my-index --source-language pt --intermediary-language es --target-language en
