Skip to content
A cli tool using Apertium to translate Elastichsearch field.
Python Makefile Dockerfile
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
es_translator
tests
.dockerignore
.gitignore
Dockerfile
LICENSE.txt
Makefile
README.md
_version.py
es_translator.py
requirements.txt

README.md

ES Translator

CircleCI

A lazy yet bulletproof machine translation tool for Elastichsearch.

$ python es_translator.py --help                                                                                                                                                                   
Usage: es_translator.py [OPTIONS]

Options:
  --url TEXT                    Elastichsearch URL  [required]
  --index TEXT                  Elastichsearch Index  [required]
  --source-language TEXT        Source language to translate from  [required]
  --target-language TEXT        Target language to translate to  [required]
  --intermediary-language TEXT  An intermediary language to use when no
                                translation is available between the source
                                and the target. If none is provided this will
                                be calculated automatically.
  --source-field TEXT           Document field to translate
  --target-field TEXT           Document field where the translations are stored
  --query-string TEXT           Search query string to filter result
  --data-dir PATH               Path to the directory where to language model
                                will be downloaded
  --scan-scroll TEXT            Scroll duration (set to higher value if you're
                                processing a lot of documents)
  --dry-run                     Don't save anything in Elasticsearch
  --pool-size INTEGER           Number of parallel processes to start
  --pool-timeout INTEGER        Timeout to add a translation
  --syslog-address TEXT         Syslog address
  --syslog-port INTEGER         Syslog port
  --syslog-facility TEXT        Syslog facility
  --help                        Show this message and exit.

Installation (Ubuntu)

Install Apertium:

wget https://apertium.projectjj.com/apt/install-release.sh -O - | sudo bash
sudo apt-get update
sudo apt-get install apertium-all-dev

Create a Virtualenv and install Pip packages:

sudo apt-get install python3-virtualenv
make install

Installation (Docker)

Nothing to do as long as you have Docker on your system:

 docker run -it icij/es-translator python es_translator.py --help

Examples

Translates documents from French to Spanish on a local Elasticsearch. The translated field is content (the default).

python es_translator.py --url "http://localhost:9200" --index my-index --source-language fr --target-language es

To translate the title field we could do:

python es_translator.py --url "http://localhost:9200" --index my-index --source-language fr --target-language es --source-field title

Translates documents from English to Spanish on a local Elasticsearch using 4 threads:

python es_translator.py --url "http://localhost:9200" --index my-index --source-language en --target-language es --pool-size 4

Translates documents from Portuguese to English, using an intermediary language (Apertium doesn't offer this translation pair):

python es_translator.py --url "http://localhost:9200" --index my-index --source-language pt --intermediary-language es --target-language en
You can’t perform that action at this time.