MutNMT aims to provide a web application to train neural machine translation with educational purposes. This web application lets the user train, inspect, evaluate and translate using neural engines. To know more about what you will find at MutNMT, please read the Basic and advanced features page.
It has been developed by Prompsit in collaboration with the partners of the "MultiTraiNMT - Machine Translation training for multilingual citizens" European project (2019-1-ES01-KA203-064245, 01/09/2019–31/08/2022).
This application uses JoeyNMT in its core.
MutNMT provides the following features:
- Upload and manage corpora
- Upload corpora in text, TMX or TSV format
- Tag corpora depending on domain
- Share corpora with other users
- Train and manage engines
- Select corpora or a subset of those corpora and train a Transformer model
- Track progress of the training process with data tables and charts
- Stop, resume or restart training at anytime
- Translate text and documents
- Select an already trained engine to translate text or documents (HTML, TMX, PDF and Office formats supported)
- Inspect an engine
- Explore details on tokenization, candidate selection and pre-processed output
- Evaluate translations
- Upload parallel translation files to evaluate them using BLEU, chrF3, TER and TTR metrics
MutNMT is provided as a Docker container. This container is based on NVIDIA Container Toolkit.
In order to run MutNMT, you need access to an NVIDIA GPU. You must install the necessary drivers on the host machine. Note that you do not need to install the CUDA Toolkit on the host system, but it should be compatible with CUDA 11.
Building and launching MutNMT consists on:
- Set up preloaded engines
- Set up user authentication
- Set up user lists: admins and whitelist
- Set up proxy fix
- Build the Docker image
- Decide on data persistance
- Launch the container
The image for the MutNMT container must be built taking into account the following steps.
You can build MutNMT with preloaded engines so that users have something to translate and inspect with. Before building the Docker image, include the engines you want to preload in the app/preloaded
folder.
Create the app/preloaded
folder even if you don't want to include any preloaded engines. This folder is ignored by Docker in order to make build process faster and the image smaller, so it is mounted by default as a volume.
Each engine must be stored in its own folder, and must have been trained with JoeyNMT.
MutNMT will use the model/train.log
to retrieve information about the engine, so make sure that file is available.
This is an example of an app/preloaded
tree with one preloaded engine:
+ app/
| + preloaded/
| | + transformer-en-es/
| | | - best.ckpt
| | | - config.yaml
| | | - train.model
| | | - train.vocab
| | | - validations.txt
| | | + model/
| | | | - train.log
| | | | + tensorboard/
MutNMT provides authentication based on the Google identity server through the OAUTH2 protocol. The procedure of setting such a server in the Google side is a bit complex and Google changes it from time to time, but it can be found here. Although not official, a useful resource is this video.
From the process above, you will get at the end two strings, "client ID" and "client secret". You can edit the config.py file in the following way (alternatively, you can create a instance/config.py file with the following content):
SECRET_KEY = 'put a random string here'
DEBUG = False
USER_LOGIN_ENABLED = True
USER_WHITELIST_ENABLED = False
OAUTHLIB_INSECURE_TRANSPORT = True # True also behind firewall, False -> require HTTPS
GOOGLE_OAUTH_CLIENT_ID = 'xxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com'
GOOGLE_OAUTH_CLIENT_SECRET = 'xxxxxxxxxxxxxxx'
USE_PROXY_FIX = False
To specify admin accounts, please create a file in app/lists
called admin.list
, containing one administrator email per line. The admin accounts will allow you to use admin features. You can set as many as you want.
When user login is not enabled, a whitelist can be established to let the users in that list log in, but only them. This whitelist is only applied when USER_LOGIN_ENABLED
is set to False
. To specify a whitelist, create a file in app/lists
called white.list
, containing one user email per line. Then, enable the whitelist by setting USER_WHITELIST_ENABLED
to True
.
Google Authentication may fail to work under some scenarios, for example behind an HTTP proxy. Set USE_PROXY_FIX
to True
in order to enable Proxy Fix and make authentication work behind a proxy.
Once you are ready, build MutNMT:
docker build -t mutnmt .
Logs, database and user data like corpora or engines are stored inside the container in /opt/mutnmt/data
. This folder is mounted in ./data
by default, so that it persists in case of removing the container. Make sure to create the ./data
folder in the project's directory if it does not exist.
The nvidia-docker image this container is based on is not compatible with docker-compose. A script to run MutNMT is provided to make launching the container easier:
./run.sh cuda 5000 mutnmt:latest
This will setup MutNMT to run on port 5000
.
If it is the first time you run MutNMT, make sure to update your database:
docker exec mutnmt bash -c "cd /opt/mutnmt/app/ && source ../venv/bin/activate && FLASK_APP=../app flask db upgrade"