Overview

This repository allow build models for machine translation (MT) quality estimation (QE). It is clearly Quest++ rip off that I made in order to experiment with 'before BERT' QE.

The data :

English-German WMT18 sentences on the IT domain translated by in-house encoder-decoder attention-based NMT system (13,442 training and 1,000 development sentences)
After running ./scripts/download-data.sh data will be downloaded to data/sentence-level/features/en_de.
The usual 17 features used in WMT12-17 is considered for the baseline system
WMT18 QE baseline model was SVM regression with an RBF kernel, with grid search algorithm for the optimisation of relevant parameters. I tried to reproduce this in config/svc.cfg

Train model

The program takes as an input; method, config file and additional parameters.

For example, to train model:

./quality_estimation.py --train --config config/svc.yaml

Preparing training corpora

To extract features from tsv file (needed columnt: src and trg):

./qulity_estimation.py --extract_features \
                       --src_lm_path data/lm.tok.en \
                       --trg_lm_path data/lm.tok.de \
                       --trg_ncount_path data/ngram-count.de \
                       -i input.tsv -i output.tsv

also remember to provide SRILM path either with export SRILM_PATH or by --srilm_path.

Available learning methods

All of available methods are taken from sklearn, so it is fairly easey to add other as well, but currently these are "supported":

Support-Vector Machines (SVM). Documentation about classifier parameters is available at http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC. The parameters exposed in the "Parameters" section of the configuration file are:
- C
- coef0
- kernel
- degree
- gamma
- tol
- verbose
Decision Trees (DT). Documentation about classifier parameters is available at https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html. The parameters exposed in the "Parameters" section of the configuration file are:
- ccp_alpha
- criterion
Multilayer Perceptron (MLP). Documentation about classifier parameters is available at https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html. The parameters exposed in the "Parameters" section of the configuration file are:
- activation
- alpha
- epsilon
- hidden_layer_sizes
- learning_rate_init
- max_iter -momentum

Feature selection

To set up a feature selection algorithm add the "feature_selection" section to the configuration file. This section is independent of the "learning" section:

feature_selection:
    method: LinearSVC
    parameters:
        cv: 10

learning:
    ...

Currently, the following feature selection algorithms are available:

Linear Support Vector Classification. The exposed parameters are:
- penalty (default=’l2’)
- loss (default=’squared_hinge’)
- dual (default=True)
- tol (default=1e-4)
- C (default=1.0)
- fit_intercept (default=True)
- intercept_scaling (default=1)
- max_iterint (default=1000)

These parameters and the method are documented at: https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html

Inference

To inference model on given input:

./quality_estimation.py --inference --config config/svc.yaml --input test.tsv

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
config		config
models		models
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
qulity_estimation.py		qulity_estimation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Train model

Preparing training corpora

Available learning methods

Feature selection

Inference

About

Releases

Packages

Languages

cartesinus/on-quality-estimation

Folders and files

Latest commit

History

Repository files navigation

Overview

Train model

Preparing training corpora

Available learning methods

Feature selection

Inference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages