# Moses IS-EN EN-IS phrase þýðingarvél
Sjá `README.md` til þess að keyra þetta vélrit (e. notebook).

Í þessu vélriti eru gögn forunnin og Moses þýðingarkerfið notað til þess búa til tvö þýðingarkerfi, IS-EN og EN-IS.
Það er gert ráð fyrir því að öll gögn séu aðgengileg undir `/work/data`. Sjá leiðbeiningar í `README.md` um hvernig það er gert með `docker` eða `singularity`.

Í stuttu máli skiptist vélritið í eftirfarandi þætti:
1. Samhliða og einhliða gögn undirbúin.
1. Tungumála módel byggt fyrir EN og IS (KenLM).
1. Texta skipt í þrjá hluta; train/val/test, fjöldi setninga í val/test er 3000/2000.
1. Moses kerfið þjálfað með train hluta texta.
1. Moses kerfið fínpússað með val hluta texta.
1. Moses kerfið metið með BLEU mælingin á test hluta texta.

Allar skrár og líkön eru raðað í skrána "WORKING_DIR" (sjá `README.md`).

Safnið `corpus.py` skilgreinir föll og gagnategundir sem eru mikið nýttar hér.

In [1]:
from collections import defaultdict, Counter, OrderedDict
import os
import pathlib
from pathlib import Path
import re
from pprint import pprint
import importlib
from typing import List

import matplotlib.pyplot as plt
import numpy as np

import corpus.corpus as c

importlib.reload(c)

%matplotlib notebook

working_dir = pathlib.Path('/work')
data_dir = working_dir.joinpath('data')

[nltk_data] Downloading package punkt to
[nltk_data]     /home/staff/haukurpj/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt to
[nltk_data]     /home/staff/haukurpj/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


Let's be sure that Moses is installed and the data is there.

In [2]:
print(os.getenv('MOSESDECODER'))
print(os.getenv('MOSESDECODER_TOOLS'))
print(int(os.getenv('THREADS')))
!ls {data_dir}

/opt/moses
/opt/moses_tools
40
bin  parice  risamalheild


In [3]:
# List of stages in processing
CAT = 'cat'
SHUFFLE = 'shuffle'
REGEXP = 'regexp'
SENT_FIX = 'sent_fix'
LOWER = 'lower'
TOKENIZE = 'tok'
PLACEHOLDERS = 'placeholders'
LENGTH = 'length'
LENGTH_SHORT = 'length-short'
DROP = 'drop'
LM = 'lm-blm'
LM_3 = 'lm-blm-3'
TRAIN = 'train'
TEST = 'test'
VAL = 'val'
FINAL = 'final'
TRANSLATED_EN_IS = 'translated_en_is'
TRANSLATED_IS_EN = 'translated_is_en'

parice_dir = data_dir.joinpath('parice')
rmh_dir = data_dir.joinpath('risamalheild')
train_parice_dir = parice_dir.joinpath('train')
test_parice_dir = parice_dir.joinpath('test')
val_parice_dir = parice_dir.joinpath('val')

!mkdir -p {train_parice_dir}
!mkdir -p {test_parice_dir}
!mkdir -p {val_parice_dir}

pipeline = [
    SHUFFLE,
    LOWER, 
    REGEXP, 
    TOKENIZE,
    PLACEHOLDERS,
    LENGTH,
    LENGTH_SHORT,
    LM,
    FINAL,
    DROP,
    TRANSLATED_EN_IS,
    TRANSLATED_IS_EN
]
rmh_stages = [
    SENT_FIX,
    LOWER,
    REGEXP,
    TOKENIZE,
    PLACEHOLDERS,
    LM,
    FINAL,
    CAT
]
parice_pipeline = [
    CAT,
    SENT_FIX,
    SHUFFLE
]

# If we are not starting from scratch - we try to load all intermediary stages
en_parice = c.pipeline_load(parice_dir, parice_pipeline, c.Lang.EN)
is_parice = c.pipeline_load(parice_dir, parice_pipeline, c.Lang.IS)
en_train = c.pipeline_load(train_parice_dir, pipeline, c.Lang.EN)
is_train = c.pipeline_load(train_parice_dir, pipeline, c.Lang.IS)
en_test = c.pipeline_load(test_parice_dir, pipeline, c.Lang.EN)
is_test = c.pipeline_load(test_parice_dir, pipeline, c.Lang.IS)
en_val = c.pipeline_load(val_parice_dir, pipeline, c.Lang.EN)
is_val = c.pipeline_load(val_parice_dir, pipeline, c.Lang.IS)
rmh = c.pipeline_load(rmh_dir, rmh_stages, c.Lang.IS)
pprint(en_parice)
pprint(is_parice)
pprint(rmh)
pprint(en_train)
pprint(is_train)

{'cat': PosixPath('/work/data/parice/cat.en'),
 'sent_fix': PosixPath('/work/data/parice/sent_fix.en'),
 'shuffle': PosixPath('/work/data/parice/shuffle.en')}
{'cat': PosixPath('/work/data/parice/cat.is'),
 'sent_fix': PosixPath('/work/data/parice/sent_fix.is'),
 'shuffle': PosixPath('/work/data/parice/shuffle.is')}
{'cat': PosixPath('/work/data/risamalheild/cat.is'),
 'final': PosixPath('/work/data/risamalheild/final.is'),
 'lm-blm': PosixPath('/work/data/risamalheild/lm-blm.is'),
 'lower': PosixPath('/work/data/risamalheild/lower.is'),
 'placeholders': None,
 'regexp': PosixPath('/work/data/risamalheild/regexp.is'),
 'sent_fix': PosixPath('/work/data/risamalheild/sent_fix.is'),
 'tok': PosixPath('/work/data/risamalheild/tok.is')}
{'drop': PosixPath('/work/data/parice/train/drop.en'),
 'final': PosixPath('/work/data/parice/train/final.en'),
 'length': PosixPath('/work/data/parice/train/length.en'),
 'length-short': PosixPath('/work/data/parice/train/length-short.en'),
 'lm-blm': Posix

### Stytta þjálfunarsetningar
Moses á erfitt með að samstilla langar setningar. Við styttum þjálfunarsetningarnar svo einungis setningar sem eru eitt orð eða lengri upp að tölunni sem er skilgreint að neðan. Við höfum tekið eftir því að niðurstöðurnar sem við fáum með hámarkslengd (100) gefa ekki góðar niðurstöður.

Þar sem við notum fall sem er skilgreint í Moses og tekur inn tvær skrár í einu fer nafnavenjan eitthvað á flakk.

In [27]:
def corpus_shorten(path, path_out, lang_id_1, lang_id_2, min_length, max_length):
    !{os.getenv('MOSESDECODER')}/scripts/training/clean-corpus-n.perl {path} {lang_id_1} {lang_id_2} {path_out} {min_length} {max_length}
    return True

path_out = is_train[FINAL].with_name(LENGTH_SHORT)
path = is_train[FINAL].parent.joinpath(FINAL)
corpus_shorten(path, path_out, 'en', 'is', 1, 50)

is_train[LENGTH_SHORT] = is_train[FINAL].with_name(LENGTH_SHORT).with_suffix('.is')
en_train[LENGTH_SHORT] = en_train[FINAL].with_name(LENGTH_SHORT).with_suffix('.en')

	LANGUAGE = "en_US:en",
	LC_ALL = (unset),
	LC_CTYPE = "C.UTF-8",
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
clean-corpus.perl: processing /work/data/parice/train/final.en & .is to /work/data/parice/train/length-short, cutoff 1-50, ratio 9
..........(100000)..........(200000)..........(300000)..........(400000)..........(500000)..........(600000)..........(700000)..........(800000)..........(900000)..........(1000000)..........(1100000)..........(1200000)..........(1300000)..........(1400000)..........(1500000)..........(1600000)..........(1700000)..........(1800000)..........(1900000)..........(2000000)..........(2100000)..........(2200000)..........(2300000)..........(2400000)..........(2500000)..........(2600000)..........(2700000)..........(2800000)..........(2900000)..........(3000000)..........(3100000)..........(3200000)..........(3300000).......
Input sentences: 3378149  Output sentences:  3246565


### Tungumála módel
Við búum til KenLM mállíkan til þess að gefa okkur líkindi setninga. Til að flýta uppflettingum þá tungumála módelið samtímis kjörsniðið.

In [28]:
def create_lm(path, out_path, order):
    tmp_arpa = c.corpus_create_path(path, 'arpa')
    !{os.getenv('MOSESDECODER')}/bin/lmplz --order {order} --temp_prefix {data_dir}/ --memory 70% < {path} > {tmp_arpa}
    !{os.getenv('MOSESDECODER')}/bin/build_binary -S 70% {tmp_arpa} {out_path}
    return True

In [8]:
is_train[LM] = c.corpus_create_path(is_train[FINAL], LM)
en_train[LM] = c.corpus_create_path(en_train[FINAL], LM)

create_lm(is_train[FINAL], is_train[LM], order=3)
create_lm(en_train[FINAL], en_train[LM], order=3)

=== 1/5 Counting and sorting n-grams ===
Reading /work/data/parice/train/final.is
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 45111359 types 557672
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:6692064 2:32821854208 3:61540978688
Statistics:
1 557672 D1=0.654933 D2=1.04977 D3+=1.39709
2 5194483 D1=0.744091 D2=1.10969 D3+=1.42352
3 13501935 D1=0.704238 D2=1.17861 D3+=1.46824
Memory estimate for binary LM:
type     MB
probing 364 assuming -p 1.5
probing 396 assuming -r models -p 1.5
trie    161 without quantization
trie     94 assuming -q 8 -b 8 quantization 
trie    151 assuming -a 22 array pointer compression
trie     85 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:6692064 2:83111728 3:270038700
----5

True

### Sameina RMH og IS ParIce fyrir mállíkan

In [5]:
rmh[CAT] = c.corpus_create_path(rmh[FINAL], CAT)
c.corpora_combine((is_train[FINAL], rmh[FINAL]), rmh[CAT])

True

Búa til mállíkan að lengd 3.

In [29]:
rmh[LM_3] = c.corpus_create_path(rmh[CAT], LM)

create_lm(rmh[CAT], rmh[LM_3], order=3)

=== 1/5 Counting and sorting n-grams ===
Reading /work/data/risamalheild/cat.is
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Unigram tokens 1459492635 types 5833046
=== 2/5 Calculating and sorting adjusted counts ===
Chain sizes: 1:69996552 2:32799834112 3:61499691008
Statistics:
1 5833046 D1=0.702775 D2=1.03479 D3+=1.32363
2 84061923 D1=0.746733 D2=1.07179 D3+=1.34939
3 332835285 D1=0.69458 D2=1.29804 D3+=1.52235
Memory estimate for binary LM:
type      MB
probing 7782 assuming -p 1.5
probing 8285 assuming -r models -p 1.5
trie    3428 without quantization
trie    2044 assuming -q 8 -b 8 quantization 
trie    3227 assuming -a 22 array pointer compression
trie    1844 assuming -a 22 -q 8 -b 8 array pointer compression and quantization
=== 3/5 Calculating and sorting initial probabilities ===
Chain sizes: 1:69996552 2:1344990768 3:6

True

Prófa tungumála módel, það ættu ekki að vera nein óþekkt orð.

In [10]:
def eval_sentence(lm_model, sentence):
   !echo "{sentence}" | {os.getenv('MOSESDECODER')}/bin/query {lm_model}

eval_sentence(rmh[LM], "þetta er flott íslensk setning , er það ekki ?")
eval_sentence(en_train[LM], "this is a nice english sentence , right ?")

þetta=408 2 -1.7515687	er=108 3 -0.45247617	flott=6918 4 -3.1981769	íslensk=8107 2 -4.3185043	setning=37795 2 -5.0770183	,=25 2 -1.4574796	er=108 3 -2.1485984	það=260 3 -1.7368271	ekki=184 4 -0.7310319	?=97 4 -0.9805836	</s>=2 4 -0.06486414	Total: -21.917128 OOV: 0
Perplexity including OOVs:	98.28022595608707
Perplexity excluding OOVs:	98.28022595608707
OOVs:	0
Tokens:	11
Name:query	VmPeak:21003588 kB	VmRSS:4812 kB	RSSMax:20988044 kB	user:0.004002	sys:1.60908	CPU:1.61308	real:1.67366
this=195 2 -1.8074161	is=188 3 -0.68361896	a=12 3 -1.0045757	nice=1048 3 -2.8550868	english=6319 1 -4.6239047	sentence=2958 1 -5.020405	,=6 2 -1.1387969	right=170 2 -3.7610703	?=94 3 -0.14322345	</s>=2 3 -0.034358077	Total: -21.072456 OOV: 0
Perplexity including OOVs:	128.0105124034037
Perplexity excluding OOVs:	128.0105124034037
OOVs:	0
Tokens:	10
Name:query	VmPeak:310420 kB	VmRSS:4916 kB	RSSMax:294980 kB	user:0	sys:0.155038	CPU:0.155038	real:7.20459


## Moses þjálfunar föll
Næstu föll snúa að þjálfun Moses og annarra atriða sem þarf að hafa í huga. Þjálfunin tekur um 12 klst.
Til þess að sjá framgang þjálfunar - sjá útprent þegar kallað er í föllin. Síðasta skrefið metur þýðingar Moses.

In [5]:
def train_moses(model_dir, corpus, lang_from, lang_to, lang_to_lm, lm_order):
    print(f'tail -f {model_dir}/training.out')
    result = !{os.getenv('MOSESDECODER')}/scripts/training/train-model.perl -root-dir {model_dir} \
        -corpus {corpus} \
        -f {lang_from} -e {lang_to} \
        -alignment grow-diag-final-and -reordering msd-bidirectional-fe \
        -lm 0:{lm_order}:{lang_to_lm}:8 \
        -mgiza -mgiza-cpus {os.getenv('THREADS')} \
        -cores {os.getenv('THREADS')} \
        -external-bin-dir {os.getenv('MOSESDECODER_TOOLS')} &> {model_dir}/training.out
    return model_dir

In [6]:
def tune_moses(model_dir, corpus_val_from, corpus_val_to, base_moses_ini):
    print(f'tail -f {model_dir}/tune.out')
    result = !{os.getenv('MOSESDECODER')}/scripts/training/mert-moses.pl \
        {corpus_val_from} \
        {corpus_val_to} \
        {os.getenv('MOSESDECODER')}/bin/moses {base_moses_ini} \
        --mertdir {os.getenv('MOSESDECODER')}/bin \
        --working-dir {model_dir} \
        --decoder-flags="-threads {os.getenv('THREADS')}" &> {model_dir}/tune.out
    return model_dir

In [7]:
def prepare_binarisation(tuned_moses_ini,
                         lm_path_in,
                         lm_path_out,
                         binarised_moses_ini,
                         binarised_phrase_table,
                         binarised_reordering_table):
    !cp {tuned_moses_ini} {binarised_moses_ini}
    !cp {lm_path_in} {lm_path_out}
    # Adjust the path in the moses.ini file to point to the new files.
    escaped_path_in = str(lm_path_in).replace(r'/', '\/')
    escaped_path_out = str(lm_path_out).replace(r'/', '\/')
    !sed -i 's/{escaped_path_in}/{escaped_path_out}/' {binarised_moses_ini}
    # Adjust the path in the moses.ini file to point to the new files.
    escaped_path = str(binarised_phrase_table).replace(r'/', '\/')
    !sed -i 's/PhraseDictionaryMemory/PhraseDictionaryCompact/' {binarised_moses_ini}
    !sed -i 's/4 path=.*\.gz input-factor/4 path={escaped_path} input-factor/' {binarised_moses_ini}
    # Adjust the path in the moses.ini file
    escaped_path = str(binarised_reordering_table).replace(r'/', '\/')
    !sed -i 's/0 path=.*\.gz$/0 path={escaped_path}/' {binarised_moses_ini}
    
def binarise_phrase_table(base_phrase_table, binarised_phrase_table):
    #Create the table
    !{os.getenv('MOSESDECODER')}/bin/processPhraseTableMin \
        -in {base_phrase_table} \
        -nscores 4 \
        -out {binarised_phrase_table}
    
def binarise_reordering_table(base_reordering_table, binarised_reordering_table):
    #Create the table
    !{os.getenv('MOSESDECODER')}/bin/processLexicalTableMin \
        -in {base_reordering_table} \
        -out {binarised_reordering_table}

In [8]:
# It only makes sense to filter the model when you know what text the system needs to translate.
def filter_model(out_dir, moses_ini, corpus):
    !{os.getenv('MOSESDECODER')}/scripts/training/filter-model-given-input.pl {out_dir} {moses_ini} {corpus}


In [9]:
def translate_corpus(moses_ini, corpus, corpus_translated):
    !{os.getenv('MOSESDECODER')}/bin/moses \
        -f {moses_ini} < {corpus} > {corpus_translated}
    
def eval_translation(corpus_gold, corpus_translated):
    result = !{os.getenv('MOSESDECODER')}/scripts/generic/multi-bleu.perl -lc {corpus_gold} < {corpus_translated}
    return result 

### Byrja þjálfanir

In [10]:
def train_tune_eval(LM,
                    LM_ORDER,
                    FROM,
                    TO,
                    MODIFIER,
                    TRAIN_IN,
                    VAL_IN,
                    VAL_OUT,
                    TEST_IN,
                    TEST_OUT):
    model_dir = working_dir.joinpath(f'{FROM}-{TO}-{MODIFIER}')
    base_model_dir = model_dir.joinpath('base')
    tuned_model_dir = model_dir.joinpath('tuned')
    binarised_model_dir = model_dir.joinpath('binarised')
    !mkdir -p {base_model_dir}
    !mkdir -p {tuned_model_dir}
    !mkdir -p {binarised_model_dir}

    base_moses_ini = base_model_dir.joinpath('model/moses.ini')
    base_phrase_table = base_model_dir.joinpath('model/phrase-table.gz')
    base_reordering_table = base_model_dir.joinpath('model/reordering-table.wbe-msd-bidirectional-fe.gz')

    tuned_moses_ini = tuned_model_dir.joinpath('moses.ini')

    binarised_moses_ini = binarised_model_dir.joinpath('moses.ini')
    binarised_phrase_table = binarised_model_dir.joinpath('phrase-table')
    binarised_reordering_table = binarised_model_dir.joinpath('reordering-table')

    # train
    train_moses(base_model_dir, TRAIN_IN, FROM, TO, LM, lm_order=LM_ORDER)

    # tune
    tune_moses(tuned_model_dir, VAL_IN, VAL_OUT, base_moses_ini)

    # binarise
    !mkdir -p {binarised_model_dir}

    lm_out = binarised_model_dir.joinpath('lm.blm')

    prepare_binarisation(tuned_moses_ini, 
                         LM,
                         lm_out, 
                         binarised_moses_ini, 
                         binarised_phrase_table, 
                         binarised_reordering_table)
    binarise_phrase_table(base_phrase_table, binarised_phrase_table)
    binarise_reordering_table(base_reordering_table, binarised_reordering_table)

    # translate
    translated = binarised_model_dir.joinpath(f'translated.{FROM}')

    translate_corpus(binarised_moses_ini, TEST_IN, translated)
    
    

Með 50 orðum og minna

In [None]:
train_tune_eval(LM = rmh[LM],
                LM_ORDER = 4,
                FROM = 'en',
                TO = 'is',
                MODIFIER = 'rmh-med',
                TRAIN_IN = is_train[FINAL].parent.joinpath(LENGTH_SHORT),
                VAL_IN = en_val[FINAL],
                VAL_OUT = is_val[FINAL],
                TEST_IN = en_test[FINAL],
                TEST_OUT = is_test[FINAL])

tail -f /work/en-is-rmh-med/base/training.out


In [None]:
train_tune_eval(LM = en_train[LM],
                LM_ORDER = 3,
                FROM = 'is',
                TO = 'en',
                MODIFIER = 'rmh-med',
                TRAIN_IN = is_train[FINAL].parent.joinpath(LENGTH_SHORT),
                VAL_IN = is_val[FINAL],
                VAL_OUT = en_val[FINAL],
                TEST_IN = is_test[FINAL],
                TEST_OUT = en_test[FINAL])

Sama nema líka með LM order 3

In [None]:
train_tune_eval(LM = rmh[LM_3],
                LM_ORDER = 3,
                FROM = 'en',
                TO = 'is',
                MODIFIER = 'rmh-3-med',
                TRAIN_IN = is_train[FINAL].parent.joinpath(LENGTH_SHORT),
                VAL_IN = en_val[FINAL],
                VAL_OUT = is_val[FINAL],
                TEST_IN = en_test[FINAL],
                TEST_OUT = is_test[FINAL])

In [25]:
TEST_OUT = en_test[FINAL]
FROM = 'is'
TO = 'en'
MODIFIER = 'short'
model_dir = working_dir.joinpath(f'{FROM}-{TO}')
binarised_model_dir = model_dir.joinpath('binarised')
translated = binarised_model_dir.joinpath(f'translated.{FROM}')
print(eval_translation(TEST_OUT, translated))
print(*c.corpora_peek((TEST_OUT, translated)))

en: • 6 km for category 2 motorcycle ( engine capacity ≥ 150 cc , vmax @lt@ 130 km/h ) ,
 is: • 6 km for category 2 motorcycle ( engine capacity ≥ 150 cc , vmax @lt@ 130 km/h ) , 
 en: e. common learning article 5( 1 ) ( e )
 is: e. common learning article 5( 1 ) ( e ) . 
 en: measurement of exhaust gas opacity with free acceleration ( no load from idling up to cut-off speed ) .
 is: measurement of exhaust gas opacity with free acceleration ( no load from idle up to cut-off speed , in a no-load state ) . 
 en: other trailers and semi-trailers
 is: other trailers and semi-trailers 
 en: i have food poisoning .
 is: i ' ve got food poisoning . 
 en: pearls as big as coconuts .
 is: pearls the size of coconuts . 
 en: this objective shall be measured , in particular , through the increase in the number of member states integrating coherent approaches in the design of their preparedness plans .
 is: this objective shall be measured , in particular , through the increase in the number of me

### Demo
Þýða einhvern texta.

In [12]:
def translate_en_is(moses_ini, sentence):
    sentence = c.sent_process_v1(sentence, c.Lang.EN)
    !echo "{sentence}" | {os.getenv('MOSESDECODER')}/bin/moses -f {moses_ini}

In [14]:
sentence = "This is a proper English sentence, and we can have learnt a better phrase model"
print(translate_en_is(binarised_model_dir.joinpath('moses.ini'), sentence))

Defined parameters (per moses.ini or switch):
	config: /work/en-is-rmh/binarised/moses.ini 
	distortion-limit: 6 
	feature: UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryCompact name=TranslationModel0 num-features=4 path=/work/en-is-rmh/binarised/phrase-table input-factor=0 output-factor=0 LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=/work/en-is-rmh/binarised/reordering-table Distortion KENLM name=LM0 factor=0 path=/work/en-is-rmh/binarised/lm.blm order=4 
	input-factors: 0 
	mapping: 0 T 0 
	threads: 40 
	weight: LexicalReordering0= 0.0384378 0.0859153 0.0405209 0.0884273 0.0272562 0.101296 Distortion0= 0.0313493 LM0= 0.0992259 WordPenalty0= -0.165075 PhrasePenalty0= 0.0804042 TranslationModel0= 0.0395704 0.0341685 0.165878 0.00247474 UnknownWordPenalty0= 1 
line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 

In [37]:
def translate_is_en(moses_ini, sentence):
    sentence = c.sent_process_v1(sentence, c.Lang.IS)
    !echo "{sentence}" | {os.getenv('MOSESDECODER')}/bin/moses -f {moses_ini}

In [38]:
sentence = "Ég man ekki eftir neinum góðum myndum nýlega "
print(translate_is_en(working_dir.joinpath('is-en/binarised').joinpath('moses.ini'), sentence))

Defined parameters (per moses.ini or switch):
	config: /work/is-en/binarised/moses.ini 
	distortion-limit: 6 
	feature: UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryCompact name=TranslationModel0 num-features=4 path=/work/is-en/binarised/phrase-table input-factor=0 output-factor=0 LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=/work/is-en/binarised/reordering-table Distortion KENLM name=LM0 factor=0 path=/work/is-en/binarised/lm-en.blm order=3 
	input-factors: 0 
	mapping: 0 T 0 
	threads: 14 
	weight: LexicalReordering0= 0.114192 0.0158818 0.0202684 0.083186 0.0208785 0.197803 Distortion0= 0.0160226 LM0= 0.0632488 WordPenalty0= -0.204654 PhrasePenalty0= -0.0417258 TranslationModel0= 0.0177732 0.00823355 0.188931 0.00720186 UnknownWordPenalty0= 1 
line=UnknownWordPenalty
FeatureFunction: UnknownWordPenalty0 start: 0 end: 0
line=WordPenalty
FeatureFunction: WordPenalty0 start: 1 end: 1
line