Conversation
# Conflicts: # onmt/Factory.lua # onmt/data/Preprocessor.lua # onmt/translate/DecoderAdvancer.lua # onmt/translate/Translator.lua # preprocess.lua # translate.lua
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also add a documentation page on this.
onmt/Factory.lua
Outdated
opt.pre_word_vecs_enc, opt.fix_word_vecs_enc == 1, | ||
verbose) | ||
else | ||
inputNetwork = nn.Identity() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be in another function, like buildInputEncoder
.
@@ -110,6 +110,23 @@ function PDBiEncoder:maskPadding() | |||
self.layers[1]:maskPadding() | |||
end | |||
|
|||
-- size of context vector | |||
function PDBiEncoder:contextSize(sourceSize, sourceLength) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need that? Past the first layer, padding no more applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contextSize is used in decoder to have size of the context size outside of the encoder which varies for pdbiencoder
preprocess.lua
Outdated
if dataType == 'monotext' then | ||
src_file = opt.train | ||
end | ||
data.dicts.src = Vocabulary.init('train', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep 'source'
instead? See #162.
@@ -74,14 +74,24 @@ function Batch:__init(src, srcFeatures, tgt, tgtFeatures) | |||
|
|||
self.sourceLength, self.sourceSize, self.uneven = getLength(src) | |||
|
|||
-- if input vectors (speech for instance) | |||
self.inputVectors = src[1]:dim() > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails with default constructor like local batch = Batch()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! It has been fixed.
Added 2 new features for adding support of input vectors
-idx_files
indicate that source/target files are indexed meaning that their format iskey value
not necessarily in the same orderfeattext
which is using Kaldi text ark dump formatTypical preprocessing command:
For training:
For decoding: