Parallel GRU decoder with Banadau-style attention #41

tkornuta-ibm · 2019-05-03T22:34:21Z

Fixes issues with description of hidden stream in decoder with attention (it actually required different order than defined in output_data_definitions)
Standardized dimensions of input/output hidden steams in both RNN models: batch-major!

That, along with fixed sizes of embeddings (padding) enabled to properly parallelize both the model and C4 pipeline

…RNN, AttDecGRU): batch first, c4 working in DataParallel

…nerate 19 symbols, which makes it really slow

tkornuta-ibm · 2019-05-03T23:35:06Z

Early results:

7 GPUs enabled me to use batch size of 300
on that setting, one epoch (i.e. processing of 4 folds of joined training and validation sets, ~3000 samples) takes around 30 seconds

tkornuta-ibm added 2 commits May 3, 2019 15:10

Reordered dimensions of hidden states passed between RNN components (…

d5eb36d

…RNN, AttDecGRU): batch first, c4 working in DataParallel

Cleanups, got max question length(1(, but then encoder has also to ge…

0a55cf2

…nerate 19 symbols, which makes it really slow

tkornuta-ibm requested a review from aasseman May 3, 2019 22:34

cleanups, hyperparams for c4 enc_dec

6233df5

tkornuta-ibm merged commit 484b2f6 into develop May 4, 2019

tkornuta-ibm deleted the parallel_att_decoder branch May 4, 2019 22:21

Provide feedback