attention is not required when only using teacher forcing in decoder #90

lifengjin · 2017-10-23T17:45:56Z

#89 Fixes this bug: when using teacher forcing but not attention in decoder, an error will be thrown saying attention is not subscriptable because it is None.

kylegao91 · 2017-10-24T08:25:46Z

seq2seq/models/DecoderRNN.py

@@ -166,7 +166,10 @@ def decode(step, step_output, step_attn):

            for di in range(decoder_output.size(1)):
                step_output = decoder_output[:, di, :]
-                step_attn = attn[:, di, :]
+                if attn:


Please use if attn is not None. When attn is a tensor, it crushes because the boolean value of a tensor is ambiguous.

* Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style.

* 0.1.5 (#91) * Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style. * shuffle the training data

* Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style. * bugfix (#92) Fixed field arguments validation. * Removed `initial_lr` when resuming optimizer with scheduler. (#95) * shuffle the training data (#97) * 0.1.5 (#91) * Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style. * shuffle the training data * fix example of inflate function in TopKDecoer.py (#98) * fix example of inflate function in TopKDecoer.py * Fix hidden_layer size for one-directional decoder (#99) * Fix hidden_layer size for one-directional decoder Hidden layer size of the decoder was given `hidden_size * 2 if bidirectional else 1`, resulting in a dimensionality error for non-bidirectional decoders. Changed `1` to `hidden_size`. * Adapt load to allow CPU loading of GPU models (#100) * Adapt load to allow CPU loading of GPU models Add storage parameter to torch.load to allow loading models on a CPU that are trained on the GPU, depending on availability of cuda. * Fix wrong parameter use on DecoderRNN (#103) * Fix wrong parameter use on DecoderRNN

* Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style. * bugfix (#92) Fixed field arguments validation. * Removed `initial_lr` when resuming optimizer with scheduler. (#95) * shuffle the training data (#97) * 0.1.5 (#91) * Modified parameter order of DecoderRNN.forward (#85) * Updated TopKDecoder (#86) * Fixed topk decoder. * Use torchtext from pipy (#87) * Use torchtext from pipe. * Fixed torch text sorting order. * attention is not required when only using teacher forcing in decoder (#90) * attention is not required when only using teacher forcing in decoder * Updated docs and version. * Fixed code style. * shuffle the training data * fix example of inflate function in TopKDecoer.py (#98) * fix example of inflate function in TopKDecoer.py * Fix hidden_layer size for one-directional decoder (#99) * Fix hidden_layer size for one-directional decoder Hidden layer size of the decoder was given `hidden_size * 2 if bidirectional else 1`, resulting in a dimensionality error for non-bidirectional decoders. Changed `1` to `hidden_size`. * Adapt load to allow CPU loading of GPU models (#100) * Adapt load to allow CPU loading of GPU models Add storage parameter to torch.load to allow loading models on a CPU that are trained on the GPU, depending on availability of cuda. * Fix wrong parameter use on DecoderRNN (#103) * Fix wrong parameter use on DecoderRNN * Upgrade to pytorch-0.3.0 (#111) * Upgrade to pytorch-0.3.0 * Use pytorch 3.0 in travis env. * Make sure tensor contiguous when attention's not used. (#112) * Implementing the predict_n method. Using the beam search outputs it returns several seqs for a given seq (#116) * Adding a predictor method to return n predicted seqs for a src_seq input (intended to be used along to Beam Search using TopKDecoder) * Checkpoint after batches not epochs (#119) * Pytorch 0.4 (#134) * add contiguous call to tensor (#127) when attention is turned off, pytorch (well, 0.4 at least) gets angry about calling view on a non-contiguous tensor * Fixed shape documentation (#131) * Update to pytorch-0.4 * Remove pytorch manual install in travis. * Allow using pre-trained embedding (#135) * updated docs

attention is not required when only using teacher forcing in decoder

a61f40e

kylegao91 suggested changes Oct 24, 2017

View reviewed changes

change condition to is None

0cef95d

kylegao91 approved these changes Oct 24, 2017

View reviewed changes

kylegao91 changed the base branch from master to develop October 24, 2017 13:26

kylegao91 added 2 commits October 24, 2017 09:26

Merge branch 'develop' into master

5723bc3

Merge branch 'develop' into master

39b1f92

kylegao91 merged commit 3f201b8 into IBM:develop Oct 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

attention is not required when only using teacher forcing in decoder #90

attention is not required when only using teacher forcing in decoder #90

lifengjin commented Oct 23, 2017 •

edited

Loading

kylegao91 Oct 24, 2017

attention is not required when only using teacher forcing in decoder #90

attention is not required when only using teacher forcing in decoder #90

Conversation

lifengjin commented Oct 23, 2017 • edited Loading

kylegao91 Oct 24, 2017

Choose a reason for hiding this comment

lifengjin commented Oct 23, 2017 •

edited

Loading