-
Notifications
You must be signed in to change notification settings - Fork 538
Conversation
Codecov Report
@@ Coverage Diff @@
## master #976 +/- ##
==========================================
+ Coverage 81.49% 90.01% +8.52%
==========================================
Files 67 67
Lines 6392 6368 -24
==========================================
+ Hits 5209 5732 +523
+ Misses 1183 636 -547
|
@pytest.mark.parametrize('use_residual', [False, True]) | ||
@pytest.mark.parametrize('batch_size', [4]) | ||
@pytest.mark.parametrize('src_tgt_seq_len', [(5, 10), (10, 5)]) | ||
def test_transformer_encoder_decoder(output_attention, use_residual, batch_size, src_tgt_seq_len): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No changes to this test. Only refactoring to use pytest.mark.parametrize
2b3c537
to
8a9be69
Compare
This is currently blocked by the issue that |
Will be unblocked once apache/mxnet#16582 is merged |
4520484
to
d7f8e90
Compare
Job PR-976/7 is complete. |
Job PR-976/8 is complete. |
Job PR-976/9 is complete. |
@szhengac @sxjscience ping for review on the API. |
In the current Gluon API, each HybridBlock has to serve one puropse and can only define a single callable interface. Previous Seq2SeqDecoder interface required each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and single-step ahead decoding). This means neither of the two functionalities can in practice be hybridized completely. Thus use two separate Blocks for the two functionalities. They may share parameters. Update the NMTModel API accordingly. Further refactor TransformerDecoder to make it completely hybridizable. TransformerOneStepDecoder still relies on a small hack but can be hybridized completely when we enable numpy shape semantics.
87c1566
to
46f5b64
Compare
Job PR-976/10 is complete. |
Job PR-976/11 is complete. |
mem_value, mem_mask = states | ||
if mem_mask is not None: | ||
mem_mask = F.expand_dims(mem_mask, axis=1) | ||
mem_mask = F.broadcast_like(mem_mask, inputs, lhs_axes=(1, ), rhs_axes=(1, )) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as in xingjian's slides, we can extract the code of generating mask into a fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do it in a follow-up PR as it will extend the attention_cell
API
I suggest making the change inside a branch for EMNLP. Changing the API just for a tutorial is not the appropriate working style. |
After offline discussion with the other members, I decide to approve it now. However, these are experimental APIs and we should discuss about that. |
Description
In the current Gluon API, each HybridBlock has to serve one puropse and can only
define a single callable interface. Previous Seq2SeqDecoder interface required
each Seq2SeqDecoder Block to perform two functionalities (multi-step ahead and
single-step ahead decoding). This means neither of the two functionalities can
in practice be hybridized completely. Thus use two separate Blocks for the two
functionalities. They may share parameters.
Update the NMTModel API accordingly.
Further refactor TransformerDecoder to make it completely hybridizable.
TransformerOneStepDecoder still relies on a small hack but can be hybridized
completely when we enable numpy shape semantics.
Checklist
Essentials
Changes
Comments
cc @dmlc/gluon-nlp-team