Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not NAR mode in training #3

Open
youngsheen opened this issue Mar 30, 2023 · 1 comment
Open

Not NAR mode in training #3

youngsheen opened this issue Mar 30, 2023 · 1 comment

Comments

@youngsheen
Copy link

youngsheen commented Mar 30, 2023

The training code still use the causal attention mask. You need to set full_context_alignment=True in the decoder.forward function to turn on non-causal attention mask. Is it a mistake?

@steventan0110
Copy link

steventan0110 commented Feb 15, 2024

Causal Mask used in reported results?

same question here. I'm benchmarking model performance with and without the causal mask and found that they result in a small change about ~1BLEU in final ASR-BLEU evaluation.
Just want to confirm with the author @Rongjiehuang if such a causal mask is used for results reported in the paper?

P.S. @youngsheen I saw you are the first author of DiffS2UT and I'm wondering if the code for that paper is released anywhere? I'm very interested in your approach as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants