Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

Open
theshypig opened this issue Mar 1, 2024 · 5 comments
Labels
Bug bug should be fixed Question Question

Comments

@theshypig
Copy link

Hi, Espnet Team !

I have a question regarding the default settings in ESPnet for ASR tasks, specifically related to the Conformer model when the input_layer is set to "embed".

Firstly, I noticed that the default generated token_list does not include a <pad> token. When using the Conformer model with an "embed" input_layer, a padding_idx needs to be passed to torch.nn.Embedding. By default, ESPnet sets padding_idx to -1. My understanding is that this setting is intended to avoid the <blank> token, as token_list[0] is <blank>.

However, torch.nn.Embedding restricts padding_idx to be within the range of 0 to num_embeddings - 1. When padding_idx is less than 0, it is added to num_embeddings and the result is re-assigned as padding_idx. This means that even though we intend to set padding_idx as -1, it actually becomes num_embeddings - 1.

I am concerned that this might not be the correct behavior. Could you please clarify this for me? I look forward to your response.

Best regards,
ck

@theshypig theshypig added the Question Question label Mar 1, 2024
@sw005320
Copy link
Contributor

sw005320 commented Mar 1, 2024

Thanks a lot for your report.
If it behaves as you explained, this sounds like a critical issue to me...
Can you actually confirm that num_embeddings - 1 happens?
Is it possible to show us with a debugger or printing it?

@theshypig
Copy link
Author

Dear @sw005320,

I apologize for the delay in my response. I can confirm that num_embeddings - 1 does indeed occur.

I have been using two versions of PyTorch, specifically 1.13.1 and 2.1.0. In the torch.nn.Embedding module, I found the following code:

if padding_idx is not None:
    if padding_idx > 0:
        assert padding_idx < self.num_embeddings, 'Padding_idx must be within num_embeddings'
    elif padding_idx < 0:
        assert padding_idx >= -self.num_embeddings, 'Padding_idx must be within num_embeddings'
        padding_idx = self.num_embeddings + padding_idx

This code snippet is directly related to the issue I previously described. I hope this information helps in resolving the issue. I look forward to your feedback.

Best regards,
ck

@sw005320 sw005320 added the Bug bug should be fixed label Mar 2, 2024
@sw005320
Copy link
Contributor

sw005320 commented Mar 2, 2024

Thanks a lot!
We also confirmed it on our side.

Several discussion items

  • It would be OK for the normal ASR usages with continuous value input
  • it would be an issue with discrete-token ASR/MT
  • the last token is usually <sos/eos> and it may be OK if <sos/eos> and padding share the same parameter
  • If the encoder and decoder share the embedding, and we apply convolutions to the encoder input, that might affect the result

We may need to fix it in the future, but we're also expecting that this might not cause a big issue.
I added @simpleoier and @pyf98 who pointed out the above discussion items.

@theshypig
Copy link
Author

Thank you very much for your attention. Indeed, as you discussed, using <sos\eos> instead of generally does not pose a significant problem, and I will pay attention to your repair results.

Currently, I am trying to reproduce a paper. The main idea is to re-input the text result of the intermediate ctc_out through Embedding and gate units to re-input the obtained intermediate text result into the next layer of the encoder. The setting of padding_idx may have a certain impact on the results, and I want to solve the problems caused by the setting of padding_idx as much as possible. I am trying to call the token_list generation stage in asr.sh,

python -m espnet2.bin.tokenize_text \
--add_symbol "${blank}:0" \
--add_symbol "${oov}:1" \
--add_symbol "${sos_eos}:-2" \
--add_symbol "${pad}:-1"

In the subsequent settings, I still use padding_idx=-1, of course, there may be other places that need to be modified. What do you think about this, or what is your repair plan, I can try it myself.

@sw005320
Copy link
Contributor

sw005320 commented Mar 3, 2024

The main idea is to re-input the text result of the intermediate ctc_out

I see.
This is an interesting problem and yes, this case would cause an issue.
I think your treatment sounds good enough to provide distinct information with the model.
<sos/eos> is hard coded sometimes in the source code and this makes it tricky especially for attention based encoder-decoder, but as long as if you only use CTC, it will not be very complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug bug should be fixed Question Question
Projects
None yet
Development

No branches or pull requests

2 participants