how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

theshypig · 2024-03-01T15:12:21Z

Hi, Espnet Team !

I have a question regarding the default settings in ESPnet for ASR tasks, specifically related to the Conformer model when the input_layer is set to "embed".

Firstly, I noticed that the default generated token_list does not include a <pad> token. When using the Conformer model with an "embed" input_layer, a padding_idx needs to be passed to torch.nn.Embedding. By default, ESPnet sets padding_idx to -1. My understanding is that this setting is intended to avoid the <blank> token, as token_list[0] is <blank>.

However, torch.nn.Embedding restricts padding_idx to be within the range of 0 to num_embeddings - 1. When padding_idx is less than 0, it is added to num_embeddings and the result is re-assigned as padding_idx. This means that even though we intend to set padding_idx as -1, it actually becomes num_embeddings - 1.

I am concerned that this might not be the correct behavior. Could you please clarify this for me? I look forward to your response.

Best regards,
ck

The text was updated successfully, but these errors were encountered:

sw005320 · 2024-03-01T16:31:10Z

Thanks a lot for your report.
If it behaves as you explained, this sounds like a critical issue to me...
Can you actually confirm that num_embeddings - 1 happens?
Is it possible to show us with a debugger or printing it?

theshypig · 2024-03-02T08:01:23Z

Dear @sw005320,

I apologize for the delay in my response. I can confirm that num_embeddings - 1 does indeed occur.

I have been using two versions of PyTorch, specifically 1.13.1 and 2.1.0. In the torch.nn.Embedding module, I found the following code:

if padding_idx is not None:
    if padding_idx > 0:
        assert padding_idx < self.num_embeddings, 'Padding_idx must be within num_embeddings'
    elif padding_idx < 0:
        assert padding_idx >= -self.num_embeddings, 'Padding_idx must be within num_embeddings'
        padding_idx = self.num_embeddings + padding_idx

This code snippet is directly related to the issue I previously described. I hope this information helps in resolving the issue. I look forward to your feedback.

Best regards,
ck

sw005320 · 2024-03-02T20:59:57Z

Thanks a lot!
We also confirmed it on our side.

Several discussion items

It would be OK for the normal ASR usages with continuous value input
it would be an issue with discrete-token ASR/MT
the last token is usually <sos/eos> and it may be OK if <sos/eos> and padding share the same parameter
If the encoder and decoder share the embedding, and we apply convolutions to the encoder input, that might affect the result

We may need to fix it in the future, but we're also expecting that this might not cause a big issue.
I added @simpleoier and @pyf98 who pointed out the above discussion items.

theshypig · 2024-03-03T08:37:52Z

Thank you very much for your attention. Indeed, as you discussed, using <sos\eos> instead of generally does not pose a significant problem, and I will pay attention to your repair results.

Currently, I am trying to reproduce a paper. The main idea is to re-input the text result of the intermediate ctc_out through Embedding and gate units to re-input the obtained intermediate text result into the next layer of the encoder. The setting of padding_idx may have a certain impact on the results, and I want to solve the problems caused by the setting of padding_idx as much as possible. I am trying to call the token_list generation stage in asr.sh,

python -m espnet2.bin.tokenize_text \
--add_symbol "${blank}:0" \
--add_symbol "${oov}:1" \
--add_symbol "${sos_eos}:-2" \
--add_symbol "${pad}:-1"

In the subsequent settings, I still use padding_idx=-1, of course, there may be other places that need to be modified. What do you think about this, or what is your repair plan, I can try it myself.

sw005320 · 2024-03-03T19:28:58Z

The main idea is to re-input the text result of the intermediate ctc_out

I see.
This is an interesting problem and yes, this case would cause an issue.
I think your treatment sounds good enough to provide distinct information with the model.
<sos/eos> is hard coded sometimes in the source code and this makes it tricky especially for attention based encoder-decoder, but as long as if you only use CTC, it will not be very complicated.

theshypig added the Question Question label Mar 1, 2024

sw005320 added the Bug bug should be fixed label Mar 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

theshypig commented Mar 1, 2024

sw005320 commented Mar 1, 2024

theshypig commented Mar 2, 2024

sw005320 commented Mar 2, 2024

theshypig commented Mar 3, 2024

sw005320 commented Mar 3, 2024

how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

how to set padding_idx in conformer_ctc? set padding_idx = -1 may be wrong ? #5691

Comments

theshypig commented Mar 1, 2024

sw005320 commented Mar 1, 2024

theshypig commented Mar 2, 2024

sw005320 commented Mar 2, 2024

theshypig commented Mar 3, 2024

sw005320 commented Mar 3, 2024