Skip to content

[TTS] Refactor BOS and EOS handling to fix codec conversion#15054

Closed
rlangman wants to merge 2 commits intoNVIDIA-NeMo:magpietts_2508from
rlangman:magpietts_2508_eos_fix
Closed

[TTS] Refactor BOS and EOS handling to fix codec conversion#15054
rlangman wants to merge 2 commits intoNVIDIA-NeMo:magpietts_2508from
rlangman:magpietts_2508_eos_fix

Conversation

@rlangman
Copy link
Copy Markdown
Collaborator

This PR fixes a bug in MagpieTTS training in which self._codec_converter.convert_original_to_new() is being called on audio codec tokens after the Lhotse data loader has already pre-populated them with BOS and EOS tokens.

The addition of BOS and EOS tokens is now always done inside the model class, after convert_original_to_new() is called. These tokens are also removed before convert_new_to_original() is called, so that they are not mistakenly passed into the codec model.

There are also a few other minor changes, including:

  • Fix data loading field mismatch when doing on-the-fly codec extraction (we now use target_audio instead of recording and context_audio instead of context_recording)
  • Fix bug with inconsistent padding in spectral codec encoder
  • Replace some manual padding done with torch.cat with simpler call to torch.nn.functional.pad

@github-actions github-actions bot added the TTS label Nov 10, 2025
@rlangman rlangman force-pushed the magpietts_2508_eos_fix branch from c87e39e to 620903a Compare November 10, 2025 21:49
Signed-off-by: Ryan <rlangman@nvidia.com>
@rlangman rlangman force-pushed the magpietts_2508_eos_fix branch from 620903a to 5fb564c Compare December 5, 2025 23:05
@rlangman rlangman force-pushed the magpietts_2508_eos_fix branch from d6f3a89 to cd463a8 Compare December 5, 2025 23:12
@blisc
Copy link
Copy Markdown
Collaborator

blisc commented Dec 10, 2025

Closing, please make a new PR into main

@blisc blisc closed this Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants