MagpieTTS decoder model on top of NeMo main branch by paarthneekhara · Pull Request #15277 · NVIDIA-NeMo/NeMo

paarthneekhara · 2026-01-08T21:56:34Z

No description provided.

nemo/collections/tts/modules/nemotron_h_decoder.py

tests/collections/tts/test_nemotron_h_decoder.py

+from nemo.collections.tts.modules.nemotron_h_decoder import (
+    HybridMambaAttentionDynamicCache,
+    NemotronHConfig,
+    NemotronHForCausalLM,
+    NemotronHMLP,
+    NemotronHModel,
+    NemotronHMOE,
+    NemotronHTopkRouter,
+)


nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py

examples/tts/conf/magpietts/easy_magpietts.yaml

nemo/collections/tts/modules/nemotron_h_decoder.py

blisc

Some more comments from WIP review

examples/tts/magpietts_inference.py

examples/tts/magpietts_streaming_inference.py

nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py

scripts/tts_dataset_files/bpe_ipa_tokenizer_2048_en_de_es_fr_hi_it_vi_zh.json

tests/collections/tts/test_infer_vs_process_batch.py

nemo/collections/tts/parts/utils/helpers.py

nemo/collections/tts/modules/__init__.py

nemo/collections/tts/modules/magpietts_inference/inference.py

nemo/collections/tts/data/text_to_speech_dataset_lhotse.py

nemo/collections/tts/models/easy_magpietts.py

nemo/collections/tts/models/easy_magpietts_preference_optimization.py

scripts/magpietts/create_crosslingual_context_dataset.py

nemo/collections/tts/models/easy_magpietts.py

nemo/collections/tts/modules/magpietts_inference/inference.py

examples/tts/conf/magpietts/easy_magpietts.yaml

examples/tts/conf/magpietts/easy_magpietts_lhotse.yaml

examples/tts/magpietts_inference.py

nemo/collections/tts/models/easy_magpietts_preference_optimization.py

blisc · 2026-03-15T15:38:15Z

nemo/collections/tts/models/easy_magpietts.py

+
+        return loss
+
+    def validation_step(self, batch, batch_idx):


@XuesongYang Can you take a look at the dataset setup and validation logging especially since you recently made changes to these inside of the magpie class

I checked the model class and the Lhotse YAML config, and I can confirm that this model doesn’t currently support multiple validation dataloaders.

Since this PR is already quite large and aggregates a significant number of changes, let’s avoid adding multiple dataloader support here. Instead, we can plan to implement that in a separate PR (similar to #15348).

nemo/collections/tts/models/easy_magpietts_inference.py

nemo/collections/tts/models/easy_magpietts_preference_optimization.py

nemo/collections/tts/parts/utils/helpers.py

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

subhankar-ghosh · 2026-03-24T15:31:20Z

nemo/collections/tts/models/easy_magpietts.py

+        self.use_multilingual_asr = cfg.get('use_multilingual_asr', False)
+        if self.run_val_inference:
+            logging.info("Loading eval models for validation inference (ASR and speaker verification)...")
+            if self.use_multilingual_asr:


Why do we have these imports in the constructor? These imports might cause issues during model loading while doing inference. IMO we should not include evaluation metrics in the class. Keep it separate like we do in MagpieTTS.

Moved them above.

subhankar-ghosh · 2026-03-24T15:33:37Z

nemo/collections/tts/models/easy_magpietts.py

+            '_utmos_calculator',
+        ]
+
+    def compute_loss(self, logits, audio_codes, audio_codes_lens):


Is it possible to reuse the compute_loss method from MagpieTTS? I think this is calculating the same loss if we pass frame_stacking_factor=1.

I am not wanting to reuse it because it uses several self variables. @rlangman suggested to dedupe code only if the function is reusable across other models as well. I dont think it is very reusable and it's just additive cross-entropy loss. Also the function is fairly small (~20 lines) so I am not too inclined towards a shared implementation.

subhankar-ghosh · 2026-03-24T15:38:32Z

nemo/collections/tts/models/easy_magpietts.py

+        total_phoneme_loss = total_phoneme_loss / self.phoneme_stacking_factor
+        return total_phoneme_loss, loss_mask
+
+    def log_val_audio_example(


This change can be part of a later PR. Can we move log_audio_* and log_plot_* to helper files and reuse them across models? I think we can reuse a lot of these plot and log methods from MagpieTTS.

That makes sense to me. I'll make another PR once this is merged to unify the audio/image logging.

nemo/collections/tts/models/easy_magpietts.py

subhankar-ghosh · 2026-03-24T15:53:27Z

nemo/collections/tts/models/easy_magpietts.py

+            if predicted_audio_paths and context_audio_paths:
+                with torch.no_grad():
+                    # ASR transcription for CER/WER
+                    if self.use_multilingual_asr:


do we really need cer/wer and ssim during validation step? It might make sense for experimentation but for open sourcing it does not make sense especially after we have a working recipe. Makes val step heavier.

I find it quite helpful to track CER/SSIM during training to know if the model is working and also comparing different experiments. Val loss comparison often doesnt translate to which experiment is doing well. The overhead of this step is not much since it is spread across 4 nodes, so less than 10 iterations of the infer_batch cover the entire val set.

…etts_decoderonly_2601

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

XuesongYang · 2026-03-31T17:57:19Z

nemo/collections/tts/models/easy_magpietts.py

+    selected_training_mode: Optional[str]
+
+
+class EasyMagpieTTSModel(EasyMagpieTTSInferenceModel):


any reason why subclassing inference class for pretraining? aren't we expecting inference subclassing pretraining class?

XuesongYang

Generally work through the general design, the new class hierarchy (ModelPT -> EasyMagpieTTSInferenceModel -> EasyMagpieTTSModel -> EasyMagpieTTSModelOnlinePO) operates independently from the existing MagpieTTS class.

Because it doesn’t subclass MagpieTTS, any features added to MagpieTTS won’t be inherited without careful cherry-picking, and I already noticed a few manually copy-pasted code blocks. If this implementation is meant to be a backbone for future models rather than a one-off recipe, it is worth rethinking the architecture now to ensure better long-term maintainability.

XuesongYang · 2026-03-31T18:31:48Z

examples/tts/conf/magpietts/easy_magpietts_lhotse.yaml

+      batch_duration: ???   # recommend to use smaller batch_duration for validation dataset than training dataset.
+      quadratic_duration: ${quadratic_duration}
+      use_bucketing: false
+      force_finite: true


pls add the two latest changes:

[MagpieTTS][bugfix] defaults to force_map_dataset=True for validation datast to avoid duplicates by a factor of num_workers. #15387

[MagpieTTS] set seed and shard_seed=randomized for val dataloader to ensure deterministic multi-GPU eval #15427

@paarthneekhara Let's make these changes.

Thanks for pointing these. I have made these changes.

XuesongYang · 2026-03-31T18:38:54Z

nemo/collections/tts/models/easy_magpietts.py

+
+        return loss
+
+    def validation_step(self, batch, batch_idx):


I checked the model class and the Lhotse YAML config, and I can confirm that this model doesn’t currently support multiple validation dataloaders.

Since this PR is already quite large and aggregates a significant number of changes, let’s avoid adding multiple dataloader support here. Instead, we can plan to implement that in a separate PR (similar to #15348).

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

…etts_decoderonly_2601

github-actions bot added TTS common labels Jan 8, 2026

zhehuaichen requested a review from Edresson January 28, 2026 21:20

github-advanced-security bot found potential problems Feb 4, 2026

View reviewed changes

github-advanced-security bot found potential problems Feb 7, 2026

View reviewed changes

nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py Fixed Show fixed Hide fixed

blisc reviewed Feb 10, 2026

View reviewed changes

examples/tts/conf/magpietts/easy_magpietts.yaml Show resolved Hide resolved

nemo/collections/tts/modules/nemotron_h_decoder.py Show resolved Hide resolved

blisc reviewed Feb 10, 2026

View reviewed changes

github-advanced-security bot found potential problems Feb 11, 2026

View reviewed changes

tests/collections/tts/test_infer_vs_process_batch.py Fixed Show fixed Hide fixed

shehzeen force-pushed the magpietts_decoderonly_2601 branch from 54d6283 to 06c516f Compare February 12, 2026 00:12

github-actions bot added the core Changes to NeMo Core label Feb 17, 2026

blisc requested changes Feb 20, 2026

View reviewed changes

paarthneekhara force-pushed the magpietts_decoderonly_2601 branch from f684fc3 to eeac2ce Compare March 9, 2026 16:32

github-advanced-security bot found potential problems Mar 9, 2026

View reviewed changes

shehzeen force-pushed the magpietts_decoderonly_2601 branch from 81af95a to c8ad57a Compare March 10, 2026 23:04

github-advanced-security bot found potential problems Mar 10, 2026

View reviewed changes

nemo/collections/tts/models/easy_magpietts.py Fixed Show fixed Hide fixed

nemo/collections/tts/models/easy_magpietts.py Fixed Show fixed Hide fixed

nemo/collections/tts/models/easy_magpietts.py Fixed Show fixed Hide fixed

github-actions bot removed the core Changes to NeMo Core label Mar 11, 2026

paarthneekhara marked this pull request as ready for review March 14, 2026 19:21

github-advanced-security bot found potential problems Mar 14, 2026

View reviewed changes

nemo/collections/tts/modules/magpietts_inference/inference.py Fixed Show fixed Hide fixed

nemo/collections/tts/modules/magpietts_inference/inference.py Fixed Show fixed Hide fixed

nemo/collections/tts/modules/magpietts_inference/inference.py Fixed Show fixed Hide fixed

blisc reviewed Mar 15, 2026

View reviewed changes

github-advanced-security bot found potential problems Mar 17, 2026

View reviewed changes

nemo/collections/tts/models/easy_magpietts_preference_optimization.py Dismissed Show dismissed Hide dismissed

github-advanced-security bot found potential problems Mar 17, 2026

View reviewed changes

nemo/collections/tts/parts/utils/helpers.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Mar 17, 2026

View reviewed changes

nemo/collections/tts/parts/utils/helpers.py Fixed Show fixed Hide fixed

nemo/collections/tts/parts/utils/helpers.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Mar 18, 2026

View reviewed changes

nemo/collections/tts/parts/utils/helpers.py Fixed Show fixed Hide fixed

Easy MagpieTTS squashed commit

c538c7f

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

paarthneekhara force-pushed the magpietts_decoderonly_2601 branch from 9353e11 to c538c7f Compare March 22, 2026 23:36

subhankar-ghosh reviewed Mar 24, 2026

View reviewed changes

nemo/collections/tts/models/easy_magpietts.py Outdated Show resolved Hide resolved

subhankar-ghosh reviewed Mar 24, 2026

View reviewed changes

Merge branch 'main' of https://github.com/NVIDIA-NeMo/NeMo into magpi…

4607eef

…etts_decoderonly_2601

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 temporarily deployed to test March 31, 2026 06:55 — with GitHub Actions Inactive

Merge branch 'main' of https://github.com/NVIDIA-NeMo/NeMo into magpi…

8c3288b

…etts_decoderonly_2601

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 temporarily deployed to test March 31, 2026 16:07 — with GitHub Actions Inactive

add doc for training mode

4b506ca

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 temporarily deployed to test March 31, 2026 16:25 — with GitHub Actions Inactive

XuesongYang reviewed Mar 31, 2026

View reviewed changes

config update

73a3601

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 had a problem deploying to test March 31, 2026 21:33 — with GitHub Actions Error

increase timeout for PO

1549a47

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 had a problem deploying to test March 31, 2026 21:54 — with GitHub Actions Error

remove unnessary line

30d79b9

Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

chtruong814 temporarily deployed to test March 31, 2026 23:16 — with GitHub Actions Inactive

Merge branch 'main' of https://github.com/NVIDIA-NeMo/NeMo into magpi…

4c385f8

…etts_decoderonly_2601

chtruong814 added Run CICD and removed Run CICD labels Mar 31, 2026

blisc approved these changes Apr 1, 2026

View reviewed changes

		selected_training_mode: Optional[str]


		class EasyMagpieTTSModel(EasyMagpieTTSInferenceModel):

Conversation

paarthneekhara commented Jan 8, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Check notice

Uh oh!

Uh oh!

Uh oh!

blisc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XuesongYang left a comment

Choose a reason for hiding this comment