MagpieTTS decoder model on top of NeMo main branch#15277
MagpieTTS decoder model on top of NeMo main branch#15277paarthneekhara merged 30 commits intoNVIDIA-NeMo:mainfrom
Conversation
nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py
Fixed
Show fixed
Hide fixed
blisc
left a comment
There was a problem hiding this comment.
Some more comments from WIP review
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py
Outdated
Show resolved
Hide resolved
scripts/tts_dataset_files/bpe_ipa_tokenizer_2048_en_de_es_fr_hi_it_vi_zh.json
Outdated
Show resolved
Hide resolved
54d6283 to
06c516f
Compare
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Outdated
Show resolved
Hide resolved
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Outdated
Show resolved
Hide resolved
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Outdated
Show resolved
Hide resolved
f684fc3 to
eeac2ce
Compare
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Dismissed
Show dismissed
Hide dismissed
81af95a to
c8ad57a
Compare
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Outdated
Show resolved
Hide resolved
|
|
||
| return loss | ||
|
|
||
| def validation_step(self, batch, batch_idx): |
There was a problem hiding this comment.
@XuesongYang Can you take a look at the dataset setup and validation logging especially since you recently made changes to these inside of the magpie class
There was a problem hiding this comment.
I checked the model class and the Lhotse YAML config, and I can confirm that this model doesn’t currently support multiple validation dataloaders.
Since this PR is already quite large and aggregates a significant number of changes, let’s avoid adding multiple dataloader support here. Instead, we can plan to implement that in a separate PR (similar to #15348).
nemo/collections/tts/models/easy_magpietts_preference_optimization.py
Dismissed
Show dismissed
Hide dismissed
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
9353e11 to
c538c7f
Compare
| self.use_multilingual_asr = cfg.get('use_multilingual_asr', False) | ||
| if self.run_val_inference: | ||
| logging.info("Loading eval models for validation inference (ASR and speaker verification)...") | ||
| if self.use_multilingual_asr: |
There was a problem hiding this comment.
Why do we have these imports in the constructor? These imports might cause issues during model loading while doing inference. IMO we should not include evaluation metrics in the class. Keep it separate like we do in MagpieTTS.
There was a problem hiding this comment.
Moved them above.
| '_utmos_calculator', | ||
| ] | ||
|
|
||
| def compute_loss(self, logits, audio_codes, audio_codes_lens): |
There was a problem hiding this comment.
Is it possible to reuse the compute_loss method from MagpieTTS? I think this is calculating the same loss if we pass frame_stacking_factor=1.
There was a problem hiding this comment.
I am not wanting to reuse it because it uses several self variables. @rlangman suggested to dedupe code only if the function is reusable across other models as well. I dont think it is very reusable and it's just additive cross-entropy loss. Also the function is fairly small (~20 lines) so I am not too inclined towards a shared implementation.
| total_phoneme_loss = total_phoneme_loss / self.phoneme_stacking_factor | ||
| return total_phoneme_loss, loss_mask | ||
|
|
||
| def log_val_audio_example( |
There was a problem hiding this comment.
This change can be part of a later PR. Can we move log_audio_* and log_plot_* to helper files and reuse them across models? I think we can reuse a lot of these plot and log methods from MagpieTTS.
There was a problem hiding this comment.
That makes sense to me. I'll make another PR once this is merged to unify the audio/image logging.
| if predicted_audio_paths and context_audio_paths: | ||
| with torch.no_grad(): | ||
| # ASR transcription for CER/WER | ||
| if self.use_multilingual_asr: |
There was a problem hiding this comment.
do we really need cer/wer and ssim during validation step? It might make sense for experimentation but for open sourcing it does not make sense especially after we have a working recipe. Makes val step heavier.
There was a problem hiding this comment.
I find it quite helpful to track CER/SSIM during training to know if the model is working and also comparing different experiments. Val loss comparison often doesnt translate to which experiment is doing well. The overhead of this step is not much since it is spread across 4 nodes, so less than 10 iterations of the infer_batch cover the entire val set.
…etts_decoderonly_2601
…etts_decoderonly_2601
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
| selected_training_mode: Optional[str] | ||
|
|
||
|
|
||
| class EasyMagpieTTSModel(EasyMagpieTTSInferenceModel): |
There was a problem hiding this comment.
any reason why subclassing inference class for pretraining? aren't we expecting inference subclassing pretraining class?
XuesongYang
left a comment
There was a problem hiding this comment.
Generally work through the general design, the new class hierarchy (ModelPT -> EasyMagpieTTSInferenceModel -> EasyMagpieTTSModel -> EasyMagpieTTSModelOnlinePO) operates independently from the existing MagpieTTS class.
Because it doesn’t subclass MagpieTTS, any features added to MagpieTTS won’t be inherited without careful cherry-picking, and I already noticed a few manually copy-pasted code blocks. If this implementation is meant to be a backbone for future models rather than a one-off recipe, it is worth rethinking the architecture now to ensure better long-term maintainability.
| batch_duration: ??? # recommend to use smaller batch_duration for validation dataset than training dataset. | ||
| quadratic_duration: ${quadratic_duration} | ||
| use_bucketing: false | ||
| force_finite: true |
There was a problem hiding this comment.
There was a problem hiding this comment.
Thanks for pointing these. I have made these changes.
|
|
||
| return loss | ||
|
|
||
| def validation_step(self, batch, batch_idx): |
There was a problem hiding this comment.
I checked the model class and the Lhotse YAML config, and I can confirm that this model doesn’t currently support multiple validation dataloaders.
Since this PR is already quite large and aggregates a significant number of changes, let’s avoid adding multiple dataloader support here. Instead, we can plan to implement that in a separate PR (similar to #15348).
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
Signed-off-by: Paarth Neekhara <paarth.n@gmail.com>
…etts_decoderonly_2601
No description provided.