Conversation
… example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
…to magpietts_opensource
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
…to magpietts_opensource
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
…to magpietts_opensource
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
… example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
…to magpietts_opensource
| @@ -16,24 +16,36 @@ | |||
| """ | |||
There was a problem hiding this comment.
Can you do a find and replace inside of this file of all print statements and switch them to logging?
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
nemo/collections/tts/modules/magpietts_inference/evaluate_generated_audio.py
Show resolved
Hide resolved
| Returns: | ||
| Dictionary mapping metric names to formatted "mean ± CI" strings. |
There was a problem hiding this comment.
Should now say the map contains tuples
| self.model_type == 'decoder_ce' | ||
| and hasattr(self, 'baked_context_embedding') | ||
| and self.baked_context_embedding is not None | ||
| and self.baked_context_embedding.numel() > 0 |
There was a problem hiding this comment.
The last condition should probably throw an error, as it implies someone tried to load an embedding but did it incorrectly.
There was a problem hiding this comment.
has_baked_context_embedding property is used as a check in many places. In cases where the condition should absolutely be true it does throw an error. Hence I think this method just returns if baked_context_embedding is present and properly loaded.
| True if baked_context_embedding buffer is set, not None, and has elements. | ||
| """ | ||
| return ( | ||
| self.model_type == 'decoder_ce' |
There was a problem hiding this comment.
Why do we want to enforce that baked embedding can only be used with decoder_ce? Its not a big deal since I imagine we will be removing most of the other model_type code paths later. Just as first glance, it looks like the implementation would be easier/simpler if it was not gated by model type.
| # Index into embeddings: (N, T, D) -> (B, T, D) | ||
| embeddings = self.baked_context_embedding[indices] # (B, T, D) | ||
| lengths = self.baked_context_embedding_len[indices] # (B,) | ||
| return embeddings, lengths |
There was a problem hiding this comment.
A benefit of using an embedding layer is you could validate the shape of the embedding during model initialization. The value of N, num_baked_speakers, would need to be provided as a buffer from the checkpoint, replacing the current definition of it as an @property
| context_input_embedded, context_mask, cond=None, cond_mask=None | ||
| )['output'] | ||
| # Check for baked context embedding first | ||
| if self.has_baked_context_embedding: |
There was a problem hiding this comment.
I would have this branch decided by whether speaker_indices is None or not. If speaker indices is provided, then use this path, else use the context encoder. An error should be thrown if someone provides neither or both context_audio and speaker_indices.
There was a problem hiding this comment.
The value of N, num_baked_speakers, would need to be provided as a buffer from the checkpoint
Why do we need to have number of baked speakers saved as a buffer? We need to save T and D sure, but I am not sure why we need to save N as a buffer. We can get the num embeddings by embedding_layer.weight.shape[0]
Nonetheless, we are changing the implementation to torch.nn.embedding
There was a problem hiding this comment.
I would have this branch decided by whether speaker_indices is None or not. If speaker indices is provided, then use this path, else use the context encoder. An error should be thrown if someone provides neither or both context_audio and speaker_indices.
The idea is if baked_embeddings are there the user should only use that and not context encoder. If speaker_indices are not provided and baked_embeddings are present, the default behavior is to use the first speaker embedding. I thought this is what we had decided - baked_embeddings so we do not let users use any other voice.
There was a problem hiding this comment.
The value of N, num_baked_speakers, would need to be provided as a buffer from the checkpoint
Why do we need to have number of baked speakers saved as a buffer? We need to save T and D sure, but I am not sure why we need to save N as a buffer. We can get the num embeddings by
embedding_layer.weight.shape[0]
The request from Jason was to make the parameters a nn.Embedding layer. Creating this layer requires knowing the value of N ahead of time. Like:
num_baked_embeddings = self.register_buffer('num_baked_embeddings ', 0)
self.embedding_layer = torch.nn.Embedding(num_embeddings=num_baked_embeddings, embedding_dim=max_context_len * context_dim, _freeze=True)
This is a clean approach that guarantees the embedding is the correct dimensions when the model is first loaded. Otherwise if the embedding_layer happens to be wrong in the model checkpoint, then you would only know by hitting errors at inference time. Granted this is a pretty minor concern.
There was a problem hiding this comment.
I would have this branch decided by whether speaker_indices is None or not. If speaker indices is provided, then use this path, else use the context encoder. An error should be thrown if someone provides neither or both context_audio and speaker_indices.
The idea is if baked_embeddings are there the user should only use that and not context encoder. If speaker_indices are not provided and baked_embeddings are present, the default behavior is to use the first speaker embedding. I thought this is what we had decided - baked_embeddings so we do not let users use any other voice.
I think we assume, and we should add this as a check to ensure that our assumption is correct, that a model can either have backed context embeddings OR a context encoder. Not both. So the path where speaker_indices is None is ok with me
There was a problem hiding this comment.
Unrelated to this release, there should be some inference path that allows a user to pre-compute and reuse context embeddings. So that when hosting the model, it is not necessary to recompute the same context for every request. It would be nice if that use case and the baked context embeddings could be consolidated, but we can revisit that in the later PR which defines the actual inference functions/APIs.
… print->logging Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com>
|
[🤖]: Hi @subhankar-ghosh 👋, We wanted to let you know that a CICD pipeline for this PR just finished successfully. So it might be time to merge this PR or get some approvals. |
* Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Modify magpie CI with inference changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * infer_batch returns dataclass object Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Fixed context embedding without context encoder Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> * Removing unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Review changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Linting issue Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * support multiple voices - baked context embeddings Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * move evalset_config to json Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Modify magpie CI with inference changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * infer_batch returns dataclass object Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Fixed context embedding without context encoder Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Removing unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Review changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Linting issue Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * support multiple voices - baked context embeddings Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * move evalset_config to json Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * simplifying get_baked_context_embeddings_batch Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Fix logging Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * logging changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Changed baked_context_embeddings from tensor to flattened embeddings, print->logging Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Removed comments and print statements Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> --------- Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> Co-authored-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> Signed-off-by: Akhil Varanasi <akhilvaranasi23@gmail.com>
* Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Modify magpie CI with inference changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * infer_batch returns dataclass object Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Fixed context embedding without context encoder Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> * Removing unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Review changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Linting issue Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * support multiple voices - baked context embeddings Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * move evalset_config to json Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Modularize magpie inference code, move inference code from scripts to example Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Modify magpie CI with inference changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Renaming magpietts inference scripts from magpie to magpietts Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * infer_batch returns dataclass object Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Fixed context embedding without context encoder Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Removing unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unnecessary configurations Removed multiple long manifest configurations from evalset_config.py. Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Copilot suggested changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Move inference helper modules from examples to tts collection Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Review changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Changes suggested in compute_mean_with_confidence_interval Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Linting issue Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * support multiple voices - baked context embeddings Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * move evalset_config to json Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * simplifying get_baked_context_embeddings_batch Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Fix logging Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Apply isort and black reformatting Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> * Remove unused imports Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * logging changes Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Changed baked_context_embeddings from tensor to flattened embeddings, print->logging Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> * Removed comments and print statements Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> --------- Signed-off-by: subhankar-ghosh <subhankar2321@gmail.com> Signed-off-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com> Signed-off-by: Subhankar Ghosh <subhankarg@nvidia.com> Co-authored-by: subhankar-ghosh <subhankar-ghosh@users.noreply.github.com>
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Copy of [TTS] MagpieTTS Inference Changes #15158
Because after rebasing with main git showed the PR had 231 files changed, which was wrong. So I created a fresh PR. This shows the correct number of files changed.
Collection: [Note which collection this PR will affect]
Changelog
Refactored Inference script
a) Modularized code
b) Separated inference and evaluation logic
c) Separated different configs
d) Lays the groundwork for longform inference
infer_batch returns dataclass object now
Moved inference code from scripts to examples
Added fixed context embeddings instead of getting it through context encoder, in case checkpoint does not have context encoder
Removed evalset_config.py and replaced with evalset.json
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information