Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spoken Language Identification #4846

Merged
merged 11 commits into from
Aug 31, 2022
Merged

Spoken Language Identification #4846

merged 11 commits into from
Aug 31, 2022

Conversation

fayejf
Copy link
Collaborator

@fayejf fayejf commented Aug 31, 2022

What does this PR do ?

Add training script and config (titanet) for spoken language identification.

Collection:ASR

Changelog

  • Calculate weight for using weighted cross entropy for imbalanced training set.
  • Add val_acc_macro and val_auroc for model selection. Support test_acc_macro and test_aucro as well.
  • Extend speech_to_label.py to support language identification.
  • Add model config file which achieves SOTA performance on voxlingua107.

Usage

python speech_to_label.py  --config-path="../conf/lang_id" --config-name="titanet_large" \
model.train_ds.manifest_filepath=<train_manifest> \
model.validation_ds.manifest_filepath=<dev_manifest> \
model.train_ds.augmentor.noise.manifest_path=<noise_manifest>  \
model.train_ds.augmentor.impulse.manifest_path=<impulse_manifest>  \
exp_manager.wandb_logger_kwargs.name="titanet_large" \
exp_manager.wandb_logger_kwargs.project="lang_id" \
+exp_manager.checkpoint_callback_params.monitor="val_acc_macro" \
+exp_manager.checkpoint_callback_params.mode="max" 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
    [doc will be updated in next PR]
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 31, 2022

This pull request introduces 2 alerts when merging e0299de into d19146c - view on LGTM.com

new alerts:

  • 2 for Variable defined multiple times

@fayejf fayejf requested a review from nithinraok August 31, 2022 01:07
Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jenkins is failing, you would need to fix it.

Did you get approval to release the model? if yes then you might need to add that as well for pretrained models list

min_lr: 0.0001

trainer:
devices: 2 # number of gpus (original titanet-large was trained on 4 nodes with 8 gpus each)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may update the comment or is it still the same?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's the same.

logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')

trainer = pl.Trainer(**cfg.trainer)
exp_manager(trainer, cfg.get("exp_manager", None))
asr_model = EncDecClassificationModel(cfg=cfg.model, trainer=trainer)

if cfg.name == 'TitaNet':
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add lower and see if titanet in cfg.name.lower(), so they can have their own config with titanet append or prepended to it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good point. Updated it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a temp trick. I didn't add task conditions here because we will refactor the model later.

asr_model = EncDecClassificationModel(cfg=cfg.model, trainer=trainer)

if cfg.name == 'TitaNet':
the_model = EncDecSpeakerLabelModel(cfg=cfg.model, trainer=trainer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: may be just model is enough?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep


labels.append(item['label'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you meant to write labels.append(label) here I guess?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!!!!

fayejf and others added 3 commits August 31, 2022 00:18
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
@fayejf
Copy link
Collaborator Author

fayejf commented Aug 31, 2022

Did you get approval to release the model? if yes then you might need to add that as well for pretrained models list.

Yes, I will submit nvbug to publish the model and add link to bug fix branch. will possibly also need to update the suggested optim in yaml file then.

fayejf and others added 3 commits August 31, 2022 02:05
Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: fayejf <fayejf07@gmail.com>
@fayejf fayejf requested a review from nithinraok August 31, 2022 16:10
Copy link
Collaborator

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fayejf fayejf merged commit e8ba60b into main Aug 31, 2022
@fayejf fayejf deleted the lang_id_pr branch August 31, 2022 17:44
Jorjeous pushed a commit that referenced this pull request Aug 31, 2022
* data and cal weight

Signed-off-by: fayejf <fayejf07@gmail.com>

* add config yaml file

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove impulse for simplicity

Signed-off-by: fayejf <fayejf07@gmail.com>

* add langid to speech class train script

Signed-off-by: fayejf <fayejf07@gmail.com>

* style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* auroc and marco acc for val and test

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect nithin's comment and fix test/ci

Signed-off-by: fayejf <fayejf07@gmail.com>

* bring back impulse

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix test

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
Jorjeous pushed a commit that referenced this pull request Sep 8, 2022
* data and cal weight

Signed-off-by: fayejf <fayejf07@gmail.com>

* add config yaml file

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove impulse for simplicity

Signed-off-by: fayejf <fayejf07@gmail.com>

* add langid to speech class train script

Signed-off-by: fayejf <fayejf07@gmail.com>

* style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* auroc and marco acc for val and test

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect nithin's comment and fix test/ci

Signed-off-by: fayejf <fayejf07@gmail.com>

* bring back impulse

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix test

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>
@guillermo-gabrielli-fer
Copy link

guillermo-gabrielli-fer commented Sep 14, 2022

Are the model weights publicly available?

@fayejf
Copy link
Collaborator Author

fayejf commented Sep 14, 2022

@guillermo-gabrielli-fer The model would possibly be publicly available next week.

@fayejf
Copy link
Collaborator Author

fayejf commented Sep 27, 2022

@guillermo-gabrielli-fer We've had a better version of the lang id model and in the process of evaluating and publishing. It will publish ASAP and I will let you know once it's done. Thanks for your patience.

jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 3, 2022
* data and cal weight

Signed-off-by: fayejf <fayejf07@gmail.com>

* add config yaml file

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove impulse for simplicity

Signed-off-by: fayejf <fayejf07@gmail.com>

* add langid to speech class train script

Signed-off-by: fayejf <fayejf07@gmail.com>

* style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* auroc and marco acc for val and test

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect nithin's comment and fix test/ci

Signed-off-by: fayejf <fayejf07@gmail.com>

* bring back impulse

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix test

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
@fayejf
Copy link
Collaborator Author

fayejf commented Oct 5, 2022

@guillermo-gabrielli-fer The model is published. Thanks for you patience. #5080

@Oscaarjs
Copy link

Oscaarjs commented Oct 7, 2022

@fayejf Thanks for the release of the model, is there currently any function for pure inference on an audio-file? Similar to e.g. "verify_speakers" that can be used with Titanet?

@nithinraok
Copy link
Collaborator

you may use this function

def get_embedding(self, path2audio_file):
with slight modification at this line to get logits as well. Once you get logits, doing argmax on the time axis will get the output label. These can be mapped to the corresponding label id in model.labels

@titu1994
Copy link
Collaborator

titu1994 commented Oct 7, 2022

@nithinraok might as well make this a function publically available in the Titanet model itself

@nithinraok
Copy link
Collaborator

Yes will create one.

@fayejf
Copy link
Collaborator Author

fayejf commented Oct 8, 2022

Good point guys. Will add an general infer/predict function in classification models and label models

hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* data and cal weight

Signed-off-by: fayejf <fayejf07@gmail.com>

* add config yaml file

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove impulse for simplicity

Signed-off-by: fayejf <fayejf07@gmail.com>

* add langid to speech class train script

Signed-off-by: fayejf <fayejf07@gmail.com>

* style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* auroc and marco acc for val and test

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect nithin's comment and fix test/ci

Signed-off-by: fayejf <fayejf07@gmail.com>

* bring back impulse

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix test

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
* data and cal weight

Signed-off-by: fayejf <fayejf07@gmail.com>

* add config yaml file

Signed-off-by: fayejf <fayejf07@gmail.com>

* remove impulse for simplicity

Signed-off-by: fayejf <fayejf07@gmail.com>

* add langid to speech class train script

Signed-off-by: fayejf <fayejf07@gmail.com>

* style fix

Signed-off-by: fayejf <fayejf07@gmail.com>

* auroc and marco acc for val and test

Signed-off-by: fayejf <fayejf07@gmail.com>

* reflect nithin's comment and fix test/ci

Signed-off-by: fayejf <fayejf07@gmail.com>

* bring back impulse

Signed-off-by: fayejf <fayejf07@gmail.com>

* fix test

Signed-off-by: fayejf <fayejf07@gmail.com>

Signed-off-by: fayejf <fayejf07@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants