Spoken Language Identification #4846

fayejf · 2022-08-31T00:52:17Z

What does this PR do ?

Add training script and config (titanet) for spoken language identification.

Collection:ASR

Changelog

Calculate weight for using weighted cross entropy for imbalanced training set.
Add val_acc_macro and val_auroc for model selection. Support test_acc_macro and test_aucro as well.
Extend speech_to_label.py to support language identification.
Add model config file which achieves SOTA performance on voxlingua107.

Usage

python speech_to_label.py  --config-path="../conf/lang_id" --config-name="titanet_large" \
model.train_ds.manifest_filepath=<train_manifest> \
model.validation_ds.manifest_filepath=<dev_manifest> \
model.train_ds.augmentor.noise.manifest_path=<noise_manifest>  \
model.train_ds.augmentor.impulse.manifest_path=<impulse_manifest>  \
exp_manager.wandb_logger_kwargs.name="titanet_large" \
exp_manager.wandb_logger_kwargs.project="lang_id" \
+exp_manager.checkpoint_callback_params.monitor="val_acc_macro" \
+exp_manager.checkpoint_callback_params.mode="max"

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
[doc will be updated in next PR]
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: fayejf <fayejf07@gmail.com>

lgtm-com · 2022-08-31T01:03:03Z

This pull request introduces 2 alerts when merging e0299de into d19146c - view on LGTM.com

new alerts:

2 for Variable defined multiple times

nithinraok

Jenkins is failing, you would need to fix it.

Did you get approval to release the model? if yes then you might need to add that as well for pretrained models list

nithinraok · 2022-08-31T04:04:00Z

examples/asr/conf/lang_id/titanet_large.yaml

+      min_lr: 0.0001
+
+trainer:
+  devices: 2 # number of gpus (original titanet-large was trained on 4 nodes with 8 gpus each)


you may update the comment or is it still the same?

it's the same.

nithinraok · 2022-08-31T04:05:49Z

examples/asr/speech_classification/speech_to_label.py

    logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')

    trainer = pl.Trainer(**cfg.trainer)
    exp_manager(trainer, cfg.get("exp_manager", None))
-    asr_model = EncDecClassificationModel(cfg=cfg.model, trainer=trainer)
+
+    if cfg.name == 'TitaNet':


add lower and see if titanet in cfg.name.lower(), so they can have their own config with titanet append or prepended to it

yeah good point. Updated it.

So this is a temp trick. I didn't add task conditions here because we will refactor the model later.

nithinraok · 2022-08-31T04:06:24Z

examples/asr/speech_classification/speech_to_label.py

-    asr_model = EncDecClassificationModel(cfg=cfg.model, trainer=trainer)
+
+    if cfg.name == 'TitaNet':
+        the_model = EncDecSpeakerLabelModel(cfg=cfg.model, trainer=trainer)


minor: may be just model is enough?

nithinraok · 2022-08-31T04:16:00Z

nemo/collections/common/parts/preprocessing/collections.py


+            labels.append(item['label'])


you meant to write labels.append(label) here I guess?

good catch!!!!

Signed-off-by: fayejf <fayejf07@gmail.com>

fayejf · 2022-08-31T08:59:01Z

Did you get approval to release the model? if yes then you might need to add that as well for pretrained models list.

Yes, I will submit nvbug to publish the model and add link to bug fix branch. will possibly also need to update the suggested optim in yaml file then.

Signed-off-by: fayejf <fayejf07@gmail.com>

nithinraok

LGTM

* data and cal weight Signed-off-by: fayejf <fayejf07@gmail.com> * add config yaml file Signed-off-by: fayejf <fayejf07@gmail.com> * remove impulse for simplicity Signed-off-by: fayejf <fayejf07@gmail.com> * add langid to speech class train script Signed-off-by: fayejf <fayejf07@gmail.com> * style fix Signed-off-by: fayejf <fayejf07@gmail.com> * auroc and marco acc for val and test Signed-off-by: fayejf <fayejf07@gmail.com> * reflect nithin's comment and fix test/ci Signed-off-by: fayejf <fayejf07@gmail.com> * bring back impulse Signed-off-by: fayejf <fayejf07@gmail.com> * fix test Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com>

guillermo-gabrielli-fer · 2022-09-14T12:42:06Z

Are the model weights publicly available?

fayejf · 2022-09-14T17:12:42Z

@guillermo-gabrielli-fer The model would possibly be publicly available next week.

fayejf · 2022-09-27T00:37:31Z

@guillermo-gabrielli-fer We've had a better version of the lang id model and in the process of evaluating and publishing. It will publish ASAP and I will let you know once it's done. Thanks for your patience.

* data and cal weight Signed-off-by: fayejf <fayejf07@gmail.com> * add config yaml file Signed-off-by: fayejf <fayejf07@gmail.com> * remove impulse for simplicity Signed-off-by: fayejf <fayejf07@gmail.com> * add langid to speech class train script Signed-off-by: fayejf <fayejf07@gmail.com> * style fix Signed-off-by: fayejf <fayejf07@gmail.com> * auroc and marco acc for val and test Signed-off-by: fayejf <fayejf07@gmail.com> * reflect nithin's comment and fix test/ci Signed-off-by: fayejf <fayejf07@gmail.com> * bring back impulse Signed-off-by: fayejf <fayejf07@gmail.com> * fix test Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>

fayejf · 2022-10-05T17:11:59Z

@guillermo-gabrielli-fer The model is published. Thanks for you patience. #5080

Oscaarjs · 2022-10-07T09:11:50Z

@fayejf Thanks for the release of the model, is there currently any function for pure inference on an audio-file? Similar to e.g. "verify_speakers" that can be used with Titanet?

nithinraok · 2022-10-07T09:20:41Z

you may use this function

NeMo/nemo/collections/asr/models/label_models.py

Line 413 in 7546e4e

def get_embedding(self, path2audio_file):

with slight modification at this line to get logits as well. Once you get logits, doing argmax on the time axis will get the output label. These can be mapped to the corresponding label id in model.labels

titu1994 · 2022-10-07T16:55:51Z

@nithinraok might as well make this a function publically available in the Titanet model itself

nithinraok · 2022-10-07T18:37:37Z

Yes will create one.

fayejf · 2022-10-08T01:33:01Z

Good point guys. Will add an general infer/predict function in classification models and label models

* data and cal weight Signed-off-by: fayejf <fayejf07@gmail.com> * add config yaml file Signed-off-by: fayejf <fayejf07@gmail.com> * remove impulse for simplicity Signed-off-by: fayejf <fayejf07@gmail.com> * add langid to speech class train script Signed-off-by: fayejf <fayejf07@gmail.com> * style fix Signed-off-by: fayejf <fayejf07@gmail.com> * auroc and marco acc for val and test Signed-off-by: fayejf <fayejf07@gmail.com> * reflect nithin's comment and fix test/ci Signed-off-by: fayejf <fayejf07@gmail.com> * bring back impulse Signed-off-by: fayejf <fayejf07@gmail.com> * fix test Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>

fayejf added 5 commits August 30, 2022 16:06

data and cal weight

b5895f9

Signed-off-by: fayejf <fayejf07@gmail.com>

add config yaml file

99d34e5

Signed-off-by: fayejf <fayejf07@gmail.com>

remove impulse for simplicity

e156fad

Signed-off-by: fayejf <fayejf07@gmail.com>

add langid to speech class train script

106b4cb

Signed-off-by: fayejf <fayejf07@gmail.com>

style fix

e0299de

Signed-off-by: fayejf <fayejf07@gmail.com>

fayejf requested a review from nithinraok August 31, 2022 01:07

nithinraok requested changes Aug 31, 2022

View reviewed changes

fayejf and others added 3 commits August 31, 2022 00:18

auroc and marco acc for val and test

6bc83b0

Signed-off-by: fayejf <fayejf07@gmail.com>

reflect nithin's comment and fix test/ci

927a1f9

Signed-off-by: fayejf <fayejf07@gmail.com>

Merge branch 'main' into lang_id_pr

e4bab5a

fayejf and others added 3 commits August 31, 2022 02:05

bring back impulse

ff4a678

Signed-off-by: fayejf <fayejf07@gmail.com>

fix test

3fd4e41

Signed-off-by: fayejf <fayejf07@gmail.com>

Merge branch 'main' into lang_id_pr

8982c2c

fayejf requested a review from nithinraok August 31, 2022 16:10

nithinraok approved these changes Aug 31, 2022

View reviewed changes

fayejf merged commit e8ba60b into main Aug 31, 2022

fayejf deleted the lang_id_pr branch August 31, 2022 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spoken Language Identification #4846

Spoken Language Identification #4846

fayejf commented Aug 31, 2022 •

edited

Loading

lgtm-com bot commented Aug 31, 2022

nithinraok left a comment

nithinraok Aug 31, 2022

fayejf Aug 31, 2022

nithinraok Aug 31, 2022

fayejf Aug 31, 2022

fayejf Aug 31, 2022

nithinraok Aug 31, 2022

fayejf Aug 31, 2022

nithinraok Aug 31, 2022

fayejf Aug 31, 2022

fayejf commented Aug 31, 2022

nithinraok left a comment

guillermo-gabrielli-fer commented Sep 14, 2022 •

edited

Loading

fayejf commented Sep 14, 2022

fayejf commented Sep 27, 2022

fayejf commented Oct 5, 2022

Oscaarjs commented Oct 7, 2022

nithinraok commented Oct 7, 2022

titu1994 commented Oct 7, 2022

nithinraok commented Oct 7, 2022

fayejf commented Oct 8, 2022

Spoken Language Identification #4846

Spoken Language Identification #4846

Conversation

fayejf commented Aug 31, 2022 • edited Loading

What does this PR do ?

Changelog

Usage

Before your PR is "Ready for review"

Who can review?

Additional Information

lgtm-com bot commented Aug 31, 2022

nithinraok left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fayejf commented Aug 31, 2022

nithinraok left a comment

Choose a reason for hiding this comment

guillermo-gabrielli-fer commented Sep 14, 2022 • edited Loading

fayejf commented Sep 14, 2022

fayejf commented Sep 27, 2022

fayejf commented Oct 5, 2022

Oscaarjs commented Oct 7, 2022

nithinraok commented Oct 7, 2022

titu1994 commented Oct 7, 2022

nithinraok commented Oct 7, 2022

fayejf commented Oct 8, 2022

fayejf commented Aug 31, 2022 •

edited

Loading

guillermo-gabrielli-fer commented Sep 14, 2022 •

edited

Loading