Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update doc in terms of get_label for lang id model #5366

Merged
merged 4 commits into from
Nov 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions docs/source/asr/speaker_recognition/results.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ For extracting embeddings from a single file:
.. code-block:: python

speaker_model = EncDecSpeakerLabelModel.from_pretrained(model_name="<pretrained_model_name or path/to/nemo/file>")
embs = speaker_model.get_embedding('audio_path')
embs = speaker_model.get_embedding('<audio_path>')

For extracting embeddings from a bunch of files:

Expand All @@ -78,7 +78,14 @@ This python call will download best pretrained model from NGC and writes embeddi
.. code-block:: bash

python examples/speaker_tasks/recognition/extract_speaker_embeddings.py --manifest=manifest.json


or you can run `batch_inference()` to perform inference on the manifest with seleted batch_size to get embeddings

.. code-block:: python

speaker_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="<pretrained_model_name or path/to/nemo/file>")
embs, logits, gt_labels, mapped_labels = speaker_model.batch_inference(manifest, batch_size=32)

Speaker Verification Inference
------------------------------

Expand Down
4 changes: 2 additions & 2 deletions docs/source/asr/speech_classification/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ MarbleNet models can be instantiated using the :class:`~nemo.collections.asr.mod
AmberNet (Lang ID)
------------------

AmberNet is an end-to-end neural network for language identification moden based on `TitanNet <../speaker_recognition/models.html#titanet>`__.
AmberNet is an end-to-end neural network for language identification moden based on `TitaNet <../speaker_recognition/models.html#titanet>`__.

It can reach state-of-the art performance on the `Voxlingua107 dataset <http://bark.phon.ioc.ee/voxlingua107/>`_ while having significantly fewer parameters than similar models.
AmberNet models can be instantiated using the :class:`~nemo.collections.asr.models.EncDecSpeakerLabelModel` class.
Expand All @@ -81,4 +81,4 @@ References
.. bibliography:: ../asr_all.bib
:style: plain
:labelprefix: SC-MODELS
:keyprefix: sc-models-
:keyprefix: sc-models-
21 changes: 19 additions & 2 deletions docs/source/asr/speech_classification/results.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Transcribing/Inference

The audio files should be 16KHz monochannel wav files.

**Transcribe speech command segment:**
`Transcribe speech command segment:`

You may perform inference and transcribe a sample of speech after loading the model by using its 'transcribe()' method:

Expand All @@ -47,7 +47,7 @@ Setting argument ``logprobs`` to True would return the log probabilities instead
Learn how to fine tune on your own data or on subset classes in ``<NeMo_git_root>/tutorials/asr/Speech_Commands.ipynb``


**Run VAD inference:**
`Run VAD inference:`

.. code-block:: bash

Expand All @@ -72,6 +72,23 @@ Filtering:
- ``filter_speech_first`` to control whether to perform short speech segment deletion first.


`Identify language of utterance`

You may load the model and identify the language of an audio file by using `get_label()` method:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may -> One can


.. code-block:: python

langid_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="<MODEL_NAME>")
lang = langid_model.get_label('<audio_path>')

or you can run `batch_inference()` to perform inference on a manifest with seleted batch_size to get mapped_labels

.. code-block:: python

langid_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="<MODEL_NAME>")
lang_embs, logits, gt_labels, mapped_labels = langid_model.batch_inference(manifest_filepath, batch_size=32)


NGC Pretrained Checkpoints
--------------------------

Expand Down