Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SpeechLLM docs #9780

Merged
merged 10 commits into from
Jul 23, 2024
Merged

Add SpeechLLM docs #9780

merged 10 commits into from
Jul 23, 2024

Conversation

stevehuang52
Copy link
Collaborator

What does this PR do ?

Add docs to SpeechLLM

Collection: [multimodal]

Signed-off-by: stevehuang52 <heh@nvidia.com>
zhehuaichen
zhehuaichen previously approved these changes Jul 18, 2024
Copy link
Collaborator

@zhehuaichen zhehuaichen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you so much!

docs/source/multimodal/speech_llm/configs.rst Show resolved Hide resolved
stevehuang52 and others added 2 commits July 18, 2024 13:16
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
zhehuaichen
zhehuaichen previously approved these changes Jul 23, 2024
Copy link
Collaborator

@titu1994 titu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requires images to be moved to GH release, rest all are minor comments

docs/source/multimodal/speech_llm/configs.rst Show resolved Hide resolved
"answer": "the transcription of the audio", # optional for inference, default to "na" in dataloader
}


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support more variations of what does the audio mean now right ?


The `context` field in the manifest is optional, and you can put a list of context in a context file (one context for each line) then set `++model.data.train_ds.context_file=<path to to context file>` to ask the dataloader to randomly pick a context from the file for each audio sample. This is useful for training with multiple prompts for the same task. If neither `context` field nor `context_file` is provided, the dataloader will use a default context `what does the audio mean?` for all audios. During inference, it is recommended to have the `context` field in the manifest.

Customizing the fields to use
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the use of prompt_template here conflicts with Canary model (and speechlm) PromptFormatter class which also uses a model.cfg.prompt_format called Canary. Just a note

docs/source/multimodal/speech_llm/datasets.rst Outdated Show resolved Hide resolved
------------------------------


In order to use a context file, you can set `++model.data.train_ds.context_file=<path to to context file>` in the command line or use multiple context files with `++model.data.train_ds.context_file=[<path to to context file1>,<path to context file2>,...]`. If the number of context files is equal to the number of provided datasets, the dataloader will assigne each context file to a dataset. Otherwise, the dataloader will randomly pick a context file from all provided context files for each audio sample. Using multiple context files is useful for training with multiple tasks, where each task has its own set of prompts. Meanwhile, you can control the weights for different tasks/datasets by using concatentated tarred datasets, where you can assign weights to datasets by:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the task and the context are wildly different during sampling ? Ie for ASR and AST ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each dataset can have it's own list of context files, such that ASR and ASR can sample from each pool separately

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, is this mentioned somewhere else ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add images to git. Upload file to last release, and put url in rst

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Signed-off-by: stevehuang52 <heh@nvidia.com>
------------------------------


In order to use a context file, you can set `++model.data.train_ds.context_file=<path to to context file>` in the command line or use multiple context files with `++model.data.train_ds.context_file=[<path to to context file1>,<path to context file2>,...]`. If the number of context files is equal to the number of provided datasets, the dataloader will assigne each context file to a dataset. Otherwise, the dataloader will randomly pick a context file from all provided context files for each audio sample. Using multiple context files is useful for training with multiple tasks, where each task has its own set of prompts. Meanwhile, you can control the weights for different tasks/datasets by using concatentated tarred datasets, where you can assign weights to datasets by:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, is this mentioned somewhere else ?

@titu1994 titu1994 merged commit 9c06389 into main Jul 23, 2024
12 checks passed
@titu1994 titu1994 deleted the add_speechlm_docs branch July 23, 2024 19:11
tonyjie pushed a commit to tonyjie/NeMo that referenced this pull request Jul 24, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
akoumpa pushed a commit that referenced this pull request Jul 25, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
BoxiangW pushed a commit to BoxiangW/NeMo that referenced this pull request Jul 30, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Signed-off-by: Boxiang Wang <boxiangw@nvidia.com>
xuanzic pushed a commit to xuanzic/NeMo that referenced this pull request Aug 1, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Signed-off-by: Vivian Chen <xuanzic@example.com>
kchike pushed a commit to kchike/NeMo that referenced this pull request Aug 8, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Signed-off-by: kchike <kohei.chike@jp.ricoh.com>
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* add docs

Signed-off-by: stevehuang52 <heh@nvidia.com>

* add lhotse specific info

Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

* move images to github release 1.23

Signed-off-by: stevehuang52 <heh@nvidia.com>

* clean up

Signed-off-by: stevehuang52 <heh@nvidia.com>

---------

Signed-off-by: stevehuang52 <heh@nvidia.com>
Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Co-authored-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants