[do-not-merge] SpeechLLM dev branch #9474

pzelasko · 2024-06-14T16:01:00Z

What does this PR do ?

This PR is for tracking the changes in speech-llm main development branch w.r.t. main.

Collection: multimodal

Changelog

Add specific line by line info of high level changes in this PR.

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

predict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

…omized_round_robin Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

…own batch settings that can be merged with zip sampler to enjoy max batch sizes for both modalities in a single training step. Each modality runs fwd+bwd in turn to save GPU memory (instead of running fwd separately and bwd together). Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

nemo/collections/common/data/lhotse/text_adapters.py

+        # elif cur_idx + tokenized_len < tgt_len:
+        #     # Check whether the mask is applied to the correct position, the first token is turn start tokens
+        #     if not torch.equal(target[cur_idx + 1 : cur_idx + tokenized_len], s_id[1:]):
+        #         logging.warning("a sentence mismatches the corresponding piece " "in the conversation")


nemo/collections/multimodal/speech_llm/models/modular_models.py

+        audio_batch = {k: v for k, v in batch.items() if not k.startswith("text_")}
+        text_batch = {k: v for k, v in batch.items() if k.startswith("text_")}
+
+        output, loss_mask = None, None


nemo/collections/multimodal/speech_llm/models/modular_models.py

+        audio_batch = {k: v for k, v in batch.items() if not k.startswith("text_")}
+        text_batch = {k: v for k, v in batch.items() if k.startswith("text_")}
+
+        output, loss_mask = None, None


nemo/collections/common/data/lhotse/dataloader.py

@@ -15,28 +15,30 @@
 import warnings
 from dataclasses import dataclass
 from functools import partial
-from typing import Any, Optional, TypeVar, Union
+from typing import Any, List, Optional, TypeVar, Union


nemo/collections/common/data/lhotse/dataloader.py

 from lhotse.lazy import LazyFlattener
 from lhotse.utils import fastcopy, fix_random_seed
-from omegaconf import DictConfig, OmegaConf
+from omegaconf import DictConfig, ListConfig, OmegaConf


nemo/collections/multimodal/speech_llm/data/lhotse_dataset.py

@@ -1,7 +1,11 @@
+from typing import Optional


nemo/collections/common/data/lhotse/text_adapters.py

 from lhotse.utils import Pathlike

 from nemo.collections.common.data.lhotse.nemo_adapters import expand_sharded_filepaths
+from nemo.collections.common.tokenizers.aggregate_tokenizer import AggregateTokenizer, TokenizerWrapper
+from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec
+from nemo.utils import logging


nemo/collections/multimodal/speech_llm/models/modular_models.py

+    def forward(
+        self,
+        batch,
+        checkpoint_activations_all_layers,
+    ):


nemo/collections/asr/modules/conformer_encoder.py

+        # if torch.distributed.is_initialized():
+        #    global_max_len = torch.tensor([seq_length], dtype=torch.float32, device=device)

-            # Update across all ranks in the distributed system
-            torch.distributed.all_reduce(global_max_len, op=torch.distributed.ReduceOp.MAX)
+        #    # Update across all ranks in the distributed system
+        #    torch.distributed.all_reduce(global_max_len, op=torch.distributed.ReduceOp.MAX)

-            seq_length = global_max_len.int().item()
+        #    seq_length = global_max_len.int().item()


nemo/collections/multimodal/speech_llm/models/modular_models.py

+            # if log_token_counts:
+            #     self.log('seq_length_padded', seq_length, prog_bar=True, batch_size=1)
+            #     self.log('tokens_avg', token_count_avg, prog_bar=True, sync_dist=True, batch_size=1)


github-actions · 2024-06-29T01:47:17Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-07-07T01:52:56Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

pzelasko and others added 16 commits June 14, 2024 10:07

Port changes related to SFT text+speech dataloading

ffeb83b

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Revert changes from Canary(nonLLM) code

5af97ff

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Add joint text/audio dataloading capability to speechllm

af24fe6

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

include text-only into fprop of training and eval; TODO: text-only

72ce606

predict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>

Actually working forward step

3f0e432

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Support for source-target text file pair training for MT+speech

be5e39c

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Include supervision text tokens in audio example's num tokens

1b76242

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Disable conformer seq len NCCL sync

db16524

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Preliminary sampler fusion stragies support: mux/zip/round_robin/rand…

0d301f4

…omized_round_robin Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Add missing config

e3581f0

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Revert multimodal grad accum and fix mask padding issue

8ef615b

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Add modality weights support via cfg.model.modality_weights

150433f

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Fix for V2 dataloader shuffling CRITICAL

0dbdcf3

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Restore multimodal grad accum

2b774db

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

Fix unit tests for multi-sampler configurations

4f4ffec

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

github-actions bot added ASR common Multi Modal labels Jun 14, 2024

Apply isort and black reformatting

6653c2e

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

github-advanced-security bot found potential problems Jun 14, 2024

View reviewed changes

github-actions bot added the stale label Jun 29, 2024

github-actions bot closed this Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[do-not-merge] SpeechLLM dev branch #9474

[do-not-merge] SpeechLLM dev branch #9474

pzelasko commented Jun 14, 2024

github-actions bot commented Jun 29, 2024

github-actions bot commented Jul 7, 2024

[do-not-merge] SpeechLLM dev branch #9474

[do-not-merge] SpeechLLM dev branch #9474

Conversation

pzelasko commented Jun 14, 2024

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

github-actions bot commented Jun 29, 2024

github-actions bot commented Jul 7, 2024