-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[do-not-merge] SpeechLLM dev branch #9474
Conversation
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
predict Signed-off-by: zhehuaichen <dian.chenzhehuai@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
…omized_round_robin Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
…own batch settings that can be merged with zip sampler to enjoy max batch sizes for both modalities in a single training step. Each modality runs fwd+bwd in turn to save GPU memory (instead of running fwd separately and bwd together). Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
# elif cur_idx + tokenized_len < tgt_len: | ||
# # Check whether the mask is applied to the correct position, the first token is turn start tokens | ||
# if not torch.equal(target[cur_idx + 1 : cur_idx + tokenized_len], s_id[1:]): | ||
# logging.warning("a sentence mismatches the corresponding piece " "in the conversation") |
Check notice
Code scanning / CodeQL
Commented-out code Note
audio_batch = {k: v for k, v in batch.items() if not k.startswith("text_")} | ||
text_batch = {k: v for k, v in batch.items() if k.startswith("text_")} | ||
|
||
output, loss_mask = None, None |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
audio_batch = {k: v for k, v in batch.items() if not k.startswith("text_")} | ||
text_batch = {k: v for k, v in batch.items() if k.startswith("text_")} | ||
|
||
output, loss_mask = None, None |
Check warning
Code scanning / CodeQL
Variable defined multiple times Warning
redefined
@@ -15,28 +15,30 @@ | |||
import warnings | |||
from dataclasses import dataclass | |||
from functools import partial | |||
from typing import Any, Optional, TypeVar, Union | |||
from typing import Any, List, Optional, TypeVar, Union |
Check notice
Code scanning / CodeQL
Unused import Note
from lhotse.lazy import LazyFlattener | ||
from lhotse.utils import fastcopy, fix_random_seed | ||
from omegaconf import DictConfig, OmegaConf | ||
from omegaconf import DictConfig, ListConfig, OmegaConf |
Check notice
Code scanning / CodeQL
Unused import Note
@@ -1,7 +1,11 @@ | |||
from typing import Optional |
Check notice
Code scanning / CodeQL
Unused import Note
from lhotse.utils import Pathlike | ||
|
||
from nemo.collections.common.data.lhotse.nemo_adapters import expand_sharded_filepaths | ||
from nemo.collections.common.tokenizers.aggregate_tokenizer import AggregateTokenizer, TokenizerWrapper | ||
from nemo.collections.common.tokenizers.tokenizer_spec import TokenizerSpec | ||
from nemo.utils import logging |
Check notice
Code scanning / CodeQL
Unused import Note
def forward( | ||
self, | ||
batch, | ||
checkpoint_activations_all_layers, | ||
): |
Check warning
Code scanning / CodeQL
Signature mismatch in overriding method Warning
overridden method
# if torch.distributed.is_initialized(): | ||
# global_max_len = torch.tensor([seq_length], dtype=torch.float32, device=device) | ||
|
||
# Update across all ranks in the distributed system | ||
torch.distributed.all_reduce(global_max_len, op=torch.distributed.ReduceOp.MAX) | ||
# # Update across all ranks in the distributed system | ||
# torch.distributed.all_reduce(global_max_len, op=torch.distributed.ReduceOp.MAX) | ||
|
||
seq_length = global_max_len.int().item() | ||
# seq_length = global_max_len.int().item() |
Check notice
Code scanning / CodeQL
Commented-out code Note
# if log_token_counts: | ||
# self.log('seq_length_padded', seq_length, prog_bar=True, batch_size=1) | ||
# self.log('tokens_avg', token_count_avg, prog_bar=True, sync_dist=True, batch_size=1) |
Check notice
Code scanning / CodeQL
Commented-out code Note
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
What does this PR do ?
This PR is for tracking the changes in speech-llm main development branch w.r.t. main.
Collection: multimodal
Changelog
Usage
# Add a code snippet demonstrating how to use this
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information