Support Canary parallel inference #9517

karpnv · 2024-06-21T13:27:34Z

What does this PR do ?

Support Canary at transcribe_speech_parallel.py script

Collection: ASR

#python3 ./examples/asr/transcribe_speech_parallel.py model=./canary-1b.nemo predict_ds.manifest_filepath=./manifest.json output_path=/tmp trainer.devices=-1

PR Type:

[ V] New Feature
Bugfix
Documentation

Who can review?

@pzelasko

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

…llel

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

examples/asr/transcribe_speech_parallel.py

pzelasko · 2024-06-24T14:47:21Z

examples/asr/transcribe_speech_parallel.py

@@ -109,7 +112,9 @@ class ParallelTranscriptionConfig:
    # att_context_size can be set for cache-aware streaming models with multiple look-aheads
    att_context_size: Optional[list] = None

-    trainer: TrainerConfig = TrainerConfig(devices=-1, accelerator="gpu", strategy="ddp")
+    trainer: TrainerConfig = TrainerConfig(
+        devices=-1, accelerator="gpu", strategy="ddp", use_distributed_sampler=False


be careful with distributed sampler setting here: non-lhotse datasets still likely require True. it might be better to just override this for EncDec model?

ok, rm use_distributed_sampler=False

pzelasko · 2024-06-24T14:51:07Z

nemo/collections/asr/data/audio_to_text_lhotse_prompted.py

@@ -72,7 +72,7 @@ def __getitem__(self, cuts: CutSet) -> tuple[torch.Tensor, torch.Tensor, torch.T
            prompts = None
            prompts_lens = None

-        return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens
+        return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens, cuts


this is not ideal as returning cuts here will transfer the data held in-memory across dataloading worker subprocesses to the main training/inference loop process. we should return cuts.drop_recordings() instead.

pzelasko · 2024-06-24T15:59:06Z

+ like we talked offline, we have to work around global_rank being set incorrectly before trainer.predict() is called - the dataloader has to be initialized with the correct global_rank (and world_size) in order for lhotse's distributed sampler to work correctly

titu1994 · 2024-06-24T19:23:39Z

trainer.global_rank is apriori set by PTL in slurm environment. It does not require model to be built or it's functions to be called c

Is this a case where PTL cannot apriori detect global rank ?

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

…llel

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

…karpnv/canary_parallel

github-actions · 2024-07-16T01:51:03Z

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions · 2024-07-24T01:50:36Z

This PR was closed because it has been inactive for 7 days since being marked as stale.

karpnv added 2 commits June 21, 2024 06:21

add Canary cats

0064392

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

Merge branch 'main' of github.com:NVIDIA/NeMo into karpnv/canary_para…

e9cfcb6

…llel

github-actions bot added the ASR label Jun 21, 2024

karpnv requested a review from pzelasko June 21, 2024 13:28

Apply isort and black reformatting

c1317d2

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

github-advanced-security bot found potential problems Jun 21, 2024

View reviewed changes

examples/asr/transcribe_speech_parallel.py Fixed Show fixed Hide fixed

pzelasko requested changes Jun 24, 2024

View reviewed changes

karpnv and others added 6 commits July 1, 2024 09:51

rm use_distributed_sampler=False

fe731b0

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

merge

2530414

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

Merge branch 'main' of github.com:NVIDIA/NeMo into karpnv/canary_para…

6f670ff

…llel

Apply isort and black reformatting

c1bc97d

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

rm use_distributed_sampler

f574488

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

Merge branch 'karpnv/canary_parallel' of github.com:NVIDIA/NeMo into …

d872dae

…karpnv/canary_parallel

karpnv requested a review from pzelasko July 1, 2024 17:07

github-actions bot added the stale label Jul 16, 2024

github-actions bot closed this Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Canary parallel inference #9517

Support Canary parallel inference #9517

karpnv commented Jun 21, 2024

pzelasko Jun 24, 2024

karpnv Jul 1, 2024

pzelasko Jun 24, 2024

karpnv Jul 1, 2024

pzelasko commented Jun 24, 2024 •

edited

Loading

titu1994 commented Jun 24, 2024

github-actions bot commented Jul 16, 2024

github-actions bot commented Jul 24, 2024

Support Canary parallel inference #9517

Support Canary parallel inference #9517

Conversation

karpnv commented Jun 21, 2024

What does this PR do ?

Who can review?

pzelasko Jun 24, 2024

Choose a reason for hiding this comment

karpnv Jul 1, 2024

Choose a reason for hiding this comment

pzelasko Jun 24, 2024

Choose a reason for hiding this comment

karpnv Jul 1, 2024

Choose a reason for hiding this comment

pzelasko commented Jun 24, 2024 • edited Loading

titu1994 commented Jun 24, 2024

github-actions bot commented Jul 16, 2024

github-actions bot commented Jul 24, 2024

pzelasko commented Jun 24, 2024 •

edited

Loading