Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Canary parallel inference #9517

Closed
wants to merge 9 commits into from
Closed

Conversation

karpnv
Copy link
Collaborator

@karpnv karpnv commented Jun 21, 2024

What does this PR do ?

Support Canary at transcribe_speech_parallel.py script

Collection: ASR

#python3 ./examples/asr/transcribe_speech_parallel.py model=./canary-1b.nemo predict_ds.manifest_filepath=./manifest.json output_path=/tmp trainer.devices=-1

PR Type:

  • [ V] New Feature
  • Bugfix
  • Documentation

Who can review?

@pzelasko

@github-actions github-actions bot added the ASR label Jun 21, 2024
@karpnv karpnv requested a review from pzelasko June 21, 2024 13:28
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
@@ -109,7 +112,9 @@ class ParallelTranscriptionConfig:
# att_context_size can be set for cache-aware streaming models with multiple look-aheads
att_context_size: Optional[list] = None

trainer: TrainerConfig = TrainerConfig(devices=-1, accelerator="gpu", strategy="ddp")
trainer: TrainerConfig = TrainerConfig(
devices=-1, accelerator="gpu", strategy="ddp", use_distributed_sampler=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be careful with distributed sampler setting here: non-lhotse datasets still likely require True. it might be better to just override this for EncDec model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, rm use_distributed_sampler=False

@@ -72,7 +72,7 @@ def __getitem__(self, cuts: CutSet) -> tuple[torch.Tensor, torch.Tensor, torch.T
prompts = None
prompts_lens = None

return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens
return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens, cuts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ideal as returning cuts here will transfer the data held in-memory across dataloading worker subprocesses to the main training/inference loop process. we should return cuts.drop_recordings() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pzelasko
Copy link
Collaborator

pzelasko commented Jun 24, 2024

+ like we talked offline, we have to work around global_rank being set incorrectly before trainer.predict() is called - the dataloader has to be initialized with the correct global_rank (and world_size) in order for lhotse's distributed sampler to work correctly

@titu1994
Copy link
Collaborator

trainer.global_rank is apriori set by PTL in slurm environment. It does not require model to be built or it's functions to be called c

Is this a case where PTL cannot apriori detect global rank ?

karpnv and others added 6 commits July 1, 2024 09:51
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
@karpnv karpnv requested a review from pzelasko July 1, 2024 17:07
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jul 16, 2024
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants