Skip to content

Add AIS batch loading support to LhotseSpeechToTextBpeDataset#15538

Merged
pzelasko merged 1 commit into
NVIDIA-NeMo:mainfrom
gaikwadabhishek:feat/ais-batch-loader-lhotse-bpe-dataset
Mar 23, 2026
Merged

Add AIS batch loading support to LhotseSpeechToTextBpeDataset#15538
pzelasko merged 1 commit into
NVIDIA-NeMo:mainfrom
gaikwadabhishek:feat/ais-batch-loader-lhotse-bpe-dataset

Conversation

@gaikwadabhishek
Copy link
Copy Markdown
Contributor

  • check USE_AIS_GET_BATCH env var in LhotseSpeechToTextBpeDataset
  • pass use_batch_loader to AudioSamples when enabled

- check USE_AIS_GET_BATCH env var in LhotseSpeechToTextBpeDataset
- pass use_batch_loader to AudioSamples when enabled

Signed-off-by: Abhishek Gaikwad <gaikwadabhishek1997@gmail.com>
@github-actions github-actions Bot added the ASR label Mar 23, 2026
with patch.object(AudioSamples, "__init__", return_value=None) as mock_init:
mock_init.side_effect = lambda *args, **kwargs: None
try:
dataset = LhotseSpeechToTextBpeDataset(tokenizer=tokenizer)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable dataset is not used.
mock_init.side_effect = lambda *args, **kwargs: None
try:
dataset = LhotseSpeechToTextBpeDataset(tokenizer=tokenizer)
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note test

'except' clause does nothing but pass and there is no explanatory comment.
with patch.object(AudioSamples, "__init__", return_value=None) as mock_init:
mock_init.side_effect = lambda *args, **kwargs: None
try:
dataset = LhotseSpeechToTextBpeDataset(tokenizer=tokenizer)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable dataset is not used.
mock_init.side_effect = lambda *args, **kwargs: None
try:
dataset = LhotseSpeechToTextBpeDataset(tokenizer=tokenizer)
except Exception:

Check notice

Code scanning / CodeQL

Empty except Note test

'except' clause does nothing but pass and there is no explanatory comment.
return original_init(self, *args, **kwargs)

with patch.object(AudioSamples, "__init__", mock_init):
dataset = LhotseSpeechToTextBpeDataset(tokenizer=tokenizer)

Check notice

Code scanning / CodeQL

Unused local variable Note test

Variable dataset is not used.
@pzelasko pzelasko merged commit 2a9b43a into NVIDIA-NeMo:main Mar 23, 2026
128 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants