Skip to content

[MagpieTTS][bugfix] defaults to force_map_dataset=True for validation datast to avoid duplicates by a factor of num_workers.#15387

Merged
pzelasko merged 1 commit intoNVIDIA-NeMo:mainfrom
XuesongYang:xueayng/pr-bugfix-val-dataloader
Feb 12, 2026
Merged

[MagpieTTS][bugfix] defaults to force_map_dataset=True for validation datast to avoid duplicates by a factor of num_workers.#15387
pzelasko merged 1 commit intoNVIDIA-NeMo:mainfrom
XuesongYang:xueayng/pr-bugfix-val-dataloader

Conversation

@XuesongYang
Copy link
Copy Markdown
Collaborator

Summary

Fix validation dataloader data duplication for Lhotse Shar datasets by adding force_map_dataset: true to the MagpieTTS validation config.

Problem

When using lhotse_shar data with force_map_dataset=False (the default), the validation dataloader uses an iterable dataset path that causes two compounding issues:

  1. No DDP data partitioning -- The Lhotse sampler is created with rank=0, world_size=1 (hardcoded for iterable datasets), so every GPU independently iterates through the entire validation dataset instead of its 1/world_size share.

  2. Worker-level data duplication -- Each DataLoader worker gets a full copy of the IterableDatasetWrapper and independently iterates all shards. With num_workers=N, data is duplicated N× per GPU.

Combined, each GPU processes num_workers × total_dataset_batches instead of the correct total_dataset_batches / world_size.

This design is intentional for training (infinite datasets with force_finite=False), where unique per-worker seeds and infinite repetition avoid explicit data splitting. But for finite validation (force_finite=True), it results in massive redundant computation and metrics computed on duplicated data.

Empirical Validation

Tested on LibriTTS dev-clean (5,620 records = 176 batches at batch_size=32, num_workers=2, quadratic_duration=null):

force_map_dataset 8 GPUs 16 GPUs Expected
False (before) 352 352 num_workers × total = 2 × 176 = 352 (GPU-count independent)
True (after) 22 11 total / world_size = 176/8=22, 176/16=11 (proper DDP scaling)

With force_map_dataset=True:

  • Validation iterations scale inversely with GPU count (correct DDP behavior)
  • No worker duplication (map dataset dispatches work to workers without duplication)
  • Reduction factor = num_workers × world_size (e.g., 2×8 = 16× fewer iterations on 8 GPUs)

Fix

Setting force_map_dataset: true in the validation config switches from iterable dataset to map dataset, where:

  • The sampler uses the actual global_rank and world_size to partition data across GPUs
  • The DataLoader manages worker dispatch without duplication

This follows the same pattern used in speechlm2/data/datamodule.py for validation/test dataloaders. Unit test at tests/collections/common/test_lhotse_dataloading.py::test_force_map_dataset validate the effectiveness.

factor of num_workers.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
@pzelasko pzelasko merged commit 438ac8a into NVIDIA-NeMo:main Feb 12, 2026
58 checks passed
@XuesongYang XuesongYang deleted the xueayng/pr-bugfix-val-dataloader branch February 12, 2026 16:51
nemoramo pushed a commit to nemoramo/MoNeMo that referenced this pull request Feb 13, 2026
…NVIDIA-NeMo#15387)

factor of num_workers.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
nune-tadevosyan pushed a commit to nune-tadevosyan/NeMo that referenced this pull request Mar 13, 2026
…NVIDIA-NeMo#15387)

factor of num_workers.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants