DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms #398

hjmjohnson · 2020-05-18T23:42:49Z

Describe the bug
When using a DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transform, all the data in the multiple workers have the same random state.

To Reproduce

With train_ds having some random parameterized transforms.

    train_loader: DataLoader = DataLoader(
        train_ds,  # <-- This is a dataset of both the input raw data filenames + definition of transforms
        batch_size=1,
        shuffle=True,
        num_workers=88,
        collate_fn=list_data_collate,
    )

This is particularly disturbing when running on a machine with 40+ CPUs and huge numbers of images have the same parameter augmentation.

Expected behavior
Each transform should have it's own random parameters chosen, regardless of the number of workers chosen.

Screenshots
NOTE: The number of replicated rotation values is always equal to the num_workers specified.

Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 19.367042973517755
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 4.039486469720721
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905
Rotating by 13.13047017599905

The text was updated successfully, but these errors were encountered:

Nic-Ma · 2020-05-18T23:56:21Z

Hi @hjmjohnson ,

Thanks for your bug report.
This is an known issue of "numpy + PyTorch multi-processing".
And you can easily fix it by adding below logic to your DataLoader initialization:

def worker_init_fn(worker_id):
    worker_info = torch.utils.data.get_worker_info()
    worker_info.dataset.transform.set_random_state(worker_info.seed % (2 ** 32 - 1))

dataloader = torch.utils.data.DataLoader(... worker_init_fn=worker_init_fn)

Thanks.

hjmjohnson · 2020-05-19T00:42:23Z

FYI: I also came across documentation that indicates the real problem is out of the scope of MONAI.

https://pytorch.org/docs/stable/data.html

hjmjohnson · 2020-05-19T00:45:33Z

@Nic-Ma THANK YOU! Sorry for the invalid MONAI bug report. Your solution worked wonderfully!

Nic-Ma · 2020-05-19T01:21:27Z

You are welcome.
Thanks.

tvercaut · 2020-05-19T10:49:05Z

Should a note about this be put in our wiki or somewhere similar to start collating an FAQ?
Linking to other resources such https://pytorch.org/docs/stable/notes/faq.html#my-data-loader-workers-return-identical-random-numbers makes sense in such a FAQ of course.

Nic-Ma · 2020-05-19T11:14:01Z

Hi @tvercaut ,

Good idea, maybe @atbenmurray can help add this to our wiki page?
Ben spent much time to initialize our detailed wiki pages before.
Thanks.

hjmjohnson · 2020-05-19T13:57:12Z

@Nic-Ma It would be nice if the FAQ were indexed in a way that the Sphinx documentation could reference the FAQ content.

Nic-Ma · 2020-05-19T14:09:45Z

I will try to raise the topic to set up FAQ to the core team guys.
Maybe we can have a discussion this Friday.
Thanks.

atbenmurray · 2020-05-19T14:19:14Z

@Nic-Ma @tvercaut I can certainly help with the wiki stuff

hjmjohnson closed this as completed May 19, 2020

Nic-Ma mentioned this issue May 25, 2020

398 set random seed for DataLoader #423

Merged

4 tasks

wyli mentioned this issue May 25, 2020

adds an FAQ link #425

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms #398

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms #398

hjmjohnson commented May 18, 2020

Nic-Ma commented May 18, 2020 •

edited

hjmjohnson commented May 19, 2020

hjmjohnson commented May 19, 2020

Nic-Ma commented May 19, 2020

tvercaut commented May 19, 2020 •

edited

Nic-Ma commented May 19, 2020

hjmjohnson commented May 19, 2020

Nic-Ma commented May 19, 2020

atbenmurray commented May 19, 2020

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms #398

DataLoader with num_workers > 1, and a Rand[Zoom/Rotate/Flip)d transforms #398

Comments

hjmjohnson commented May 18, 2020

Nic-Ma commented May 18, 2020 • edited

hjmjohnson commented May 19, 2020

hjmjohnson commented May 19, 2020

Nic-Ma commented May 19, 2020

tvercaut commented May 19, 2020 • edited

Nic-Ma commented May 19, 2020

hjmjohnson commented May 19, 2020

Nic-Ma commented May 19, 2020

atbenmurray commented May 19, 2020

Nic-Ma commented May 18, 2020 •

edited

tvercaut commented May 19, 2020 •

edited