Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataLoader is very slow when using SubjectsDataset #941

Open
1 task done
ivezakis opened this issue Jul 28, 2022 · 7 comments
Open
1 task done

DataLoader is very slow when using SubjectsDataset #941

ivezakis opened this issue Jul 28, 2022 · 7 comments

Comments

@ivezakis
Copy link

ivezakis commented Jul 28, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Problem summary

When using SubjectsDataset with PyTorch dataloader, iterating over the dataloader is incredibly slow. Naturally, this slows training down as well.
When iterating over the SubjectsDataset however, it is significantly faster.

In my experience, starting to iterate over Subjects dataset takes a few seconds (<10), while for dataloader to begin, it takes more than a minute.

Code for reproduction

f = open("some_data.json")
jdata = json.load(f)

subjects = tio.SubjectsDataset(
    [
        tio.Subject(
            img=tio.ScalarImage(subject["image"]),
            labels=tio.LabelMap(subject["label"])
        )
        for subject in jdata
    ]
)

for sample in subjects:
    print(sample)
    break


loader = torch.utils.data.DataLoader(subjects, batch_size=8, shuffle=True)
for sample in loader:
    print(sample)
    break

Actual outcome

Iterating over loader is much more slow than iterating over subjects.

Error messages

No response

Expected outcome

Performance should be similar.

System info

pytorch 1.12.0
torchio 0.18.83

On a machine with Ubuntu 20.04, NVMe SSD
@fepegar
Copy link
Owner

fepegar commented Jul 28, 2022

Hi, @ivezakis. You are using one process to load 8 images, so it will be 8 times slower. This is expected. To make it faster, you should use a num_workers larger than 1.

@fepegar fepegar closed this as completed Jul 28, 2022
@ivezakis
Copy link
Author

ivezakis commented Jul 28, 2022

Hi @fepegar, in fact I am using the maximum number of workers for my machine in the dataloader, num_workers = 12. Sorry that wasn't accurate on the code I provided.

Please consider re-opening this. The difference is rather large in my experience. For a batch size of 8, it is over 40 times. Picture attached.

image

Edit: Also tried it with batch size one, it's 6.8 seconds vs 3.6.

@fepegar fepegar reopened this Jul 29, 2022
@QingYunA
Copy link

Yes,I have meet the same problem with yours. it is very very slow!(at least 30 times than actually model traning time) but I don't have good ways to resolve it. Have you get any good method?

@romainVala
Copy link
Contributor

romainVala commented Feb 28, 2023

hi
can it be that you ask for too much numworker ? making num_worker equal to the number of core, may be too much (overload can really decrease performance).
can you try different numworker (1/2 1/4 of you total num_worker) and report if you get the same difference ?
(do not forget, as fepegar said, both are equivalent if time_dataloader = batch_size * time_dataset
(because in dataset you get only a batch_size of 1)

@fepegar
Copy link
Owner

fepegar commented Feb 28, 2023

@ivezakis, @QingYunA

Can you please provide a minimal, reproducible example?

@fepegar
Copy link
Owner

fepegar commented Feb 28, 2023

@romainVala I've also noticed that behavior. For example, in a DGX with 40 cores, my code was fastest using only 12.

@QingYunA
Copy link

QingYunA commented Mar 2, 2023

hi can it be that you ask for too much numworker ? making num_worker equal to the number of core, may be too much (overload can really decrease performance). can you try different numworker (1/2 1/4 of you total num_worker) and report if you get the same difference ? (do not forget, as fepegar said, both are equivalent if time_dataloader = batch_size * time_dataset (because in dataset you get only a batch_size of 1)

yes, after i increase the num_workers(16) of Queue, the speed of preparing dataloader get fast. By the way, i found the transform i used influence the speed. when i remove the RandomAffine(degrees=20), load time reduce half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants