You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Nic-Ma I had a brief look but can't see anything obvious on their github.
@wyli if both set_num_threads and OMP_NUM_THREADS work, I would have thought that the former would be preferable, it would be easier to revert than messing with environmental variables.
We could add threads_per_worker to our DataLoader constructor? In that way, we can set the number of threads as we iterate through the dataset, and then revert if once we're finished.
I think maybe it's not necessary to control threads_per_worker in DataLoader, for example: CacheDataset can execute transforms before DataLoader. If you want to control this global variable, maybe it's more clear to just call torch.set_num_threads(N) in the application program directly(just like your demo program)?
Of course, this variable is a good practice for us to tune the performance of training.
To play devil's advocate, I suppose the arguments of CacheDataset have overlap with the DataLoader. For both CacheDataset and DataLoader we have the argument num_workers, so we could arguably have num_workers_per_thread for both CacheDataset and DataLoader, too.
But perhaps you're right that it's best to leave it down to the user to use torch.set_num_threads themselves is best.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Some of our transforms use
torch.nn.functional.interpolate
under the hood, which will use multiprocessing by default.The following code snippet uses 800% of my CPU:
However, uncommenting the
torch.torch.set_num_threads(1)
line limits CPU usage to 100%.I think this is something we need to think about. Otherwise, if we have:
then we would be doubling down on concurrency, as there would be multiprocessing due to
interpolate
and from the dataloader workers.Beta Was this translation helpful? Give feedback.
All reactions