You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been testing different ways of creating data loaders for Polaris.
One of the strategies I'm looking into is inspired by Graphium:
Save all data-point in individual pickle files.
Group files in directories of at most 1000 datapoints.
I noticed that you're using torch.save and torch.load to save and load the files. I was expecting this to be faster, because you would not have to convert from / to a torch.Tensor, but it might not be...
# Create a random, toy dataset of 10k samplesX=np.random.random((10000, 64, 64, 3))
y=np.random.random(10000)
Save and load using Torch
# Create a dataset that saves using torcharchive=get_pickle_archive(X, y, tmpdir, use_torch=True)
dataset=PickleDataset(archive, use_torch=True, length=10000)
dataloader=torch.utils.data.DataLoader(dataset)
This gives: 4.13 s ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Save and load using Pickle
# This time, we just use pickle directly.archive=get_pickle_archive(X, y, tmpdir, use_torch=False)
dataset=PickleDataset(archive, use_torch=False, length=10000)
dataloader=torch.utils.data.DataLoader(dataset)
This gives: 1.72 s ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Conclusion
This could be an easy change that might get you a ~58% speed-up!
torch.save adds relevant functionality (see e.g. here), but I'm not familiar enough with the internals of either Torch or Graphium to understand whether this functionality is needed for your use case of saving featurized graphs (right?) to the disk.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I have been testing different ways of creating data loaders for Polaris.
One of the strategies I'm looking into is inspired by Graphium:
I noticed that you're using
torch.save
andtorch.load
to save and load the files. I was expecting this to be faster, because you would not have to convert from / to atorch.Tensor
, but it might not be...Let's look at a simple example.
We start by defining some utility code
And now we can do some benchmarking.
Save and load using Torch
This gives: 4.13 s ± 35.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Save and load using Pickle
This gives: 1.72 s ± 8.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Conclusion
This could be an easy change that might get you a ~58% speed-up!
torch.save
adds relevant functionality (see e.g. here), but I'm not familiar enough with the internals of either Torch or Graphium to understand whether this functionality is needed for your use case of saving featurized graphs (right?) to the disk.Might be worth a try!
Beta Was this translation helpful? Give feedback.
All reactions