-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a new parameter in the class AugmentedDataloader to send transformation to the device. #406
Conversation
Codecov Report
@@ Coverage Diff @@
## master #406 +/- ##
==========================================
+ Coverage 82.81% 82.83% +0.01%
==========================================
Files 54 54
Lines 3789 3792 +3
==========================================
+ Hits 3138 3141 +3
Misses 651 651 |
I agree with you @robintibor! I incorporated this suggestion into the code and improved the documentation a bit. See if it's good for the merge. |
@bruAristimunha from your experience how much speed up do you observe with this? can you give us some details on practical impact? 🙏 |
Thanks for the PR @bruAristimunha !! :) Also, when adding a new functionality like this you should try to add the corresponding test somewhere to avoid degrading the test coverage ☝️ |
Also tested this on GPU on my side (since GHA does not support that). Works fine |
One additional thing that I observed for the GaussianNoise augmentation is that there is no speed up on the GPU if we keep using the numpy rng (from sklearns rng = check_random_state(random_state)
if isinstance(std, torch.Tensor):
std = std.to(X.device)
noise = torch.from_numpy(
rng.normal(
loc=np.zeros(X.shape),
scale=1
),
).float().to(X.device) * std
transformed_X = X + noise
return transformed_X, y Using something like noise = torch.normal(mean=torch.zeros_like(X, device=X.device), std=torch.as_tensor(std, device=X.device), generator=rng) instead would result in a substantial speed up. The rng would have to be a torch.Generator. |
Good point @martinwimpff! But we are potentially touching a point involving way more modifications to the code, because this remark involves all augmentations :) However, it is still probably worth seeing whether there are indeed any speed ups compared to the native CPU-multi-threading happening in DataLoaders before doing the change. |
Hello @agramfort, @martinwimpff and @cedricrommel, I conducted a little analysis for each augmentation method listed below, performing a runtime analysis with five repetitions, passing either the GPU or the CPU as the device to the AugmentedDataLoader class.
The code to do the analysis is in the end. I studied using the entire BCI competition 4a dataset to stress memory consumption and see how the model behaves in a scenario closer to the real one. I ran in my research computer with i7-9700K CPU @ 3.60GHz and GPU: GeForce RTX 2080 Ti 11Gb. The expected is decrease the time when the augmentation is sent to the GPU. The improvement ratio can be seen below: It seems that for some raises, there is a good gain in general, except for Sensor*Rotation. I think this is related to Martin's discussion with Cedric. Some methods need to be GPU optimized, possibly in another PR. Code: |
It was necessary to test MixUp, as it has a slightly different logic, I could not run with this code. |
thx @bruAristimunha @cedricrommel I let you merge if you're happy |
I'm happy :) Thanks @bruAristimunha LGTM, merging! Let's leave the rng stuff for another PR as proposed. Not obvious either whether the proposed change is worth in terms of computation. |
As suggested by @martinwimpff, this PR tries to close issue #404.