Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix - pretraining oom #348

Merged
merged 4 commits into from Dec 27, 2021
Merged

Bugfix - pretraining oom #348

merged 4 commits into from Dec 27, 2021

Conversation

eduardocarvp
Copy link
Collaborator

@eduardocarvp eduardocarvp commented Dec 18, 2021

What kind of change does this PR introduce?

This fixes the OOM error when using GPU with large validation sets during pretraining.

Does this PR introduce a breaking change?

No.

What needs to be documented once your changes are merged?

Nothing.

Closing issues

Closes #341

PS:
I spent a lot of time trying to find why the two losses were not giving the same answer. It turns out, be default numpy.std does not apply the Bessel correction, which consists of multiplying the std by n/(n-1) to render the estimator unbiased. For that, we need to add ddof=1 on the line

batch_stds = np.std(embedded_x, axis=0, ddof=1) ** 2 + eps

and just then the test works.

Talking about tests, the test was written, but since we do not currently have unit tests and pytest is not installed, installing it would cause the poetry lock to change and upgrade quite a lot of packages. We should probably do this somewhere else on another PR and then add the unit tests to the gitlab-ci on that occasion.

@Optimox
Copy link
Collaborator

Optimox commented Dec 20, 2021

Thanks @eduardocarvp,

It seems to be exactly what we needed.

I was thinking about the cases where the std is 0, this happens everytime one feature has a unique value.

As all this is just about normalization we could maybe fill 0 values of the std by the absolute value of the unique value (+1 if the unique value is 0).

This might be done in another PR but I'd like to have this in the next release as it causes a lot of confusion during pretraining.

@eduardocarvp
Copy link
Collaborator Author

What about this @Optimox ?
The tests are working and on the second test case we have a constant matrix. Do you think it's a good idea to update the package versions before release? I'd like to add pytest to the dev dependencies, but other than that just upgrading the versions might be a good idea if it does not break anything.

@Optimox
Copy link
Collaborator

Optimox commented Dec 21, 2021

@eduardocarvp I think we should try not to ask for unnecessary newer version to allow compatibility with other packages. For instance, we do not need the latest pytorch version, it would be a shame to force torch=1.10 while things are working fine with version 1.5
Otherwise there is absolutely no problem to add dev packages to the repo. But you should probably avoid a poetry update which will update all the dependencies to the latest versions.

@Optimox Optimox merged commit 95e0e9a into develop Dec 27, 2021
@Optimox Optimox deleted the bugfix/pretraining-oom branch December 27, 2021 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pretraining OOM error
2 participants