-
Notifications
You must be signed in to change notification settings - Fork 538
Use a Generator object to specify the seed for the random embeddings. #525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses a potential race condition in multi-device inference by replacing the global PyTorch random seed with a thread-safe torch.Generator object. The changes are well-implemented and the accompanying tests for multi-device consistency are a great addition. I've found one minor issue: a leftover debugging statement that should be removed.
0084066 to
8978833
Compare
Currently we set the global pytorch random seed. When multi-device inference is enabled, this can result in a race condition because the global seed is shared between threads. Use a Generator instead to avoid this. Exporting the generator is not supported by onnx, so don't use it during tracing. This means that the random embeddings will be different to those used during training, which may effect the performance of the model. However, the previous method for fixing the seed was silently ignored by onnx export, so it doesn't make things worse. Testing: - Save the sampled embeddings from the old and the new versions and check they are equal. - Existing consistency checks, plus the additional two I have added to verify multi-device inference.
|
hey @LeoGrin , I updated the implementation a bit to disable the seed during onnx export. This fixes the tests, and adds a warning that the seed won't be set correctly. What do you think? |
LeoGrin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just one question to check I'm understanding things correctly: this should give use the exact same random output as before the change right? (I think yes because the consistency tests are passing?)
|
yes, and I saved + compared the actual embedding tensors before and after the change as well. But I'll do that one more time before merging just in case haha |
|
I did the following on macos mps, cpu, cuda:
It all passed, so seems safe to merge. |
… random embeddings. (#175) * Record copied public PR 525 * Use a Generator object to specify the seed for the random embeddings. (#525) Currently we set the global pytorch random seed. When multi-device inference is enabled, this can result in a race condition because the global seed is shared between threads. Use a Generator instead to avoid this. Testing: - Existing consistency checks, plus the additional two I have added to verify multi-device inference. - Save the sampled embeddings from the old and the new versions and check they are equal. - Script to compare global/generator rng output: see comment on PR (cherry picked from commit b39d4b6) * Port change to v2 and v2.1 single-file models. --------- Co-authored-by: mirror-bot <mirror-bot@users.noreply.github.com> Co-authored-by: Oscar Key <oscar@priorlabs.ai>
Currently we set the global pytorch random seed. When multi-device inference is enabled, this can result in a race condition because the global seed is shared between threads. Use a Generator instead to avoid this.
Testing: