New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add support for float16 metrics serialization #2915
Conversation
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have the users @BlueskyFR on file. In order for us to review and merge your code, please start the CLA process at https://determined.ai/cla. After we approve your CLA, we will update the contributors list (private) and comment |
✔️ Deploy Preview for determined-ui canceled. 🔨 Explore the source changes: f9a7e13 🔍 Inspect the deploy log: https://app.netlify.com/sites/determined-ui/deploys/613781fe03054100075a51a9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changes look good, thank you for the fix!
I was not yet able to test it locally though, so I am wondering if anything else will need to be changed too |
Ok. If you are not able to test it manually by the time we work out the CLA details, I'll write a test before we merge this. |
Is there some build instructions somewhere so that I can test the fixed agent image locally? |
Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have the users @BlueskyFR on file. In order for us to review and merge your code, please start the CLA process at https://determined.ai/cla. After we approve your CLA, we will update the contributors list (private) and comment |
The cla-bot has been summoned, and re-checked this pull request! |
Ah, I guess not. Actually our build system is a bit of a pain. You are welcome to clone the repo and build it On the other hand, for simple fixes like this you can hotpatch the python harness in your live cluster in a file="$(python -c 'import determined.util as x; print(x.__file__)')"
sed -i -e '/if isinstance(obj, np.float32):/a \ return float(obj)\n if isinstance(obj, np.float16):' "$file" That will literally just modify the source of the installed python library in the container just before running your code.
We're still talking to legal about what exactly required in your case. At any rate, the next release is next week so there's not a huge rush here. |
I'll run some tests tomorrow then :) I also tried to guess the build commands with the Makefiles but |
@cla-bot[bot] check |
The cla-bot has been summoned, and re-checked this pull request! |
@BlueskyFR we talked to legal and you are good to go! |
@BlueskyFR have you had a chance to test this yet? In either case, please mark the PR as ready for review (so either you or I can test it and I can merge it) |
Not sure where you went but I'll just land this. |
Hi @rb-determined-ai, sorry for this: I returned to school so I didn't see the Github notifications :( |
Description
Adds support for float16 metrics serialization
Test Plan
Commentary (optional)
When running an experiment using keras while having enable mixed precision using
tf.keras.mixed_precision.set_global_policy("mixed_float16")
(TF 2.6), the metrics are serialized before being sent by socket (to the master I guess?).However, while float64 and float32 are supported, float16 is not one of them: this patch hopefully fixes it.
Checklist
docs/release-notes/
.See Release Note for details.