-
Notifications
You must be signed in to change notification settings - Fork 197
Selective mode saving #596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
6f61222 to
929d07a
Compare
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
929d07a to
d4883ed
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #596 +/- ##
=======================================
Coverage 74.45% 74.46%
=======================================
Files 182 182
Lines 18250 18255 +5
=======================================
+ Hits 13588 13593 +5
Misses 4662 4662 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving the PR to unblock merging for the release.
Attaching the teacher model design was not a good choice looking back - it is creating lot of headaches. We should just deal with distillation with the loss function or trainer.
We dont need to necessarily need to remove the mtd.convert - but stop maintaining it and use a trainer/loss function based distillation support -
QAT + distillation is a critical piece going forward - It would be great to simplify things - If any design choices turned out problematic - we dont have to keep patching it - we could redirect our energy for a better design.
What does this PR do?
Type of change: ? Bug fix
Overview: Filter out KD state from ModelOpt state list when saving. This allows for applying the KD mode after a modelopt checkpoint restore without it complaining that it was already applied previously.
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information