-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: revert multirm refactors #8962
Conversation
✅ Deploy Preview for determined-ui canceled.
|
5cfdbd5
to
36f23b3
Compare
36f23b3
to
4ce070a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #8962 +/- ##
==========================================
- Coverage 47.55% 47.48% -0.07%
==========================================
Files 1168 1168
Lines 176706 176286 -420
Branches 2356 2356
==========================================
- Hits 84026 83714 -312
+ Misses 92522 92414 -108
Partials 158 158
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@@ -52,13 +52,6 @@ | |||
"maximum": 99, | |||
"default": null | |||
}, | |||
"resource_manager": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd have expected that this results in an invalid experiment config that had a section
resources:
resource_manager: 'agent'
but when I try running an experiment with a section like this it seems to launch just fine. Is this patch backwards compatible in a way I don't understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh I'll double-check -- I ran 'git revert' on the three commits I wanted to revert, so this is unexpected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change wasn't actually publicly released. does this cause any issues..
are there places with serialize then deserialize an experiment config that are going to pick this up and then explode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carolinaecalderon to test this you may want to run an experiment before this revert, pause it, shutdown your cluster, add this revert, recompile, restart the cluster and see if resuming the experiment works.
if that works i think this is safe even though it isn't actually backward compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wes-turner it's possible that when you tested this I hadn't re-generated the schemas, because when I run a test following @stoksc 's proposed flow, I get this error:
ERRO[2024-03-06T15:15:46-05:00] failed to restore experiment: 1 error="cannot restore experiment 1 with unparsable config"
tldr; the change is not backwards compatible since the experiment can't resume. However, since this change was not announced to the customer and is being reverted one release later, would it be fine to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait no, that is exactly the error I was concerned about. You launched an experiment that did not use the multirm features and upgraded and it broke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change wasn't actually publicly released. does this cause any issues..
Nice, nice, nice. No issues, no deserialization that I know about.
Thanks. The public deprecation was exactly my concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that doing this does break experiment restore, so @carolinaecalderon is going to put up a fix. Really good you asked this @wes-turner, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just BTW, I don't think we need backwards compatibility since this wasn't publicly released.
I think we do, because the original PR was included in 0.29.0 release. To fix this, I added a new db migration to drop the 'resource_manager' column from any experiments that might have it |
7e3d222
to
7038fde
Compare
7c7aa34
to
12819be
Compare
@@ -0,0 +1,2 @@ | |||
UPDATE experiments | |||
SET config = config #- '{resource_manager}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this down migration doesn't make sense to me, can you explain it? i don't think the down migrations matter, but i'm curious.
This reverts commit c3012ff, 6857ecf, 7c6bec9.
Description
Since we're now requiring globally unique resource pool names, we can remove the refactors that added an extra Resource Manager field to structs.
Test Plan
Pass existing tests.
Commentary (optional)
Checklist
docs/release-notes/
.See Release Note for details.
Ticket
RM-73