-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Skip resource checking for unmanaged exp #9372
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9372 +/- ##
=======================================
Coverage 46.07% 46.07%
=======================================
Files 1228 1228
Lines 155900 155902 +2
Branches 2439 2439
=======================================
+ Hits 71837 71838 +1
- Misses 83872 83873 +1
Partials 191 191
Flags with carried forward coverage won't be shown. Click here to find out more.
|
✅ Deploy Preview for determined-ui ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
master/internal/core_experiment.go
Outdated
slotsPerTrial = 0 | ||
} | ||
|
||
poolName, _, err := m.ResolveResources(resources.ResourcePool(), slotsPerTrial, workspaceID, isSingleNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any reason to call resolve resources at all if the experiment is unmanaged? e.g., we have a very similar bug if someone passes a non-existent resource pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess what makes this one so painful is that the default is 1 slot which doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, haran hit this recently as well: we don't need to validate resource pool names for unmanaged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems to be a fine fix for this, but i'd like to split the managed/unmanaged code paths even more eventually.
master/internal/core_experiment.go
Outdated
taskContainerDefaults, err := m.rm.TaskContainerDefaults( | ||
poolName, | ||
m.config.TaskContainerDefaults, | ||
) | ||
if err != nil { | ||
return nil, nil, config, nil, nil, errors.Wrapf(err, "error getting TaskContainerDefaults") | ||
} | ||
taskSpec.TaskContainerDefaults = taskContainerDefaults | ||
taskSpec.TaskContainerDefaults.MergeIntoExpConfig(&config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure we need to put this under unmanaged check. I think it's supposed to work for unmanaged experiments, and having properly merged spec might become helpful if/when we want the master to have access to checkpoints produced by the unmanaged experiment, e.g. for downloads or GC. LGTM otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I've moved taskContainerDefaults
out of the condition
85b1281
to
a61f606
Compare
Ticket
MD-385
Description
For unmanaged experiments, the slots per trial should be 0 since we are not spending any resource for it.
Test Plan
Create a master without any agent connection, run unmanaged experiments and the experiments complete successfully even though there is no computing resource available
Checklist
docs/release-notes/
.See Release Note for details.