Return plan offers wrt fleets #3300
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #3294
The PR changes run plan logic to return offers wrt to an optimal fleet. Effectively, run plan offers are now the same as the offers used for provisioning + non-available (e.g. busy, no quota) offers. If there is no suitable fleet for the run, the run plan includes offers across all project backends (as before, to show offers for autocreated fleets flow). If
FeatureFlags.DSTACK_FF_AUTOCREATED_FLEETS_DISABLEDis set, then no suitable fleet for the run means no offers (future behavior).There is an extensive refactoring so that the same
find_optimal_fleet_with_offers()function is used for getting optimal fleet and its offers both for run plan and during provisioning. Also split services/runs.py into modules since it became too big.dstack offercommand works the same as before returning all offers irrespective of fleets. This case is handled separately when generating run plan offers.Performance
Getting run plan now involves getting offers for every fleet (to select an optimal fleet), and the time of getting run plan now increases linearly with the number of suitable fleets when offers in cache. For example, with aws+gcp across all regions and 10 fleets, get offers becomes 15-20s from 1.5s-2s.
Peak memory also increases linearly with the number of suitable fleets. With aws+gcp across all regions and 10 fleets, it's ~1.3GB vs 0.5GB with one fleet / before.
A possible time/memory optimization is to limit the number of offers process by a backend so that backends do not need to modify thousands of offers for every fleet to set disk, availability zones, etc. A caveat is that there is post filtering that can filter out all/most of returned offers, and the offers left out by the limit may be needed then.