Optimize getting offers for every fleet

After #3022 and #3300 getting offers for provisioning and run plan involves getting offers for every fleet (to select an optimal fleet), and the time to get offers increases linearly with the number of suitable fleets when offers in cache. For example, with aws+gcp across all regions and 10 fleets, get offers becomes 15-20s from 1.5s-2s.

Peak memory also increases linearly with the number of suitable fleets. With aws+gcp across all regions and 10 fleets, it's ~1.3GB vs 0.5GB with one fleet / before.

A possible time/memory optimization is to limit the number of offers process by a backend so that backends do not need to modify thousands of offers for every fleet to set disk, availability zones, etc. A caveat is that there is post filtering that can filter out all/most of returned offers, and the offers left out by the limit may be needed then. Apparently, we need replace [generic post-filtering](https://github.com/dstackai/dstack/blob/a172672c5a811e6d0879f8e4b8f9d3a09c710912/src/dstack/_internal/server/services/offers.py#L107) with backend-level filters exclusively, so that backends go over cached offers until max_offers reached.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize getting offers for every fleet #3303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize getting offers for every fleet #3303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions