Skip to content

Conversation

r4victor
Copy link
Collaborator

@r4victor r4victor commented Sep 2, 2025

Part of #2973
Closes #2921

This PR:

  • Implements fleet state-spec consolidation logic in process_fleets background task. It ensures that fleets have at least nodes.min instances and creates new instances if there aren't enough. Exponential backoff on repeated consolidation to avoid retrying instances too often (e.g. no capacity in case of no offers). The consolidation logic will later be extended to implement cloud fleets in-place update.
  • Introduces nodes.target property – number of instances to provision on fleet apply. Since now nodes.min is always maintained, users need an ability to provision a different number of instances initially.
  • Drop old instance retry logic (retrying provisioning on no_capacity if profile.retry is specified).

Major change: dstack now indefinitely maintains nodes.min instances in the fleet. If it's undesired to maintain fleet instances, users must set nodes.min to 0 and specify the number of instances to provision initially via nodes.target.

Example

Apply configuration:

type: fleet
name: default-fleet
nodes:
  min: 1
  target: 2
  max: 3

dstack will provision two instances. After deleting one instance, there will be one instances left. Deleting the last instance will trigger dstack to re-create the instance.

@r4victor r4victor marked this pull request as ready for review September 2, 2025 06:44
@r4victor r4victor merged commit a4ae127 into master Sep 2, 2025
28 checks passed
@r4victor r4victor deleted the issue_2921_fleet_retry branch September 2, 2025 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Fleet Retry
1 participant