Skip to content

[Bug]: The first fleet instance offer is retried indefinitely in case of errors #2442

@jvstme

Description

@jvstme

Steps to reproduce

Simulate an unexpected exception in the create_instance method of any backend. Try creating a fleet with an instance in this backend.

Actual behaviour

dstack retries provisioning the first offer indefinitely. The instance always remains in pending.

Expected behaviour

dstack tries the next offer if there was an error provisioning the previous offer. If all offers fail, the instance is terminated.

dstack version

0.19.0

Server logs

[19:06:57] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:02 CET)" raised an exception
                    ...
[19:07:02] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
[19:07:05] DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:08 CET)" raised an exception
                    ...
[19:07:08] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:12 CET)" raised an exception
                    ...
[19:07:12] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
[19:07:13] DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:17 CET)" raised an exception
                    ...
[19:07:17] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:21 CET)" raised an exception
                    ...
[19:07:21] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:26 CET)" raised an exception
                    ...
[19:07:27] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:32 CET)" raised an exception
                    ...
[19:07:32] INFO     dstack._internal.server.services.backends:341 Requesting instance offers from backends: ['vultr']
           DEBUG    dstack._internal.server.background.tasks.process_instances:558 Trying vc2-4c-8gb in vultr/ewr for $0.0550 per hour
           ERROR    apscheduler.executors.default:195 Job "process_instances (trigger: interval[0:00:04], next run at: 2025-03-21 19:07:36 CET)" raised an exception
                    ...

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions