Skip to content

[Bug]: Short utilization_policy.time_window ignored #2384

@jvstme

Description

@jvstme

Steps to reproduce

Start a run with a short utilization_policy.time_window.

> cat .dstack.yml 
type: dev-environment
ide: vscode
utilization_policy:
  min_gpu_utilization: 50
  time_window: 10s
resources:
  gpu: 1..

> dstack apply -y

Wait until time_window elapses.

Actual behaviour

The run keeps running.

Expected behaviour

The run is terminated OR dstack does not allow to set a time_window that is too short or too long to work properly.

dstack version

master

Server logs

[23:27:01] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
           INFO     dstack._internal.server.background.tasks.process_runs:336 run(10058a)warm-badger-2: run status has changed PROVISIONING -> RUNNING
[23:27:04] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:09] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:14] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:19] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:24] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:28] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:33] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:38] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:44] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:49] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:27:55] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:28:01] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples
[23:28:06] DEBUG    dstack._internal.server.background.tasks.process_running_jobs:694 job(0c5465)warm-badger-2-0-0: GPU utilization check: not enough samples

Additional information

If time_window is comparable to the metrics collection interval, there will never be 2 metric points within the window.

Short time_window does not make sense for real workloads but is likely to be used when testing utilization_policy.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions