Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/docs/concepts/fleets.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The filename must end with `.dstack.yml` (e.g. `.dstack.yml` or `fleet.dstack.ym
placement: cluster

# Terminate if idle for 3 days
termination_idle_time: 3d
idle_duration: 3d

resources:
gpu:
Expand Down Expand Up @@ -199,9 +199,10 @@ $ dstack fleet

Once the status of instances changes to `idle`, they can be used by dev environments, tasks, and services.

!!! info "Termination policy"
!!! info "Idle duration"
If you want a fleet to be automatically deleted after a certain idle time,
you can set the [`termination_idle_time`](../reference/dstack.yml/fleet.md#termination_idle_time) property.
you can set the [`idle_duration`](../reference/dstack.yml/fleet.md#idle_duration) property.
By default, it's set to `3d`.

[//]: # (Add Idle time example to the reference page)

Expand Down
9 changes: 5 additions & 4 deletions docs/docs/dev-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,12 +136,13 @@ $ dstack apply -R -f examples/.dstack.yml

Alternatively, set [`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the run configuration.

### Termination policy
### Idle duration

If a fleet is created automatically, it remains `idle` for 5 minutes and can be reused within that time.
If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time.
If the fleet is not reused within this period, it is automatically terminated.
To change the default idle duration, set
[`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) in the run configuration (e.g., to 0 or a
longer duration).
[`idle_duration`](reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for
unlimited).

!!! info "Fleets"
For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use
Expand Down
9 changes: 5 additions & 4 deletions docs/docs/guides/protips.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,13 @@ $ dstack apply -R -f examples/.dstack.yml

</div>

### Termination policy
### Idle duration

If a fleet is created automatically, it remains `idle` for 5 minutes and can be reused within that time.
If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time.
If the fleet is not reused within this period, it is automatically terminated.
To change the default idle duration, set
[`termination_idle_time`](../reference/dstack.yml/fleet.md#termination_idle_time) in the run configuration (e.g., to 0 or a
longer duration).
[`idle_duration`](../reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for
unlimited).

!!! info "Fleets"
For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use
Expand Down
9 changes: 5 additions & 4 deletions docs/docs/services.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,12 +176,13 @@ $ dstack apply -R -f examples/.dstack.yml

Alternatively, set [`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the run configuration.

### Termination policy
### Idle duration

If a fleet is created automatically, it remains `idle` for 5 minutes and can be reused within that time.
If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time.
If the fleet is not reused within this period, it is automatically terminated.
To change the default idle duration, set
[`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) in the run configuration (e.g., to 0 or a
longer duration).
[`idle_duration`](reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for
unlimited).

!!! info "Fleets"
For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use
Expand Down
9 changes: 5 additions & 4 deletions docs/docs/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,13 @@ $ dstack apply -R -f examples/.dstack.yml

Alternatively, set [`creation_policy`](reference/dstack.yml/dev-environment.md#creation_policy) to `reuse` in the run configuration.

### Termination policy
### Idle duration

If a fleet is created automatically, it remains `idle` for 5 minutes and can be reused within that time.
If a fleet is created automatically, it stays `idle` for 5 minutes by default and can be reused within that time.
If the fleet is not reused within this period, it is automatically terminated.
To change the default idle duration, set
[`termination_idle_time`](reference/dstack.yml/fleet.md#termination_idle_time) in the run configuration (e.g., to 0 or a
longer duration).
[`idle_duration`](reference/dstack.yml/fleet.md#idle_duration) in the run configuration (e.g., `0s`, `1m`, or `off` for
unlimited).

!!! info "Fleets"
For greater control over fleet provisioning, configuration, and lifecycle management, it is recommended to use
Expand Down
6 changes: 2 additions & 4 deletions frontend/src/types/fleet.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,7 @@ declare interface IFleetConfigurationRequest {
duration?: number | string;
} | boolean;
max_price?: number;
termination_policy?: "dont-destroy" | "destroy-after-idle";
termination_idle_time?: number | string;
idle_duration?: number | string;
}

declare interface IProfileRequest {
Expand All @@ -105,8 +104,7 @@ declare interface IProfileRequest {
pool_name?: string;
instance_name?: string;
creation_policy?: "reuse" | "reuse-or-create";
termination_policy?: "dont-destroy" | "destroy-after-idle";
termination_idle_time?: number | string;
idle_duration?: number | string;
name: string;
default?: boolean;
}
Expand Down
1 change: 1 addition & 0 deletions src/dstack/_internal/cli/services/configurators/fleet.py
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,7 @@ def th(s: str) -> str:
configuration_table.add_row(th("Spot policy"), spot_policy)
if reservation is not None:
configuration_table.add_row(th("Reservation"), reservation)
# TODO: [Andrey] Display "Idle duration"

offers_table = Table(box=None)
offers_table.add_column("#")
Expand Down
25 changes: 16 additions & 9 deletions src/dstack/_internal/cli/utils/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,15 @@

from dstack._internal.cli.utils.common import add_row_from_dict, console
from dstack._internal.core.models.instances import InstanceAvailability
from dstack._internal.core.models.profiles import TerminationPolicy
from dstack._internal.core.models.profiles import (
DEFAULT_RUN_TERMINATION_IDLE_TIME,
TerminationPolicy,
)
from dstack._internal.core.models.runs import (
Job,
RunPlan,
)
from dstack._internal.core.services.profiles import get_termination
from dstack._internal.utils.common import DateFormatter, format_pretty_duration, pretty_date
from dstack.api import Run

Expand All @@ -25,20 +29,24 @@ def print_run_plan(run_plan: RunPlan, offers_limit: int = 3):
pretty_req = req.pretty_format(resources_only=True)
max_price = f"${req.max_price:g}" if req.max_price else "-"
max_duration = (
f"{job_plan.job_spec.max_duration / 3600:g}h" if job_plan.job_spec.max_duration else "-"
format_pretty_duration(job_plan.job_spec.max_duration)
if job_plan.job_spec.max_duration
else "-"
)
if job_plan.job_spec.retry is None:
retry = "no"
retry = "-"
else:
retry = escape(job_plan.job_spec.retry.pretty_format())

profile = run_plan.run_spec.merged_profile
creation_policy = profile.creation_policy
termination_policy = profile.termination_policy
termination_policy, termination_idle_time = get_termination(
profile, DEFAULT_RUN_TERMINATION_IDLE_TIME
)
if termination_policy == TerminationPolicy.DONT_DESTROY:
termination_idle_time = "-"
idle_duration = "-"
else:
termination_idle_time = format_pretty_duration(profile.termination_idle_time)
idle_duration = format_pretty_duration(termination_idle_time)

if req.spot is None:
spot_policy = "auto"
Expand All @@ -60,9 +68,8 @@ def th(s: str) -> str:
props.add_row(th("Spot policy"), spot_policy)
props.add_row(th("Retry policy"), retry)
props.add_row(th("Creation policy"), creation_policy)
props.add_row(th("Termination policy"), termination_policy)
props.add_row(th("Termination idle time"), termination_idle_time)
props.add_row(th("Reservation"), run_plan.run_spec.configuration.reservation)
props.add_row(th("Idle duration"), idle_duration)
props.add_row(th("Reservation"), run_plan.run_spec.configuration.reservation or "-")

offers = Table(box=None)
offers.add_column("#")
Expand Down
25 changes: 18 additions & 7 deletions src/dstack/_internal/core/models/fleets.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@
from dstack._internal.core.models.instances import InstanceOfferWithAvailability, SSHKey
from dstack._internal.core.models.pools import Instance
from dstack._internal.core.models.profiles import (
DEFAULT_POOL_TERMINATION_IDLE_TIME,
Profile,
ProfileParams,
ProfileRetry,
SpotPolicy,
TerminationPolicy,
parse_duration,
parse_idle_duration,
)
from dstack._internal.core.models.resources import Range, ResourcesSpec

Expand Down Expand Up @@ -172,18 +172,33 @@ class InstanceGroupParams(CoreModel):
Optional[float],
Field(description="The maximum instance price per hour, in dollars", gt=0.0),
] = None

idle_duration: Annotated[
Optional[Union[Literal["off"], str, int]],
Field(
description="Time to wait before terminating idle instances. Defaults to `5m` for runs and `3d` for fleets. Use `off` for unlimited duration"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if off is better than -1. For me, off is ambiguous: does it mean “no duration” (= unlim, = -1, = don't destroy) or “no idle” (= 0, = destroy immediately after use). I personally would prefer -1 for “never destroy”, 0 for “destroy immediately” and > 0 for “destroy if idle for N days (minutes, hours, …)”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree about -1 being potentially better than off. On the other hand:
a) We already support off with max_duration
b) -1 is a bit too technical
Perhaps, we could later support t-1 too as a synonym for off. Okay?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a) Ah, I see
b) Agree, but, as far as I understand, -1 is already works the same way as off, it's just not documented. We can leave it as is for now

),
]
# Deprecated:
termination_policy: Annotated[
Optional[TerminationPolicy],
Field(description="The policy for instance termination. Defaults to `destroy-after-idle`"),
Field(
description="Deprecated in favor of `idle_duration`",
),
] = None
termination_idle_time: Annotated[
Optional[Union[str, int]],
Field(description="Time to wait before destroying idle instances. Defaults to `3d`"),
Field(
description="Deprecated in favor of `idle_duration`",
),
] = None

_validate_termination_idle_time = validator(
"termination_idle_time", pre=True, allow_reuse=True
)(parse_duration)
_validate_idle_duration = validator("idle_duration", pre=True, allow_reuse=True)(
parse_idle_duration
)


class FleetProps(CoreModel):
Expand Down Expand Up @@ -224,10 +239,6 @@ def _merged_profile(cls, values) -> Dict:
merged_profile.spot_policy = SpotPolicy.ONDEMAND
if merged_profile.retry is None:
merged_profile.retry = False
if merged_profile.termination_policy is None:
merged_profile.termination_policy = TerminationPolicy.DESTROY_AFTER_IDLE
if merged_profile.termination_idle_time is None:
merged_profile.termination_idle_time = DEFAULT_POOL_TERMINATION_IDLE_TIME
values["merged_profile"] = merged_profile
return values

Expand Down
24 changes: 21 additions & 3 deletions src/dstack/_internal/core/models/profiles.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,18 @@ def parse_duration(v: Optional[Union[int, str]]) -> Optional[int]:


def parse_max_duration(v: Optional[Union[int, str]]) -> Optional[Union[str, int]]:
# TODO: [Andrey] Not sure this works (see `parse_idle_duration`)
if v == "off":
return v
return parse_duration(v)


def parse_idle_duration(v: Optional[Union[int, str]]) -> Optional[Union[str, int]]:
if v is False:
return -1
return parse_duration(v)


class ProfileRetryPolicy(CoreModel):
retry: Annotated[bool, Field(description="Whether to retry the run on failure or not")] = False
duration: Annotated[
Expand Down Expand Up @@ -144,17 +151,25 @@ class ProfileParams(CoreModel):
description="The policy for using instances from the pool. Defaults to `reuse-or-create`"
),
]
idle_duration: Annotated[
Optional[Union[Literal["off"], str, int]],
Field(
description="Time to wait before terminating idle instances. Defaults to `5m` for runs and `3d` for fleets. Use `off` for unlimited duration"
),
]
# Deprecated:
termination_policy: Annotated[
Optional[TerminationPolicy],
Field(description="The policy for instance termination. Defaults to `destroy-after-idle`"),
Field(
description="Deprecated in favor of `idle_duration`",
),
]
termination_idle_time: Annotated[
Optional[Union[str, int]],
Field(
description="Time to wait before destroying the idle instance. Defaults to `5m` for `dstack run` and to `3d` for `dstack pool add`"
description="Deprecated in favor of `idle_duration`",
),
]
# Deprecated:
# The name of the pool. If not set, dstack will use the default name
pool_name: Optional[str]
# The name of the instance
Expand All @@ -168,6 +183,9 @@ class ProfileParams(CoreModel):
_validate_termination_idle_time = validator(
"termination_idle_time", pre=True, allow_reuse=True
)(parse_duration)
_validate_idle_duration = validator("idle_duration", pre=True, allow_reuse=True)(
parse_idle_duration
)


class ProfileProps(CoreModel):
Expand Down
6 changes: 0 additions & 6 deletions src/dstack/_internal/core/models/runs.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,12 @@
SSHConnectionParams,
)
from dstack._internal.core.models.profiles import (
DEFAULT_RUN_TERMINATION_IDLE_TIME,
CreationPolicy,
Profile,
ProfileParams,
ProfileRetryPolicy,
RetryEvent,
SpotPolicy,
TerminationPolicy,
)
from dstack._internal.core.models.repos import AnyRunRepoData
from dstack._internal.core.models.resources import ResourcesSpec
Expand Down Expand Up @@ -338,10 +336,6 @@ def _merged_profile(cls, values) -> Dict:
setattr(merged_profile, key, conf_val)
if merged_profile.creation_policy is None:
merged_profile.creation_policy = CreationPolicy.REUSE_OR_CREATE
if merged_profile.termination_policy is None:
merged_profile.termination_policy = TerminationPolicy.DESTROY_AFTER_IDLE
if merged_profile.termination_idle_time is None:
merged_profile.termination_idle_time = DEFAULT_RUN_TERMINATION_IDLE_TIME
values["merged_profile"] = merged_profile
return values

Expand Down
27 changes: 25 additions & 2 deletions src/dstack/_internal/core/services/profiles.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
from typing import Optional
from typing import Optional, Tuple

from dstack._internal.core.models.profiles import DEFAULT_RETRY_DURATION, Profile, RetryEvent
from dstack._internal.core.models.profiles import (
DEFAULT_RETRY_DURATION,
Profile,
RetryEvent,
TerminationPolicy,
)
from dstack._internal.core.models.runs import Retry


Expand Down Expand Up @@ -30,3 +35,21 @@ def get_retry(profile: Profile) -> Optional[Retry]:
if profile_retry.duration is None:
profile_retry.duration = DEFAULT_RETRY_DURATION
return Retry.parse_obj(profile_retry)


def get_termination(
profile: Profile, default_termination_idle_time: int
) -> Tuple[TerminationPolicy, int]:
termination_policy = TerminationPolicy.DESTROY_AFTER_IDLE
termination_idle_time = default_termination_idle_time
if profile.termination_policy is not None:
termination_policy = profile.termination_policy
if profile.termination_idle_time is not None:
termination_idle_time = profile.termination_idle_time
if profile.idle_duration is not None and int(profile.idle_duration) < 0:
termination_policy = TerminationPolicy.DONT_DESTROY
elif profile.idle_duration is not None:
termination_idle_time = profile.idle_duration
if termination_policy == TerminationPolicy.DONT_DESTROY:
termination_idle_time = -1
return termination_policy, int(termination_idle_time)
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
)
from dstack._internal.core.models.profiles import (
DEFAULT_POOL_NAME,
DEFAULT_RUN_TERMINATION_IDLE_TIME,
CreationPolicy,
TerminationPolicy,
)
Expand All @@ -31,6 +32,7 @@
RunSpec,
)
from dstack._internal.core.models.volumes import Volume
from dstack._internal.core.services.profiles import get_termination
from dstack._internal.server.db import get_db, get_session_ctx
from dstack._internal.server.models import (
FleetModel,
Expand Down Expand Up @@ -499,12 +501,14 @@ def _create_instance_model_for_job(
instance_num: int,
) -> InstanceModel:
profile = run_spec.merged_profile
termination_policy = profile.termination_policy
termination_idle_time = profile.termination_idle_time
if not job_provisioning_data.dockerized:
# terminate vastai/k8s instances immediately
termination_policy = TerminationPolicy.DESTROY_AFTER_IDLE
termination_idle_time = 0
else:
termination_policy, termination_idle_time = get_termination(
profile, DEFAULT_RUN_TERMINATION_IDLE_TIME
)
instance = InstanceModel(
id=uuid.uuid4(),
name=f"{fleet_model.name}-{instance_num}",
Expand Down
Loading
Loading