Skip to content

Added support for lifecycle.started option for apps#4672

Merged
andrewnester merged 52 commits intomainfrom
feat/lifecycle-started
Apr 9, 2026
Merged

Added support for lifecycle.started option for apps#4672
andrewnester merged 52 commits intomainfrom
feat/lifecycle-started

Conversation

@andrewnester
Copy link
Copy Markdown
Contributor

@andrewnester andrewnester commented Mar 6, 2026

Changes

Added support for lifecycle.started option

Why

This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy

Example configuration

resources:
  apps:
    myapp:
      name: my_app
      description: my_app_description
      source_code_path: ./app
      lifecycle:
        started: true

Tests

Added an acceptance test

@eng-dev-ecosystem-bot
Copy link
Copy Markdown
Collaborator

eng-dev-ecosystem-bot commented Mar 6, 2026

Commit: ef5f09a

Run: 23750330860

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
🟨​ aws linux 7 10 270 817 6:36
🟨​ aws windows 7 10 272 815 6:34
💚​ aws-ucws linux 7 10 366 733 7:50
💚​ aws-ucws windows 7 10 368 731 5:46
💚​ azure linux 1 12 273 815 6:23
💚​ azure windows 1 12 275 813 4:46
💚​ azure-ucws linux 1 12 371 729 7:45
💚​ azure-ucws windows 1 12 373 727 5:03
💚​ gcp linux 1 12 269 818 6:01
💚​ gcp windows 1 12 271 816 6:35
17 interesting tests: 10 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 🟨​K 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 🟨​K 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 🟨​K 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 🟨​K 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 20 slowest tests (at least 2 minutes):
duration env testname
4:19 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:45 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:14 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:13 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:12 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:11 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:10 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:08 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:07 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:51 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:49 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:48 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:47 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:46 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:45 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:39 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:19 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:18 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:11 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

Copy link
Copy Markdown
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review (automated, 2 agents)

Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion

[Critical] DoCreate never deploys app code when lifecycle.started=true

bundle/direct/dresources/app.go (DoCreate)

DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.

Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.

[Critical] All local-only fields Skipped, preventing DoUpdate from running

bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.

Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.

[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op

bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go

started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.

Suggestion: Plan an explicit Start step when started=true and resource is stopped.

[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

bundle/config/resources/lifecycle.go:18-32

If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.

[Major] plan_test.go lost coverage breadth

bundle/phases/plan_test.go

Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.

[Major] No validation for lifecycle.started on unsupported resource types

bundle/config/mutator/validate_lifecycle_started.go:30-46

Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.

[Gap (Major)] Acceptance test never tests started-only toggle

The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.

[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started

Only the app path is tested.

[Nit] Validation error doesn't identify which resource

validate_lifecycle_started.go:40-46 - Include resource key in error message.

[Nit] Duplicate lifecycle entries in schema output

out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.

[Nit] Redundant zero-value assignments in RemapState

app.go:93-100 - Explicit zero values are unnecessary in Go struct init.

Copy link
Copy Markdown
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This review was posted by Claude (AI assistant).

Priority: HIGH — Several critical correctness issues

MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys

For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.

MAJOR: If only started changes on an app, DoUpdate is never called

OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.

MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle

type LifecycleWithStarted struct {
    PreventDestroy bool  `json:"prevent_destroy,omitempty"`
    Started        *bool `json:"started,omitempty"`
}

Should embed Lifecycle instead:

type LifecycleWithStarted struct {
    Lifecycle
    Started *bool `json:"started,omitempty"`
}

Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.

MAJOR: Field shadowing creates duplicate lifecycle schema entries

Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:

resources.apps.*.lifecycle  resources.Lifecycle           INPUT
resources.apps.*.lifecycle  resources.LifecycleWithStarted  INPUT

MEDIUM: ILifecycle naming not idiomatic Go

The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.

MEDIUM: Lost parametric test coverage

The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.

MEDIUM: No unit tests for ValidateLifecycleStarted

The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.

What looks good

  • appdeploy package extraction is clean DRY improvement
  • Test server additions are thorough with proper state management
  • Schema and annotation descriptions are clear
  • The overall feature design is well thought out

Focus areas for review

  1. Cluster/warehouse update path — started=true ineffective after creation
  2. App started-only toggle — silent no-op
  3. Field embedding — LifecycleWithStarted should embed Lifecycle
  4. Test coverage restoration

@andrewnester
Copy link
Copy Markdown
Contributor Author

[Critical] DoCreate never deploys app code when lifecycle.started=true
MAJOR: If only started changes on an app, DoUpdate is never called
MEDIUM: No unit tests for ValidateLifecycleStarted
[Major] No validation for lifecycle.started on unsupported resource types

All of these are expected

Copy link
Copy Markdown
Contributor

@denik denik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of offline discussion:

  • we should have a test where we only change config entry from started=false to started=true and vice versa. This should only trigger Start/Stop call but not update call (we should record requests to confirm)
  • started=false should not be the same as started omitted. It should mean stopped and omitted should mean "dont care about start/stop status" which is backward compatible with current behaviour.

@@ -0,0 +1,10 @@

>>> update_file.py databricks.yml my_app_description MY_APP_DESCRIPTION
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is a bit difficult to read because the update operations are separated from the actual applies / assertions in out.deploy.direct.txt. Can we inline these update operations there as well? No need for an output.txt here.

deployment.Command = config.Command
}

if len(config.Env) > 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if unnecessary? for-loop will take care of it.

if app.ActiveDeployment != nil {
// The source code path in active deployment is snapshotted version of the source code path in the app.
// We need to use the default source code path to get the correct source code path for drift detection.
remote.SourceCodePath = app.DefaultSourceCodePath
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, why not always set SourceCodePath to app.DefaultSourceCodePath? (even if app.ActiveDeployment="")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because if there's no active deployment (the job wasn't started) we should not set SourceCodePath for remote because it does not exist, otherwise it causes the drift

@andrewnester andrewnester requested a review from denik April 7, 2026 12:55
@andrewnester andrewnester requested a review from denik April 8, 2026 09:26
@simonfaltum simonfaltum removed their request for review April 8, 2026 12:29
Copy link
Copy Markdown
Contributor

@denik denik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved assuming remaining comments are addressed.

@andrewnester andrewnester changed the title Added support for lifecycle.started option Added support for lifecycle.started option for apps Apr 9, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Approved by @shreyas-goenka

See OWNERS for ownership rules.

@andrewnester andrewnester enabled auto-merge April 9, 2026 11:54
@andrewnester andrewnester added this pull request to the merge queue Apr 9, 2026
Merged via the queue into main with commit 49bb30b Apr 9, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants