Added support for lifecycle.started option for apps#4672
Conversation
|
Commit: ef5f09a
17 interesting tests: 10 SKIP, 7 KNOWN
Top 20 slowest tests (at least 2 minutes):
|
simonfaltum
left a comment
There was a problem hiding this comment.
Review (automated, 2 agents)
Verdict: Not ready yet | 3 Critical | 3 Major | 2 Gap(Major) | 3 Nit | 1 Suggestion
[Critical] DoCreate never deploys app code when lifecycle.started=true
bundle/direct/dresources/app.go (DoCreate)
DoCreate only flips NoCompute and creates the app shell, but never calls Apps.Deploy. On first bundle deploy with started=true, the app gets compute but no actual deployment.
Suggestion: After create + wait, build deployment and call appdeploy.Deploy when started=true.
[Critical] All local-only fields Skipped, preventing DoUpdate from running
bundle/direct/dresources/app.go (OverrideChangeDesc + DoUpdate)
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If no other app fields change, the planner never calls DoUpdate, so lifecycle.started=true has no effect. The acceptance test masks this by always changing description alongside started.
Suggestion: Model app deployment as its own actionable step, or ensure started changes produce a non-skip action.
[Critical] Clusters and SQL warehouses: started=true on stopped resources is a no-op
bundle/direct/dresources/cluster.go, bundle/direct/dresources/sql_warehouse.go
started is also Skipped for clusters/warehouses. Even if another field triggers DoUpdate, Clusters.Edit on a terminated cluster doesn't start it. The bundle never converges to the requested active state.
Suggestion: Plan an explicit Start step when started=true and resource is stopped.
[Major] LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
bundle/config/resources/lifecycle.go:18-32
If Lifecycle gains new fields, LifecycleWithStarted won't inherit them. Suggestion: Embed Lifecycle in LifecycleWithStarted.
[Major] plan_test.go lost coverage breadth
bundle/phases/plan_test.go
Old test iterated ALL resource types for checkForPreventDestroy. New tests only cover 2 specific types. Suggestion: Keep a parametric test over all resource types.
[Major] No validation for lifecycle.started on unsupported resource types
bundle/config/mutator/validate_lifecycle_started.go:30-46
Setting lifecycle.started on a job in direct mode only produces a schema warning, not an error. Suggestion: Error explicitly for unsupported types.
[Gap (Major)] Acceptance test never tests started-only toggle
The test always changes description alongside started. No test for: first deploy issuing /deployments, source-only redeploys, or toggling started without other changes.
[Gap (Major)] No acceptance coverage for cluster or SQL warehouse lifecycle.started
Only the app path is tested.
[Nit] Validation error doesn't identify which resource
validate_lifecycle_started.go:40-46 - Include resource key in error message.
[Nit] Duplicate lifecycle entries in schema output
out.fields.txt - Both Lifecycle and LifecycleWithStarted show for apps/clusters/warehouses.
[Nit] Redundant zero-value assignments in RemapState
app.go:93-100 - Explicit zero values are unnecessary in Go struct init.
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant).
Priority: HIGH — Several critical correctness issues
MAJOR: Clusters and SQL Warehouses started=true has no effect on subsequent deploys
For clusters, OverrideChangeDesc marks started as Skip, but DoUpdate (which calls Clusters.Edit) does NOT start a terminated cluster. There is no code path that calls Clusters.Start when started=true and the cluster is terminated. Same issue for SQL warehouses. This means lifecycle.started: true only has effect during initial creation — on subsequent deploys, a stopped resource stays stopped.
MAJOR: If only started changes on an app, DoUpdate is never called
OverrideChangeDesc marks started, source_code_path, config, and git_source as Skip. If toggling only started from false→true with no other field changes, all fields get skipped and DoUpdate never fires. The acceptance test masks this by always changing description alongside started.
MAJOR: LifecycleWithStarted duplicates PreventDestroy instead of embedding Lifecycle
type LifecycleWithStarted struct {
PreventDestroy bool `json:"prevent_destroy,omitempty"`
Started *bool `json:"started,omitempty"`
}Should embed Lifecycle instead:
type LifecycleWithStarted struct {
Lifecycle
Started *bool `json:"started,omitempty"`
}Without this, any future fields added to Lifecycle will be silently missing from LifecycleWithStarted.
MAJOR: Field shadowing creates duplicate lifecycle schema entries
Apps, clusters, and SQL warehouses now have TWO lifecycle fields (one from BaseResource, one from the override). The schema output shows duplicate entries which is confusing. Visible in out.fields.txt:
resources.apps.*.lifecycle resources.Lifecycle INPUT
resources.apps.*.lifecycle resources.LifecycleWithStarted INPUT
MEDIUM: ILifecycle naming not idiomatic Go
The I prefix for interfaces is a Java/C# convention. Consider LifecycleConfig or similar.
MEDIUM: Lost parametric test coverage
The old TestCheckPreventDestroyForAllResources iterated over ALL resource types. The new tests only cover Job and App — significant regression in test breadth.
MEDIUM: No unit tests for ValidateLifecycleStarted
The new mutator has no corresponding test file. The error diagnostic also doesn't identify WHICH resource has the issue.
What looks good
appdeploypackage extraction is clean DRY improvement- Test server additions are thorough with proper state management
- Schema and annotation descriptions are clear
- The overall feature design is well thought out
Focus areas for review
- Cluster/warehouse update path —
started=trueineffective after creation - App started-only toggle — silent no-op
- Field embedding —
LifecycleWithStartedshould embedLifecycle - Test coverage restoration
All of these are expected |
denik
left a comment
There was a problem hiding this comment.
Summary of offline discussion:
- we should have a test where we only change config entry from started=false to started=true and vice versa. This should only trigger Start/Stop call but not update call (we should record requests to confirm)
- started=false should not be the same as started omitted. It should mean stopped and omitted should mean "dont care about start/stop status" which is backward compatible with current behaviour.
acceptance/bundle/resources/apps/lifecycle-started/out.deploy.direct.txt
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,10 @@ | |||
|
|
|||
| >>> update_file.py databricks.yml my_app_description MY_APP_DESCRIPTION | |||
There was a problem hiding this comment.
The test is a bit difficult to read because the update operations are separated from the actual applies / assertions in out.deploy.direct.txt. Can we inline these update operations there as well? No need for an output.txt here.
acceptance/bundle/invariant/configs/app.yml.tmpl-post-deploy.sh
Outdated
Show resolved
Hide resolved
| deployment.Command = config.Command | ||
| } | ||
|
|
||
| if len(config.Env) > 0 { |
There was a problem hiding this comment.
nit: if unnecessary? for-loop will take care of it.
| if app.ActiveDeployment != nil { | ||
| // The source code path in active deployment is snapshotted version of the source code path in the app. | ||
| // We need to use the default source code path to get the correct source code path for drift detection. | ||
| remote.SourceCodePath = app.DefaultSourceCodePath |
There was a problem hiding this comment.
Question, why not always set SourceCodePath to app.DefaultSourceCodePath? (even if app.ActiveDeployment="")
There was a problem hiding this comment.
Because if there's no active deployment (the job wasn't started) we should not set SourceCodePath for remote because it does not exist, otherwise it causes the drift
denik
left a comment
There was a problem hiding this comment.
Approved assuming remaining comments are addressed.
Approved by @shreyas-goenkaSee OWNERS for ownership rules. |
Changes
Added support for lifecycle.started option
Why
This new option allows to start resources such as apps, clusters and sql warehouses in started/active state.
For apps: when this option enabled, on each bundle deploy we automatically will trigger a new app deploy
Example configuration
Tests
Added an acceptance test