Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added validate mutator to surface additional bundle warnings #1352

Merged
merged 7 commits into from Apr 18, 2024

Conversation

andrewnester
Copy link
Contributor

@andrewnester andrewnester commented Apr 9, 2024

Changes

All these validators will return warnings as part of bundle validate run

Added 2 mutators:

  1. To check that if tasks use job_cluster_key it is actually defined
  2. To check if there are any files to sync as part of deployment

Also added bundle.Parallel to run them in parallel

To make sure mutators under bundle.Parallel do not mutate config, introduced new ReadOnlyMutator, ReadOnlyBundle and ReadOnlyConfig.

Example

databricks bundle validate -p deco-staging
Warning: unknown field: new_cluster
  at resources.jobs.my_job
  in bundle.yml:24:7

Warning: job_cluster_key high_cpu_workload_job_cluster is not defined
  at resources.jobs.my_job.tasks[0].job_cluster_key
  in bundle.yml:35:28

Warning: There are no files to sync, please check your your .gitignore and sync.exclude configuration
  at sync.exclude
  in bundle.yml:18:5

Name: test
Target: default
Workspace:
  Host: https://acme.databricks.com
  User: andrew.nester@databricks.com
  Path: /Users/andrew.nester@databricks.com/.bundle/test/default

Found 3 warnings

Tests

Added unit tests

bundle/parallel.go Outdated Show resolved Hide resolved

func JobClusterKeyDefined() bundle.Mutator {
return &jobClusterKeyDefined{}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one! Question, no immediate action needed.

So we have a few more cases of references throughout our APIs. I suspect the ones that customers would hit most often are job_cluster_key, then task_key, and then some small long tails including the new environment_key. Makes me wonder how far we should go with these checks? And maybe whether it's worthwhile making this pattern more generic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are able to define to which fields attributes like task_key, job_cluster_key and etc are referencing to in some general way (like key value map of config path and etc.) we can make these generic. But I like the very explicit nature of it and would rather prefer add separate explicit mutator for each type of these checks

bundle/config/validate/files_to_sync.go Outdated Show resolved Hide resolved
@@ -140,6 +141,7 @@ func newValidateCommand() *cobra.Command {
}

diags = diags.Extend(bundle.Apply(ctx, b, phases.Initialize()))
diags = diags.Extend(bundle.Apply(ctx, b, validate.Validate()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these also be applied during deploy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't for the set of validation checks that perform I/O.

We can choose to run some of them at deploy time later if 1) they don't add latency, 2) the deploy would fail anyway if the validation fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point taken about performance! Alternatively, we group all validations that affect performance together? Rather than have one folder simply called /validate/ we could have a separate folder like /slow_validations/ or something. Then other validations like JobClusterKeyDefined and most future validations are still applied.

It's important that we include these warnings in the deploy path too since that is the common CUJ for CLI users right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed. Would fit well in a change of what we display on deploy.

Copy link
Contributor

@shreyas-goenka shreyas-goenka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor comments regarding tests.

bundle/config/validate/files_to_sync.go Outdated Show resolved Hide resolved
bundle/config/validate/job_cluster_key_defined.go Outdated Show resolved Hide resolved
bundle/parallel_test.go Outdated Show resolved Hide resolved
bundle/config/validate/files_to_sync.go Outdated Show resolved Hide resolved
bundle/config/validate/files_to_sync.go Outdated Show resolved Hide resolved
bundle/config/validate/job_cluster_key_defined.go Outdated Show resolved Hide resolved
bundle/config/validate/job_cluster_key_defined.go Outdated Show resolved Hide resolved
bundle/tests/job_cluster_key/databricks.yml Show resolved Hide resolved
@@ -140,6 +141,7 @@ func newValidateCommand() *cobra.Command {
}

diags = diags.Extend(bundle.Apply(ctx, b, phases.Initialize()))
diags = diags.Extend(bundle.Apply(ctx, b, validate.Validate()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't for the set of validation checks that perform I/O.

We can choose to run some of them at deploy time later if 1) they don't add latency, 2) the deploy would fail anyway if the validation fails.

bundle/parallel.go Outdated Show resolved Hide resolved
bundle/parallel.go Show resolved Hide resolved
bundle/bundle_read_only.go Show resolved Hide resolved
bundle/config/root_read_only.go Outdated Show resolved Hide resolved
bundle/config/validate/job_cluster_key_defined.go Outdated Show resolved Hide resolved
@@ -140,6 +141,7 @@ func newValidateCommand() *cobra.Command {
}

diags = diags.Extend(bundle.Apply(ctx, b, phases.Initialize()))
diags = diags.Extend(bundle.Apply(ctx, b, validate.Validate()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, indeed. Would fit well in a change of what we display on deploy.

{JobClusterKey: "do-not-exist"},
},
Tasks: []jobs.Task{
{JobClusterKey: "do-not-exist"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does exist here :D

@andrewnester andrewnester added this pull request to the merge queue Apr 18, 2024
Merged via the queue into main with commit 27f51c7 Apr 18, 2024
5 checks passed
@andrewnester andrewnester deleted the feature/validate-mutators branch April 18, 2024 15:20
pietern added a commit that referenced this pull request Apr 23, 2024
This release marks the general availability of Databricks Asset Bundles.

CLI:
 * Publish Docker images ([#1353](#1353)).
 * Add support for multi-arch Docker images ([#1362](#1362)).
 * Do not prefill https:// in prompt for Databricks Host ([#1364](#1364)).
 * Add better documentation for the `auth login` command ([#1366](#1366)).
 * Add URLs for authentication documentation to the auth command help ([#1365](#1365)).

Bundles:
 * Fix compute override for foreach tasks ([#1357](#1357)).
 * Transform artifact files source patterns in build not upload stage ([#1359](#1359)).
 * Convert between integer and float in normalization ([#1371](#1371)).
 * Disable locking for development mode ([#1302](#1302)).
 * Resolve variable references inside variable lookup fields ([#1368](#1368)).
 * Added validate mutator to surface additional bundle warnings ([#1352](#1352)).
 * Upgrade terraform-provider-databricks to 1.40.0 ([#1376](#1376)).
 * Print host in `bundle validate` when passed via profile or environment variables ([#1378](#1378)).
 * Cleanup remote file path on bundle destroy ([#1374](#1374)).
 * Add docs URL for `run_as` in error message ([#1381](#1381)).
 * Enable job queueing by default ([#1385](#1385)).
 * Added support for job environments ([#1379](#1379)).
 * Processing and completion of positional args to bundle run ([#1120](#1120)).
 * Add legacy option for `run_as` ([#1384](#1384)).

API Changes:
 * Changed `databricks lakehouse-monitors cancel-refresh` command with new required argument order.
 * Changed `databricks lakehouse-monitors create` command with new required argument order.
 * Changed `databricks lakehouse-monitors delete` command with new required argument order.
 * Changed `databricks lakehouse-monitors get` command with new required argument order.
 * Changed `databricks lakehouse-monitors get-refresh` command with new required argument order.
 * Changed `databricks lakehouse-monitors list-refreshes` command with new required argument order.
 * Changed `databricks lakehouse-monitors run-refresh` command with new required argument order.
 * Changed `databricks lakehouse-monitors update` command with new required argument order.
 * Changed `databricks account workspace-assignment update` command to return response.

OpenAPI commit 94684175b8bd65f8701f89729351f8069e8309c9 (2024-04-11)

Dependency updates:
 * Bump github.com/databricks/databricks-sdk-go from 0.37.0 to 0.38.0 ([#1361](#1361)).
 * Bump golang.org/x/net from 0.22.0 to 0.23.0 ([#1380](#1380)).
@pietern pietern mentioned this pull request Apr 23, 2024
github-merge-queue bot pushed a commit that referenced this pull request Apr 23, 2024
This release marks the general availability of Databricks Asset Bundles.

CLI:
* Publish Docker images
([#1353](#1353)).
* Add support for multi-arch Docker images
([#1362](#1362)).
* Do not prefill https:// in prompt for Databricks Host
([#1364](#1364)).
* Add better documentation for the `auth login` command
([#1366](#1366)).
* Add URLs for authentication documentation to the auth command help
([#1365](#1365)).

Bundles:
* Fix compute override for foreach tasks
([#1357](#1357)).
* Transform artifact files source patterns in build not upload stage
([#1359](#1359)).
* Convert between integer and float in normalization
([#1371](#1371)).
* Disable locking for development mode
([#1302](#1302)).
* Resolve variable references inside variable lookup fields
([#1368](#1368)).
* Added validate mutator to surface additional bundle warnings
([#1352](#1352)).
* Upgrade terraform-provider-databricks to 1.40.0
([#1376](#1376)).
* Print host in `bundle validate` when passed via profile or environment
variables ([#1378](#1378)).
* Cleanup remote file path on bundle destroy
([#1374](#1374)).
* Add docs URL for `run_as` in error message
([#1381](#1381)).
* Enable job queueing by default
([#1385](#1385)).
* Added support for job environments
([#1379](#1379)).
* Processing and completion of positional args to bundle run
([#1120](#1120)).
* Add legacy option for `run_as`
([#1384](#1384)).

API Changes:
* Changed `databricks lakehouse-monitors cancel-refresh` command with
new required argument order.
* Changed `databricks lakehouse-monitors create` command with new
required argument order.
* Changed `databricks lakehouse-monitors delete` command with new
required argument order.
* Changed `databricks lakehouse-monitors get` command with new required
argument order.
* Changed `databricks lakehouse-monitors get-refresh` command with new
required argument order.
* Changed `databricks lakehouse-monitors list-refreshes` command with
new required argument order.
* Changed `databricks lakehouse-monitors run-refresh` command with new
required argument order.
* Changed `databricks lakehouse-monitors update` command with new
required argument order.
* Changed `databricks account workspace-assignment update` command to
return response.

OpenAPI commit 94684175b8bd65f8701f89729351f8069e8309c9 (2024-04-11)

Dependency updates:
* Bump github.com/databricks/databricks-sdk-go from 0.37.0 to 0.38.0
([#1361](#1361)).
* Bump golang.org/x/net from 0.22.0 to 0.23.0
([#1380](#1380)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants