Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dynamic configuration model in bundles #1098

Merged
merged 142 commits into from
Feb 16, 2024
Merged

Use dynamic configuration model in bundles #1098

merged 142 commits into from
Feb 16, 2024

Conversation

pietern
Copy link
Contributor

@pietern pietern commented Jan 4, 2024

Changes

This is a fundamental change to how we load and process bundle configuration. We now depend on the configuration being represented as a dyn.Value. This representation is functionally equivalent to Go's any (it is variadic) and allows us to capture metadata associated with a value, such as where it was defined (e.g. file, line, and column). It also allows us to represent Go's zero values properly (e.g. empty string, integer equal to 0, or boolean false).

Using this representation allows us to let the configuration model deviate from the typed structure we have been relying on so far (config.Root). We need to deviate from these types when using variables for fields that are not a string themselves. For example, using ${var.num_workers} for an integer workers field was impossible until now (though not implemented in this change).

The loader for a dyn.Value includes functionality to capture any and all type mismatches between the user-defined configuration and the expected types. These mismatches can be surfaced as validation errors in future PRs.

Given that many mutators expect the typed struct to be the source of truth, this change converts between the dynamic representation and the typed representation on mutator entry and exit. Existing mutators can continue to modify the typed representation and these modifications are reflected in the dynamic representation (see MarkMutatorEntry and MarkMutatorExit in bundle/config/root.go).

Required changes included in this change:

  • The existing interpolation package is removed in favor of libs/dyn/dynvar.
  • Functionality to merge job clusters, job tasks, and pipeline clusters are now all broken out into their own mutators.

To be implemented later:

  • Allow variable references for non-string types.
  • Surface diagnostics about the configuration provided by the user in the validation output.
  • Some mutators use a resource's configuration file path to resolve related relative paths. These depend on bundle/config/paths.Path being set and populated through ConfigureConfigFilePath. Instead, they should interact with the dynamically typed configuration directly. Doing this also unlocks being able to differentiate different base paths used within a job (e.g. a task override with a relative path defined in a directory other than the base job).

Tests

  • Existing unit tests pass (some have been modified to accommodate)
  • Integration tests pass

This change adds:
* A `config.Walk` function to walk a configuration tree
* A `config.Path` type to represent a value's path inside a tree
* Functions to create a `config.Path` from a string, or convert one to a string
This feature supports variable lookups in a `dyn.Value` that are present in the
type but haven't been initialized with a value.

For example: `${bundle.git.origin_url}` is present in the `dyn.Value` only if
it was assigned a value. If it wasn't assigned a value it should resolve to the
empty string. This normalization option, when set, ensures that all fields that
are represented in the specified type are present in the return value.
github-merge-queue bot pushed a commit that referenced this pull request Feb 15, 2024
…#1211)

## Changes

This feature supports variable lookups in a `dyn.Value` that are present
in the type but haven't been initialized with a value.

For example: `${bundle.git.origin_url}` is present in the `dyn.Value`
only if it was assigned a value. If it wasn't assigned a value it should
resolve to the empty string. This normalization option, when set,
ensures that all fields that are represented in the specified type are
present in the return value.

This change is in support of #1098.

## Tests

Added unit test.
github-merge-queue bot pushed a commit that referenced this pull request Feb 16, 2024
#1217)

## Changes

When resolving a value returned by the lookup function, the code would
call into `resolveRef` with the key that `resolveKey` was called with.
In doing so, it would cache the _new_ ref under that key.

We fix this by caching ref resolution only at the top level and relying
on lookup caching to avoid duplicate work.

This came up while testing #1098.

## Tests

Unit test.
@pietern pietern added this pull request to the merge queue Feb 16, 2024
Merged via the queue into main with commit 87dd46a Feb 16, 2024
4 checks passed
@pietern pietern deleted the bundle-use-dyn branch February 16, 2024 19:49
github-merge-queue bot pushed a commit that referenced this pull request Feb 16, 2024
## Changes

This builds on #1098 and uses the `dyn.Value` representation of the
bundle configuration to generate the Terraform JSON definition of
resources in the bundle.

The existing code (in `BundleToTerraform`) was not great and in an
effort to slightly improve this, I added a package `tfdyn` that includes
dedicated files for each resource type. Every resource type has its own
conversion type that takes the `dyn.Value` of the bundle-side resource
and converts it into Terraform resources (e.g. a job and optionally its
permissions).

Because we now use a `dyn.Value` as input, we can represent and emit
zero-values that have so far been omitted. For example, setting
`num_workers: 0` in your bundle configuration now propagates all the way
to the Terraform JSON definition.

## Tests

* Unit tests for every converter. I reused the test inputs from
`convert_test.go`.
* Equivalence tests in every existing test case checks that the
resulting JSON is identical.
* I manually compared the TF JSON file generated by the CLI from the
main branch and from this PR on all of our bundles and bundle examples
(internal and external) and found the output doesn't change (with the
exception of the odd zero-value being included by the version in this
PR).
andrewnester added a commit that referenced this pull request Feb 20, 2024
CLI:
 * Add support for UC Volumes to the `databricks fs` commands ([#1209](#1209)).

Bundles:
 * Use dynamic configuration model in bundles ([#1098](#1098)).
 * Allow use of variables references in primitive non-string fields ([#1219](#1219)).
 * Add an experimental default-sql template ([#1051](#1051)).
 * Add an experimental dbt-sql template ([#1059](#1059)).

Internal:
 * Add fork-user to winget release workflow ([#1214](#1214)).
 * Use `any` as type for data sources and resources in `tf/schema` ([#1216](#1216)).
 * Avoid infinite recursion when normalizing a recursive type ([#1213](#1213)).
 * Fix issue where interpolating a new ref would rewrite unrelated fields ([#1217](#1217)).
 * Use `dyn.Value` as input to generating Terraform JSON ([#1218](#1218)).

API Changes:
 * Changed `databricks lakehouse-monitors update` command with new required argument order.
 * Added `databricks online-tables` command group.

OpenAPI commit cdd76a98a4fca7008572b3a94427566dd286c63b (2024-02-19)
Dependency updates:
 * Bump Terraform provider to v1.36.2 ([#1215](#1215)).
 * Bump github.com/databricks/databricks-sdk-go from 0.32.0 to 0.33.0 ([#1222](#1222)).
@andrewnester andrewnester mentioned this pull request Feb 20, 2024
github-merge-queue bot pushed a commit that referenced this pull request Feb 20, 2024
CLI:
* Add support for UC Volumes to the `databricks fs` commands
([#1209](#1209)).

Bundles:
* Use dynamic configuration model in bundles
([#1098](#1098)).
* Allow use of variables references in primitive non-string fields
([#1219](#1219)).
* Add an experimental default-sql template
([#1051](#1051)).
* Add an experimental dbt-sql template
([#1059](#1059)).

Internal:
* Add fork-user to winget release workflow
([#1214](#1214)).
* Use `any` as type for data sources and resources in `tf/schema`
([#1216](#1216)).
* Avoid infinite recursion when normalizing a recursive type
([#1213](#1213)).
* Fix issue where interpolating a new ref would rewrite unrelated fields
([#1217](#1217)).
* Use `dyn.Value` as input to generating Terraform JSON
([#1218](#1218)).

API Changes:
* Changed `databricks lakehouse-monitors update` command with new
required argument order.
 * Added `databricks online-tables` command group.

OpenAPI commit cdd76a98a4fca7008572b3a94427566dd286c63b (2024-02-19)
Dependency updates:
* Bump Terraform provider to v1.36.2
([#1215](#1215)).
* Bump github.com/databricks/databricks-sdk-go from 0.32.0 to 0.33.0
([#1222](#1222)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants