Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist deployment metadata in WSFS #845

Merged
merged 36 commits into from
Oct 27, 2023
Merged

Persist deployment metadata in WSFS #845

merged 36 commits into from
Oct 27, 2023

Conversation

shreyas-goenka
Copy link
Contributor

@shreyas-goenka shreyas-goenka commented Oct 6, 2023

Changes

This PR introduces a metadata struct that stores a subset of bundle configuration that we wish to expose to other Databricks services that wish to integrate with bundles.

This metadata file is uploaded to a file ${bundle.workspace.state_path}/metadata.json in the WSFS destination of the bundle deployment.

Documentation for emitted metadata fields:

  • version: Version for the metadata file schema
  • config.bundle.git.branch: Name of the git branch the bundle was deployed from.
  • config.bundle.git.origin_url: URL for git remote "origin"
  • config.bundle.git.bundle_root_path: Relative path of the bundle root from the root of the git repository. Is set to "." if they are the same.
  • config.bundle.git.commit: SHA-1 commit hash of the exact commit this bundle was deployed from. Note, the deployment might not exactly match this commit version if there are changes that have not been committed to git at deploy time,
  • file_path: Path in workspace where we sync bundle files to.
  • resources.jobs.[job-ref].id: Id of the job
  • resources.jobs.[job-ref].relative_path: Relative path of the yaml config file from the bundle root where this job was defined.

Example metadata object when bundle root and git root are the same:

{
  "version": 1,
  "config": {
    "bundle": {
      "lock": {},
      "git": {
        "branch": "master",
        "origin_url": "www.host.com",
        "commit": "7af8e5d3f5dceffff9295d42d21606ccf056dce0",
        "bundle_root_path": "."
      }
    },
    "workspace": {
      "file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
    },
    "resources": {
      "jobs": {
        "bar": {
          "id": "245921165354846",
          "relative_path": "databricks.yml"
        }
      }
    },
    "sync": {}
  }
}

Example metadata when the git root is one level above the bundle repo:

{
  "version": 1,
  "config": {
    "bundle": {
      "lock": {},
      "git": {
        "branch": "dev-branch",
        "origin_url": "www.my-repo.com",
        "commit": "3db46ef750998952b00a2b3e7991e31787e4b98b",
        "bundle_root_path": "pipeline-progress"
      }
    },
    "workspace": {
      "file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
    },
    "resources": {
      "jobs": {
        "bar": {
          "id": "245921165354846",
          "relative_path": "databricks.yml"
        }
      }
    },
    "sync": {}
  }
}

This unblocks integration to the jobs break glass UI for bundles.

Tests

Unit tests and integration tests.

@shreyas-goenka shreyas-goenka changed the title Emit metadata v2 Persist deployment metadata in WSFS Oct 9, 2023
@shreyas-goenka shreyas-goenka marked this pull request as ready for review October 9, 2023 12:20
bundle/config/paths/paths.go Outdated Show resolved Hide resolved
bundle/config/bundle.go Outdated Show resolved Hide resolved
bundle/config/bundle.go Outdated Show resolved Hide resolved
internal/bundle/job_metadata_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach LGTM, most comments are cosmetic.

bundle/bundle.go Show resolved Hide resolved
bundle/config/git.go Outdated Show resolved Hide resolved
bundle/config/mutator/load_git_details.go Show resolved Hide resolved
bundle/config/paths/paths.go Outdated Show resolved Hide resolved
bundle/deploy/metadata.go Outdated Show resolved Hide resolved
bundle/deploy/metadata/compute.go Show resolved Hide resolved
bundle/deploy/metadata/compute_test.go Outdated Show resolved Hide resolved
bundle/deploy/metadata/upload.go Outdated Show resolved Hide resolved
internal/bundle/job_metadata_test.go Outdated Show resolved Hide resolved
bundle/config/paths/paths.go Outdated Show resolved Hide resolved
bundle/deploy/metadata/compute.go Outdated Show resolved Hide resolved
Copy link

@sungchiu-db sungchiu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fields look good!
@jwm0 we should be able to use

  1. config.bundle.git.bundle_root_path + resources.jobs.[job-ref].relative_path to derive relative job file path in Repo.
  2. config.bundle.git.bundle_root_path + (notebook_path - file_path) to derive relative notebook path in a Repo.

Copy link

@jwm0 jwm0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for metadata content

@shreyas-goenka
Copy link
Contributor Author

Tested out the new implementation with a deploy. Here's what the metadata looks like:

{
  "version": 1,
  "config": {
    "bundle": {
      "git": {
        "branch": "my-branch-lol",
        "origin_url": "www.abc.com",
        "commit": "ed3b97a45ed7822f9963442bfb938ac36c65579b",
        "bundle_root_path": "."
      }
    },
    "workspace": {
      "file_path": "/Users/shreyas.goenka@databricks.com/.bundle/pipeline-progress/default/files"
    },
    "resources": {
      "jobs": {
        "bar": {
          "id": "1115233850898651",
          "relative_path": "databricks.yml"
        }
      }
    }
  }
}

Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, espc on the integration test.

Remaining comments are tiny/style.

bundle/deploy/metadata/compute.go Outdated Show resolved Hide resolved
bundle/deploy/metadata/upload.go Outdated Show resolved Hide resolved
bundle/metadata/metadata.go Outdated Show resolved Hide resolved
bundle/metadata/metadata.go Outdated Show resolved Hide resolved
bundle/deploy/metadata/compute_test.go Outdated Show resolved Hide resolved
@shreyas-goenka shreyas-goenka added this pull request to the merge queue Oct 27, 2023
Merged via the queue into main with commit 5a8cd0c Oct 27, 2023
4 checks passed
@shreyas-goenka shreyas-goenka deleted the emit-metadata-v2 branch October 27, 2023 13:02
@shreyas-goenka shreyas-goenka mentioned this pull request Nov 2, 2023
shreyas-goenka added a commit that referenced this pull request Nov 2, 2023
CLI:
 * Fix URL for bundle template documentation ([#903](#903)).
 * Library to convert config.Value to Go struct ([#904](#904)).
 * Loading an empty file yields a nil ([#906](#906)).
 * Fix pattern validation for input properties ([#912](#912)).
 * Simplified code generation logic for handling path and request body parameters and JSON input ([#905](#905)).
 * Add support for multiline descriptions when using template enums ([#916](#916)).
 * Move bundle configuration filename code ([#917](#917)).
 * Add configuration normalization code ([#915](#915)).
 * Add welcome message to bundle templates ([#907](#907)).
 * Consolidate bundle configuration loader function ([#918](#918)).
 * Upload terraform state even if apply fails ([#923](#923)).
 * Use UserName instead of Id to check if identity used is a service principal ([#924](#924)).
 * `make snapshot` to build file in `.databricks/databricks` ([#927](#927)).
 * Persist deployment metadata in WSFS ([#845](#845)).
 * Run make fmt from fmt job ([#929](#929)).
 * Add override to support YAML inputs for apps ([#921](#921)).
 * Add GitHub issue templates ([#925](#925)).
 * Remove resolution of repo names against the Databricks Github account ([#940](#940)).
 * Fix metadata computation for empty bundle ([#939](#939)).

Bundles:
 * **FILL THIS IN MANUALLY BY MOVING RELEVANT ITEMS FROM ABOVE LIST**

Internal:
 * **FILL THIS IN MANUALLY BY MOVING RELEVANT ITEMS FROM ABOVE LIST**

API Changes:
 * Added `databricks apps` command group.
 * Added `databricks account network-policy` command group.

OpenAPI commit 5903bb39137fd76ac384b2044e425f9c56840e00 (2023-10-23)
Dependency updates:
 * Bump google.golang.org/grpc from 1.58.2 to 1.58.3 ([#920](#920)).
 * Bump the Go SDK in the CLI ([#919](#919)).
 * Bump Terraform provider to v1.29.0 ([#926](#926)).
 * Bump github.com/google/uuid from 1.3.1 to 1.4.0 ([#932](#932)).
github-merge-queue bot pushed a commit that referenced this pull request Nov 2, 2023
CLI:
* Added GitHub issue templates for CLI and DABs issues
([#925](#925)).
* Added override to support YAML inputs for apps
([#921](#921)).
* Simplified code generation logic for handling path and request body
parameters and JSON input
([#905](#905)).


Bundles:
* Fixed URL for bundle template documentation in init command help docs
([#903](#903)).
* Fixed pattern validation for input parameters in a bundle template
([#912](#912)).
* Fixed multiline description rendering for enum input parameters in
bundle templates ([#916](#916)).
* Changed production mode check for whether identity used is a service
principal to use UserName
([#924](#924)).
* Changed bundle deploy to upload partial terraform state even if
deployment fails ([#923](#923)).
* Added support for welcome messages to bundle templates
([#907](#907)).
* Added support for uploading bundle deployment metadata to WSFS
([#845](#845)).


Internal:
* Loading an empty yaml file yields a nil
([#906](#906)).
* Library to convert config.Value to Go struct
([#904](#904)).
* Remove default resolution of repo names against the Databricks Github
account([#940](#940)).
* Run make fmt from fmt job
([#929](#929)).
* `make snapshot` to build file in `.databricks/databricks`
([#927](#927)).
* Add configuration normalization code
([#915](#915)).

API Changes:
 * Added `databricks apps` command group.
 * Added `databricks account network-policy` command group.

Dependency updates:
* Bump Terraform provider from v1.28.0 to v1.29.0
([#926](#926)).
* Bump the Go SDK in the CLI from v0.23 to v0.24
([#919](#919)).
* Bump google.golang.org/grpc from 1.58.2 to 1.58.3
([#920](#920)).
* Bump github.com/google/uuid from 1.3.1 to 1.4.0
([#932](#932)).

OpenAPI commit 5903bb39137fd76ac384b2044e425f9c56840e00 (2023-10-23)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants