Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spark] Fix databricks environment #2537

Merged
merged 1 commit into from Apr 2, 2024

Conversation

pawel-big-lebowski
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski commented Mar 27, 2024

Problem

Databricks integration test is triggered manually and has not been triggered for some time. Meanwhile:

  • The test is not working anymore because the openlineage-spark jar name has changed
  • DatabricksEnvironmentFacetBuilder uses compileOnly class dependency which are no longer present on databricks environment which causes ClassNotFoundError

Solution

  • Fix package name that is uploaded onto databricks runtime for integration test
  • Use reflection to access databricks runtime properties as it is now present within classes which are not publicly available (for example com.databricks.backend.daemon.dbutils.MountInfo)
  • Verify databricks property builder within integration test.
  • Separate environment per databricks runtime version. If two integration tests are run at the same time, they no longer share the same tables. Suffix is added to all the names of resources (like tables, files) which were shared before.

Follow up issues:

Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a schema change, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change, then select one of the following:

Note: All schema changes require discussion. Please link the issue for context.

  • [] Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

One-line summary:

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@pawel-big-lebowski pawel-big-lebowski changed the title [Spark] fix databricks environment [Spark] Fix databricks environment Mar 27, 2024
@pawel-big-lebowski pawel-big-lebowski force-pushed the spark/fix-databricks-test branch 4 times, most recently from 5837791 to 350dde6 Compare March 28, 2024 11:11
@pawel-big-lebowski
Copy link
Contributor Author

Copy link
Member

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, besides looks good to me.

@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label Apr 2, 2024
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski merged commit a391c53 into main Apr 2, 2024
68 checks passed
@pawel-big-lebowski pawel-big-lebowski deleted the spark/fix-databricks-test branch April 2, 2024 16:17
blacklight pushed a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation Improvements or additions to documentation area:integration/spark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants