Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK] JobTypeJobFacet added and used in Spark #2410

Conversation

mattiabertorello
Copy link
Contributor

Problem

Currently, a lineage backend has difficulty distinguishing between job types, Spark, DBT, and Airflow and needs regex on the _producer to get this information.
We need to add the JobType to make it more clear.

https://openlineage.io/docs/spec/facets/job-facets/job-type

Solution

  • Introduce new facet,
  • Include new facet within Spark integration.

Note: All schema changes require discussion. Please link the issue for context.

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports S3 and GCS filesystem operations, tested with AWS EMR).

One-line summary:

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

Signed-off-by: Mattia Bertorello <mattia.bertorello@booking.com>
…Facet

Signed-off-by: Mattia Bertorello <mattia.bertorello@booking.com>
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me. Would love to see changelog entry about this.

@mattiabertorello mattiabertorello changed the title JobTypeJobFacet added and used in Spark [SPARK] JobTypeJobFacet added and used in Spark Feb 7, 2024
Signed-off-by: Mattia Bertorello <mattia.bertorello@booking.com>
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Feb 7, 2024
@pawel-big-lebowski pawel-big-lebowski merged commit 7f1dd20 into OpenLineage:main Feb 7, 2024
23 checks passed
@mattiabertorello mattiabertorello deleted the spark/processing-type-job-facet branch February 7, 2024 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation integration/spark
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants