Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW] Remove redundant operator information from facets #2524

Merged

Conversation

kacpermuda
Copy link
Contributor

Problem

We encountered a case where a customer received an overly large OpenLineage START event, exceeding 2MB. This was traced back to an operator with unusually long arguments and attributes.

Solution

I propose refining the operator's attribute inclusion logic in facets. Instead of excluding known unimportant or large attributes, we should selectively include only those known to be important or compact. This approach ensures that custom operator attributes with substantial data do not inflate the event size.

One-line summary:

Remove redundant operator information from facets

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@kacpermuda kacpermuda force-pushed the airflow/remove-redundant-task-information-from-facets branch 2 times, most recently from 510f510 to 0c55c92 Compare March 15, 2024 10:41
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
@kacpermuda kacpermuda force-pushed the airflow/remove-redundant-task-information-from-facets branch from 0c55c92 to a187a40 Compare March 15, 2024 12:32
@mobuchowski mobuchowski merged commit 6ad158f into main Mar 15, 2024
29 checks passed
@mobuchowski mobuchowski deleted the airflow/remove-redundant-task-information-from-facets branch March 15, 2024 13:39
mobuchowski pushed a commit that referenced this pull request Mar 29, 2024
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
blacklight pushed a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
…ge#2524)

Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:integration/airflow openlineage-airflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants