New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spark: Change SparkPropertyFacetBuilder to support recording spark run time … #2523
spark: Change SparkPropertyFacetBuilder to support recording spark run time … #2523
Conversation
…config. Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com>
@pawel-big-lebowski Moved the old PR to here! |
The old PR is reviewed. I just copied it to here to make it clean. List the old PR below. |
@pawel-big-lebowski I found that after these lines of codes added, not all the openlineage events reported has a SparkPropertyFacet. |
First, I think CI is failing because of different reason:
It looks to me as if there was a problem with SparkSession at the end event - which is possible if a whole spark context is already down. My understanding of the code is:
My previous comment on adding the condition was based on your comment:
I thought you want to add the facet for |
…ception Signed-off-by: ruihuawang <ruihuawang@microsoft.com>
fc5727a
to
ec58be3
Compare
Signed-off-by: ruihuawang <ruihuawang@microsoft.com>
Signed-off-by: ruihuawang <ruihuawang@microsoft.com>
I apologize for my misleading comments. What we really want is that: each open lineage event with Start/Complete status has this SparkPropertyFacet. Previously I believed SparkListenerJobStart is OpenLineage START status and SparkListenerJobEnd is OpenLineage COMPLETE status. Recently I found I was wrong. So I finally delete the checking logic. By the way, I also change the error handling logic to fix the CI. I change the catcher to super class RuntimeException and now it can handle all the cases in CI. |
@pawel-big-lebowski |
Great job! Congrats on your first merged pull request in OpenLineage! |
…n time … (#2523) * Change SparkPropertyFacetBuilder to support recording spark run time config. Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com> * Remove checking logic for SparkListener instances and catch RuntimeException Signed-off-by: ruihuawang <ruihuawang@microsoft.com> * Fix format Signed-off-by: ruihuawang <ruihuawang@microsoft.com> * Fix format again Signed-off-by: ruihuawang <ruihuawang@microsoft.com> --------- Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com> Signed-off-by: ruihuawang <ruihuawang@microsoft.com> Co-authored-by: pawel.leszczynski <leszczynski.pawel@gmail.com>
…n time … (OpenLineage#2523) * Change SparkPropertyFacetBuilder to support recording spark run time config. Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com> * Remove checking logic for SparkListener instances and catch RuntimeException Signed-off-by: ruihuawang <ruihuawang@microsoft.com> * Fix format Signed-off-by: ruihuawang <ruihuawang@microsoft.com> * Fix format again Signed-off-by: ruihuawang <ruihuawang@microsoft.com> --------- Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com> Signed-off-by: ruihuawang <ruihuawang@microsoft.com> Co-authored-by: pawel.leszczynski <leszczynski.pawel@gmail.com> Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Change SparkPropertyFacetBuilder to support recording spark run time config
Problem
Modify SparkPropertyFacetBuilder to capture RuntimeConfig of Spark session, because the existing SparkPropertyFacet can only capture static config of spark context.
This Facet will be added in both RDD-related runs and SQL related runs.
One-line summary:
Modify SparkPropertyFacetBuilder to support recording spark run time config
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project