-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK][FLINK][JAVA] Support yaml config files together with SparkConf/FlinkConf #2583
Conversation
1a3f37f
to
d59f3e2
Compare
8982f93
to
aaaba7f
Compare
@Getter @Setter private @Nullable Map<String, String> urlParams; | ||
@Getter @Setter private @Nullable Map<String, String> headers; | ||
|
||
@JsonProperty(access = JsonProperty.Access.WRITE_ONLY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to prevent serialising the field within debug facet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
ea0c971
to
7d12b4c
Compare
b8e85bf
to
04f03c7
Compare
* Contains methods which allows overwriting values of a config object with values from another | ||
* object | ||
*/ | ||
public interface OverwriteConfig<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've added this mechanism, but what's the actual purpose of it? Can't we just create new config instance rather than making config mutable if we need to merge config entries from multiple sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go with the approach you describe: change the interface to return a new value instead
will prepare a change for this.
@@ -31,33 +32,27 @@ public class EventEmitter { | |||
@Getter private String applicationJobName; | |||
@Getter private Optional<List<String>> customEnvironmentVariables; | |||
|
|||
public EventEmitter(ArgumentParser argument, String applicationJobName) | |||
public EventEmitter(SparkOpenLineageConfig config, String applicationJobName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
dff1c24
to
de990d0
Compare
62c7ce0
to
d5c8a3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on a phone right now, so it's a bit difficult to see the changes, so forgive me if I ask something that is obvious.
When converting from flat properties to a config object, how is this done?
To we try and infer a hierarchy from using dot separated properties? For example, if we do spark.openlineage.transport.foo.bar
, do we assume that the object has the hierarchy of:
transport:
\tfoo:
\t\tbar: $value
If so, this could pose a problem in my use case, as I have letting the custom transport act as a pass through for the values the spark integration provides. For example, I have the following property:
spark.openlineage.transport.properties.cluster.federation
The cluster.federation
is a property that the library my transport wraps expects. So if this code tries to infer hierarchy, this would break my integration.
@d-m-h Actually this is already done within this piece of code -> https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/src/main/java/io/openlineage/spark/agent/ArgumentParser.java#L88 I don't think this PR changes this behaviour anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job @pawel-big-lebowski. That change was very much needed. I'm approving but please wait with merging till we hear if @d-m-h's concerns are answered.
One thing you can improve in the meantime is improving comments on MergeConfig class - I believe we can explain the semantics better and in simple language.
...java/src/main/java/io/openlineage/client/circuitBreaker/JavaRuntimeCircuitBreakerConfig.java
Outdated
Show resolved
Hide resolved
ac1a464
to
7e5b0e2
Compare
@mobuchowski fixed docs for MergeConfig |
263055e
to
42b2aac
Compare
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
42b2aac
to
7eff37b
Compare
I see that changelog entry was added, but for 1.12.0 instead of Unreleased. Fix: #2636 |
@dolfinus thank you, sry |
Problem
Client-java documentation says it can read configs directly from
openlineage.yaml
file, while Spark integration reads config entries from SparkConf only. Flink integration can read it from file and Flink conf exclusively. This is confusing for the user.Problems:
Solution
OpenlineageYaml
->OpenlineageConfig
SparkOpenlineageConfig
,FlinkOpenlineageConfig
extending the above to keep entries specific to integrationOpenLineageConfig
classes.Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a schema change, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change, then select one of the following:
If you're contributing a new integration, please specify the scope of the integration and how/where it has been tested (e.g., Apache Spark integration supports
S3
andGCS
filesystem operations, tested with AWS EMR).One-line summary:
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project