Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK] improve Cassandra lineage metadata #2479

Merged

Conversation

HuangZhenQiu
Copy link
Contributor

One-line summary:

Following the namespace definition, we should use cassandra://host:port as format, so that users can find which Cassandra cluster is used. At the same time, Cassandra keyspace and table name can be combined as name.


SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@HuangZhenQiu HuangZhenQiu force-pushed the improve-cassandra-lineage branch 2 times, most recently from 4a0c705 to cd699b5 Compare February 29, 2024 07:56
@boring-cyborg boring-cyborg bot added the area:documentation Improvements or additions to documentation label Feb 29, 2024
Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement. Couple of minor comments added.

@@ -1,6 +1,8 @@
# Changelog

## [Unreleased](https://github.com/OpenLineage/OpenLineage/compare/1.9.1...HEAD)
* **Flink: improve Cassandra lineage metadata** (https://github.com/OpenLineage/OpenLineage/pull/2479) [@HuangZhenQiu](https://github.com/HuangZhenQiu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth adding Cassandra naming convention to:

Sorry for having same definition in both places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


private static Optional<String> convertToNamespace(Optional<List<Object>> endpointsOpt) {
if (endpointsOpt.isPresent() && !endpointsOpt.isEmpty()) {
return Optional.of("cassandra://" + endpointsOpt.get().get(0).toString().split("/")[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can still fail on endpointsOpt.get().get(0) being null or endpointsOpt.get().get(0).toString().split("/") having single element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added more strict pattern verification.

@boring-cyborg boring-cyborg bot added the area:spec Specifications and standards for the project label Feb 29, 2024
@HuangZhenQiu HuangZhenQiu force-pushed the improve-cassandra-lineage branch 2 times, most recently from 858deb0 to 34d01c6 Compare February 29, 2024 20:53
Signed-off-by: Zhenqiu Huang <huangzhenqiu0825@gmail.com>
@pawel-big-lebowski pawel-big-lebowski merged commit 4008755 into OpenLineage:main Mar 1, 2024
18 checks passed
Ruihua98 pushed a commit to Ruihua98/OpenLineage that referenced this pull request Mar 15, 2024
Signed-off-by: Zhenqiu Huang <huangzhenqiu0825@gmail.com>
Signed-off-by: Ruihua Wang <ruihuawang@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:documentation Improvements or additions to documentation area:integration/flink area:spec Specifications and standards for the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants