Skip to content

Conversation

@HuangZhenQiu
Copy link
Contributor

What is the purpose of the change

Add helper classes in stream java api for lineage integration in connectors.

Brief change log

  • Add TypeDatasetFacet class and Provide interface for easier extract type info from connectors.
  • Add LineageUtil to easily contract Lineage related POJOs

Verifying this change

This change added tests and can be verified as follows:

  • Added LineageUtilsTest to cover public functions

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@HuangZhenQiu HuangZhenQiu force-pushed the FLINK-36625-lineage-helper branch from 8171f17 to 87538a2 Compare November 30, 2024 06:30
@flinkbot
Copy link
Collaborator

flinkbot commented Nov 30, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@PublicEvolving
public interface TypeDatasetFacet extends LineageDatasetFacet {

TypeInformation getTypeInformation();
Copy link
Contributor

@davidradl davidradl Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: can we add @nonnull and @PublicEvolving to this method. I see examples in the Flink codebase where @PublicEvolving is on the methods as well as the class.

If it can be null it should be annotated with @nullable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note, IMO @Nonnull usage adds a lot of bloat to the code (and Java is bloated enough on its own already) compared to its benefit. Using @Nullable is a lot more practical and reasonable when something can be null in my experience, but in this repo, none of the are (or should be) enforced.

* Returns a type dataset facet or `Optional.empty` in case an implementing class is not able to
* resolve type.
*/
Optional<TypeDatasetFacet> getTypeDatasetFacet();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this is a new interface but it is not referenced in the fix, what is the thinking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface is mainly for some connector in which return type is provided by internal class rather than the source/sink directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. It would be good to understand when a connector author should use this interface. Could we add documentation around when this and the utility classes could/ should be used ?

return datasetOf(name, namespace, Collections.singletonList(typeDatasetFacet));
}

public static LineageDataset datasetOf(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if these datasetOf constructor methods can be put in DefaultLineageDataset or a class that creates DefaultLineageDataset , maybe a DefaultLineageDatasetProvider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is mainly to make the LineageDataset creation easier and simplify the code in each of connectors.

@HuangZhenQiu HuangZhenQiu force-pushed the FLINK-36625-lineage-helper branch from edbdb1d to 7771506 Compare December 4, 2024 06:03
@davidradl
Copy link
Contributor

Reviewed by Chi on 05/12/24. Asked submitter questions. Requests documentation be updated on how connector authors should / could use this helper class.

@HuangZhenQiu
Copy link
Contributor Author

@davidradl
Given we have existing interfaces that haven't been documented, I plan to add en end to end native lineage page for this jira https://issues.apache.org/jira/browse/FLINK-35745. May I do it in a follow up PR?

@HuangZhenQiu
Copy link
Contributor Author

After this PR is merged, I will update this PR #25762.

@github-actions
Copy link

github-actions bot commented Apr 9, 2025

This PR is being marked as stale since it has not had any activity in the last 90 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the
community, contact details can be found here: https://flink.apache.org/what-is-flink/community/

If this PR is no longer valid or desired, please feel free to close it.
If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions bot added the stale label Apr 9, 2025
@github-actions
Copy link

github-actions bot commented May 9, 2025

This PR has been closed since it has not had any activity in 120 days.
If you feel like this was a mistake, or you would like to continue working on it,
please feel free to re-open the PR and ask for a review.

@ferenc-csaky
Copy link
Contributor

@flinkbot run azure

Copy link
Contributor

@ferenc-csaky ferenc-csaky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll merge this be tomorrow EOD if CI is green.

@github-actions github-actions bot added the community-reviewed PR has been reviewed by the community. label Nov 18, 2025
@HuangZhenQiu
Copy link
Contributor Author

@ferenc-csaky
Do you want me to rebase main? BTW, we do need some help to review the lineage related PRs. It will be great, if you may help to review.

@ferenc-csaky
Copy link
Contributor

@ferenc-csaky Do you want me to rebase main? BTW, we do need some help to review the lineage related PRs. It will be great, if you may help to review.

It's fine, these classes were not touched since then at all, no reason to rebase.

@ferenc-csaky ferenc-csaky merged commit 1c2d953 into apache:master Nov 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community. component=API/DataStream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants