Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-31127] Add public API classes for FLIP-289 #214

Merged
merged 1 commit into from Feb 28, 2023

Conversation

lindong28
Copy link
Member

@lindong28 lindong28 commented Feb 19, 2023

What is the purpose of the change

Add public API classes for FLIP-289 with the exception specified below.

Brief change log

Added most public API classes specified in FLIP-289.

The following class and methods are not added:

  • GraphModel#supportServable
  • PipelineModel#supportServable
  • PipelineModelServable

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? JavaDocs

@lindong28 lindong28 force-pushed the FLINK-31127 branch 5 times, most recently from 0e031e6 to 013b406 Compare February 21, 2023 09:20
@lindong28
Copy link
Member Author

@jiangxin369 Can you help review this PR?

Copy link
Contributor

@jiangxin369 jiangxin369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 Thanks for the PR, left some comments.

public interface ModelServable<T extends ModelServable<T>> extends TransformerServable<T> {

/** Sets model data using the serialized model data from the given input stream. */
default T setModelData(InputStream modelData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since We need to support setModelData on GraphModel, shall we modify the parameter to InputStream... modelDatas to support reading multiple model data from different Tables?
  2. Since we usually need to read and deserialize the model data, we'd better add throws IOException in the method signature.

Copy link
Member Author

@lindong28 lindong28 Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I agree we need to make setModelData() take multiple InputSreams. And we need to let these methods throw IOException.

I have updated the method to the this:
default T setModelData(InputStream... modelDataInputs) throws IOException

I will send an email update to the voting thread when this PR is mostly ready for merge.

@lindong28
Copy link
Member Author

@jiangxin369 Thanks for the review. Can you take another look?

Copy link
Contributor

@jiangxin369 jiangxin369 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lindong28 Thanks for the updates. LGTM overall only left 2 minor comments.

@lindong28 lindong28 force-pushed the FLINK-31127 branch 2 times, most recently from 7026130 to 657ed66 Compare February 22, 2023 07:04
@lindong28
Copy link
Member Author

lindong28 commented Feb 22, 2023

@zhipeng93 Can you help review this PR? If this PR looks good to you, I will need to send an email to the FLIP-289 voting thread to discuss the following API change before this PR is merged.

Update ModelServable#setModelData to use the signature default T setModelData(InputStream... modelDataInputs) throws IOException.

Copy link
Contributor

@zhipeng93 zhipeng93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments below.

@lindong28
Copy link
Member Author

@zhipeng93 Thanks for the review. Can you help take another look?

@zhipeng93
Copy link
Contributor

Thanks for the update. LGTM.

@Fanoid
Copy link
Contributor

Fanoid commented Feb 27, 2023

Update ModelServable#setModelData to use the signature default T setModelData(InputStream... modelDataInputs) throws IOException.

Hi, I just found we have made this change. I have a question: how do we obtain the corresponding input stream for each GraphNode in the GraphModel? modelDataInputs are linearly organized, but GraphNode are in a form of DAG.

It seems additional parameters are required in the signature of setModelData to help map InputStreams to GraphNodes.

@jiangxin369
Copy link
Contributor

jiangxin369 commented Feb 28, 2023

It seems additional parameters are required in the signature of setModelData to help map InputStreams to GraphNodes.

IMO, indeed we need to map InputStreams to GraphNodes, but it is implementation, not required in API. The existing GraphModel class has shown a way to implement it.

@Fanoid
Copy link
Contributor

Fanoid commented Feb 28, 2023

It seems additional parameters are required in the signature of setModelData to help map InputStreams to GraphNodes.

IMO, indeed we need to map InputStreams to GraphNodes, but it is implementation, not required in API. The existing GraphModel class has shown a way to implement it.

If not specifying the mapping in this method, we may need another public API to obtain the orders of GraphNodes. Otherwise, end-users can't figure out how to put the model data of all GraphNodes into an array/list.

@lindong28
Copy link
Member Author

lindong28 commented Feb 28, 2023

Hi, I just found we have made this change. I have a question: how do we obtain the corresponding input stream for each GraphNode in the GraphModel? modelDataInputs are linearly organized, but GraphNode are in a form of DAG.

It seems additional parameters are required in the signature of setModelData to help map InputStreams to GraphNodes.

@Fanoid we will need to add extra APIs in GraphBuilder and extra class GraphModelServable in order to compose multiple Estimator/Transformer into one Servable. We can obtain the input stream for ModelServable#setModelData similar to how the existing GraphBuilder obtain input streams for Model#setModelData. The high level idea is to use placeholder to provide this information to GraphBuilder. Maybe checkout testGraphWithSetGetModelData() [1] for example.

[1] https://github.com/apache/flink-ml/blob/master/flink-ml-core/src/test/java/org/apache/flink/ml/api/GraphTest.java#L213

@Fanoid
Copy link
Contributor

Fanoid commented Feb 28, 2023

@lindong28 Thanks for your explanation. I missed the point that we already have the orders in the GraphBuilder#buildEstimator. Then, I'm OK with this change.

@lindong28
Copy link
Member Author

Thanks @zhipeng93 @Fanoid @jiangxin369 for the review.

@lindong28 lindong28 merged commit f0c1ce7 into apache:master Feb 28, 2023
Fanoid pushed a commit to Fanoid/flink-ml that referenced this pull request Apr 6, 2023
@lindong28 lindong28 deleted the FLINK-31127 branch April 12, 2023 07:01
zhipeng93 pushed a commit to zhipeng93/flink-ml that referenced this pull request Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants