Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-26205] Support Online Model Save in FlinkML #60

Closed
wants to merge 1 commit into from

Conversation

weibozhao
Copy link
Contributor

What is the purpose of the change

Support Online Model Save in FlinkML.

Brief change log

Add code of online model save.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @public(Evolving): (no)
Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (Java doc)

@weibozhao weibozhao force-pushed the FLINK-26205 branch 6 times, most recently from a9d1831 to fe7d882 Compare February 18, 2022 11:33
@yunfengzhou-hub
Copy link
Contributor

Thanks for the PR! I think it's a crucial feature for Flink ML.

I suppose this PR also wants to solve the problem raised in this email. In this email an exception was thrown when an unbounded stream was fed to an Estimator, but the test cases introduced by this PR have only covered the bounded situations. It might be better if we could add test cases that corresponds to the conditions described in the email.

Copy link
Contributor

@zhipeng93 zhipeng93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Left some comments below.

public static class ModelVersionAssigner<T> extends BasePathBucketAssigner<T> {
@Override
public String getBucketId(T element, Context context) {
return String.valueOf(System.nanoTime());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make modelVersionAssiginer independent of execution time?
If model data contains multiple streams and we use the current version assigner(with timestamp as the version), we may not be able to associate the model data from different streams.

modelPath = new org.apache.flink.core.fs.Path(path + "/" + fileName);
}
}
Source<T, ?, ?> source = FileSource.forRecordStreamFormat(modelDecoder, modelPath).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still use ".../data/" as the default model data path?

If there is a directory that is not ".../data/", can the test case still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will refine it.

}

/**
* Loads the model data from the given path which has more than one model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the java doc and explain why do we need this function here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will update the doc. This function can get a model data with special model version from the path which has more than one model version.


@Test
public void saveAndLoadOnlineModel() throws Exception {
Configuration config = new Configuration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to extract the common logic here in Before function, same as we did in other test cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK


/* Loads every LogisticRegression model in model path and validates it. */
String modelVersion;
while ((modelVersion = bufferedReader.readLine()) != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also hava a test case that loads all of the model data in a single data stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

tmpPath,
new LogisticRegressionModelData.ModelDataDecoder(),
modelVersion))
.as("label, vec");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you convert model data as label, vec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validated data has feature(vec) and label. I will check the prediction result agree with given label.

String modelVersion;
while ((modelVersion = bufferedReader.readLine()) != null) {
if (!"metadata".equals(modelVersion)) {
LogisticRegressionModel lrModel =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: LogisticRegressionModel lrModel =.. could be outside of the loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@zhipeng93 zhipeng93 self-requested a review February 21, 2022 05:53
@weibozhao weibozhao force-pushed the FLINK-26205 branch 2 times, most recently from 4bfa4d3 to ca209d3 Compare February 23, 2022 00:48
@weibozhao weibozhao force-pushed the FLINK-26205 branch 5 times, most recently from 293dce8 to 1b8a183 Compare March 9, 2022 03:11
@zhipeng93
Copy link
Contributor

Closing for now. We can open it later if needed.

@zhipeng93 zhipeng93 closed this Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants