-
Notifications
You must be signed in to change notification settings - Fork 90
[FLINK-24933][python] Support ML Python API to implement FLIP-173 and FLP-174 #36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lindong28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Looks pretty good! Left some minor comments below.
Could you update the PR description to explain the scope of this PR? It may be useful to explain that the goal is to implement FLIP-173 and FLIP-174 in python. And we don't support interaction between Java and Python yet.
| @@ -0,0 +1,17 @@ | |||
| Flink ML is a library which provides machine learning (ML) APIs and libraries that simplify the building of machine learning pipelines. It provides a set of standard ML APIs for MLlib developers to implement ML algorithms, as well as libraries of ML algorithms that can be used to build ML pipelines for both training and inference jobs. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to explain flink-ml-python specific instructions in this READMe, similar to flink-python/README.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will adjust these descriptions.
| @@ -0,0 +1,17 @@ | |||
| ################################################################################ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this file be under path flink-ml-python/apache_flink_ml/ml/lib to be consistent with e.g. flink-ml-python/apache_flink_ml/ml/api?
flink-ml-python/setup.py
Outdated
| 'apache_flink_ml.ml.param', | ||
| 'apache_flink_ml.ml.util', | ||
| 'apache_flink_ml.examples', | ||
| 'apache_flink_ml.mllib'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that apache_flink_ml.ml.lib would be more consistent with other package names?
| model_stages = [] | ||
| last_inputs = inputs | ||
| for i, stage in enumerate(self._stages): | ||
| if not isinstance(stage, AlgoOperator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no other non-AlgoOperator inherited from Stage class, we can remove this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I think we allow Estimator as a member stage of pipeline. And Estimator does not inherit AlgoOperator. Thus this check needs to be removed?
| Interface for classes that take parameters. It provides APIs to set and get parameters. | ||
| """ | ||
|
|
||
| def set(self, param: 'Param[V]', value: V) -> 'T': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic of set(...) in Java was recently updated such that it throws exception if the param is not already defined on this class. Could we do the same here?
| return self._param_map | ||
|
|
||
| def _init_param(self): | ||
| self._param_map[BOOLEAN_PARAM] = BOOLEAN_PARAM.default_value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to initialize the param_map automatically without requiring users to manually setup this map? This can be done in Java using reflection (see ParamUtils.initializeMapWithDefaultValues(...)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this way can be achieved. But the reason I did not achieve this is that this usage does not seem to be consistent with the idiom of Python users. From my perspective, Python users generally do not use static members(class fields in Python) like this type to define member variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I don't know as much about Python idiom. Please feel free to keep this pattern if it fits Python idiom better.
|
@lindong28 Thanks a lot for the review. I have addressed the comments at the latest commit. |
lindong28
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HuangXingBo Thanks for the PR ! LGTM overall. Can you update the commit message as appropriate?
| The version will be consistent with the flink ml version and follow the PEP440. | ||
| .. seealso:: https://www.python.org/dev/peps/pep-0440 | ||
| """ | ||
| __version__ = "0.1.dev0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename the directory name from apache_flink_ml to pyflink? This could make the flink-ml directory structure more consistent with the core Flink repo. And we can let the python packages of flink-ml and the core flink could be installed in the same directory under site-packages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense.
b0cd9c6 to
85f3ef2
Compare
|
|
||
| The sdist package of apache-flink-ml will be found under ./flink-ml-python/dist/. It could be used for installation, such as: | ||
| ```bash | ||
| python -m pip install dist/*.tar.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update README to specify how to run test cases (similar to the existing README in flink/flink-python/README.md)?
And we probably also need to compile the python package and run all python unit tests in the Github CI. This could be specified in .github/workflows/java8-build.yml. Could you help do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README.md is for users, not for developers, so I don't think it is appropriate to add the content of how to test into README.md. Regarding the part of adding python CI, I intend to do it in a separate PR, thanks for the reminder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
06c83fc to
15cf2c9
Compare
15cf2c9 to
238a667
Compare
… FLP-174 This closes apache#36.
What is the purpose of the change
This PR adds the ML python API to implement FLIP-173 and FLIP-174.
Brief change log
This PR adds classes in the package of api/core and param.
Verifying this change
test_pipeline.pyandtest_stage.py.Does this pull request potentially affect one of the following parts:
Dependencies (does it add or upgrade a dependency): (yes)