Skip to content

Conversation

@HuangXingBo
Copy link
Contributor

@HuangXingBo HuangXingBo commented Nov 17, 2021

What is the purpose of the change

This PR adds the ML python API to implement FLIP-173 and FLIP-174.

Brief change log

This PR adds classes in the package of api/core and param.

Verifying this change

test_pipeline.py and test_stage.py.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes)

Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks pretty good! Left some minor comments below.

Could you update the PR description to explain the scope of this PR? It may be useful to explain that the goal is to implement FLIP-173 and FLIP-174 in python. And we don't support interaction between Java and Python yet.

@@ -0,0 +1,17 @@
Flink ML is a library which provides machine learning (ML) APIs and libraries that simplify the building of machine learning pipelines. It provides a set of standard ML APIs for MLlib developers to implement ML algorithms, as well as libraries of ML algorithms that can be used to build ML pipelines for both training and inference jobs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to explain flink-ml-python specific instructions in this READMe, similar to flink-python/README.md?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I will adjust these descriptions.

@@ -0,0 +1,17 @@
################################################################################
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file be under path flink-ml-python/apache_flink_ml/ml/lib to be consistent with e.g. flink-ml-python/apache_flink_ml/ml/api?

'apache_flink_ml.ml.param',
'apache_flink_ml.ml.util',
'apache_flink_ml.examples',
'apache_flink_ml.mllib']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that apache_flink_ml.ml.lib would be more consistent with other package names?

model_stages = []
last_inputs = inputs
for i, stage in enumerate(self._stages):
if not isinstance(stage, AlgoOperator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no other non-AlgoOperator inherited from Stage class, we can remove this check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think we allow Estimator as a member stage of pipeline. And Estimator does not inherit AlgoOperator. Thus this check needs to be removed?

Interface for classes that take parameters. It provides APIs to set and get parameters.
"""

def set(self, param: 'Param[V]', value: V) -> 'T':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of set(...) in Java was recently updated such that it throws exception if the param is not already defined on this class. Could we do the same here?

return self._param_map

def _init_param(self):
self._param_map[BOOLEAN_PARAM] = BOOLEAN_PARAM.default_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to initialize the param_map automatically without requiring users to manually setup this map? This can be done in Java using reflection (see ParamUtils.initializeMapWithDefaultValues(...)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this way can be achieved. But the reason I did not achieve this is that this usage does not seem to be consistent with the idiom of Python users. From my perspective, Python users generally do not use static members(class fields in Python) like this type to define member variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I don't know as much about Python idiom. Please feel free to keep this pattern if it fits Python idiom better.

@HuangXingBo HuangXingBo changed the title [FLINK-24933][ML] Support ML Python API [FLINK-24933][ML] Support ML Python API to implement FLIP-173 and FLP-174 Nov 19, 2021
@HuangXingBo
Copy link
Contributor Author

@lindong28 Thanks a lot for the review. I have addressed the comments at the latest commit.

@HuangXingBo HuangXingBo changed the title [FLINK-24933][ML] Support ML Python API to implement FLIP-173 and FLP-174 [FLINK-24933][python] Support ML Python API to implement FLIP-173 and FLP-174 Nov 24, 2021
Copy link
Member

@lindong28 lindong28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HuangXingBo Thanks for the PR ! LGTM overall. Can you update the commit message as appropriate?

The version will be consistent with the flink ml version and follow the PEP440.
.. seealso:: https://www.python.org/dev/peps/pep-0440
"""
__version__ = "0.1.dev0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename the directory name from apache_flink_ml to pyflink? This could make the flink-ml directory structure more consistent with the core Flink repo. And we can let the python packages of flink-ml and the core flink could be installed in the same directory under site-packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.


The sdist package of apache-flink-ml will be found under ./flink-ml-python/dist/. It could be used for installation, such as:
```bash
python -m pip install dist/*.tar.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update README to specify how to run test cases (similar to the existing README in flink/flink-python/README.md)?

And we probably also need to compile the python package and run all python unit tests in the Github CI. This could be specified in .github/workflows/java8-build.yml. Could you help do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

README.md is for users, not for developers, so I don't think it is appropriate to add the content of how to test into README.md. Regarding the part of adding python CI, I intend to do it in a separate PR, thanks for the reminder.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

@HuangXingBo HuangXingBo force-pushed the FLINK-24933 branch 2 times, most recently from 06c83fc to 15cf2c9 Compare November 24, 2021 09:55
weibozhao pushed a commit to weibozhao/flink-ml that referenced this pull request Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants