New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt #10995
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 478491b (Fri Feb 28 21:49:20 UTC 2020) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
flink-ml-parent/flink-ml-lib/pom.xml
Outdated
<configuration> | ||
<artifactSet> | ||
<includes combine.children="append"> | ||
<include>com.github.fommil.netlib:core</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
licensing needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm not sure if I get your point. The license of com.github.fommil.netlib:core
has already been included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zentol cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the license already included if the dependency wasn't bundled so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zentol The license has already been added by this commit https://github.com/apache/flink/pull/8631/files
Or do you mean we should remove the licensing for release-1.10? I think we can create a fix in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the dependency is not bundled in 1.10 then yes, the licensing part of the commit should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. in fact the flink-ml-* packages are never release/bundled anyway. we should revert the LICENSE notice (in fact it shouldn't contain any LICENSE since only source codes were released: https://repo1.maven.org/maven2/org/apache/flink/flink-ml-lib_2.12/1.10.0/.
How should we proceed? should we have a revert in release-1.10 branch only and keep this PR intact?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@walterddr Hi, flink-ml-* are released by jar artifacts and the flink-ml-lib jar should not contain the license and notice file as it actually does not bundle the related dependencies. I have fired another issue(FLINK-16241) and will open a corresponding PR to address the problem on release-1.10. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hequn8128 Just left one comment. Besides, should we set the scope of all the dependencies of Flink packages to "provided"? It was not introduced in this PR. However, I think that's related to packaging and so commented here.
@@ -168,6 +168,14 @@ | |||
<destName>flink-python_${scala.binary.version}-${project.version}.jar</destName> | |||
<fileMode>0644</fileMode> | |||
</file> | |||
|
|||
<!-- ML (API & LIB) --> | |||
<file> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a module "flink-ml-uber" just like "flink-table-uber" to create a fat job for flink ml module? Currently it assumes that the flink-ml-api is packaged in the fat jar of flink-ml-lib. This seems a little wired for me. It would be great if more people could share the thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that there should be an uber module. The current packaging also appears to be incorrect since flink-ml-api is not bundled in flink-ml-lib anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dianfu @zentol Thanks for your advice.
I would fine with adding an uber module. But I think there are some differences between the table module and the ml module. For table module, it needs a uber module to packages all table sub-modules plus other modules, e.g., cep. While for ml, there are no such requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where exactly do they differ? They both have multiple sub-modules that you want to ship as a single jar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the table module, it makes no sense to let any sub-modules to bundle cep, so a uber module is used to create the bundled jar. but for ml module, it makes sense to let flink-ml-lib to bundle the flink-ml-api module, so that we can simply put the flink-ml-lib jar into the opt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the table-uber module existed before it bundled cep though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's add the uber module.
@hequn8128 Thanks for the update. LGTM. @zentol Do you have any other comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the contribution @hequn8128 . I was not familiar with the LICENSE notice process so I didn't raise the right question in #8631. I do have a question regarding how to progress forward to fix this. could you please kindly take a look?
flink-ml-parent/flink-ml-lib/pom.xml
Outdated
<configuration> | ||
<artifactSet> | ||
<includes combine.children="append"> | ||
<include>com.github.fommil.netlib:core</include> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm. in fact the flink-ml-* packages are never release/bundled anyway. we should revert the LICENSE notice (in fact it shouldn't contain any LICENSE since only source codes were released: https://repo1.maven.org/maven2/org/apache/flink/flink-ml-lib_2.12/1.10.0/.
How should we proceed? should we have a revert in release-1.10 branch only and keep this PR intact?
@walterddr Hi, flink-ml-* are released by jar artifacts and the flink-ml-lib jar should not contain the license and notice file as it actually does not bundle the related dependencies. I have fired another issue(FLINK-16241) and will open a corresponding PR to address the problem on release-1.10. :-) |
@hequn8128 yeah. I thought so too. the revert should only be on 1.10 branch IMO. thanks for the explanation and follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hequn8128 Thanks for creating the JIRA to fix the license issue in 1.10. This PR LGTM and will wait one more day to merge and leave some time for @zentol to see if there are any comments.
@becketqin Would you like to take a final look at this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, otherwise this looks good.
flink-ml-parent/flink-ml-lib/pom.xml
Outdated
<!-- Build flink-ml-lib jar --> | ||
<plugin> | ||
<groupId>org.apache.maven.plugins</groupId> | ||
<artifactId>maven-shade-plugin</artifactId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this (and the corresponding NOTICE) into the uber
module.
The basic idea is that the uber
module is responsible for all things related to the distribution. The individual modules usually shouldn't have to shade dependencies; users should set flink-ml dependencies to provided anyways so we don't have to concerned about them accidently bundling this dependency.
@zentol Thanks for the review. The PR has been updated accordingly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, +1
Thanks @zentol @walterddr for the review and thanks @hequn8128 for the PR. Merging... |
What is the purpose of the change
This pull request adds the jar of flink-ml into the opt folder so that users can directly use the jars with the binary release. For example, users can move the jars into the
lib
folder or use -j to upload the jar.Brief change log
maven-shade-plugin
in the pom of flink-ml-lib moduleVerifying this change
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (no)Documentation