Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-15847][ml] Include flink-ml-api and flink-ml-lib in opt #10995

Merged
merged 3 commits into from Feb 27, 2020

Conversation

hequn8128
Copy link
Contributor

What is the purpose of the change

This pull request adds the jar of flink-ml into the opt folder so that users can directly use the jars with the binary release. For example, users can move the jars into the lib folder or use -j to upload the jar.

Brief change log

  • Adds maven-shade-plugin in the pom of flink-ml-lib module
  • Adds a file config similar to sql-client in flink-dist/src/main/assemblies/opt.xml

Verifying this change

  • This change is a trivial rework / code cleanup without any test coverage.
  • Build Flink locally and check jars in opt.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 3, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 478491b (Fri Feb 28 21:49:20 UTC 2020)

Warnings:

  • 5 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 3, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

<configuration>
<artifactSet>
<includes combine.children="append">
<include>com.github.fommil.netlib:core</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

licensing needs to be updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm not sure if I get your point. The license of com.github.fommil.netlib:core has already been included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zentol cc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was the license already included if the dependency wasn't bundled so far?

Copy link
Contributor Author

@hequn8128 hequn8128 Feb 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zentol The license has already been added by this commit https://github.com/apache/flink/pull/8631/files

Or do you mean we should remove the licensing for release-1.10? I think we can create a fix in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the dependency is not bundled in 1.10 then yes, the licensing part of the commit should be reverted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. in fact the flink-ml-* packages are never release/bundled anyway. we should revert the LICENSE notice (in fact it shouldn't contain any LICENSE since only source codes were released: https://repo1.maven.org/maven2/org/apache/flink/flink-ml-lib_2.12/1.10.0/.

How should we proceed? should we have a revert in release-1.10 branch only and keep this PR intact?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@walterddr Hi, flink-ml-* are released by jar artifacts and the flink-ml-lib jar should not contain the license and notice file as it actually does not bundle the related dependencies. I have fired another issue(FLINK-16241) and will open a corresponding PR to address the problem on release-1.10. :-)

Copy link
Contributor

@dianfu dianfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hequn8128 Just left one comment. Besides, should we set the scope of all the dependencies of Flink packages to "provided"? It was not introduced in this PR. However, I think that's related to packaging and so commented here.

@@ -168,6 +168,14 @@
<destName>flink-python_${scala.binary.version}-${project.version}.jar</destName>
<fileMode>0644</fileMode>
</file>

<!-- ML (API & LIB) -->
<file>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a module "flink-ml-uber" just like "flink-table-uber" to create a fat job for flink ml module? Currently it assumes that the flink-ml-api is packaged in the fat jar of flink-ml-lib. This seems a little wired for me. It would be great if more people could share the thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that there should be an uber module. The current packaging also appears to be incorrect since flink-ml-api is not bundled in flink-ml-lib anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dianfu @zentol Thanks for your advice.
I would fine with adding an uber module. But I think there are some differences between the table module and the ml module. For table module, it needs a uber module to packages all table sub-modules plus other modules, e.g., cep. While for ml, there are no such requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly do they differ? They both have multiple sub-modules that you want to ship as a single jar.

Copy link
Contributor Author

@hequn8128 hequn8128 Feb 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the table module, it makes no sense to let any sub-modules to bundle cep, so a uber module is used to create the bundled jar. but for ml module, it makes sense to let flink-ml-lib to bundle the flink-ml-api module, so that we can simply put the flink-ml-lib jar into the opt.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the table-uber module existed before it bundled cep though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's add the uber module.

@hequn8128
Copy link
Contributor Author

@dianfu @zentol Thanks a lot for your review. The PR has been updated.

@dianfu
Copy link
Contributor

dianfu commented Feb 19, 2020

@hequn8128 Thanks for the update. LGTM.

@zentol Do you have any other comments?

Copy link
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution @hequn8128 . I was not familiar with the LICENSE notice process so I didn't raise the right question in #8631. I do have a question regarding how to progress forward to fix this. could you please kindly take a look?

<configuration>
<artifactSet>
<includes combine.children="append">
<include>com.github.fommil.netlib:core</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm. in fact the flink-ml-* packages are never release/bundled anyway. we should revert the LICENSE notice (in fact it shouldn't contain any LICENSE since only source codes were released: https://repo1.maven.org/maven2/org/apache/flink/flink-ml-lib_2.12/1.10.0/.

How should we proceed? should we have a revert in release-1.10 branch only and keep this PR intact?

@hequn8128
Copy link
Contributor Author

@walterddr Hi, flink-ml-* are released by jar artifacts and the flink-ml-lib jar should not contain the license and notice file as it actually does not bundle the related dependencies. I have fired another issue(FLINK-16241) and will open a corresponding PR to address the problem on release-1.10. :-)

@walterddr
Copy link
Contributor

@hequn8128 yeah. I thought so too. the revert should only be on 1.10 branch IMO. thanks for the explanation and follow up.

Copy link
Contributor

@dianfu dianfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hequn8128 Thanks for creating the JIRA to fix the license issue in 1.10. This PR LGTM and will wait one more day to merge and leave some time for @zentol to see if there are any comments.

@dianfu
Copy link
Contributor

dianfu commented Feb 26, 2020

@becketqin Would you like to take a final look at this PR?

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise this looks good.

<!-- Build flink-ml-lib jar -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this (and the corresponding NOTICE) into the uber module.

The basic idea is that the uber module is responsible for all things related to the distribution. The individual modules usually shouldn't have to shade dependencies; users should set flink-ml dependencies to provided anyways so we don't have to concerned about them accidently bundling this dependency.

@zentol zentol self-assigned this Feb 26, 2020
@hequn8128
Copy link
Contributor Author

@zentol Thanks for the review. The PR has been updated accordingly.

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, +1

@dianfu
Copy link
Contributor

dianfu commented Feb 27, 2020

Thanks @zentol @walterddr for the review and thanks @hequn8128 for the PR. Merging...

@dianfu dianfu merged commit 6deca76 into apache:master Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants