Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-2311] Fixes 'flink-*' dependency scope in flink-contrib. #880

Closed
wants to merge 1 commit into from

Conversation

aalexandrov
Copy link
Contributor

No description provided.

@StephanEwen
Copy link
Contributor

This is an interesting case.

In theory, dependencies should be provided, such that building a fat-jar does not take them into account. However, when running examples out of IntelliJ, it does not work if the dependency is in "provided", as the dependency is not added to the classpath in that case.

In the examples, we hence add them as "compile" scope dependencies. The storm compatibility layer could follow a rule, though.

@mjsax Can you comment?

@rmetzger
Copy link
Contributor

rmetzger commented Jul 2, 2015

As I've already written in the JIRA, I don't understand the problem you are trying to solve.

@mjsax
Copy link
Member

mjsax commented Jul 2, 2015

From my understanding, "provided" would mean that a project that uses stuff from flink-contrib would not need to include the code from flink-contrib into the own program jar because flink-contrib code is provided automatically at runtime (ie, is included in flink-dist).

However, Flink follows the opposite approach. "The flink-contrib folder is assumed to be provided by the user" means, that flink-contrib stuff is not included in flink-dist, thus, if flink-contrib stuff is use, the program must be contained in a fat-jar that includes flink-contrib classes.

@aalexandrov
Copy link
Contributor Author

@mjsax exactly. Let's at the moment I land in the following situtaion.

  1. I want to create a job which includes the statistics collection facility proposed by @tammymendt.
  2. In order to do that, my project needs to set flink-contrib with scope compile and use it in the final artifact (even if I am building a thin jar which is going to be submitted to a standalone flink cluster).
  3. The jar that turns out to be ~ 64 MB. Out of that, half of the bits are taken by code is guaranteed to be in the classpath at rintime by virtue of being included in the $FLINK_DIR/lib/flink-dist-$VERSION.jar.
  4. Setting all transitive flink-* dependencies via flink-contrib as provided reduces the size to ~ 32 MB by default.

@aalexandrov
Copy link
Contributor Author

@StephanEwen can you tell me what package doesn't work so I can test it out?

@mjsax
Copy link
Member

mjsax commented Jul 2, 2015

I see. But changing the scope to "provided" is the wrong way to go. You need to exclude unnecessary sub-module dependencies in your own project:

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-contrib</artifactId>
  <version>${project.version}</version>
  <exclusions>
    <exclusion>
      <groupId>flink-storm-compatiblitly-parent</groupId>
      <version>${project.version}</version>
    </exclusion>
    <exclusion>
      <groupId>flink-streaming-contrib</groupId>
      <version>${project.version}</version>
    </exclusion>
    <exclusion>
      <groupId>flink-tweet-inputformat</groupId>
      <version>${project.version}</version>
    </exclusion>
  </exclusions>
</dependency>

However, I am wondering if the statistics collection code is placed correctly into flink-contrib. I thought flink-contrib is a "parent" project only containing sub-modules. For this case, statistic collection would be an own sub-module and you could just include this sub-module into your dependencies (this resolves the problem naturally)

@aalexandrov
Copy link
Contributor Author

The PR is still open and can be adapted.

@aalexandrov
Copy link
Contributor Author

OK thanks for the remark.

Although somewhat verbose, this solves my concrete issue. I wonder if the list can be exclusive, e.g.

<!-- use Tweet input format -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-tweet-inputformat</artifactId>
  <version>${project.version}</version>
</dependency>

However, it still leaves the question what policy to use for flink-* based dependencies.

As an example, if I want to include flink-streaming-contrib in a module that is build as a "thin" jar, transitive flink-* dependencies that will be packaged in the flink-dist fat jar do not need to be included.

I therefore suggest to at least make this consistent with the quickstart poms and optionally set the scope of already included dependencies to provided if the build-jar profile is activated.

@rmetzger
Copy link
Contributor

rmetzger commented Jul 2, 2015

I think the statistics collection pull request has been created before we splitted the flink-contrib project into individual submodules.
Now contrib is much more fine grained, with the dependencies.

Afaik it should be impossible to have a "pom"-type maven module which also contains code?!

The quickstart's pom is setting the dependencies for the user correctly. Are the dependencies of "flink-contrib" overwriting the "provided" scope from the quickstart-generated poms?
In my understanding, building a quickstart generated project with -P build-jar profile should create the optimal fat jar, containing exactly the classes needed.

@mjsax
Copy link
Member

mjsax commented Jul 2, 2015

The PR from @tammymendt is outdated. The flink-contrib is packaging to jar (not pom) in this old version.

@rmetzger
Copy link
Contributor

rmetzger commented Jul 2, 2015

I think that setting the dependency to "provided" is something the user should do, because it actually depends on the user environment.

@aalexandrov
Copy link
Contributor Author

OK then we can close the issue as wontfix.

@rmetzger
Copy link
Contributor

rmetzger commented Jul 6, 2015

Okay, I've closed the JIRA. can you close the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants