Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java_library() needs provided_deps attribute #1402

Closed
spearce opened this issue Jun 14, 2016 · 28 comments
Closed

java_library() needs provided_deps attribute #1402

spearce opened this issue Jun 14, 2016 · 28 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-Java Issues for Java rules type: feature request

Comments

@spearce
Copy link

spearce commented Jun 14, 2016

As discussed in https://gerrit-review.googlesource.com/77946 the java_library() rule needs to support a "provided_deps" or "compile_deps" attribute that does not include the provided libraries into the transitive dependency closure when the rule is used in another rule. E.g.:

java_binary(name = "app", deps=[":lib"])
java_library(name = "lib", deps="[//third_party:foo"], provided_deps=["//third_party:ssl"])

When building app, it includes //third_party:foo, but not //third_party:ssl. However lib is able to see //third_party:ssl symbols and its sources can compile against those.

@hermione521 hermione521 added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) category: rules > java labels Jun 15, 2016
@ulfjack
Copy link
Contributor

ulfjack commented Jun 16, 2016

One problem with that is that the entire transitive closure of your binary has to do that consistently. If there is any library that has a normal deps on that, it will be linked in after all.

With neverlink, it's easier to have consistency, because the annotation is on the rule, not the edge.

Both proposals (neverlink and compile_deps) break down if you sometimes want to link a library and sometimes you don't want to link it, for example, because you have multiple binaries that you want to deploy to different environments that provide different libraries out of the box. Both proposals encourage duplicating the dependency graph in that case, which is very costly.

Unfortunately, it's not clear from the feature request why you'd want to have such a feature and how you'd use it, making it harder for us to come up with a workable proposal.

Another possible solution would be to use neverlink in combination with select. That'd allow you to say on the build command-line whether you want a certain library linked into a binary or not. However, the downside is that you can't build two libraries differently in the same build (at least, not right now), though we're working on that.

Yet another solution is the deploy_env attribute on java_binary, that we have internally. It's specified at the java_binary rule, and it allows performing classpath subtraction at the binary level. I.e., it takes the usual runtime classpath, and also computes a runtime classpath for the things referenced through the deploy_env attribute, and then builds a binary that only contains those classpath entries on the first classpath that aren't on the second. That avoids most of the problems above - you can have multiple binaries with different environments, in the same build, and it's guaranteed to be consistent for each binary. The downsides are that you have to specify it on each binary, which could be expensive (or easy to forget) if you have lots of binaries that want to use the same deploy_env. It could potentially also slow down builds. Also, you have to avoid any manual munging of classpath jars - if you post-process parts of the classpath with a genrule or by piping it through a java_binary deploy_jar, then the classpath entries don't match and the subtraction doesn't do what you might have expected.

We could easily add the depoly_env attribute to java_binary in Bazel, since all the code already exists. I'm not sure why this specific attribute was left out when we open sourced the Java rules.

@ulfjack ulfjack added this to the 1.0 milestone Jun 16, 2016
@spearce
Copy link
Author

spearce commented Jun 16, 2016

For reference facebook/buck#63 is the Buck feature request.

This is actually a very important feature for some Java "applications". Stepping back from a lot of the discussion around Gerrit, look at a Java servlet "application". When writing a servlet to be deployed in a servlet container my sources must compile against the servlet spec JAR, but must not include the servlet spec jar in its WEB-INF/lib directory.

To compile and deploy a Java servlet, you must use a compile_deps/provided_deps type of strategy in the build system to ensure this particular dependency is visible at compile time for some sources, but is not caught into the transitive dependency graph when assembling all other necessary libraries for the application to run.

Cryptography libraries also can fall into this category. E.g. in Gerrit we "work with" BouncyCastle if the end-user downloads and installs it next to Gerrit. If it is not present, we gracefully degrade and support a reduced set of functionality that is still useful. For simplicity in development, some source classes are compiled against BouncyCastle APIs, but we break the dependency graph to prevent BC from going into the release binary.

I guess both of those cases might be workable by deploy_env telling the java_binary rule that it provides those things, and therefore should not be included in the transitive closure output. A downside of this approach is it forces us to construct a java_binary for something that maybe should be a java_library instead. E.g. when we package Gerrit, we don't shade everything into a single flat JAR; we collect the dependencies into unique JAR names under WEB-INF/lib and let the container deal with a pile of (mostly original) JARs in the classpath, rather than a single flattened JAR.

Its also awkward for Gerrit plugin developers to specify the Gerrit API they are building against twice; once in the java_library as a dep and again in their java_binary as a deploy_env to subtract out the very thing they had to add in as a dependency. Its somewhat cleaner to add the dep just once in the library target as a known part of the runtime environment and not have the build system force you to document it twice.

If you look at the rest of the Java build tools space, I think every other system either has this, or has had to add it after the fact, because there are just too many corner cases where you need to clip the transitive dependency edge.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 17, 2016

java_binary does not generate a single flattened JAR by default, only if you request the _deploy.jar implicit output. However, it doesn't generate a WEB-INF/lib layout, but I have seen Skylark rules that do.

For the gerrit plugin use case, (something like) neverlink seems like a better solution. You're suggesting to write something like this:

java_library(name = "my_plugin", srcs = ["Foo.java"], compile_deps = ["gerrit-api"])

With neverlink, you'd instead write something like this:

java_library(name = "my_plugin", srcs = ["Foo.java"], deps = ["gerrit-api"])

And the gerrit-api target would specify neverlink (though note that neverlink currently only applies to a specific library, not to all libraries reachable through that library).

This seems better because a) you can't accidentally get it wrong, and b) it's easier to automate with tools.

Don't get me wrong, I'm not against adding more mechanisms to the Java rules to support such use cases, I'm just pointing out the advantages and disadvantages of different approaches. I don't think copying what someone else did is necessarily the right approach, but it's also not necessarily wrong. I prefer to make decisions based on technical arguments, not on the say-so from someone else.

@spearce
Copy link
Author

spearce commented Jun 17, 2016

So if we write:

java_library(name="gerrit-api", neverlink=1, deps=["//third-party:servlet-api", ":other-lib"])

and servlet-api and other-lib have neverlink=0, a "my_plugin" does not include gerrit-api, but does include servlet-api and other-lib?

I do see the value in neverlink for this plugin-api case. Its certainly easier for the my_plugin author.

@damienmg damienmg added P2 We'll consider working on this in future. (Assignee optional) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Jun 22, 2016
@damienmg damienmg modified the milestones: 0.6, 1.0 Jun 22, 2016
@damienmg
Copy link
Contributor

Ok. We discussed offline and that sounds reasonable to cover that use case. Repriotized accordingly. Hopefully we will get manpower to do it earlier.

lucamilanesio pushed a commit to GerritCodeReview/gerrit that referenced this issue Nov 3, 2016
Currently too big files are published, because some unwanted transitive
dependencies are included in the final artifacts. That will be fixed in
follow-up change by using neverlink option in java_library rule or using
provided_deps attribute that will be addded in future releases of Bazel:
[1].

TEST PLAN:

  $ VERBOSE=1 tools/maven/api.sh install bazel
  $ VERBOSE=1 tools/maven/api.sh install buck

* [1] bazelbuild/bazel#1402

Change-Id: Ie73d4ae34d96be7f97f6329c4c30c814f54688d5
openstack-gerrit pushed a commit to openstack-infra/gerrit that referenced this issue Dec 13, 2016
Given that the feature request: [1], still wasn't implemented, expose
neverlink artifact for every artifact per default. This is needed to
support in Gerrit tree build for plugins, because every dependency
can be potentially used from a plugin, in which case it must be used
in neverlink form to avoid shipping it twice, in plugin artifact in
addition to gerrit.war itself.

With this change in place, we can write this build rule:

  gerrit_plugin(
      name = "verify-status",
      [...]
      provided_deps = [
          "@commons_dbcp//jar:neverlink",
      ],
      [...]

that would now work in both build mode: standalone (using bazlets) and
in Gerrit tree mode.

[1] bazelbuild/bazel#1402

Change-Id: I1240d25c576b13bd4d7450a0e5ba143df27a3d3a
@arjantop
Copy link

Is this still going to be implemented?

@hmemcpy
Copy link

hmemcpy commented Jul 19, 2018

I'd like to add another few use-cases where we need to be able to exclude specific targets from the final deployable jar. For example, we have a situation where we need to use our library with different runtimes, e.g. Hadoop. Some of those environments need to have a specific Jackson binary, others don't. This is why the neverlink solution doesn't work for us, as we need to have it excluded sometimes.

Reading this thread, the deploy_env may be a good solution, unfortunately, as @ulfjack mentioned, it's an internal feature with no documentation to speak of.

Our options right now are to either create a genrule that would crack open the produced deployable, and manually remove the packages we'd like to remove (or, use @johnynek's jarjar wrapper and use the zap rule for this purpose). This works, but feels non-bazelish, as we don't deal in the label/target level anymore, and have to manually provide a list of packages to remove.

Another option is to use another top-level rule that would perform this classpath subtraction manually, then delegate it to the SingleJar utility that handles the rest. So instead of creating our own e.g.

bazel_assembly(
    name = "my-output-jar"
    for-target="//foo/bar/mybinary"
    exclude=["//com/quux/dependency"]
)

it would be really useful to have the ability to specify exclude_for_deployment list of targets on the java_binary level.

cc @ittaiz

@ittaiz
Copy link
Member

ittaiz commented Jul 20, 2018 via email

@ulfjack
Copy link
Contributor

ulfjack commented Jul 20, 2018

@ittaiz how's "exclude_for_deployment" different from the existing "deploy_env" attribute? It looks like the attribute isn't defined in Bazel, but the code is otherwise all there. I'm not sure who decided to exclude "deploy_env" from Bazel, but I generally think it's a mistake to introduce divergence between the rules.

@ittaiz
Copy link
Member

ittaiz commented Jul 20, 2018 via email

@ulfjack
Copy link
Contributor

ulfjack commented Jul 20, 2018

It's filtering by jar file, it does not unpack jars and filter their contents.

@ittaiz
Copy link
Member

ittaiz commented Jul 20, 2018 via email

@ulfjack
Copy link
Contributor

ulfjack commented Jul 20, 2018

See here:
ulfjack@146e52c

I haven't tried this (not even to compile), but my earlier code digging says it should just work.

@hmemcpy
Copy link

hmemcpy commented Jul 24, 2018

Hi @ulfjack,

I'm trying to investigate whether deploy_env can suit our needs, but I can't seem to make it work. I've applied the change your commit described, but it seems to be failing to find the classpath providers here:

for (JavaRuntimeClasspathProvider envTarget : ruleContext.getPrerequisites(
"deploy_env", Mode.TARGET, JavaRuntimeClasspathProvider.class)) {

This returns an empty list.

Is there anything I'm missing? How can I configure deploy_env to have JavaRuntimeClasspathProvider on it?

@ulfjack
Copy link
Contributor

ulfjack commented Jul 24, 2018

Are you referencing a java_binary in deploy_env?

@hmemcpy
Copy link

hmemcpy commented Jul 24, 2018

Oh wow, my bad!
Yes, it works with java_binary targets!
I'll report back whether it suits our needs. Thanks!

@hmemcpy
Copy link

hmemcpy commented Jul 25, 2018

Ok, just to make I'm doing this correctly. Suppose I have a binary target like this:

java_binary(
    name = "test",
    srcs = ["Program.java"],
    main_class = "test.Program",
    runtime_deps = ["//test/lib1:lib1", "//test/lib2:lib2"],
    ...

My goal is to produce a deployable without the reference tolib2.
I was able to achieve it by adding another java_binary which just references lib2, and add this new binary in deploy_env:

java_binary(
    name = "test_deploy",
    main_class = "test.prog1.Program",
    runtime_deps = ["//test/lib2:lib2"]
)

and adding deploy_env = [":test_deploy"] to the original java_binary target.

After building test_deploy.jar, I have a jar that only contains the lib1 files!

However, it seems this will subtract all the transitive dependencies of lib2 as well, am I correct to assume that?

@ulfjack
Copy link
Contributor

ulfjack commented Jul 25, 2018

Yes, that's right, unless you pass through a non-Java rule, like so:

java_binary(name="foo", runtime_deps=["baz","original"], deploy_env=["bar"])
java_binary(name="bar", deps=["baz"])
java_import(name="baz", jars=["copy.jar"])
genrule(cmd = "cp original.jar copy.jar") <--- renames the jar file
java_library(name = "original")

In this case, original.jar ends up on the classpath, even though it is in the transitive closure of the deploy_env.

@statik
Copy link

statik commented Jul 30, 2018

I'm excited to find this issue, this is exactly what I need in order to use bazel to build jars that are plugins for a keycloak.org authentication server.

@hmemcpy
Copy link

hmemcpy commented Aug 8, 2018

@ulfjack the deploy_env feature could really help us in migrating difficult maven projects. Would you make this available in bazel?

@rdnetto4
Copy link

Note that another common use case for provided scope is OSGI boundaries - given some library A, a library B loaded within the same classloader should have a regular dependency on it, whereas a library C loaded in another classloader that OSGI-imports classes from A should have a provided-scope dependency on it.

@s50600822
Copy link

One problem with that is that the entire transitive closure of your binary has to do that consistently. If there is any library that has a normal deps on that, it will be linked in after all.

Hi, I'm trying to see how to add this, but an example is if you move Guava into Bazel google/guava#2850

under certain WORKSPACE that contains other packages. Guava itself(guava-*.jar) is an OSGi bundle. Image if there are several other packages in the workspace that are OSGi bundles, and they
depend on Guava traditionally as "provided". Now if we say "neverlink" here (and do the OSGi instruction in the MANIFEST somehow). We won't be able to have any other BAZEL package that acts as application/platform that include Guava as compile dependency and do OSGi export for every other packages in the WORKSPACE. I think this is a very common use case for OSGi projects.

Though I'm not totally sure if this rule should be under java_library or there should be an osgi_bundle rule by itself.

@s50600822
Copy link

s50600822 commented Oct 22, 2018

Hi

I wonder if I misunderstood something regarding the scope semantic of "deps", below I is when I tried to compile and saw somehow "deps" is already acting like "provided". java-maven-lib came out without guava or java-blah-lib inside the package. Similarly java-blah-lib was built without guava or common-io inside it.
screen shot 2018-10-23 at 9 36 40 am

bazel-io pushed a commit that referenced this issue Jan 10, 2019
This adds support for `deploy_env`, an internal feature that supports transitive classpath subtraction to create deployable jars suited for different environments.
This PR cherrypicks @ulfjack's commit, as described in #1402, and will help with scenarios describe there, as well as #5856.

Closes #6013.

PiperOrigin-RevId: 228690507
@thundergolfer
Copy link
Contributor

@s50600822 Did you build with the _deploy.jar suffix?

@s50600822
Copy link

s50600822 commented Apr 15, 2019

@s50600822 Did you build with the _deploy.jar suffix?

I don't think so. The jar name above (in the decompiler) also seems to say that too (cause it's been a while). If you meant https://docs.bazel.build/versions/master/tutorial/java.html#package-a-java-target-for-deployment, I wouldn't think of it since java application deployment is something usually not covered by build tool(since the logic isn't trivial). For example, in Maven the "deploy" semantic means adding the artifact to a remote repository(https://maven.apache.org/plugins/maven-deploy-plugin/). I think Bazel strength is incremental build behavior and was trying to use it to that extend.

@thundergolfer
Copy link
Contributor

Well if you don't create a 'Fat Jar' with the _deploy.jar target suffix, then you won't get your dependencies inside your Jar.

@jin jin added team-Rules-Java Issues for Java rules and removed category: rules > java labels May 2, 2019
qtprojectorg pushed a commit to qtqa/gerrit that referenced this issue Jan 19, 2020
From the beginning gerrit-acceptance-framework artifact shipped the
same content as gerrit-plugin-api. Given that provided_deps attribute
is not available in java_library rules, see issue: [1], there was no
way to exclude libraries already shipped in gerrit-plugin-api.

However with this commit: [2], that is available in recent Bazel
versions, deploy_env attribute is exposed in java_binary rule. With it
it is now possible to achieve transitive classpath subtraction to create
deployable jars suited for different environments.

Use this feature to subtract transitive dependencies already shipped in
gerrit-plugin-api. As the consequence the size of the test framework
artifact is reduced from ca. 70 MB to 6 MB.

While already on it, remove some related TODO comments as well.

[1] bazelbuild/bazel#1402
[2] bazelbuild/bazel@a92347e

Change-Id: I0deada504648d27465f57021787885705635b8b4
qtprojectorg pushed a commit to qtqa/gerrit that referenced this issue Jan 28, 2020
From the beginning gerrit-acceptance-framework artifact shipped the
same content as gerrit-plugin-api. Given that provided_deps attribute
is not available in java_library rules, see issue: [1], there was no
way to exclude libraries already shipped in gerrit-plugin-api.

However with this commit: [2], that is available in recent Bazel
versions, deploy_env attribute is exposed in java_binary rule. With it
it is now possible to achieve transitive classpath subtraction to create
deployable jars suited for different environments.

Use this feature to subtract transitive dependencies already shipped in
gerrit-plugin-api. As the consequence the size of the test framework
artifact is reduced from ca. 70 MB to 6 MB.

While already on it, remove some related TODO comments as well.

[1] bazelbuild/bazel#1402
[2] bazelbuild/bazel@a92347e

Change-Id: I0deada504648d27465f57021787885705635b8b4
@meisterT meisterT removed this from the 0.6 milestone May 12, 2020
@gergelyfabian
Copy link

gergelyfabian commented Jun 2, 2020

Does deploy_env help in a case when you have a diamond dependency problem:

A depends on B and C, B depends on C (possibly in a longer transitive chain), and I'd like to use deploy_env to remove from the *_deploy.jar B and it's transitive dependencies, but not C (because A depends on C directly).

According to my memories regular Maven supported this case, marking B as provided, but using C as a normal dependency. With what I can see with deploy_env (testing it), in this case it's not possible to use C from A.
Is this a bug, that I should report, or it's something expected?

Example:

Building software that would depend on scala.util.parsing (direct dependency) on a Hadoop platform. If I start including elements of the Hadoop platform in the java_binary that I use as deploy_env, then I cannot use scala.util.parsing any more in my software (I can compile, but I cannot run it, because scala.util.parsing gets cut out - as it's frequently a direct or transitive dependency of elements of the Hadoop platform).

My gut feeling would be, that if I have a direct dependency, it should not be cut out with deploy_env.

@comius comius removed the P2 We'll consider working on this in future. (Assignee optional) label Nov 20, 2020
@comius comius added the P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) label Jul 21, 2021
@comius
Copy link
Contributor

comius commented Jul 21, 2021

java_binary deploy_env should be used for this use case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) team-Rules-Java Issues for Java rules type: feature request
Projects
None yet
Development

No branches or pull requests