Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strict classpath checking / one version enforcement #1071

Open
jart opened this issue Mar 22, 2016 · 22 comments
Open

Strict classpath checking / one version enforcement #1071

jart opened this issue Mar 22, 2016 · 22 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Java Issues for Java rules type: feature request

Comments

@jart
Copy link
Contributor

jart commented Mar 22, 2016

I was having lunch with @kchodorow and we came up with a really cool idea for guaranteeing that two classes with the same package + name will never appear on the same classpath.

The way it works, is you have java_{library,binary,import} export the set of all the class names that exist in the jar it produces. Then when two rules get linked together, have it fail if the intersection of those sets is nonempty.

This solves the Maven multi-version diamond dependency problem brought up in #1065, in a fully generic way. It would be the greatest thing since strict dependency checking. It would completely eliminate an entire category of mysterious Java errors, that everyone who's ever maintained a Java build has surely encountered. Best of all, this will be fast, because Bazel can perform this check in an incremental fashion.

@hhclam
Copy link
Contributor

hhclam commented Mar 22, 2016

I like this a lot. We have this check for our internal build system and would love to see this feature in Bazel. You can combine this with the use of ijar to speed up the generation of symbols.

@damienmg damienmg added type: feature request P2 We'll consider working on this in future. (Assignee optional) Native rules (Java / C++ / Python) labels Mar 23, 2016
@damienmg
Copy link
Contributor

/cc @cushon @eaftan Do you think this check should be in JavaBuilder, we already have all the information (we pass the ijars and listing a content of a jar is easy)?

@cushon
Copy link
Contributor

cushon commented Mar 23, 2016

We've had conversations about doing this, and @eaftan did some work on a prototype.

I think there are two main reasons it doesn't exist yet. The first is that it would require a significant cleanup before it could be enabled internally. It'd be nice to have as an option for Bazel, though.

The second is that it's expensive to enforce. A leaf target can have dozens of direct deps and thousands of transitive deps. With --experimental_java_classpath=javabuilder we don't have to open all of the jar files for a compilation, so forcing JavaBuilder to scan every class file in every jar on the classpath would slow down the build. One alternative is to have a separate artifact containing a compact summary of the transitive dependencies of each target, and for every java_* target have an action that merges the summaries of its deps and the current jar to emit a new summary, and report an error if there's a one version violation.

Another proposal was to only enforce this for binaries. That makes it cheaper (you're only scanning the classpath once), but there's less immediate feedback since it isn't reported for intermediate libraries. It would be cheap to do the enforcement in singlejar, but then the error would only be reported when building deploy jars.

@hhclam - can you provide more detail on how this could be combined with ijar to improve performance? That sounds interesting, but I'm not sure I understand what you mean.

@jart
Copy link
Contributor Author

jart commented Mar 23, 2016

It probably wouldn't be expensive if it's done incrementally.

For example, the way I implemented strict dependency checking for Closure Rules is I have each closure_js_library rule scan all its srcs for provided symbols, and then I write them to the implicit output file: name-provides.txt. Then any closure_js_library rule that depends on that rule, can quickly load that txt file containing the list of symbols for each direct dep, rather than having to pay the price scanning through all transitive dep source files. (See jschecker.py and closure_js_library.bzl)

I imagine something similar could be done for java_library, even though it's not written in Skylark.

@hhclam
Copy link
Contributor

hhclam commented Mar 23, 2016

By default Bazel generates an ijar target. If the ijar process also produces a list of symbols then it can save the javabuilder from doing scanning. Like @jart mentioned if a java target also produces an implicit output that lists symbols from all the transitive dependencies plus its own symbols then the computation can be done incrementally.

@ahumesky
Copy link
Contributor

For java_library and java_binary, there is the java_compilation proto:
https://github.com/bazelbuild/bazel/blob/master/src/main/protobuf/java_compilation.proto
which JavaBuilder writes to a .jar_manifest_proto file next to the jar,
e.g.:
bazel-bin/src/main/java/com/google/devtools/build/lib/libcommon.jar_manifest_proto

That may have the information you want, it would need to be added to
java_import though.

On Wed, Mar 23, 2016 at 1:42 PM Justine Tunney notifications@github.com
wrote:

It wouldn't be expensive if it's done incrementally.

For example, the way I implemented strict dependency checking for Closure
Rules is I have each closure_js_library rule scan all its srcs for
provided symbols, and then I write them to the implicit output file:
name-provides.txt. Then any closure_js_library rule that depends on
that rule, can quickly load that txt file containing the list of symbols
for each direct dep, rather than having to pay the price scanning through
all transitive dep source files. (See jschecker.py
https://github.com/bazelbuild/rules_closure/blob/master/closure/compiler/jschecker/jschecker.py
and closure_js_library.bzl
https://github.com/bazelbuild/rules_closure/blob/master/closure/compiler/closure_js_library.bzl
.

I imagine something similar could be done for java_library, even though
it's not written in Skylark.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#1071 (comment)

@cushon
Copy link
Contributor

cushon commented Mar 23, 2016

Thanks, when I mentioned summaries I was also thinking about a summary of the transitive closure of each target. Processing one summary per direct dep is definitely better than scanning the transitive closure.

I'm not sure that the compilation proto buys us much, all that's needed is the list of class outputs for each target, so we could scan the sources like @jart describes or just list the entries in the output jar. The source approach is nice because I don't think we care about anonymous classes, and it avoids waiting for the compilation to finish.

I'm still not convinced that enforcement is cheap, since the transitive closure of a target can get pretty big. But as long as it's faster than javac we're fine.

@eaftan
Copy link

eaftan commented Mar 29, 2016

We totally agree this is a huge problem and are working on it internally. See internal bug b/21271423.

One issue with extracting symbols directly from the source is that we need to allow cases where there are multiple identical copies of a class file on the classpath. This happens legitimately when you have a target that globs *.java as well as another target that compiles a subset of the source files in the same directory. I guess we could disallow that as well, but the cleanup would be painful and wouldn't add significant value IMO.

We were planning to enforce only for java_binary and java_test, and also whitelist existing instances to avoid a giant cleanup. Anecdotally, I believe the majority of the java build targets in google3 violate the one version rule.

@ulfjack
Copy link
Contributor

ulfjack commented May 4, 2016

Counterintuitively, it's probably more expensive to do it incrementally - the problem is diamond dependencies (which are very common with Bazel-style build rules): let's say you have three levels with one top-level rule, n intermediate rules, and m low-level dependencies, where the top level rule depends on all the intermediate rules, each of which depends on all m low-level rules.

Then the list of classes (actually, it needs to be a map, so that you know where each one comes from for merging) for each of the n rules is O(m), and you end up with O(n*m) merging at the top-level rules, rather than O(n+m). Only merging at the binary rules (and tests?) is much less expensive.

@ulfjack ulfjack added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed P2 We'll consider working on this in future. (Assignee optional) labels Jun 16, 2016
@ulfjack ulfjack added this to the 1.0 milestone Jun 16, 2016
@ittaiz
Copy link
Member

ittaiz commented Oct 25, 2016

just wanted to add that using the sources won't help other languages on the JVM (like Scala) which allow multiple top level classes in the same source file

@nick-someone
Copy link
Contributor

0eaff44 is the start: we've plumbed the one version enforcement glue into the build graph for java_binary rules.

There are as-of-yet-not-fully-specified plans to open up the binary to actually do enforcement, which might require a bit of retooling for open-source (different data storage format, having prebuilts for the executable). We'll probably want to have a binary_level opt-out of one-version-enforcement for people taking this on incrementally.

@ulfjack
Copy link
Contributor

ulfjack commented Oct 10, 2018

This is working in Blaze and enabled by default for some cases across Google. Can we open source the binary that does the actual enforcement and provide instructions for how to use it?

@cushon
Copy link
Contributor

cushon commented Oct 10, 2018

The implementation is based on singlejar, so this is probably blocked on #2241.

Another question is what the whitelisting mechanism should look like. When we rolled this out at Google there were a lot of existing problems, and instead of trying to clean them all up before enabling enforcement we added them to a whitelist so we could start preventing new problems while gradually fixing existing ones. For Bazel we need to decide how that whitelist should be configured.

@ittaiz
Copy link
Member

ittaiz commented Oct 12, 2018 via email

@ittaiz
Copy link
Member

ittaiz commented Dec 12, 2018

@cushon it sounds like #2241 is fixed on HEAD. Can this progress?

@jin jin added team-Rules-Java Issues for Java rules and removed category: rules > java labels Jan 9, 2019
@cushon cushon changed the title Strict classpath checking Strict classpath checking / one version enforcement Feb 5, 2019
@ittaiz
Copy link
Member

ittaiz commented Jan 11, 2020

@katre this is the issue we were discussing in the java BOF BazelCon about duplicates in classpath of deployables.
@cushon do you think the binary code can be open sourced like Ulf suggested?

@johnynek
Copy link
Member

Note, you can check this very easily with:

https://github.com/classgraph/classgraph

A zero dependency java library that has duplicate class detection as one of the examples.

Seems like you could easily make an aspect that would check all java providers with that.

@or-shachar
Copy link
Contributor

This is my small solution for this: https://github.com/or-shachar/jvm-classpath-validator

@meisterT meisterT removed this from the 1.0 milestone May 12, 2020
@guw
Copy link
Contributor

guw commented Apr 10, 2022

What is the state of the one_version_tool? Will it be open source one day? Is there a spec somewhere of what's expected from the one_version_tool in terms of input processing and output?

@ShreeM01 ShreeM01 added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 16, 2023
@ShreeM01 ShreeM01 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2023
@cushon cushon reopened this Feb 16, 2023
@ShreeM01 ShreeM01 removed the stale Issues or PRs that are stale (no activity for 30 days) label Feb 16, 2023
@sgowroji sgowroji added the not stale Issues or PRs that are inactive but not considered stale label Feb 17, 2023
@fmeum
Copy link
Collaborator

fmeum commented Mar 11, 2024

@cushon Would you consider open-sourcing the singlejar-based tool? The existing allowlisting mechanism via a toolchain attribute does look reasonable for use in Bazel.

@cushon
Copy link
Contributor

cushon commented Mar 11, 2024

I think it's definitely worth considering. The internal tool is implemented on top of the c++ singlejar, which as mentioned above was eventually open-sourced. The remaining code depends on absl, which I think is OK to use in Bazel tools now. There are some details about the handling of allowlists that might need reworking.

copybara-service bot pushed a commit that referenced this issue Mar 12, 2024
#1071

PiperOrigin-RevId: 615111349
Change-Id: I297c85d38c277dd16d539de65dea8ec1d3311710
copybara-service bot pushed a commit that referenced this issue Mar 13, 2024
Follow-up to 3130202

#1071

PiperOrigin-RevId: 615448774
Change-Id: I8bc0ffcc36eabe010b0e85abc18f80e3637ee62a
copybara-service bot pushed a commit that referenced this issue Mar 14, 2024
Follow-up to 9aef517

Demo:

```
./one_version_main --output out.txt --inputs ~/.m2/repository/com/google/guava/guava/32.1.1-jre/guava-32.1.1-jre.jar,//foo  ~/.m2/repository/com/google/guava/guava/32.0.0-jre/guava-32.0.0-jre.jar,//bar
```

#1071

PiperOrigin-RevId: 615796723
Change-Id: I92a7c5401c72d0b3a583994263dcdf1091aa6a8f
@cushon
Copy link
Contributor

cushon commented Mar 14, 2024

As of 90e1661 I've finished the initial work to open-source the one version tool.

The tool takes a list of jars and their build labels and checks them for conflicts, e.g.:

./one_version_main --output out.txt --inputs ~/.m2/repository/com/google/guava/guava/32.1.1-jre/guava-32.1.1-jre.jar,//foo  ~/.m2/repository/com/google/guava/guava/32.0.0-jre/guava-32.0.0-jre.jar,//bar
...
  com/google/thirdparty/publicsuffix/PublicSuffixPatterns has incompatible definitions in:
    crc32=1633022630
      //foo [new]
      via /usr/local/google/home/cushon/.m2/repository/com/google/guava/guava/32.1.1-jre/guava-32.1.1-jre.jar
    crc32=1936051819
      //bar [new]
      via /usr/local/google/home/cushon/.m2/repository/com/google/guava/guava/32.0.0-jre/guava-32.0.0-jre.jar

I am not sure I'll have time to pursue the next steps here immediately, so I'll leave some notes in case anyone is interest in contributing:

I think all of the rules support is already there, it just isn't doing anything because the tool wasn't available and enabled in the toolchain.

The next step is to start including the tool in java_tools releases (similar to singlejar), and wire it up to the toolchain.

The internal version of the tool has support for an allowlist of existing violations to ignore, because when we rolled it out there were many violations and it wasn't practical to clean them up. I'm not sure what the priority of that is for Bazel: new projects may be able to just avoid introducing one version issues, but some existing projects may want a similar way to roll out enforcement to catch new violations and allowlist existing ones. The internal implementation uses and SSTable library that isn't open-source, for Bazel we could probably start with a text file.

copybara-service bot pushed a commit that referenced this issue Jun 4, 2024
* Add a prebuilt `one_version` tool as well as sources to `java_tools`.
* Avoid passing `--whitelist` to `one_version` if no allowlist is configured in the toolchain as it isn't supported by the Bazel version of `oneversion` yet.
* Document the `one_version` flags.
* Clean up tests not updated after recent rules_java releases.

Work towards #1071

Closes #22246.

PiperOrigin-RevId: 640177996
Change-Id: I0323154274bf2127a023184a67485783aa868461
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-Java Issues for Java rules type: feature request
Projects
None yet
Development

No branches or pull requests