-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strict classpath checking / one version enforcement #1071
Comments
I like this a lot. We have this check for our internal build system and would love to see this feature in Bazel. You can combine this with the use of ijar to speed up the generation of symbols. |
We've had conversations about doing this, and @eaftan did some work on a prototype. I think there are two main reasons it doesn't exist yet. The first is that it would require a significant cleanup before it could be enabled internally. It'd be nice to have as an option for Bazel, though. The second is that it's expensive to enforce. A leaf target can have dozens of direct deps and thousands of transitive deps. With Another proposal was to only enforce this for binaries. That makes it cheaper (you're only scanning the classpath once), but there's less immediate feedback since it isn't reported for intermediate libraries. It would be cheap to do the enforcement in singlejar, but then the error would only be reported when building deploy jars. @hhclam - can you provide more detail on how this could be combined with ijar to improve performance? That sounds interesting, but I'm not sure I understand what you mean. |
It probably wouldn't be expensive if it's done incrementally. For example, the way I implemented strict dependency checking for Closure Rules is I have each I imagine something similar could be done for |
By default Bazel generates an ijar target. If the ijar process also produces a list of symbols then it can save the javabuilder from doing scanning. Like @jart mentioned if a java target also produces an implicit output that lists symbols from all the transitive dependencies plus its own symbols then the computation can be done incrementally. |
For java_library and java_binary, there is the java_compilation proto: That may have the information you want, it would need to be added to On Wed, Mar 23, 2016 at 1:42 PM Justine Tunney notifications@github.com
|
Thanks, when I mentioned summaries I was also thinking about a summary of the transitive closure of each target. Processing one summary per direct dep is definitely better than scanning the transitive closure. I'm not sure that the compilation proto buys us much, all that's needed is the list of class outputs for each target, so we could scan the sources like @jart describes or just list the entries in the output jar. The source approach is nice because I don't think we care about anonymous classes, and it avoids waiting for the compilation to finish. I'm still not convinced that enforcement is cheap, since the transitive closure of a target can get pretty big. But as long as it's faster than javac we're fine. |
We totally agree this is a huge problem and are working on it internally. See internal bug b/21271423. One issue with extracting symbols directly from the source is that we need to allow cases where there are multiple identical copies of a class file on the classpath. This happens legitimately when you have a target that globs *.java as well as another target that compiles a subset of the source files in the same directory. I guess we could disallow that as well, but the cleanup would be painful and wouldn't add significant value IMO. We were planning to enforce only for java_binary and java_test, and also whitelist existing instances to avoid a giant cleanup. Anecdotally, I believe the majority of the java build targets in google3 violate the one version rule. |
Counterintuitively, it's probably more expensive to do it incrementally - the problem is diamond dependencies (which are very common with Bazel-style build rules): let's say you have three levels with one top-level rule, n intermediate rules, and m low-level dependencies, where the top level rule depends on all the intermediate rules, each of which depends on all m low-level rules. Then the list of classes (actually, it needs to be a map, so that you know where each one comes from for merging) for each of the n rules is O(m), and you end up with O(n*m) merging at the top-level rules, rather than O(n+m). Only merging at the binary rules (and tests?) is much less expensive. |
just wanted to add that using the sources won't help other languages on the JVM (like Scala) which allow multiple top level classes in the same source file |
0eaff44 is the start: we've plumbed the one version enforcement glue into the build graph for There are as-of-yet-not-fully-specified plans to open up the binary to actually do enforcement, which might require a bit of retooling for open-source (different data storage format, having prebuilts for the executable). We'll probably want to have a binary_level opt-out of one-version-enforcement for people taking this on incrementally. |
This is working in Blaze and enabled by default for some cases across Google. Can we open source the binary that does the actual enforcement and provide instructions for how to use it? |
The implementation is based on singlejar, so this is probably blocked on #2241. Another question is what the whitelisting mechanism should look like. When we rolled this out at Google there were a lot of existing problems, and instead of trying to clean them all up before enabling enforcement we added them to a whitelist so we could start preventing new problems while gradually fixing existing ones. For Bazel we need to decide how that whitelist should be configured. |
Exciting!
Can we start without a whitelist and then add it later if people have the
need?
I have no data but it sounds possible that there will be at least a few
users that might not have this issue.
…On Thu, 11 Oct 2018 at 1:38 Liam Miller-Cushon ***@***.***> wrote:
The implementation is based on singlejar, so this is probably blocked on
#2241 <#2241>.
Another question is what the whitelisting mechanism should look like. When
we rolled this out at Google there were a lot of existing problems, and
instead of trying to clean them all up before enabling enforcement we added
them to a whitelist so we could start preventing new problems while
gradually fixing existing ones. For Bazel we need to decide how that
whitelist should be configured.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1071 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABUIF2nF0cEHBV0W6tHWVhFCtjdF2Wygks5ujnbbgaJpZM4H2XzL>
.
|
Note, you can check this very easily with: https://github.com/classgraph/classgraph A zero dependency java library that has duplicate class detection as one of the examples. Seems like you could easily make an aspect that would check all java providers with that. |
This is my small solution for this: https://github.com/or-shachar/jvm-classpath-validator |
What is the state of the |
@cushon Would you consider open-sourcing the singlejar-based tool? The existing allowlisting mechanism via a toolchain attribute does look reasonable for use in Bazel. |
I think it's definitely worth considering. The internal tool is implemented on top of the c++ singlejar, which as mentioned above was eventually open-sourced. The remaining code depends on absl, which I think is OK to use in Bazel tools now. There are some details about the handling of allowlists that might need reworking. |
#1071 PiperOrigin-RevId: 615111349 Change-Id: I297c85d38c277dd16d539de65dea8ec1d3311710
Follow-up to 9aef517 Demo: ``` ./one_version_main --output out.txt --inputs ~/.m2/repository/com/google/guava/guava/32.1.1-jre/guava-32.1.1-jre.jar,//foo ~/.m2/repository/com/google/guava/guava/32.0.0-jre/guava-32.0.0-jre.jar,//bar ``` #1071 PiperOrigin-RevId: 615796723 Change-Id: I92a7c5401c72d0b3a583994263dcdf1091aa6a8f
As of 90e1661 I've finished the initial work to open-source the one version tool. The tool takes a list of jars and their build labels and checks them for conflicts, e.g.:
I am not sure I'll have time to pursue the next steps here immediately, so I'll leave some notes in case anyone is interest in contributing: I think all of the rules support is already there, it just isn't doing anything because the tool wasn't available and enabled in the toolchain. The next step is to start including the tool in The internal version of the tool has support for an allowlist of existing violations to ignore, because when we rolled it out there were many violations and it wasn't practical to clean them up. I'm not sure what the priority of that is for Bazel: new projects may be able to just avoid introducing one version issues, but some existing projects may want a similar way to roll out enforcement to catch new violations and allowlist existing ones. The internal implementation uses and SSTable library that isn't open-source, for Bazel we could probably start with a text file. |
* Add a prebuilt `one_version` tool as well as sources to `java_tools`. * Avoid passing `--whitelist` to `one_version` if no allowlist is configured in the toolchain as it isn't supported by the Bazel version of `oneversion` yet. * Document the `one_version` flags. * Clean up tests not updated after recent rules_java releases. Work towards #1071 Closes #22246. PiperOrigin-RevId: 640177996 Change-Id: I0323154274bf2127a023184a67485783aa868461
I was having lunch with @kchodorow and we came up with a really cool idea for guaranteeing that two classes with the same package + name will never appear on the same classpath.
The way it works, is you have
java_{library,binary,import}
export the set of all the class names that exist in the jar it produces. Then when two rules get linked together, have it fail if the intersection of those sets is nonempty.This solves the Maven multi-version diamond dependency problem brought up in #1065, in a fully generic way. It would be the greatest thing since strict dependency checking. It would completely eliminate an entire category of mysterious Java errors, that everyone who's ever maintained a Java build has surely encountered. Best of all, this will be fast, because Bazel can perform this check in an incremental fashion.
The text was updated successfully, but these errors were encountered: