Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce naming convention on maven_jar #1065

Closed
jart opened this issue Mar 21, 2016 · 17 comments
Closed

Enforce naming convention on maven_jar #1065

jart opened this issue Mar 21, 2016 · 17 comments
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) type: feature request

Comments

@jart
Copy link
Contributor

jart commented Mar 21, 2016

As you can see, the Closure Rules repositories.bzl file is getting to be a nontrivial size. I'm imagining how the naming of these dependencies is going to play out in a large ecosystem of external repositories. For example:

maven_jar(
    name = "guice",
    artifact = "com.google.inject:guice:3.0",
    sha1 = "9d84f15fe35e2c716a02979fb62f50a29f38aefa",
)

Now let's imagine rules_closure and rules_scala both depend on Guice, but they name it different things. If my project uses both closure and scala, then there will be no way for me, in my own WORKSPACE file, to copy in the repositories.bzl from both projects, and merge/reconcile them. There will be two Guice jars. This breaks the "one version" philosophy of Bazel internally at Google.

The solution might be to have Maven jar emit a warning if my maven_jar rule is not named as follows:

maven_jar(
    name = "com_google_inject_guice",
    artifact = "com.google.inject:guice:3.0",
    sha1 = "9d84f15fe35e2c716a02979fb62f50a29f38aefa",
)

As you can see, the enforced name is generated from the artifact argument.

Unfortunately, I don't think it would be possible to enforce a naming convention on any other types of external dependencies. But it could be very advantageous to Java users if it could be enforced on maven_jar.

@damienmg
Copy link
Contributor

That's sounds reasonable. Assigning to @kchodorow for triage / see how it can fit the general roadmap of naming conventions.

@damienmg damienmg added type: feature request P3 We're not considering working on this, but happy to review a PR. (No assignee) External repositories labels Mar 22, 2016
@kchodorow
Copy link
Contributor

Unfortunately, external users actually want to have conflicting versions of some dependencies (e.g., build one part of the project with one version, another part with another version). We don't want to have spurious warnings or people will ignore them. You might want to take a look at https://groups.google.com/d/msg/bazel-discuss/Dc1bCD0JpDU/5LZmFHAXKwAJ, a design doc @hhclam has been working on addressing this (and some other issues).

@jart
Copy link
Contributor Author

jart commented Mar 22, 2016

The solution to that is very simple. Provide the user with two choices of names: com_google_inject__guice or com_google_inject__guice__3_0.

@hhclam
Copy link
Contributor

hhclam commented Mar 22, 2016

While appending version to the package name allows multiple versions to be used in a repo. Bazel doesn't provide a mechanism for resolving diamond dependency. It might result in a bad usage pattern and lead to problems that user cannot resolve.

@jart
Copy link
Contributor Author

jart commented Mar 22, 2016

So @hhclam is what you mean by resolving diamond, automatically turning this:

diamond-inconsistent

Into this?

diamond

I would argue that this probably isn't something Bazel should automatically do. The best strategy would probably be if java_library/java_binary could do strict dependency checking on Maven deps. If two artifacts with the same name but different versions are linked, it would be essential that Bazel throw an error. Then the user can resolve this himself.

Being strict rather than magical is probably better in my opinion. Non-strictness has led to problems with other build systems; for example Maven will simply put both on the classpath in a hard-to-predict order.

@hhclam
Copy link
Contributor

hhclam commented Mar 22, 2016

@jart: That's right. I agree that Bazel shouldn't handle this automatically. Instead I think Bazel team should provide tools and guidelines such that user doesn't run into this situation in the first place. That's the motivation of the design doc for generate_workspace.

I totally agree that being strict is better in general. It can go together with a well documented tools and best practices.

My understanding from discussion with @kchodorow is that Bazel does not provide an easy way to resolve this. Features in other build systems such as exclusion (in Maven) and override (in pants) are not available in Bazel.

@jart
Copy link
Contributor Author

jart commented Mar 22, 2016

Penny for your thoughts on #1071 @hhclam.

I'm not sure if you're a Googler, but Bazel doesn't support tools for exclusion because Google's internal monolithic repository has a policy where only a single version of a third party dependency is allowed to exist at a given time. When third party dependencies are updated, every single project that uses that dependency will be updated, if necessary, in a single atomic commit.

That's the way it works in theory. In practice, some third party dependencies have multiple versions checked in. This is unfortunately necessary in situations where libraries change their APIs so radically, that migrating projects would be impossible. In those types of situations, what we do is we create a separate directory for each version. Then we restrict the visibility of the old version to being a whitelist rather than public, in order to minimize the chance that two versions might end up on the same classpath.

I'm not sure why exclusions would be necessary, since Bazel doesn't auto-fetch transitive jars. As far as I know, your proposal is to continue having a script that discovers all the transitive dependencies by crawling pom.xml files, and then generates the WORKSPACE / BUILD configs (which is the ideal approach IMHO.) In that case, you could probably do exclusions with command line flags, or manual reconciliation.

@jart
Copy link
Contributor Author

jart commented Mar 22, 2016

Here's how I've been solving the problem so far. I maintain a large Bazel codebase that depends on 83 separate Maven jars. What I basically did was I wrote, by hand, 60 BUILD files under //third_party/java that define each jar with its transitive dependencies as runtime deps. For example:

# third_party/java/apache_sshd/BUILD
package(default_visibility = ["//visibility:public"])

licenses(["notice"])  # Apache License 2.0

java_library(
    name = "apache_sshd",
    exports = ["@sshd_core//jar"],
    runtime_deps = ["//third_party/java/apache_mina:core"],
)

Writing these files was a pain. But it was just brainless boilerplate that I was able to whip up in a few days. But I don't think everyone should be expected to do it.

So I'd like to propose an alternative worth considering, that might be more idiomatic to the Google way of doing things, for which Bazel was designed.

What if we started a new open source project called baby3 which was basically just a gigantic collection of BUILD files for all the world's open source libraries, each tracked at a single version. This could basically be Bazel's equivalent to the Maven repository. Except baby3 would actually be a meta-repository containing all the world's repositories! King of kings.

However such an effort might not be feasible. The hardest part about doing that, would be integration testing new versions. In order to make baby3 successful, we would have to make sure that everyone who uses baby3 would be registered somehow, so that every time baby3 changes we can run the integration tests of the affected projects.

That way, if you want to use apache sshd, then you add baby3 as an external dependency and then add @baby3//third_party/java/apache_sshd to your deps list. You might have to add load("@baby3//third_party/java/apache_sshd:repositories.bzl", "etc") to your WORKSPACE too... which would be a pain. But I'm sure we could add a Bazel feature for making that annoying bit go away.

@hhclam
Copy link
Contributor

hhclam commented Mar 25, 2016

@jart Thanks for the ideas.

We are starting with a large legacy repository that we would like to migrate to use Bazel to build. Hence the situation of diamond dependency on maven jars (internally we have one single repo that contains all code similar to Google). You are right that the proposal is to keep using a script / tool to generate WORKSPACE and BUILD files, and teach the tool to generate select conditions if multiple versions are used. I think it serves our needs and provides some degree of aid to developer to resolve this situation.

@jart
Copy link
Contributor Author

jart commented Mar 25, 2016

Why would you use select() expressions if multiple versions are being used? Why not just evict the old version? I think I commented on this in the design doc. I've never seen this strategy used at Google.

@jart
Copy link
Contributor Author

jart commented Mar 25, 2016

Is there any chance you could make your particular use case more concrete? I'd like to understand better why it would need the select thing.

@hhclam
Copy link
Contributor

hhclam commented Mar 25, 2016

What does it mean by evict? Do you mean to eliminate the older version completely in all usage?

The concrete case is that some part of the company wants to use a newer version of guava say 19. And some part of the company is not yet ready to upgrade or is slower to change their code, say they still depend on version 18. Now we end up with two products (java_binary), one wants guava 19 and one wants guava 18. They both live in the same repo and depends on the same base library. We would like the build system to handle this case.

@jart
Copy link
Contributor Author

jart commented Mar 26, 2016

By evict, what I mean is if B depends on D-2.0.1 and C depends on D-2.0.2, then you get rid of D-2.0.1 and make both B and C depend on D-2.0.2.

Bazel already supports the use case you're describing. We already do this at Google. The solution we use it to simply have a different build rule for each version, then rewrite the package names so they don't conflict.

To give you an example, Objectify is a third party package used by a lot of teams at Google. When version 4.0 came out, it broke certain things, like the way serialized objects are stored, which caused upgrading to involve performing datastore migrations for several services using the old version. No one wanted to do that. But we really wanted to use Objectify v4.

So what did we do? We asked the people in charge of //third_party for an exemption from the "one version policy." They approved it. And we put the different versions under two separate rules:

  • //third_party/java/objectify for v3
  • //third_party/java/objectify:objectify-v4 for v4

That way, anyone who wanted to use v4 could just put //third_party/java/objectify:objectify-v4 in their deps list, rather than the other version.

But what happens if a large application ends up linking both of these libraries? They'll conflict on the classpath. Then things break. Data could become corrupted.

So what we had to do when we added v4, was write a genrule that renamed the packages, so that com.googlecode.objectify.annotation.Entity would become com.googlecode.objectify.v4.annotation.Entity. This is easy:

# repackage.rules
rule **.objectify.** @1.objectify.v4.@2
rule **CustomFieldSerializer @1CustomFieldSerializerV4
genrule(
    name = "repackage",
    srcs = [
        "repackage.rules",
        "objectify-4.1.3.jar",
    ],
    outs = ["objectify-4.1.3-repackaged.jar"],
    cmd = "$(location //third_party/java/jarjar:jarjar_bin) process $(SRCS) $@",
    tools = ["//third_party/java/jarjar:jarjar_bin"],
)

So the use case you're describing is already supported. But we of course don't recommend doing this. We recommend having a corporate policy in place that states "one version only" for third party packages, unless granted an explicit case-by-case exemption by policymakers.

@jart
Copy link
Contributor Author

jart commented Mar 26, 2016

Also consider that if we had used the select solution for objectify, it would have resulted in the corruption of data. If my understanding of the select() design is correct, it would have caused applications that were written with objectify v3 in mind, to suddenly start using v4. Then serialized objects would get written incorrectly, while some others would remain. Then you end up with a patchwork of mixed data that becomes broken in any state.

@hhclam
Copy link
Contributor

hhclam commented Mar 28, 2016

I think this approach works in some cases and not in cases when refactor / recompile is needed.

It is great to have a corporate policy to have one version for each third party dependency and a strict process for ingestion. Google can mostly afford to implement this policy in such a large scale. I'm afraid that this doesn't work well in our organization. We are a medium size company with ~500 engineers and a lot of legacy code, some already have classpath conflicts. It will be quite impossible for us to refactor all our usage before we can start using Bazel.

I agree that repackaging is a viable strategy for some cases. In the case when we started out with legacy code or developers want to avoid the hassle to refactoring and rename, select is a strategy that requires less effort on their part.

Even more concretely, the Guava example above is a real one. Eventually, we want to move everyone to Guava 19, but we can't impose that burden on our developers just yet. That's why, when there are no conflicts, we would like to allow both version to coexist, and we would like both to be Google's Guava, and not a modified version. That will ease convergence down the road.

@jart
Copy link
Contributor Author

jart commented Mar 28, 2016

I understand that you're not Google and that you have a large legacy codebase. What I don't understand is the engineering strategy that necessitates your use case. This makes it difficult for me to evaluate whether or not the feature you're proposing to merge into Bazel is going to be helpful to other users. We want to make the best recommendations possible.

For example, will your migrated codebase be able to follow The 1:1:1 Rule? Or are you planning to have a small number of build files that glob() gigantic directory structures? If it's the latter, then you might lose many of the benefits offered by Bazel. It could potentially be inferior to your current build setup. Furthermore, you might find it difficult to escape this suboptimal state, because once you flip over, all the incentives to complete the migration go away.

Would it be possible to grow your Bazel build organically under a hybrid model? By this I mean, you would use both your old build system and Bazel, and migrate each project one at a time. Then you would farm out responsibility for migrating individual projects to project owners. You've got 500 engineers; you might as well use them. Each project owner would be responsible for adding the third party packages he needs, writing a BUILD file for each and every package within his project, and doing any necessary refactoring to remove cycles between packages. This means the migration also trains each team on how to use Bazel.

You would of course need high level approval to send down this mandate, and see it through to completion. Completion entails a concrete objective, which is deleting the legacy build system. That makes the project easier to manage. But here's the best part: you would probably be leading this cross-organizational effort. That's terrific for your career.

@hhclam
Copy link
Contributor

hhclam commented Mar 29, 2016

I think the mapping of codebase to build files is quite orthogonal to the discussion regarding external dependencies.

We do plan to have a hybrid build approach, with minimal effort from the users / developers to switch over. The plan is to migrate the codebases one by one, replacing some external dependencies with maven_jar when it makes sense, resolving conflicts with select. We're hoping the migration does not entail a signification code change and refactoring.

The case you mentioned didn't require any refactoring of existing code, only new code needs to use the com.googlecode.objectify.v4 package. I don't think it's viable for us to refactoring a large part of the codebase. As for mobilizing the entire engineering workforce, even as resourceful as Google it didn't happen for the Chrome team (first migration to GYP then GN). I doubt such a mandate would actually work.

Despite our strategy for migration I'm hoping the proposal to use select will enable a different policy for using Bazel. One that allows user to add new external dependencies as long as all applications builds and runs correctly (with help from strict check for classpath conflicts). I think this flexibility will be valuable to users, especially when other build tools are providing features to handle this use case (e.g. Gradle: https://goo.gl/EnZkzO and Pants: https://goo.gl/olNtEQ). I would love to hear if there are other alternatives.

@kchodorow kchodorow added P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) and removed P3 We're not considering working on this, but happy to review a PR. (No assignee) labels Jun 14, 2016
@dslomov dslomov closed this as completed Mar 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P4 This is either out of scope or we don't have bandwidth to review a PR. (No assignee) type: feature request
Projects
None yet
Development

No branches or pull requests

5 participants