Consider removing bind() #1952

Open
jart opened this Issue Oct 16, 2016 · 53 comments

Comments

@jart
Member

jart commented Oct 16, 2016

What is the use case for bind()? I can't think of one. Even in situations where there are multiple ABI-compatible implementations of a library (e.g. OpenSSL, BoringSSL, etc.) this problem could still be solved by using vanilla externals.

Most Bazel projects don't seem to use bind(). The ones that do, it seems to have caused problems.

For example, the protobuf repository, rather than defining a protobuf_repositories() function, simply uses //external:foo for every single target upon which it depends, thereby punting the burden defining bind() rules not only for every single external, but every target within those externals.

As a result, the TensorFlow workspace.bzl file has developed a cargo cult pattern where superfluous bindings will be added, because I don't think people really understand what bind() does.

What is especially suboptimal is that the bind() namespace overlaps with the external repository namespace. We can't name externals like "six" to be "@six" because the protobuf BUILD file asked us for //external:six. So we don't have a choice. We have to name it @six_archive, which hurts readability. It would have been more optimal if the protobuf BUILD file should have just asked for @six//:six.

It would be nice if we could retire bind() and help projects like protobuf migrate to the foo_repositories() model that official Bazel projects use. We could recommend as a best practice the technique that is employed by the Closure Rules repositories.bzl file.

def closure_repositories(
    omit_foo=False,
    omit_bar=False):
  if not omit_foo:
    foo()
  if not omit_bar:
    bar()

def foo():
  native.maven_jar(name = "foo", ...)

def bar():
  native.maven_jar(name = "bar", ...)

This gives dependent Bazel projects the power to schlep in transitive Closure Rules dependencies using either a whitelist or blacklist model. For example, one project that uses Closure Rules has the following in its WORKSPACE file:

http_archive(
    name = "io_bazel_rules_closure",
    sha256 = "7d75688c63ac09a55ca092a76c12f8d1e9ee8e7a890f3be6594a4e7d714f0e8a",
    strip_prefix = "rules_closure-b8841276e73ca677c139802f1168aaad9791dec0",
    url = "http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_closure/archive/b8841276e73ca677c139802f1168aaad9791dec0.tar.gz",  # 2016-10-02
)

load("@io_bazel_rules_closure//closure:defs.bzl", "closure_repositories")

closure_repositories(
    omit_gson = True,
    omit_guava = True,
    omit_icu4j = True,
    omit_json = True,
    omit_jsr305 = True,
    omit_jsr330_inject = True,
)

Because it directly depends on those transitive dependencies and wants to specify them on its own.

I think this is a much more desirable and flexible pattern than bind().

@philwo

This comment has been minimized.

Show comment
Hide comment
@philwo

philwo Oct 17, 2016

Member

I personally also find bind(...) confusing and am not sure how to correctly use it.

Pinging @damienmg and @lberki for some input here, I think they know more about how this is supposed to work.

Member

philwo commented Oct 17, 2016

I personally also find bind(...) confusing and am not sure how to correctly use it.

Pinging @damienmg and @lberki for some input here, I think they know more about how this is supposed to work.

@lberki

This comment has been minimized.

Show comment
Hide comment
@lberki

lberki Oct 17, 2016

Contributor

bind was indeed invented for selecting between e.g. different implementations of SSL or e.g. for different versions of GSON/Guava/... in the Closure case. I'm not sure how much use it sees, so I'd not be that trigger-happy with removing it.

Contributor

lberki commented Oct 17, 2016

bind was indeed invented for selecting between e.g. different implementations of SSL or e.g. for different versions of GSON/Guava/... in the Closure case. I'm not sure how much use it sees, so I'd not be that trigger-happy with removing it.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Oct 17, 2016

Contributor

I think all bind uses-cases can be replaced with alias but it does not strike me as high priority. bind() has that weird thing that it creates a //external package that does not really exists, so only for that I would say +1

Contributor

damienmg commented Oct 17, 2016

I think all bind uses-cases can be replaced with alias but it does not strike me as high priority. bind() has that weird thing that it creates a //external package that does not really exists, so only for that I would say +1

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Oct 17, 2016

Member

It might be worth putting a quick change in the documentation saying that bind() is or will likely be deprecated and that alias should be used instead, or simply vanilla external repository names. That should hopefully lessen any refactoring we'll have to do in the future, if it is removed.

Member

jart commented Oct 17, 2016

It might be worth putting a quick change in the documentation saying that bind() is or will likely be deprecated and that alias should be used instead, or simply vanilla external repository names. That should hopefully lessen any refactoring we'll have to do in the future, if it is removed.

@lberki

This comment has been minimized.

Show comment
Hide comment
@lberki

lberki Oct 18, 2016

Contributor

Considering that I'm not really sure we want to do that, I'd rather not deprecate bind() just now.

Contributor

lberki commented Oct 18, 2016

Considering that I'm not really sure we want to do that, I'd rather not deprecate bind() just now.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Oct 18, 2016

Member

Then would Bazel at the very least be open to a documentation change
dissuading people from using it?

On Oct 18, 2016 1:45 AM, "lberki" notifications@github.com wrote:

Considering that I'm not really sure we want to do that, I'd rather not
deprecate bind() just now.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADAbqUFcPK40C1qQCoJ2B3pyKrI3LEIks5q1IcOgaJpZM4KYF52
.

Member

jart commented Oct 18, 2016

Then would Bazel at the very least be open to a documentation change
dissuading people from using it?

On Oct 18, 2016 1:45 AM, "lberki" notifications@github.com wrote:

Considering that I'm not really sure we want to do that, I'd rather not
deprecate bind() just now.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADAbqUFcPK40C1qQCoJ2B3pyKrI3LEIks5q1IcOgaJpZM4KYF52
.

@robertcrowe

This comment has been minimized.

Show comment
Hide comment
@robertcrowe

robertcrowe Oct 18, 2016

It looks like this could be related to a problem I'm trying to solve. When I try to build my Tensorflow Serving client I'm getting this:

bazel build //FFv1:client
ERROR: /home/rcrowe/.cache/bazel/_bazel_rcrowe/b1867dd6bfd6249a885c91482eccde46/external/org_tensorflow/tensorflow/workspace.bzl:80:3: no such package '@six_archive//': In new_http_archive rule //external:six_archive the 'build_file' attribute does not specify an existing file (/home/rcrowe/serving/six.BUILD does not exist) and referenced by '//external:six'.
ERROR: Analysis of target '//FFv1:client' failed; build aborted.
INFO: Elapsed time: 1.286s

Any idea how I can fix this? I'm on Ubuntu 14.04

It looks like this could be related to a problem I'm trying to solve. When I try to build my Tensorflow Serving client I'm getting this:

bazel build //FFv1:client
ERROR: /home/rcrowe/.cache/bazel/_bazel_rcrowe/b1867dd6bfd6249a885c91482eccde46/external/org_tensorflow/tensorflow/workspace.bzl:80:3: no such package '@six_archive//': In new_http_archive rule //external:six_archive the 'build_file' attribute does not specify an existing file (/home/rcrowe/serving/six.BUILD does not exist) and referenced by '//external:six'.
ERROR: Analysis of target '//FFv1:client' failed; build aborted.
INFO: Elapsed time: 1.286s

Any idea how I can fix this? I'm on Ubuntu 14.04

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Oct 18, 2016

Member

@robertcrowe Did you modify the workspace.bzl file? Can you start a separate issue for this and CC me?

Member

jart commented Oct 18, 2016

@robertcrowe Did you modify the workspace.bzl file? Can you start a separate issue for this and CC me?

@robertcrowe

This comment has been minimized.

Show comment
Hide comment
@robertcrowe

robertcrowe Oct 18, 2016

@jart - Thanks, I created #1963

@jart - Thanks, I created #1963

@kchodorow

This comment has been minimized.

Show comment
Hide comment
@kchodorow

kchodorow Oct 19, 2016

Contributor

+1 to removing bind, updating the docs in the meantime seems like a good idea.

Contributor

kchodorow commented Oct 19, 2016

+1 to removing bind, updating the docs in the meantime seems like a good idea.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Oct 19, 2016

Member

I reviewed the code to rules_web yesterday, which was recently introduced and makes extensive use of bind(). It uses bind() to allow the user to override executable attributes on its Skylark rules without having to repeat them every time the rule is used. I had a discussion with @DrMarcII about this. We both came to the conclusion that it would be better if those attributes were public and the user defined macro wrappers that customize the attribute.

I'm glad we're building consensus around removing bind(). Right now it's the first rule listed in the documentation, but it's actually the last rule we'd want to use. The same is true for git_repository(), which is also listed first, but has the biggest negative impact on performance for the project in question, and all dependent projects. Many users make use of these rules without considering the alternatives, like grabbing the snapshot tarball from GitHub. Whatever we can do to help the user make the correct choices, especially if we make the wrong choices impossible, is going to go a long way to fostering a healthy and lightning fast build ecosystem spanning many GitHub projects that all reference each other.

Member

jart commented Oct 19, 2016

I reviewed the code to rules_web yesterday, which was recently introduced and makes extensive use of bind(). It uses bind() to allow the user to override executable attributes on its Skylark rules without having to repeat them every time the rule is used. I had a discussion with @DrMarcII about this. We both came to the conclusion that it would be better if those attributes were public and the user defined macro wrappers that customize the attribute.

I'm glad we're building consensus around removing bind(). Right now it's the first rule listed in the documentation, but it's actually the last rule we'd want to use. The same is true for git_repository(), which is also listed first, but has the biggest negative impact on performance for the project in question, and all dependent projects. Many users make use of these rules without considering the alternatives, like grabbing the snapshot tarball from GitHub. Whatever we can do to help the user make the correct choices, especially if we make the wrong choices impossible, is going to go a long way to fostering a healthy and lightning fast build ecosystem spanning many GitHub projects that all reference each other.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Oct 20, 2016

Contributor

The order of the rules are a bit random, I don't think they matter.

Anyway, IMO:

  1. we should advertise bind as deprecated and point user to alias or use
    the repository name instead,
  2. we should starts advertising maven_jar and git_repository as
    deprecated and point the user to http_* rules (for source distribution) and
    the skylark implementations (for those who git / maven support is better).
  3. We should publish a best practice document about the good use of
    external repositories. A blog post maybe?

On Wed, Oct 19, 2016 at 8:36 PM Justine Tunney notifications@github.com
wrote:

I reviewed the code to rules_web https://github.com/bazelbuild/rules_web
yesterday, which was recently introduced and makes extensive use of bind().
It uses bind() to allow the user to override executable attributes on its
Skylark rules
https://github.com/bazelbuild/rules_web/blob/master/web/internal/web_test_config.bzl#L63
without having to repeat them every time the rule is used. I had a
discussion with @DrMarcII https://github.com/DrMarcII about this. We
both came to the conclusion that it would be better if those attributes
were public and the user defined macro wrappers that customize the
attribute.

I'm glad we're building consensus around removing bind(). Right now it's
the first rule listed in the documentation, but it's actually the last rule
we'd want to use. The same is true for git_repository(), which is also
listed first, but has the biggest negative impact on performance for the
project in question, and all dependent projects. Many users make use of
these rules without considering the alternatives, like grabbing the
snapshot tarball from GitHub. Whatever we can do to help the user make the
correct choices, especially if we make the wrong choices impossible, is
going to go a long way to fostering a healthy and lightning fast build
ecosystem spanning many GitHub projects that all reference each other.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHfzgrJ6tRZ8DwEjXfpJr2Yu-g31dhks5q1mMugaJpZM4KYF52
.

Contributor

damienmg commented Oct 20, 2016

The order of the rules are a bit random, I don't think they matter.

Anyway, IMO:

  1. we should advertise bind as deprecated and point user to alias or use
    the repository name instead,
  2. we should starts advertising maven_jar and git_repository as
    deprecated and point the user to http_* rules (for source distribution) and
    the skylark implementations (for those who git / maven support is better).
  3. We should publish a best practice document about the good use of
    external repositories. A blog post maybe?

On Wed, Oct 19, 2016 at 8:36 PM Justine Tunney notifications@github.com
wrote:

I reviewed the code to rules_web https://github.com/bazelbuild/rules_web
yesterday, which was recently introduced and makes extensive use of bind().
It uses bind() to allow the user to override executable attributes on its
Skylark rules
https://github.com/bazelbuild/rules_web/blob/master/web/internal/web_test_config.bzl#L63
without having to repeat them every time the rule is used. I had a
discussion with @DrMarcII https://github.com/DrMarcII about this. We
both came to the conclusion that it would be better if those attributes
were public and the user defined macro wrappers that customize the
attribute.

I'm glad we're building consensus around removing bind(). Right now it's
the first rule listed in the documentation, but it's actually the last rule
we'd want to use. The same is true for git_repository(), which is also
listed first, but has the biggest negative impact on performance for the
project in question, and all dependent projects. Many users make use of
these rules without considering the alternatives, like grabbing the
snapshot tarball from GitHub. Whatever we can do to help the user make the
correct choices, especially if we make the wrong choices impossible, is
going to go a long way to fostering a healthy and lightning fast build
ecosystem spanning many GitHub projects that all reference each other.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHfzgrJ6tRZ8DwEjXfpJr2Yu-g31dhks5q1mMugaJpZM4KYF52
.

@steven-johnson

This comment has been minimized.

Show comment
Hide comment
@steven-johnson

steven-johnson Oct 25, 2016

Contributor

On Thu, Oct 20, 2016 at 12:54 AM, Damien Martin-Guillerez <
notifications@github.com> wrote:

  1. We should publish a best practice document about the good use of
    external repositories.

+1

A blog post maybe?

Sure, but please also mirror it into the official docs somehow; people
arriving later may never see the blog post.

Contributor

steven-johnson commented Oct 25, 2016

On Thu, Oct 20, 2016 at 12:54 AM, Damien Martin-Guillerez <
notifications@github.com> wrote:

  1. We should publish a best practice document about the good use of
    external repositories.

+1

A blog post maybe?

Sure, but please also mirror it into the official docs somehow; people
arriving later may never see the blog post.

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz Nov 1, 2016

Member

@damienmg why are maven_jar and git_repository being deprecated?
additionally the whole "new" prefix is a bit misleading IMHO since in this doc I got the impression that both local_repository and http_archive have a different use-case than the new_local_repository and new_http_archive and not just replacing them.

Member

ittaiz commented Nov 1, 2016

@damienmg why are maven_jar and git_repository being deprecated?
additionally the whole "new" prefix is a bit misleading IMHO since in this doc I got the impression that both local_repository and http_archive have a different use-case than the new_local_repository and new_http_archive and not just replacing them.

@damienmg

This comment has been minimized.

Show comment
Hide comment
@damienmg

damienmg Nov 2, 2016

Contributor

They are being replaced with skylark implementation from
@bazel_tools//tools/build_defs/repo:git.bzl and
@bazel_tools//tools/build_defs/repo:maven_rules.bzl

new_* is indeed not the replacement version

On Tue, Nov 1, 2016 at 11:02 AM Ittai Zeidman notifications@github.com
wrote:

@damienmg https://github.com/damienmg why are maven_jar and
git_repository being deprecated?
additionally the whole "new" prefix is a bit misleading IMHO since in this
doc
https://www.bazel.io/versions/master/docs/external.html#depending-on-non-bazel-projects
I got the impression that both local_repository and http_archive have a
different use-case than the new_local_repository and new_http_archive and
not just replacing them.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf54T56FF5IWlWzFgjgi52ekMGonhks5q5w4ogaJpZM4KYF52
.

Contributor

damienmg commented Nov 2, 2016

They are being replaced with skylark implementation from
@bazel_tools//tools/build_defs/repo:git.bzl and
@bazel_tools//tools/build_defs/repo:maven_rules.bzl

new_* is indeed not the replacement version

On Tue, Nov 1, 2016 at 11:02 AM Ittai Zeidman notifications@github.com
wrote:

@damienmg https://github.com/damienmg why are maven_jar and
git_repository being deprecated?
additionally the whole "new" prefix is a bit misleading IMHO since in this
doc
https://www.bazel.io/versions/master/docs/external.html#depending-on-non-bazel-projects
I got the impression that both local_repository and http_archive have a
different use-case than the new_local_repository and new_http_archive and
not just replacing them.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#1952 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADjHf54T56FF5IWlWzFgjgi52ekMGonhks5q5w4ogaJpZM4KYF52
.

@aj-michael

This comment has been minimized.

Show comment
Hide comment
@aj-michael

aj-michael Dec 13, 2016

Contributor

I don't see it mentioned is this thread, so I thought I should mention, the AndroidSdkRepositoryFunction makes use of bind so that android_binary can depend on the android_sdk generated by AndroidSdkRepositoryFunction without needing to know the name of the android_sdk_repository rule.

Contributor

aj-michael commented Dec 13, 2016

I don't see it mentioned is this thread, so I thought I should mention, the AndroidSdkRepositoryFunction makes use of bind so that android_binary can depend on the android_sdk generated by AndroidSdkRepositoryFunction without needing to know the name of the android_sdk_repository rule.

@kchodorow

This comment has been minimized.

Show comment
Hide comment
@kchodorow

kchodorow Dec 13, 2016

Contributor

Which I think is a mistake, as I think I'm mentioned before. The android_sdk rule still depends on the external name, which is no better that depending on the repository name.

Contributor

kchodorow commented Dec 13, 2016

Which I think is a mistake, as I think I'm mentioned before. The android_sdk rule still depends on the external name, which is no better that depending on the repository name.

@aj-michael

This comment has been minimized.

Show comment
Hide comment
@aj-michael

aj-michael Dec 13, 2016

Contributor

Sorry, what do you mean by "the android_sdk rule still depends on the external name"? Do you mean that the name of the android_sdk rule is the name of the android_sdk_repository rule? Or that the "bound" name refers to the external name?

Contributor

aj-michael commented Dec 13, 2016

Sorry, what do you mean by "the android_sdk rule still depends on the external name"? Do you mean that the name of the android_sdk rule is the name of the android_sdk_repository rule? Or that the "bound" name refers to the external name?

@kchodorow

This comment has been minimized.

Show comment
Hide comment
@kchodorow

kchodorow Dec 13, 2016

Contributor

Whoops, I mistyped, I meant the android_binary rule. The android_binary rule depends on something like //external:android_sdk, it would be better for it to depend on @android_sdk//jar or something.

Contributor

kchodorow commented Dec 13, 2016

Whoops, I mistyped, I meant the android_binary rule. The android_binary rule depends on something like //external:android_sdk, it would be better for it to depend on @android_sdk//jar or something.

@aj-michael

This comment has been minimized.

Show comment
Hide comment
@aj-michael

aj-michael Dec 13, 2016

Contributor

Hmmm, why would that be better? that would require that every developer name their android_sdk_repository "android_sdk". The current advantage of using //external:android/sdk is that android_binary still works no matter what you name your android_sdk_repository.

I agree with you that android_binary depending on an external bind is not ideal. Maybe this is not the right place to discuss this, but could we remove the name attribute of android_sdk_repository? If we could hardcode the name of android_sdk_repository to something, then we could stop using bind in Android land.

Contributor

aj-michael commented Dec 13, 2016

Hmmm, why would that be better? that would require that every developer name their android_sdk_repository "android_sdk". The current advantage of using //external:android/sdk is that android_binary still works no matter what you name your android_sdk_repository.

I agree with you that android_binary depending on an external bind is not ideal. Maybe this is not the right place to discuss this, but could we remove the name attribute of android_sdk_repository? If we could hardcode the name of android_sdk_repository to something, then we could stop using bind in Android land.

@kchodorow

This comment has been minimized.

Show comment
Hide comment
@kchodorow

kchodorow Dec 13, 2016

Contributor

You are already requiring every developer to declare an android_sdk_repository and bind it to a certain name: this eliminates one of those steps.

Also, you could make it a macro and set the name in the macro, e.g.,

def my_android_sdk():
  native.android_sdk(
    name = 'android_sdk',
    ...
  )
Contributor

kchodorow commented Dec 13, 2016

You are already requiring every developer to declare an android_sdk_repository and bind it to a certain name: this eliminates one of those steps.

Also, you could make it a macro and set the name in the macro, e.g.,

def my_android_sdk():
  native.android_sdk(
    name = 'android_sdk',
    ...
  )
@aj-michael

This comment has been minimized.

Show comment
Hide comment
@aj-michael

aj-michael Dec 13, 2016

Contributor

I don't think we are. The developer does not have to use a bind() in their WORKSPACE. We create the bind under the hood:

.

E.g., the following works:

$ cat WORKSPACE
android_sdk_repository(
    name = "foobar",
    path = "/home/ajmichael/Sdk",
    api_level = 25,
    build_tools_version = "25.0.0",
)
$ cat BUILD
android_binary(
    name = "app",
    manifest = "AndroidManifest.xml",
    custom_package = "com.example"
    srcs = glob(["**.java"]),
)
$ bazel build //:app

I guess if we remove the user-visible bind() function but keep the underlying functionality for native repository rules, android would be fine.

I'm going to open another issue to address your comment about the macro, because I like that idea a lot. 😄

Contributor

aj-michael commented Dec 13, 2016

I don't think we are. The developer does not have to use a bind() in their WORKSPACE. We create the bind under the hood:

.

E.g., the following works:

$ cat WORKSPACE
android_sdk_repository(
    name = "foobar",
    path = "/home/ajmichael/Sdk",
    api_level = 25,
    build_tools_version = "25.0.0",
)
$ cat BUILD
android_binary(
    name = "app",
    manifest = "AndroidManifest.xml",
    custom_package = "com.example"
    srcs = glob(["**.java"]),
)
$ bazel build //:app

I guess if we remove the user-visible bind() function but keep the underlying functionality for native repository rules, android would be fine.

I'm going to open another issue to address your comment about the macro, because I like that idea a lot. 😄

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Jan 12, 2017

Member

I'm -1 on removing or deprecating bind. Suppose you have two external bazel repository dependencies, A, B. Each of which needs dependency foo. One of them expects it at @foo_label_in_a and one expects it in @foo_label_in_b. Then I have two equivalent dependencies that look different to bazel. There is no way (nor even standard convention as far as I can see) on what to name dependencies as a function of their semantic identity.

I think the right way to go here is that A expects a binding to say a_bind_of_foo and B expects b_bind_of_foo then when I depend on both, I set up the bindings appropriately.

I wish the bazel core team had someone whose role was explicitly to advocate for the many external repo use case (hopefully many of them bazel). That is the use case of the vast numbers of people using most existing tools (and migration to monorepo as a precondition to use bazel is not so workable, especially in open source: what would that even mean?_

Member

johnynek commented Jan 12, 2017

I'm -1 on removing or deprecating bind. Suppose you have two external bazel repository dependencies, A, B. Each of which needs dependency foo. One of them expects it at @foo_label_in_a and one expects it in @foo_label_in_b. Then I have two equivalent dependencies that look different to bazel. There is no way (nor even standard convention as far as I can see) on what to name dependencies as a function of their semantic identity.

I think the right way to go here is that A expects a binding to say a_bind_of_foo and B expects b_bind_of_foo then when I depend on both, I set up the bindings appropriately.

I wish the bazel core team had someone whose role was explicitly to advocate for the many external repo use case (hopefully many of them bazel). That is the use case of the vast numbers of people using most existing tools (and migration to monorepo as a precondition to use bazel is not so workable, especially in open source: what would that even mean?_

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Jan 12, 2017

Member

All the projects I maintain use the following algorithm for standardizing the naming convention of Maven repositories: https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55

If two projects can't agree on how to name a repository, then wouldn't alias() be a viable workaround? Introducing bind() would mean you now have two problems: two projects can't agree on two parallel names for the same dependency.

Member

jart commented Jan 12, 2017

All the projects I maintain use the following algorithm for standardizing the naming convention of Maven repositories: https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55

If two projects can't agree on how to name a repository, then wouldn't alias() be a viable workaround? Introducing bind() would mean you now have two problems: two projects can't agree on two parallel names for the same dependency.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Jan 14, 2017

Member

@jart I agree that when you are dealing with your own code, using one format is fairly straight forward. The problem is that sharing with folks that don't agree. For instance: bazel just uses guava at a certain path. If I want to use a bazel target, bazel targets are already expecting //third_party:guava as the target. How do I alias //third_party:guava to my //3rdparty/jvm/com/google/guava? Can you show how? It looks like I can point my target to bazel, but since bazel is an external project, I can't repoint the versions it is on, so I either have to accept that version or have two versions on the classpath.

This problem becomes more pronounced when composing larger numbers of repositories.

If instead, bazel expected //external/jvm/com/google/guava to be a binding of the currently in play jar, and then you bind that to your local //thirdparty/guava in your workspace, then in my workspace, I could bind whatever version of guava I am using to the same location, and the bazel targets would see those and use them.

My actual use case is for scala thrift generation: scrooge (https://github.com/twitter/scrooge). The rules have a run-time dependency on some version, but that dependency is super weak and almost any version will do in practice. The user should be able to configure the version of scrooge they need, the bazel plugin should not set the version. I don't see a clean way currently to do this without bind. Am I missing something?

Member

johnynek commented Jan 14, 2017

@jart I agree that when you are dealing with your own code, using one format is fairly straight forward. The problem is that sharing with folks that don't agree. For instance: bazel just uses guava at a certain path. If I want to use a bazel target, bazel targets are already expecting //third_party:guava as the target. How do I alias //third_party:guava to my //3rdparty/jvm/com/google/guava? Can you show how? It looks like I can point my target to bazel, but since bazel is an external project, I can't repoint the versions it is on, so I either have to accept that version or have two versions on the classpath.

This problem becomes more pronounced when composing larger numbers of repositories.

If instead, bazel expected //external/jvm/com/google/guava to be a binding of the currently in play jar, and then you bind that to your local //thirdparty/guava in your workspace, then in my workspace, I could bind whatever version of guava I am using to the same location, and the bazel targets would see those and use them.

My actual use case is for scala thrift generation: scrooge (https://github.com/twitter/scrooge). The rules have a run-time dependency on some version, but that dependency is super weak and almost any version will do in practice. The user should be able to configure the version of scrooge they need, the bazel plugin should not set the version. I don't see a clean way currently to do this without bind. Am I missing something?

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Jan 14, 2017

Member

The only way to have dependencies be composable across Bazel repositories that reference each other is to write files that look like Nomulus' repositories.bzl file. Then other projects can depend on Nomulus using the same technique Nomulus' WORKSPACE uses to depend on Closure Rules. You'll notice that Closure Rules' repositories.bzl has an overlapping set of dependencies with Nomulus. Hence all that omit_foo boilerplate.

I've been planning to write a piece for the Bazel blog explaining this best practice, as well as convert more projects to it, but haven't had the time. It's also quite verbose. But I guarantee you that when builds get written that way, they'll be faster and more reliable than anything else. But maybe someday we can add a feature to Bazel that makes this best practice require fewer lines.

As for twitter/scrooge, that project appears to be using Pants. Are you planning to rewrite their build in Bazel?

Member

jart commented Jan 14, 2017

The only way to have dependencies be composable across Bazel repositories that reference each other is to write files that look like Nomulus' repositories.bzl file. Then other projects can depend on Nomulus using the same technique Nomulus' WORKSPACE uses to depend on Closure Rules. You'll notice that Closure Rules' repositories.bzl has an overlapping set of dependencies with Nomulus. Hence all that omit_foo boilerplate.

I've been planning to write a piece for the Bazel blog explaining this best practice, as well as convert more projects to it, but haven't had the time. It's also quite verbose. But I guarantee you that when builds get written that way, they'll be faster and more reliable than anything else. But maybe someday we can add a feature to Bazel that makes this best practice require fewer lines.

As for twitter/scrooge, that project appears to be using Pants. Are you planning to rewrite their build in Bazel?

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Jan 14, 2017

Member

The scala rules (which I help maintain) have scrooge support:
https://github.com/bazelbuild/rules_scala/blob/master/twitter_scrooge/twitter_scrooge.bzl#L9

notice we have a number of versions set. That is not where they should be set. The consumer of the plugin should set the version, not the plugin itself ideally.

I don't think the only way to have composable dependencies to lock them to one central repositories.bzl. I think you could have something like this:

the scala rules could say, I need //external:io_bazel_rules_scala_scrooge_jar etc... I list the symbolic dependencies that consumers of the rule need to set up. Then in the callers, they can bind that name their local choice of what jar they need.

It is very challenging to satisfy all the dependency requirements in a large repo to begin with. By having external bazel plugins force more constraints it could become almost impossible.

As for using external maven deps with the same naming convention, at Stripe we are using a tool I wrote:
https://github.com/johnynek/bazel-deps

It is working great. You declare a list of dependencies and it generates a lock-file of all the shas and the maven coordinates you need. So, all our local repos can work together. The friction point comes when we have external rules and deps we don't control.

To me, bind is perfectly suited to this use case. I don't see why it should be removed since it seems like it will actually work, but nothing else is there to replace it except the suggestion to centralize how dependencies are done, which I don't think is realistic in the OSS world.

Member

johnynek commented Jan 14, 2017

The scala rules (which I help maintain) have scrooge support:
https://github.com/bazelbuild/rules_scala/blob/master/twitter_scrooge/twitter_scrooge.bzl#L9

notice we have a number of versions set. That is not where they should be set. The consumer of the plugin should set the version, not the plugin itself ideally.

I don't think the only way to have composable dependencies to lock them to one central repositories.bzl. I think you could have something like this:

the scala rules could say, I need //external:io_bazel_rules_scala_scrooge_jar etc... I list the symbolic dependencies that consumers of the rule need to set up. Then in the callers, they can bind that name their local choice of what jar they need.

It is very challenging to satisfy all the dependency requirements in a large repo to begin with. By having external bazel plugins force more constraints it could become almost impossible.

As for using external maven deps with the same naming convention, at Stripe we are using a tool I wrote:
https://github.com/johnynek/bazel-deps

It is working great. You declare a list of dependencies and it generates a lock-file of all the shas and the maven coordinates you need. So, all our local repos can work together. The friction point comes when we have external rules and deps we don't control.

To me, bind is perfectly suited to this use case. I don't see why it should be removed since it seems like it will actually work, but nothing else is there to replace it except the suggestion to centralize how dependencies are done, which I don't think is realistic in the OSS world.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Jan 14, 2017

Member

That repositories.bzl design is not centralized or locked in. Please take another look at the files I linked in my previous comment.

Assume I want my Bazel project to depend on Nomulus. I can do this with a whitelist or blacklist model.

  • Default: Say nomulus_repositories() and trust the Nomulus team to manage versions for me.
  • Blacklist: Say nomulus_repositories(omit_com_google_guava=True) and then redefine it with whatever version or BUILD file I want.
  • Whitelist: Say com_google_guava() (after loading that function from repositories.bzl) and then don't build any targets in Nomulus that require the other stuff.

If I want to use Guava in my own build rules, I just say deps = ["@com_google_guava"] and it links Guava along with its transitive dependencies. It's a global name. Every java_library in the world that wants to depend on Guava should do so using that exact same name, because the goal is to have a one version policy across repositories.

There's no need for delegate build files like this.

There's no need for bind().

Member

jart commented Jan 14, 2017

That repositories.bzl design is not centralized or locked in. Please take another look at the files I linked in my previous comment.

Assume I want my Bazel project to depend on Nomulus. I can do this with a whitelist or blacklist model.

  • Default: Say nomulus_repositories() and trust the Nomulus team to manage versions for me.
  • Blacklist: Say nomulus_repositories(omit_com_google_guava=True) and then redefine it with whatever version or BUILD file I want.
  • Whitelist: Say com_google_guava() (after loading that function from repositories.bzl) and then don't build any targets in Nomulus that require the other stuff.

If I want to use Guava in my own build rules, I just say deps = ["@com_google_guava"] and it links Guava along with its transitive dependencies. It's a global name. Every java_library in the world that wants to depend on Guava should do so using that exact same name, because the goal is to have a one version policy across repositories.

There's no need for delegate build files like this.

There's no need for bind().

@pcj

This comment has been minimized.

Show comment
Hide comment
@pcj

pcj Jan 14, 2017

Member

The system I've been using employs a require function in conjunction with a list of deps encoded as dict objects (https://github.com/pubref/rules_maven/blob/master/maven/internal/require.bzl).

The require function tests for the existence of the rule via native.existing_rule. If it is registered, assert that the requested version matches the pre-existing value.

This way the rule can be called by multiple repositories without having to say omit_*, but the versions must still match. Otherwise, it one can omit it completely it with an exclude clause, or override specific fields by virtue of the dict + operator (https://bazel.build/versions/master/docs/skylark/lib/dict.html).

Whether this implementation is the optimal one is an open question, but it would be nice to have a similar blessed function within the bazel repo itself for others to use.

I don't have a strong opinion on bind, but I do think @ekuefler's setup of rules_gwt is a good example of bind done well.

The non-standarized approach to workspace names and dependencies needs clarity though, and will slow bazel adoption.

Member

pcj commented Jan 14, 2017

The system I've been using employs a require function in conjunction with a list of deps encoded as dict objects (https://github.com/pubref/rules_maven/blob/master/maven/internal/require.bzl).

The require function tests for the existence of the rule via native.existing_rule. If it is registered, assert that the requested version matches the pre-existing value.

This way the rule can be called by multiple repositories without having to say omit_*, but the versions must still match. Otherwise, it one can omit it completely it with an exclude clause, or override specific fields by virtue of the dict + operator (https://bazel.build/versions/master/docs/skylark/lib/dict.html).

Whether this implementation is the optimal one is an open question, but it would be nice to have a similar blessed function within the bazel repo itself for others to use.

I don't have a strong opinion on bind, but I do think @ekuefler's setup of rules_gwt is a good example of bind done well.

The non-standarized approach to workspace names and dependencies needs clarity though, and will slow bazel adoption.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Jan 16, 2017

Member

@jart Perhaps I am not being precise enough. When I say centralized I mean this:

If you want to use Nomulus, then you accept their naming scheme for maven artifacts. It is not clear exactly that that is. Can you point to any documentation on that? That is the centralization I speak of: if we all centrally agree (and hopefully validate with tooling) on some convention this will work. By contrast, bind allows us to retain a distributed view on the bazel target id -> maven coordinate mapping.

What exactly are the issues?

  1. The main one is that nomulus seems somewhat ad-hoc (as I imagine many systems will have ad-hoc rule violations). For instance, in nomulus the google guava project is: "@com_google_guava" yet the maven coordinate is: "com.google.guava:guava" Is there some rule about dropping repeated strings on the end? There are several cases where the naming is not some simple find and replace of the maven coordinate (for instance "com.google.api-client:google-api-client"). I am sorry, I didn't take time to catalog the entire list of these but many of them have some slight deviation from the maven coordinate naming). If you have two such repos you depend on that behave as nomulus does, what is the best practice? Just wire the jars in multiple times (duplicate external repos, once for each name). bind allows you to work around since each repo does not assume its maven names are valid global identifiers.

  2. There is no standard, safe mapping from maven coordinate to bazel repository name. What people sometimes do is map any special character (like - or :) to _, but this has the potential to create collisions in the namespace. For instance "com.foo.bar:baz" is mapped to the same bazel name as "com.foo:bar-baz". This may rarely be an issue, but it can be. This should not be good enough to meet bazel's high build correctness standards. bind can help you work around since you can then drop the assumption of a global identifier.

  3. Repository naming in general, beyond just speaking of maven repos, is a challenge. It has been proposed to name WORKSPACES with the jvm style: com_google_foo_bar etc. Suppose from that same project you wish to publish a jar "com.google:foo-bar". Now we have created a collision between the maven coordinate and the bazel repo. URLs attempt to solve this problem with namespacing and ports. We have not seen anything like this for bazel that I know of. One can imagine some standard prefixes: bazel_ for a bazel repo, maven_jar_ for a jar with a name derived from a maven coordinate, etc... It may be that the restricted set of strings bazel uses for external repo names should be expanded somewhat.

As @pcj comments, the approach here: https://github.com/bazelbuild/rules_gwt/blob/master/gwt/gwt.bzl#L425 sidesteps these issues.

Lastly, I want to make the analogy to filesystems. Correct me if I am wrong, but bind is basically like a symbolic link on a filesystem. Strictly speaking, of course, symbolic links are not required. You can copy files or everyone can agree on canonical locations. But I would argue that symbolic links give a lot of flexibility to the design space at a fairly minimal cost. In the absence of any evidence that we can get naming globally correct (we certainly have not seen it), I really don't understand the motivation to remove this tool (bind) for flexibility.

Maybe after we have solved the global naming issues, bind will truly be unneeded. For instance, I have never needed anything personally like this in the maven world, but in maven the naming problem is pretty much solved (modulo some corner cases about non-uniqueness of class -> artifact).

Member

johnynek commented Jan 16, 2017

@jart Perhaps I am not being precise enough. When I say centralized I mean this:

If you want to use Nomulus, then you accept their naming scheme for maven artifacts. It is not clear exactly that that is. Can you point to any documentation on that? That is the centralization I speak of: if we all centrally agree (and hopefully validate with tooling) on some convention this will work. By contrast, bind allows us to retain a distributed view on the bazel target id -> maven coordinate mapping.

What exactly are the issues?

  1. The main one is that nomulus seems somewhat ad-hoc (as I imagine many systems will have ad-hoc rule violations). For instance, in nomulus the google guava project is: "@com_google_guava" yet the maven coordinate is: "com.google.guava:guava" Is there some rule about dropping repeated strings on the end? There are several cases where the naming is not some simple find and replace of the maven coordinate (for instance "com.google.api-client:google-api-client"). I am sorry, I didn't take time to catalog the entire list of these but many of them have some slight deviation from the maven coordinate naming). If you have two such repos you depend on that behave as nomulus does, what is the best practice? Just wire the jars in multiple times (duplicate external repos, once for each name). bind allows you to work around since each repo does not assume its maven names are valid global identifiers.

  2. There is no standard, safe mapping from maven coordinate to bazel repository name. What people sometimes do is map any special character (like - or :) to _, but this has the potential to create collisions in the namespace. For instance "com.foo.bar:baz" is mapped to the same bazel name as "com.foo:bar-baz". This may rarely be an issue, but it can be. This should not be good enough to meet bazel's high build correctness standards. bind can help you work around since you can then drop the assumption of a global identifier.

  3. Repository naming in general, beyond just speaking of maven repos, is a challenge. It has been proposed to name WORKSPACES with the jvm style: com_google_foo_bar etc. Suppose from that same project you wish to publish a jar "com.google:foo-bar". Now we have created a collision between the maven coordinate and the bazel repo. URLs attempt to solve this problem with namespacing and ports. We have not seen anything like this for bazel that I know of. One can imagine some standard prefixes: bazel_ for a bazel repo, maven_jar_ for a jar with a name derived from a maven coordinate, etc... It may be that the restricted set of strings bazel uses for external repo names should be expanded somewhat.

As @pcj comments, the approach here: https://github.com/bazelbuild/rules_gwt/blob/master/gwt/gwt.bzl#L425 sidesteps these issues.

Lastly, I want to make the analogy to filesystems. Correct me if I am wrong, but bind is basically like a symbolic link on a filesystem. Strictly speaking, of course, symbolic links are not required. You can copy files or everyone can agree on canonical locations. But I would argue that symbolic links give a lot of flexibility to the design space at a fairly minimal cost. In the absence of any evidence that we can get naming globally correct (we certainly have not seen it), I really don't understand the motivation to remove this tool (bind) for flexibility.

Maybe after we have solved the global naming issues, bind will truly be unneeded. For instance, I have never needed anything personally like this in the maven world, but in maven the naming problem is pretty much solved (modulo some corner cases about non-uniqueness of class -> artifact).

@ekuefler

This comment has been minimized.

Show comment
Hide comment
@ekuefler

ekuefler Jan 16, 2017

Contributor

I'm not particularly happy with the use of bind in the GWT repo. It works but is very all-or-nothing: everything is easy if you're fine with the defaults and can call gwt_repositories(), but if you want to change the version of GWT or any of its dependencies you're out of luck and need to manage every transitive dependency yourself via bind. I believe this also means that, by default using gwt_repositories(), you'll have a duplicate copy of any of those dependencies you happen to use yourself in your own project, which must be downloaded separately and might lead to exciting classpath issues (though maybe Bazel does some caching and de-duping to mitigate this).

I like the end result of what @jart is illustrating with Nomulus. The blacklist/whitelist approach seems like an improvement over the bind strategy in the GWT rules, though there are potentially some naming issues. The central issue seems to be how to reliably map maven coordinates to Bazel repo names, and how to fit in artifacts that aren't in maven. Maybe it's a matter of establishing strong conventions or maybe tooling that takes maven artifacts should generate names somehow to enforce consistency. And the amount of boilerplate required to implement the Nomulus strategy seems prohibitive; it would be cool if Bazel provided some dedicated features for this.

Using bind to manage the dependencies of a shared library-like project is one thing, but how bind is used in internal projects that won't be used by others is another (probably less important) issue. The strategy at my company is to disallow the use of @repo-style deps anywhere in application code. Any external artifacts that are to be used by code must be referenced via the external namespace as defined by a bind rule, so //external:guava is bound either to the guava jar if it has no dependencies or to a dummy java_library that exports guava and declares its dependency as runtime deps.

I actually somewhat dislike this since it means we have a very large WORKSPACE and top-level BUILD file containing all external dependencies and their transitive dep information, which people need to touch whenever they're modifying external dependencies. Were I to do this again I would probably make a third_party directory with a subdirectory per dependency containing only a BUILD file defining that dependency's transitive deps to better modularize things. So the result would look the same to the application except they'd refer to //third_party:guava instead of //external:guava. So overall I think there are better options for all of my current (fairly extensive) usages of bind, and I'd be alright with deprecating it.

Contributor

ekuefler commented Jan 16, 2017

I'm not particularly happy with the use of bind in the GWT repo. It works but is very all-or-nothing: everything is easy if you're fine with the defaults and can call gwt_repositories(), but if you want to change the version of GWT or any of its dependencies you're out of luck and need to manage every transitive dependency yourself via bind. I believe this also means that, by default using gwt_repositories(), you'll have a duplicate copy of any of those dependencies you happen to use yourself in your own project, which must be downloaded separately and might lead to exciting classpath issues (though maybe Bazel does some caching and de-duping to mitigate this).

I like the end result of what @jart is illustrating with Nomulus. The blacklist/whitelist approach seems like an improvement over the bind strategy in the GWT rules, though there are potentially some naming issues. The central issue seems to be how to reliably map maven coordinates to Bazel repo names, and how to fit in artifacts that aren't in maven. Maybe it's a matter of establishing strong conventions or maybe tooling that takes maven artifacts should generate names somehow to enforce consistency. And the amount of boilerplate required to implement the Nomulus strategy seems prohibitive; it would be cool if Bazel provided some dedicated features for this.

Using bind to manage the dependencies of a shared library-like project is one thing, but how bind is used in internal projects that won't be used by others is another (probably less important) issue. The strategy at my company is to disallow the use of @repo-style deps anywhere in application code. Any external artifacts that are to be used by code must be referenced via the external namespace as defined by a bind rule, so //external:guava is bound either to the guava jar if it has no dependencies or to a dummy java_library that exports guava and declares its dependency as runtime deps.

I actually somewhat dislike this since it means we have a very large WORKSPACE and top-level BUILD file containing all external dependencies and their transitive dep information, which people need to touch whenever they're modifying external dependencies. Were I to do this again I would probably make a third_party directory with a subdirectory per dependency containing only a BUILD file defining that dependency's transitive deps to better modularize things. So the result would look the same to the application except they'd refer to //third_party:guava instead of //external:guava. So overall I think there are better options for all of my current (fairly extensive) usages of bind, and I'd be alright with deprecating it.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Jan 16, 2017

Member

Yes, I agree that bind is not elegant, or beautiful, but it seems workable.

@ekuefler we also don't use @foo names in the BUILD (outside of 3rdparty directly, which have targets that set up the correct runtime and compile time classpaths that people need to depend on external targets. So, then they do: //3rdparty/jvm/com/google/guava:guava for instance. The path is the maven group, and the target in the build is the maven id.

To describe our use case at Stripe: we have many maven dependencies (several hundred now) with complex interdependencies (you can't use hadoop deps without a giant web of interdependencies of apache projects). We also have a DAG of bazel repos that depend on each other, and want to share maven dependency names. Internally, we can use the same names for things with tooling, but we don't use the same as nomulus, for instance (since we just naively compute the name from the maven name to try to minimize collisions).

It really feels to me that we need more work on the external repo story. It is also related to publishing, since publishing targets is kind of the dual problem: creating external dependencies.

Member

johnynek commented Jan 16, 2017

Yes, I agree that bind is not elegant, or beautiful, but it seems workable.

@ekuefler we also don't use @foo names in the BUILD (outside of 3rdparty directly, which have targets that set up the correct runtime and compile time classpaths that people need to depend on external targets. So, then they do: //3rdparty/jvm/com/google/guava:guava for instance. The path is the maven group, and the target in the build is the maven id.

To describe our use case at Stripe: we have many maven dependencies (several hundred now) with complex interdependencies (you can't use hadoop deps without a giant web of interdependencies of apache projects). We also have a DAG of bazel repos that depend on each other, and want to share maven dependency names. Internally, we can use the same names for things with tooling, but we don't use the same as nomulus, for instance (since we just naively compute the name from the maven name to try to minimize collisions).

It really feels to me that we need more work on the external repo story. It is also related to publishing, since publishing targets is kind of the dual problem: creating external dependencies.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Jan 16, 2017

Member

@johnynek Nomulus, Closure Rules, etc. have adopted the following naming algorithm for Maven artifacts: https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55

var CLEANSE_CHARS_ = new RegExp('[^_0-9A-Za-z]', 'g');

/**
 * Turns Maven group and artifact into Bazel repository name.
 *
 * <p>This algorithm works by turning illegal characters into underscores and
 * then eliminating redundancy. For example:
 *
 * <ul>
 * <li>com.google.guava:guava becomes com_google_guava
 * <li>commons-logging:commons-logging becomes commons_logging
 * <li>junit:junit becomes junit
 * </ul>
 *
 * @param {string} group Maven group ID.
 * @param {string} artifact Maven artifact ID.
 * @return {string} Recommended name for Bazel external repository.
 */
function getName(group, artifact) {
  var left = group.replace(CLEANSE_CHARS_, '_');
  var right = artifact.replace(CLEANSE_CHARS_, '_');
  var p = -1;
  while (p < right.length) {
    p = right.indexOf('_', p + 1);
    if (p == -1) {
      p = right.length;
    }
    var chunk = right.slice(0, p);
    if (left == chunk) {
      return right;
    }
    chunk = '_' + chunk;
    if (left.slice(-chunk.length) == chunk) {
      left = left.slice(0, -chunk.length);
      break;
    }
  }
  return left + '_' + right;
}

You are correct that it is possible for two different Maven artifacts to end up with the same name when run through this algorithm. You have a keen eye for spotting edge cases. I believe this is an acceptable tradeoff in order to make the names look nice. I feel that the solution is human review. These names would be created by a tool that generates a Bazel config by crawling the Maven repository. Then the developer would tune things accordingly.

I've created a website that will generate these configs automatically. It looks like this: http://i.imgur.com/MuIzgcG.png Right now I'm going through the red tape required to launch it. This tool is going to make life so much better for Bazel Java users. You're going to be so happy.

You'll notice that the generated code uses java_import_external rather than maven_jar. That repository rule is defined here: https://gist.github.com/jart/70bdc88e662a5078a7d8682e5411ae8c This rule will be contributed to the Bazel codebase soon. But you can start using it today if you copy that code into your codebase. The reason why you should consider using this rule is because it allows the WORKSPACE file to define the dependency relationships. That way you don't need delegate BUILD files. Check out how much code Nomulus was able to delete thanks to java_import_external: google/nomulus@734130a

As for multiple Bazel projects agreeing on naming, I suspect this is mostly a diplomacy problem.

My goal right now is to build consensus in the Bazel community. In order to eventually build a consensus, I've been putting so much work into improving Bazel core so we can have an incredible way to define external dependencies. I modified Bazel so @foo//:foo can be written as @foo in 61affe7. I added exponential backoff retries to downloads in 7f8e045. I added a feature for redundant failover URLs in ed7ced0.

The reason why I want to build consensus around not using bind() is because Google designed Blaze to have O(n) dependencies. (Cf. NPM where dependencies are O(n^2).) Google accomplishes this through one version policy and company-wide cooperation in a single monolithic repository. It's challenging for us to all cooperate in this shared environment, where upgrading a third party dependency means potentially breaking the builds of hundreds of teams. But it saves a lot of collective effort at the end of the day. I'm not sure how much of this model will be able to carry over into the Bazel world. But I'm trying the best I can to make that the case, because I feel like this is how Blaze (and therefore Bazel too) was designed to be.

Member

jart commented Jan 16, 2017

@johnynek Nomulus, Closure Rules, etc. have adopted the following naming algorithm for Maven artifacts: https://gist.github.com/jart/41bfd977b913c2301627162f1c038e55

var CLEANSE_CHARS_ = new RegExp('[^_0-9A-Za-z]', 'g');

/**
 * Turns Maven group and artifact into Bazel repository name.
 *
 * <p>This algorithm works by turning illegal characters into underscores and
 * then eliminating redundancy. For example:
 *
 * <ul>
 * <li>com.google.guava:guava becomes com_google_guava
 * <li>commons-logging:commons-logging becomes commons_logging
 * <li>junit:junit becomes junit
 * </ul>
 *
 * @param {string} group Maven group ID.
 * @param {string} artifact Maven artifact ID.
 * @return {string} Recommended name for Bazel external repository.
 */
function getName(group, artifact) {
  var left = group.replace(CLEANSE_CHARS_, '_');
  var right = artifact.replace(CLEANSE_CHARS_, '_');
  var p = -1;
  while (p < right.length) {
    p = right.indexOf('_', p + 1);
    if (p == -1) {
      p = right.length;
    }
    var chunk = right.slice(0, p);
    if (left == chunk) {
      return right;
    }
    chunk = '_' + chunk;
    if (left.slice(-chunk.length) == chunk) {
      left = left.slice(0, -chunk.length);
      break;
    }
  }
  return left + '_' + right;
}

You are correct that it is possible for two different Maven artifacts to end up with the same name when run through this algorithm. You have a keen eye for spotting edge cases. I believe this is an acceptable tradeoff in order to make the names look nice. I feel that the solution is human review. These names would be created by a tool that generates a Bazel config by crawling the Maven repository. Then the developer would tune things accordingly.

I've created a website that will generate these configs automatically. It looks like this: http://i.imgur.com/MuIzgcG.png Right now I'm going through the red tape required to launch it. This tool is going to make life so much better for Bazel Java users. You're going to be so happy.

You'll notice that the generated code uses java_import_external rather than maven_jar. That repository rule is defined here: https://gist.github.com/jart/70bdc88e662a5078a7d8682e5411ae8c This rule will be contributed to the Bazel codebase soon. But you can start using it today if you copy that code into your codebase. The reason why you should consider using this rule is because it allows the WORKSPACE file to define the dependency relationships. That way you don't need delegate BUILD files. Check out how much code Nomulus was able to delete thanks to java_import_external: google/nomulus@734130a

As for multiple Bazel projects agreeing on naming, I suspect this is mostly a diplomacy problem.

My goal right now is to build consensus in the Bazel community. In order to eventually build a consensus, I've been putting so much work into improving Bazel core so we can have an incredible way to define external dependencies. I modified Bazel so @foo//:foo can be written as @foo in 61affe7. I added exponential backoff retries to downloads in 7f8e045. I added a feature for redundant failover URLs in ed7ced0.

The reason why I want to build consensus around not using bind() is because Google designed Blaze to have O(n) dependencies. (Cf. NPM where dependencies are O(n^2).) Google accomplishes this through one version policy and company-wide cooperation in a single monolithic repository. It's challenging for us to all cooperate in this shared environment, where upgrading a third party dependency means potentially breaking the builds of hundreds of teams. But it saves a lot of collective effort at the end of the day. I'm not sure how much of this model will be able to carry over into the Bazel world. But I'm trying the best I can to make that the case, because I feel like this is how Blaze (and therefore Bazel too) was designed to be.

@abergmeier-dsfishlabs

This comment has been minimized.

Show comment
Hide comment
@abergmeier-dsfishlabs

abergmeier-dsfishlabs Jan 30, 2017

Contributor

I think all bind uses-cases can be replaced with alias but it does not strike me as high priority. bind() has that weird thing that it creates a //external package that does not really exists, so only for that I would say +1

Trying to use alias in WORKSPACE file with latest master...

alias cannot be in the WORKSPACE file
Contributor

abergmeier-dsfishlabs commented Jan 30, 2017

I think all bind uses-cases can be replaced with alias but it does not strike me as high priority. bind() has that weird thing that it creates a //external package that does not really exists, so only for that I would say +1

Trying to use alias in WORKSPACE file with latest master...

alias cannot be in the WORKSPACE file
@kchodorow

This comment has been minimized.

Show comment
Hide comment
@kchodorow

kchodorow Jan 30, 2017

Contributor

You can't use it in the WORKSPACE file, but you could create an alias in a BUILD file that referred to an external target.

Contributor

kchodorow commented Jan 30, 2017

You can't use it in the WORKSPACE file, but you could create an alias in a BUILD file that referred to an external target.

@alexeagle

This comment has been minimized.

Show comment
Hide comment
@alexeagle

alexeagle Apr 26, 2017

FYI I just used bind() today because the docs made it look shiny. +1 for updating the docs now to avoid more and more usages.

FYI I just used bind() today because the docs made it look shiny. +1 for updating the docs now to avoid more and more usages.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Apr 27, 2017

Member

Today I had a use case for bind:

In the scala rules the binary version of scalatest is configured with bind (the rules expect the target of the jar for scalatest to be at a particular bind location). I was able to set that to scalatest 3.0.1 and build the code without others having to take that change.

Note, 3.0.1 has different transitive dependencies, so I could bind to a java_library that has exports, and everything worked great.

I still don't see a super clear way to do this otherwise without also having me pass in a bunch of runtime and compile time deps into some function that becomes a repo-rule that each repo using the scala rules sets up.

Member

johnynek commented Apr 27, 2017

Today I had a use case for bind:

In the scala rules the binary version of scalatest is configured with bind (the rules expect the target of the jar for scalatest to be at a particular bind location). I was able to set that to scalatest 3.0.1 and build the code without others having to take that change.

Note, 3.0.1 has different transitive dependencies, so I could bind to a java_library that has exports, and everything worked great.

I still don't see a super clear way to do this otherwise without also having me pass in a bunch of runtime and compile time deps into some function that becomes a repo-rule that each repo using the scala rules sets up.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Apr 28, 2017

Member

I was able to set that to scalatest 3.0.1 and build the code without others having to take that change. Note, 3.0.1 has different transitive dependencies

@johnynek If rules_scala was using java_import_external and the omit_foo pattern, then could what you described have been accomplished as follows?

http_archive(
    name = "io_bazel_rules_scala",
    urls = [
        "http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_scala/archive/d916599d38de29085e5ca9eae167716c4f150a02.tar.gz",
        "https://github.com/bazelbuild/rules_scala/archive/d916599d38de29085e5ca9eae167716c4f150a02.tar.gz",
    ],
    sha256 = "391cae2055c9e3bebdb2a6ce06157408e4831b1846043c48c648c79380b4de66",
    strip_prefix = "rules_scala-d916599d38de29085e5ca9eae167716c4f150a02",
)

load("@io_bazel_rules_closure//closure/private:java_import_external.bzl", "java_import_external")
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories")

# upstream rules_scala depends on scalatest v2.2.6
# we want to swap it with scalatest v3.0.1
scala_repositories(
    omit_org_scalatest_2_11 = True,
)

java_import_external(
    name = "org_scalatest_2_11",
    licenses = ["notice"],  # the Apache License, ASL Version 2.0
    jar_sha256 = "3788679b5c8762997b819989e5ec12847df3fa8dcb9d4a787c63188bd953ae2a",
    jar_urls = [
        "http://maven.ibiblio.org/maven2/org/scalatest/scalatest_2.11/3.0.1/scalatest_2.11-3.0.1.jar",
        "http://repo1.maven.org/maven2/org/scalatest/scalatest_2.11/3.0.1/scalatest_2.11-3.0.1.jar",
    ],
    deps = [
        "@org_scala_lang_scala_compiler",
        "@org_scala_lang_scala_library",
        "@org_scalactic_2_11", # Not a dependency of v2.2.6
        "@org_scala_lang_scala_reflect",
        "@org_scala_lang_modules_scala_xml_2_11",
        "@org_scala_lang_modules_scala_parser_combinators_2_11", # Not a dependency of v2.2.6
    ],
)

# upstream rules_scala does not define this transitive dependency
java_import_external(
    name = "org_scalactic_2_11",
    licenses = ["notice"],  # the Apache License, ASL Version 2.0
    jar_sha256 = "d5586d4aa060aebbf0ccb85be62208ca85ccc8c4220a342c22783adb04b1ded1",
    jar_urls = [
        "http://repo1.maven.org/maven2/org/scalactic/scalactic_2.11/3.0.1/scalactic_2.11-3.0.1.jar",
        "http://maven.ibiblio.org/maven2/org/scalactic/scalactic_2.11/3.0.1/scalactic_2.11-3.0.1.jar",
    ],
    deps = [
        "@org_scala_lang_scala_compiler",
        "@org_scala_lang_scala_library",
        "@org_scala_lang_scala_reflect",
    ],
)

# upstream rules_scala does not define this transitive dependency
java_import_external(
    name = "org_scala_lang_modules_scala_parser_combinators_2_11",
    licenses = ["notice"],  # BSD 3-clause
    jar_sha256 = "0dfaafce29a9a245b0a9180ec2c1073d2bd8f0330f03a9f1f6a74d1bc83f62d6",
    jar_urls = [
        "http://repo1.maven.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.4/scala-parser-combinators_2.11-1.0.4.jar",
        "http://maven.ibiblio.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.4/scala-parser-combinators_2.11-1.0.4.jar",
    ],
    deps = ["@org_scala_lang_scala_library"],
)
Member

jart commented Apr 28, 2017

I was able to set that to scalatest 3.0.1 and build the code without others having to take that change. Note, 3.0.1 has different transitive dependencies

@johnynek If rules_scala was using java_import_external and the omit_foo pattern, then could what you described have been accomplished as follows?

http_archive(
    name = "io_bazel_rules_scala",
    urls = [
        "http://bazel-mirror.storage.googleapis.com/github.com/bazelbuild/rules_scala/archive/d916599d38de29085e5ca9eae167716c4f150a02.tar.gz",
        "https://github.com/bazelbuild/rules_scala/archive/d916599d38de29085e5ca9eae167716c4f150a02.tar.gz",
    ],
    sha256 = "391cae2055c9e3bebdb2a6ce06157408e4831b1846043c48c648c79380b4de66",
    strip_prefix = "rules_scala-d916599d38de29085e5ca9eae167716c4f150a02",
)

load("@io_bazel_rules_closure//closure/private:java_import_external.bzl", "java_import_external")
load("@io_bazel_rules_scala//scala:scala.bzl", "scala_repositories")

# upstream rules_scala depends on scalatest v2.2.6
# we want to swap it with scalatest v3.0.1
scala_repositories(
    omit_org_scalatest_2_11 = True,
)

java_import_external(
    name = "org_scalatest_2_11",
    licenses = ["notice"],  # the Apache License, ASL Version 2.0
    jar_sha256 = "3788679b5c8762997b819989e5ec12847df3fa8dcb9d4a787c63188bd953ae2a",
    jar_urls = [
        "http://maven.ibiblio.org/maven2/org/scalatest/scalatest_2.11/3.0.1/scalatest_2.11-3.0.1.jar",
        "http://repo1.maven.org/maven2/org/scalatest/scalatest_2.11/3.0.1/scalatest_2.11-3.0.1.jar",
    ],
    deps = [
        "@org_scala_lang_scala_compiler",
        "@org_scala_lang_scala_library",
        "@org_scalactic_2_11", # Not a dependency of v2.2.6
        "@org_scala_lang_scala_reflect",
        "@org_scala_lang_modules_scala_xml_2_11",
        "@org_scala_lang_modules_scala_parser_combinators_2_11", # Not a dependency of v2.2.6
    ],
)

# upstream rules_scala does not define this transitive dependency
java_import_external(
    name = "org_scalactic_2_11",
    licenses = ["notice"],  # the Apache License, ASL Version 2.0
    jar_sha256 = "d5586d4aa060aebbf0ccb85be62208ca85ccc8c4220a342c22783adb04b1ded1",
    jar_urls = [
        "http://repo1.maven.org/maven2/org/scalactic/scalactic_2.11/3.0.1/scalactic_2.11-3.0.1.jar",
        "http://maven.ibiblio.org/maven2/org/scalactic/scalactic_2.11/3.0.1/scalactic_2.11-3.0.1.jar",
    ],
    deps = [
        "@org_scala_lang_scala_compiler",
        "@org_scala_lang_scala_library",
        "@org_scala_lang_scala_reflect",
    ],
)

# upstream rules_scala does not define this transitive dependency
java_import_external(
    name = "org_scala_lang_modules_scala_parser_combinators_2_11",
    licenses = ["notice"],  # BSD 3-clause
    jar_sha256 = "0dfaafce29a9a245b0a9180ec2c1073d2bd8f0330f03a9f1f6a74d1bc83f62d6",
    jar_urls = [
        "http://repo1.maven.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.4/scala-parser-combinators_2.11-1.0.4.jar",
        "http://maven.ibiblio.org/maven2/org/scala-lang/modules/scala-parser-combinators_2.11/1.0.4/scala-parser-combinators_2.11-1.0.4.jar",
    ],
    deps = ["@org_scala_lang_scala_library"],
)
@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Apr 28, 2017

Member

It could have been indeed @jart, but currently I solved it this way:

# use the locally set scalatest
bind(name = 'io_bazel_rules_scala/dependency/scalatest/scalatest', actual = '//3rdparty/jvm/org/scalatest')

Since we have huge and complex dependency graphs of external code, we have to have tooling already to handle resolving them.

If you remove bind, we have to significantly retool around this (or stick on old versions of bazel until we find cycles to migrate).

Member

johnynek commented Apr 28, 2017

It could have been indeed @jart, but currently I solved it this way:

# use the locally set scalatest
bind(name = 'io_bazel_rules_scala/dependency/scalatest/scalatest', actual = '//3rdparty/jvm/org/scalatest')

Since we have huge and complex dependency graphs of external code, we have to have tooling already to handle resolving them.

If you remove bind, we have to significantly retool around this (or stick on old versions of bazel until we find cycles to migrate).

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Apr 28, 2017

Member

PS: if this abbreviated maven coordinate approach (removing redundancies in group and artifact) is going to be pushed, has anyone pulled a list of artifacts from maven central to see how many collisions there would be?

I would much rather an encoding from maven coordinate to bazel repo name that is lossless, even is that means using some encoding or considering adding characters to the allowed repository names.

Member

johnynek commented Apr 28, 2017

PS: if this abbreviated maven coordinate approach (removing redundancies in group and artifact) is going to be pushed, has anyone pulled a list of artifacts from maven central to see how many collisions there would be?

I would much rather an encoding from maven coordinate to bazel repo name that is lossless, even is that means using some encoding or considering adding characters to the allowed repository names.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Apr 28, 2017

Member

If //3rdparty/jvm/org/scalatest exists within this repository, and it also depend on rules_scala, then that would mean multiple versions of the same Scala jar exist within that repository.

Once again, I would advise caution. Developers at Google need to be granted an exception to one version policy (described earlier) in order to do that. We're supposed to have a single label for any given library, which must be a single version. Our lawyers keep a close eye on our //third_party folder to make sure we're doing exactly that (among other things.)

I share this information firstly because we want other companies be successful with Bazel. I believe the best way to do is by plainly stating what Google did, and didn't do, internally with Bazel. Secondly, as evidence that we successfully built a repository where this use case wasn't encountered.

PS: if this abbreviated maven coordinate approach (removing redundancies in group and artifact) is going to be pushed, has anyone pulled a list of artifacts from maven central to see how many collisions there would be?

We're currently doing other meta-analysis of Maven, as part of Operation Rosehub, so that's something we can look into. Thank you for the excellent suggestion.

Member

jart commented Apr 28, 2017

If //3rdparty/jvm/org/scalatest exists within this repository, and it also depend on rules_scala, then that would mean multiple versions of the same Scala jar exist within that repository.

Once again, I would advise caution. Developers at Google need to be granted an exception to one version policy (described earlier) in order to do that. We're supposed to have a single label for any given library, which must be a single version. Our lawyers keep a close eye on our //third_party folder to make sure we're doing exactly that (among other things.)

I share this information firstly because we want other companies be successful with Bazel. I believe the best way to do is by plainly stating what Google did, and didn't do, internally with Bazel. Secondly, as evidence that we successfully built a repository where this use case wasn't encountered.

PS: if this abbreviated maven coordinate approach (removing redundancies in group and artifact) is going to be pushed, has anyone pulled a list of artifacts from maven central to see how many collisions there would be?

We're currently doing other meta-analysis of Maven, as part of Operation Rosehub, so that's something we can look into. Thank you for the excellent suggestion.

@johnynek

This comment has been minimized.

Show comment
Hide comment
@johnynek

johnynek Apr 28, 2017

Member

We have one version of the jar. What I didn't show was that we could remove some special casing in the tool that does the transitive dependency walk to force it to use the scalatest jar that came from the rules. Removing those exceptions is another nice thing we get but that is a side benefit.

So in addition to the above solution adding 1 line of code, it also deleted code that looked like this:

replacements:
  org.scalatest:
    scalatest:
      lang: java
      target: "@scalatest//jar"

which is a feature of the tool we have discussed earlier in the thread: https://github.com/johnynek/bazel-deps so you can replace a maven dependency with another target (maybe a local bazel build if you have it, or another name).

Member

johnynek commented Apr 28, 2017

We have one version of the jar. What I didn't show was that we could remove some special casing in the tool that does the transitive dependency walk to force it to use the scalatest jar that came from the rules. Removing those exceptions is another nice thing we get but that is a side benefit.

So in addition to the above solution adding 1 line of code, it also deleted code that looked like this:

replacements:
  org.scalatest:
    scalatest:
      lang: java
      target: "@scalatest//jar"

which is a feature of the tool we have discussed earlier in the thread: https://github.com/johnynek/bazel-deps so you can replace a maven dependency with another target (maybe a local bazel build if you have it, or another name).

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz Apr 28, 2017

Member
Member

ittaiz commented Apr 28, 2017

simonhorlick added a commit to simonhorlick/grpc-java that referenced this issue Apr 28, 2017

Add Bazel java_grpc_library rule
Bazel third party dependencies are specified in repositories.bzl which
gives the consumer the ability to opt-out of any dependencies they use
directly in their own project. Also note that reposititories.bzl
uses java_import_external from rules_closure instead of Bazel's built in
maven_jar. This allows us to specify multiple urls for each jar,
reducing the likelyhood of build failures due to repositories becoming
unavailable and also define dependencies between libraries.

References:

bazelbuild/bazel#1952

simonhorlick added a commit to simonhorlick/grpc-java that referenced this issue Apr 28, 2017

Add Bazel java_grpc_library rule
Bazel third party dependencies are specified in repositories.bzl which
gives the consumer the ability to opt-out of any dependencies they use
directly in their own project. Also note that reposititories.bzl
uses java_import_external from rules_closure instead of Bazel's built in
maven_jar. This allows us to specify multiple urls for each jar,
reducing the likelyhood of build failures due to repositories becoming
unavailable and also define dependencies between libraries.

References:

bazelbuild/bazel#1952

simonhorlick added a commit to simonhorlick/grpc-java that referenced this issue May 4, 2017

Add Bazel java_grpc_library rule
Bazel third party dependencies are specified in repositories.bzl which
gives the consumer the ability to opt-out of any dependencies they use
directly in their own project. Also note that reposititories.bzl
uses java_import_external from rules_closure instead of Bazel's built in
maven_jar. This allows us to specify multiple urls for each jar,
reducing the likelyhood of build failures due to repositories becoming
unavailable and also define dependencies between libraries.

References:

bazelbuild/bazel#1952

simonhorlick added a commit to simonhorlick/grpc-java that referenced this issue May 4, 2017

build: Add Bazel java_grpc_library rule
Bazel third party dependencies are specified in repositories.bzl which
gives the consumer the ability to opt-out of any dependencies they use
directly in their own project. Also note that reposititories.bzl
uses java_import_external from rules_closure instead of Bazel's built in
maven_jar. This allows us to specify multiple urls for each jar,
reducing the likelyhood of build failures due to repositories becoming
unavailable and also define dependencies between libraries.

Fixes #2756

References:

bazelbuild/bazel#1952

bazel-io pushed a commit that referenced this issue Aug 25, 2017

Add a warning to the bind() docs
This will encourage new users to avoid it, and would have helped me. At
the start, I cargo-culted other projects that use bind() heavily without
knowing why.

Related: #1952

Closes #3608.

PiperOrigin-RevId: 166464352

@johnynek johnynek referenced this issue in johnynek/bazel-deps Sep 30, 2017

Open

Consider removing the use of bind() #72

@ronshapiro

This comment has been minimized.

Show comment
Hide comment
@ronshapiro

ronshapiro Oct 17, 2017

Contributor

I'm also +1 on lossless maven -> bazel coordinates. While it would be nice, I think it's hard to establish consistency. I'd rather have things more explicit and then create aliases within the actual project (which is what Dagger does)

Contributor

ronshapiro commented Oct 17, 2017

I'm also +1 on lossless maven -> bazel coordinates. While it would be nice, I think it's hard to establish consistency. I'd rather have things more explicit and then create aliases within the actual project (which is what Dagger does)

@liujisi

This comment has been minimized.

Show comment
Hide comment
@liujisi

liujisi Jan 27, 2018

I was pointed here when reviewing the PR google/protobuf#4204

Looks like the repo based dependency will attach the dependency to a specific version and deploy that dependency edge together with the library. How do we solve the diamond dependency problem when two dependency edges on the same library don't agree on the versions?

For instance, both libfoo and libbar depend on different versions of libprotobuf, then if an application depend on both libfoo and libbar, it can no longer compile due to the symbol conflicts/ ODR violation. The bind() approach gives the end user control which library/version should be depended on. Solving the problem by checking if a library is loaded or not in protobuf_dependencies() , IMO, is no better than bind(). The actual version will be determined by the load order. This brings side-effects when refactoring BUILD rules, which should be avoided. If the best practice is for users to always load dependency libraries manually first, then it's essentially the same as bind(), while bind() being more explicit to prevent errors.

I feel like the dependency problem here is much harder than in google. bind() is actually one way to enforce the one version rule in opensource.

Also, there needs to be a migration path. Even if we are moving away from bind, we cannot just cut a release that breaks everyone's build. Especially this case, as it requires an atomic change on the client code to adopt the new library.

liujisi commented Jan 27, 2018

I was pointed here when reviewing the PR google/protobuf#4204

Looks like the repo based dependency will attach the dependency to a specific version and deploy that dependency edge together with the library. How do we solve the diamond dependency problem when two dependency edges on the same library don't agree on the versions?

For instance, both libfoo and libbar depend on different versions of libprotobuf, then if an application depend on both libfoo and libbar, it can no longer compile due to the symbol conflicts/ ODR violation. The bind() approach gives the end user control which library/version should be depended on. Solving the problem by checking if a library is loaded or not in protobuf_dependencies() , IMO, is no better than bind(). The actual version will be determined by the load order. This brings side-effects when refactoring BUILD rules, which should be avoided. If the best practice is for users to always load dependency libraries manually first, then it's essentially the same as bind(), while bind() being more explicit to prevent errors.

I feel like the dependency problem here is much harder than in google. bind() is actually one way to enforce the one version rule in opensource.

Also, there needs to be a migration path. Even if we are moving away from bind, we cannot just cut a release that breaks everyone's build. Especially this case, as it requires an atomic change on the client code to adopt the new library.

@jart

This comment has been minimized.

Show comment
Hide comment
@jart

jart Jan 27, 2018

Member

Note: I noticed not too long ago a use-case for bind(). gRPC uses it for this "unsecure" target that lets you hook in your own C++ authentication function. That's a good reason to use bind() IMHO although I feel like there's a better way Bazel could solve that particular problem.

Gentle remember that dependencies are exceedingly difficult, but the work is worth doing. Motivational reading: https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 Anyone who looks to bind() for easy answers is likely to only dig that hole deeper.

Member

jart commented Jan 27, 2018

Note: I noticed not too long ago a use-case for bind(). gRPC uses it for this "unsecure" target that lets you hook in your own C++ authentication function. That's a good reason to use bind() IMHO although I feel like there's a better way Bazel could solve that particular problem.

Gentle remember that dependencies are exceedingly difficult, but the work is worth doing. Motivational reading: https://hackernoon.com/im-harvesting-credit-card-numbers-and-passwords-from-your-site-here-s-how-9a8cb347c5b5 Anyone who looks to bind() for easy answers is likely to only dig that hole deeper.

@ittaiz

This comment has been minimized.

Show comment
Hide comment
@ittaiz

ittaiz Jan 27, 2018

Member
Member

ittaiz commented Jan 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment