Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support pinning artifacts with checksums and downloading artifacts with Bazel #169

Merged
merged 70 commits into from
Jul 4, 2019

Conversation

jin
Copy link
Member

@jin jin commented Jun 17, 2019

This PR adds a new maven_install_json attribute that accepts a label to a JSON file, maven_install.json. This file contains the pinned / locked state of a maven_install fetch.

This feature improves artifact resolution and fetch speed, since it uses Bazel's download mechanism which caches files on their sha256 checksums. It also improves resiliency and security by encoding the sha256 checksums and original artifact urls into the JSON file.

Fixes #125
Fixes #146
Fixes #170

To get started with pinning artifacts, run the following command:

$ bazel run @maven//:pin

This generates a maven_install.json (or your_maven_install_name_install.json) in the root of your Bazel workspace.

Then, specify maven_install_json in maven_install and load pinned_maven_install from @maven//:defs.bzl:

maven_install(
    # artifacts, repositories, ...
    maven_install_json = "//:maven_install.json",
)

load("@maven//:defs.bzl", "pinned_maven_install")
pinned_maven_install()

Whenever you make a change to the list of artifacts or repositories and want to update maven_install.json, run this command to re-pin the unpinned @maven repository:

$ bazel run @unpinned_maven//:pin

By specifying maven_install_json, an additional @unpinned_maven (or unpinned_<your_maven_install_name>) repo will be created. For example, if your maven_install is named @foo, @unpinned_foo will be created.

The contents of maven_install.json will look something like this:

$ cat maven_install.json
{
    "dependency_tree": {
        "conflict_resolution": {},
        "dependencies": [
            {
                "coord": "com.google.code.findbugs:jsr305:3.0.2",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar",
                "sha256": "766ad2a0783f2687962c8ad74ceecc38a28b9f72a2d085ee438b7813e928d0c7",
                "url": "https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.2/jsr305-3.0.2.jar"
            },
            {
                "coord": "com.google.errorprone:error_prone_annotations:2.2.0",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/com/google/errorprone/error_prone_annotations/2.2.0/error_prone_annotations-2.2.0.jar",
                "sha256": "6ebd22ca1b9d8ec06d41de8d64e0596981d9607b42035f9ed374f9de271a481a",
                "url": "https://repo1.maven.org/maven2/com/google/errorprone/error_prone_annotations/2.2.0/error_prone_annotations-2.2.0.jar"
            },
            {
                "coord": "com.google.guava:failureaccess:1.0",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/com/google/guava/failureaccess/1.0/failureaccess-1.0.jar",
                "sha256": "d084bef9cd07a8537a1753e4850a69b7e8bab1d1e22e9f3a1e4826309a7a2336",
                "url": "https://repo1.maven.org/maven2/com/google/guava/failureaccess/1.0/failureaccess-1.0.jar"
            },
            {
                "coord": "com.google.guava:guava:27.0-android",
                "dependencies": [
                    "com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava",
                    "com.google.code.findbugs:jsr305:3.0.2",
                    "com.google.guava:failureaccess:1.0",
                    "org.checkerframework:checker-compat-qual:2.5.2",
                    "org.codehaus.mojo:animal-sniffer-annotations:1.17",
                    "com.google.j2objc:j2objc-annotations:1.1",
                    "com.google.errorprone:error_prone_annotations:2.2.0"
                ],
                "file": "v1/https/repo1.maven.org/maven2/com/google/guava/guava/27.0-android/guava-27.0-android.jar",
                "sha256": "9fcd95a6d27ce9309b68cfb92369192ae13498d5104a3367e6c64e3879215324",
                "url": "https://repo1.maven.org/maven2/com/google/guava/guava/27.0-android/guava-27.0-android.jar"
            },
            {
                "coord": "com.google.guava:listenablefuture:9999.0-empty-to-avoid-conflict-with-guava",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar",
                "sha256": "b372a037d4230aa57fbeffdef30fd6123f9c0c2db85d0aced00c91b974f33f99",
                "url": "https://repo1.maven.org/maven2/com/google/guava/listenablefuture/9999.0-empty-to-avoid-conflict-with-guava/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar"
            },
            {
                "coord": "com.google.j2objc:j2objc-annotations:1.1",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/com/google/j2objc/j2objc-annotations/1.1/j2objc-annotations-1.1.jar",
                "sha256": "2994a7eb78f2710bd3d3bfb639b2c94e219cedac0d4d084d516e78c16dddecf6",
                "url": "https://repo1.maven.org/maven2/com/google/j2objc/j2objc-annotations/1.1/j2objc-annotations-1.1.jar"
            },
            {
                "coord": "org.checkerframework:checker-compat-qual:2.5.2",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/org/checkerframework/checker-compat-qual/2.5.2/checker-compat-qual-2.5.2.jar",
                "sha256": "d7291cebf5e158d169807ae49d4b16ff672986f0c6d803e5f207c40cb61ef982",
                "url": "https://repo1.maven.org/maven2/org/checkerframework/checker-compat-qual/2.5.2/checker-compat-qual-2.5.2.jar"
            },
            {
                "coord": "org.codehaus.mojo:animal-sniffer-annotations:1.17",
                "dependencies": [],
                "file": "v1/https/repo1.maven.org/maven2/org/codehaus/mojo/animal-sniffer-annotations/1.17/animal-sniffer-annotations-1.17.jar",
                "sha256": "92654f493ecfec52082e76354f0ebf87648dc3d5cec2e3c3cdb947c016747a53",
                "url": "https://repo1.maven.org/maven2/org/codehaus/mojo/animal-sniffer-annotations/1.17/animal-sniffer-annotations-1.17.jar"
            }
        ],
        "version": "0.1.0"
    }
}

Since all artifacts are checksummed with sha256, it means that fully offline builds are possible even with bazel clean --expunge after performing one bazel fetch @maven//....

Whenever a change is made to the list of artifacts or repositories, you will need to rerun

$ bazel run @unpinned_maven//:pin

to update the pinned artifacts.

This change adds a dependency on system Python for two things: platform independent sha256 tool from @bazel_tools and pretty printing of maven_install.json.

@jin jin changed the title Support pinning artifacts with checksum and download artifacts with Bazel Support pinning artifacts with checksums, and download artifacts with Bazel Jun 17, 2019
@Globegitter
Copy link

Awesome, this is a really good step into the right direction imo.

Would it be also possible to expose a new .bzl file with independent external repositories as discussed in #146 rather than calling ctx.download?

Also I might try and look at adding the json file capability to coursier resolve as it would be a nice optimisation.

@jin
Copy link
Member Author

jin commented Jun 17, 2019

Would it be also possible to expose a new .bzl file with independent external repositories as discussed in #146 rather than calling ctx.download?

@Globegitter is this use case covered by https://github.com/bazelbuild/rules_jvm_external#repository-aliases already?

README.md Outdated Show resolved Hide resolved
@Globegitter
Copy link

Globegitter commented Jun 17, 2019

@jin that only changes the way one can import deps but because they still rely on the same single external repository so all deps will be downloaded the first time. The use case discussed in #146, of just downloading the deps actually needed for the current build, is just possible by having independent repository rules that call ctx.download for one external dependency each.

@borkaehw
Copy link
Contributor

I have tried it out and it's heading to the direction we hope for, thanks for taking care of these two issues. Few concerns I have so far:

  1. Coursier seems to resolve colons in URLs to encoding form in dep-tree.json. For instance, http://repo.foobar.com:8000 will become v1/http/repo.foobar.com%3A8000, which causes downloading error when it tries to convert it back to URL.
  2. It seems to take twice as long for fetching and resolving. Are both Coursier and Bazel trying to do the downloading?
  3. pinned_maven_install.json is currently generated in external/maven/. Is there a plan to have it generated in (or copied to) main workspace to be checked in?

Thanks.

@jin
Copy link
Member Author

jin commented Jun 17, 2019

@jin that only changes the way one can import deps but because they still rely on the same single external repository so all deps will be downloaded the first time. The use case discussed in #146, of just downloading the deps actually needed for the current build, is just possible by having independent repository rules that call ctx.download for one external dependency each.

@Globegitter This is true. Let me play around with this idea instead of ctx.download.

Coursier seems to resolve colons in URLs to encoding form in dep-tree.json. For instance, http://repo.foobar.com:8000 will become v1/http/repo.foobar.com%3A8000, which causes downloading error when it tries to convert it back to URL.

I suppose we can do a simple string replacement for colons when converting back into URLs.

It seems to take twice as long for fetching and resolving. Are both Coursier and Bazel trying to do the downloading?

Yes it's doing that, it seems that ctx.download downloads eagerly even if the statement is not in an executed branch. Using the http_file approach will (probably) bypass this issue.

pinned_maven_install.json is currently generated in external/maven/. Is there a plan to have it generated in (or copied to) main workspace to be checked in?

This is what $ bazel run @maven//:pin is for. The original idea is to have it copied into the main workspace automatically, but I cannot figure out a reliable way to get the path to the source workspace without passing arguments into $ bazel run @maven//:pin.

@jin
Copy link
Member Author

jin commented Jun 17, 2019

What are your opinions about the API?

Option 1:

maven_install(
     maven_install_json = "//:maven_install.json",
)

Option 2

maven_install(
    # usual stuff
)

pinned_maven_install(
     name = "pinned_maven",
     maven_install_json = "//:maven_install.json",
)

# or just have the above values be default, so we can specify

pinned_maven_install() 

I like the second one more, because it makes explicit that the workspace depend on @pinned_maven targets instead of regular @maven//'s. Generating //:maven_install.json is just a matter of running bazel run @maven//:pin. It also allows users who don't rely on pinning to use regular maven_install and ignore pinned_maven_install altogether. pinned_maven_install can also set its name to be maven to override the maven_install declaration earlier.

The first option feels like it's cramming too much into maven_install.

@jin
Copy link
Member Author

jin commented Jun 18, 2019

@borkaehw @Globegitter I've switched ctx.download to http_file in the latest commit, and hooked up several examples that use this. PTAL, lmk what you think.

README.md Outdated Show resolved Hide resolved
@jin
Copy link
Member Author

jin commented Jun 19, 2019

@borkaehw I made some changes to improve the first-time experience. You don't need to create an empty maven_install.json file now.

@borkaehw
Copy link
Contributor

It works great, thanks.

@borkaehw
Copy link
Contributor

I am interested in to know how far are we from merging this PR?

@jin
Copy link
Member Author

jin commented Jun 19, 2019

@borkaehw Waiting on a review from either @laurentlb or @aehlig.

# load("@maven//:defs.bzl", "pinned_maven_install")
# pinned_maven_install()
http_files = [
"load(\"@bazel_tools//tools/build_defs/repo:http.bzl\", \"http_file\")",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a comment at the very top of this file to remind users that this is a generated file which shouldn't be modified manually?
The message could be like # Do not edit. rules_jvm_external autogenerates this file from maven_install Please run ... command to update this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's JSON, so no comments :-)

I can turn it into a "private" field in the JSON object, through.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I totally forget json has not comment. Thinking of a better way to remind users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing maven_install.json to something more obvious that implies we shouldn't modify it manually? Like maven_autogenerated.json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that still doesn't imply that you shouldn't modify it manually. Do you think users will end up modifying this manually often enough that it becomes a user experience problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of a scenario that someone may think changing the artifacts version is as simple as modifying maven_install.json, but it may break other artifacts which have the dependency on the changed artifact.

Since I am going to introduce rules_jvm_external to the entire organization, there is a chance someone doesn't notice the proper way of using it.

Copy link
Member Author

@jin jin Jun 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can hash the contents of the dependencies list, and add that as a field to the generated object? Then pinned_coursier_maven would compare the hash before generating the http_file targets and see if it was modified manually, and ask the user to not do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like a good idea to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add this in a follow up PR.

Copy link

@aehlig aehlig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but please fix the reconstruction of the URL to not depend on the absence of magic names in the URL itself.

coursier.bzl Show resolved Hide resolved
coursier.bzl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants