Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue where a jetified maven artifact did not also jetify the deps #386

Merged
merged 12 commits into from
Mar 4, 2020

Conversation

justhecuke-zz
Copy link
Contributor

There are two main changes in this PR:

  • we now use directDependencies rather than the full transitive closure of maven dependencies for maven targets
  • dependencies for jetified artifacts are redirected towards their jetified versions

This fixes an issue where a jetified maven artifact would still depend on un-jetified maven artifacts which led to a lot of problems with duplicate classes and resources since having both a jetfied and un-jetified android support library in your classpath is a very bad idea.

@justhecuke-zz
Copy link
Contributor Author

This PR still needs some work because maven_install.json itself should be changed to provide the jetified deps because we need coursier to figure out the dep graph of the jetified dependencies.

The approach here of just changing in ingestion of maven_install.json isn't sufficient since it requires that you either explicitly add the jetified artifacts into your list of maven artifacts to install or that someone else explicitly depend on it in POM which is unreliable.

@jin
Copy link
Member

jin commented Feb 24, 2020

Thanks. I've been wanting to generate the BUILD file using directDependencies instead of dependencies, ever since Coursier added it.

We don't need Coursier to figure out the dep graph of the dependencies. Since we now have the map of coordinate transformations, we can modify this code path (

for artifact in dep_tree["dependencies"]:
# Some artifacts don't contain files; they are just parent artifacts
# to other artifacts.
if artifact["file"] == None:
continue
# Normalize paths in place here.
artifact.update({"file": _normalize_to_unix_path(artifact["file"])})
if repository_ctx.attr.use_unsafe_shared_cache:
artifact.update({"file": _relativize_and_symlink_file(repository_ctx, artifact["file"])})
# Coursier saves the artifacts into a subdirectory structure
# that mirrors the URL where the artifact's fetched from. Using
# this, we can reconstruct the original URL.
primary_url_parts = []
filepath_parts = artifact["file"].split("/")
protocol = None
# Only support http/https transports
for part in filepath_parts:
if part == "http" or part == "https":
protocol = part
break
if protocol == None:
fail("Only artifacts downloaded over http(s) are supported: %s" % artifact["coord"])
primary_url_parts.extend([protocol, "://"])
for part in filepath_parts[filepath_parts.index(protocol) + 1:]:
primary_url_parts.extend([part, "/"])
primary_url_parts.pop() # pop the final "/"
# Coursier encodes:
# - ':' as '%3A'
# - '@' as '%40'
#
# The primary_url is the url from which Coursier downloaded the jar from. It looks like this:
# https://repo1.maven.org/maven2/org/threeten/threetenbp/1.3.3/threetenbp-1.3.3.jar
primary_url = "".join(primary_url_parts).replace("%3A", ":").replace("%40", "@")
artifact.update({"url": primary_url})
# The repository for the primary_url has to be one of the repositories provided through
# maven_install. Since Maven artifact URLs are standardized, we can make the `http_file`
# targets more robust by replicating the primary url for each specified repository url.
#
# It does not matter if the artifact is on a repository or not, since http_file takes
# care of 404s.
#
# If the artifact does exist, Bazel's HttpConnectorMultiplexer enforces the SHA-256 checksum
# is correct. By applying the SHA-256 checksum verification across all the mirrored files,
# we get increased robustness in the case where our primary artifact has been tampered with,
# and we somehow ended up using the tampered checksum. Attackers would need to tamper *all*
# mirrored artifacts.
#
# See https://github.com/bazelbuild/bazel/blob/77497817b011f298b7f3a1138b08ba6a962b24b8/src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpConnectorMultiplexer.java#L103
# for more information on how Bazel's HTTP multiplexing works.
#
# TODO(https://github.com/bazelbuild/rules_jvm_external/issues/186): Make this work with
# basic auth.
repository_urls = [r["repo_url"].rstrip("/") for r in repositories]
primary_artifact_path = infer_artifact_path_from_primary_and_repos(primary_url, repository_urls)
mirror_urls = [url + "/" + primary_artifact_path for url in repository_urls]
artifact.update({"mirror_urls": mirror_urls})
files_to_hash.append(repository_ctx.path(artifact["file"]))
) to replace the Coursier-provided maven_install.json in-place, to reflect the Jetifier changes.

@justhecuke-zz
Copy link
Contributor Author

Thanks. I've been wanting to generate the BUILD file using directDependencies instead of dependencies, ever since Coursier added it.

We don't need Coursier to figure out the dep graph of the dependencies. Since we now have the map of coordinate transformations, we can modify this code path (

for artifact in dep_tree["dependencies"]:
# Some artifacts don't contain files; they are just parent artifacts
# to other artifacts.
if artifact["file"] == None:
continue
# Normalize paths in place here.
artifact.update({"file": _normalize_to_unix_path(artifact["file"])})
if repository_ctx.attr.use_unsafe_shared_cache:
artifact.update({"file": _relativize_and_symlink_file(repository_ctx, artifact["file"])})
# Coursier saves the artifacts into a subdirectory structure
# that mirrors the URL where the artifact's fetched from. Using
# this, we can reconstruct the original URL.
primary_url_parts = []
filepath_parts = artifact["file"].split("/")
protocol = None
# Only support http/https transports
for part in filepath_parts:
if part == "http" or part == "https":
protocol = part
break
if protocol == None:
fail("Only artifacts downloaded over http(s) are supported: %s" % artifact["coord"])
primary_url_parts.extend([protocol, "://"])
for part in filepath_parts[filepath_parts.index(protocol) + 1:]:
primary_url_parts.extend([part, "/"])
primary_url_parts.pop() # pop the final "/"
# Coursier encodes:
# - ':' as '%3A'
# - '@' as '%40'
#
# The primary_url is the url from which Coursier downloaded the jar from. It looks like this:
# https://repo1.maven.org/maven2/org/threeten/threetenbp/1.3.3/threetenbp-1.3.3.jar
primary_url = "".join(primary_url_parts).replace("%3A", ":").replace("%40", "@")
artifact.update({"url": primary_url})
# The repository for the primary_url has to be one of the repositories provided through
# maven_install. Since Maven artifact URLs are standardized, we can make the `http_file`
# targets more robust by replicating the primary url for each specified repository url.
#
# It does not matter if the artifact is on a repository or not, since http_file takes
# care of 404s.
#
# If the artifact does exist, Bazel's HttpConnectorMultiplexer enforces the SHA-256 checksum
# is correct. By applying the SHA-256 checksum verification across all the mirrored files,
# we get increased robustness in the case where our primary artifact has been tampered with,
# and we somehow ended up using the tampered checksum. Attackers would need to tamper *all*
# mirrored artifacts.
#
# See https://github.com/bazelbuild/bazel/blob/77497817b011f298b7f3a1138b08ba6a962b24b8/src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpConnectorMultiplexer.java#L103
# for more information on how Bazel's HTTP multiplexing works.
#
# TODO(https://github.com/bazelbuild/rules_jvm_external/issues/186): Make this work with
# basic auth.
repository_urls = [r["repo_url"].rstrip("/") for r in repositories]
primary_artifact_path = infer_artifact_path_from_primary_and_repos(primary_url, repository_urls)
mirror_urls = [url + "/" + primary_artifact_path for url in repository_urls]
artifact.update({"mirror_urls": mirror_urls})
files_to_hash.append(repository_ctx.path(artifact["file"]))

) to replace the Coursier-provided maven_install.json in-place, to reflect the Jetifier changes.

I'm not too sure this works because we'd also need to add in the entries for androidx deps and transitive deps and version conflict resolution.

@justhecuke-zz
Copy link
Contributor Author

Thanks. I've been wanting to generate the BUILD file using directDependencies instead of dependencies, ever since Coursier added it.
We don't need Coursier to figure out the dep graph of the dependencies. Since we now have the map of coordinate transformations, we can modify this code path (

for artifact in dep_tree["dependencies"]:
# Some artifacts don't contain files; they are just parent artifacts
# to other artifacts.
if artifact["file"] == None:
continue
# Normalize paths in place here.
artifact.update({"file": _normalize_to_unix_path(artifact["file"])})
if repository_ctx.attr.use_unsafe_shared_cache:
artifact.update({"file": _relativize_and_symlink_file(repository_ctx, artifact["file"])})
# Coursier saves the artifacts into a subdirectory structure
# that mirrors the URL where the artifact's fetched from. Using
# this, we can reconstruct the original URL.
primary_url_parts = []
filepath_parts = artifact["file"].split("/")
protocol = None
# Only support http/https transports
for part in filepath_parts:
if part == "http" or part == "https":
protocol = part
break
if protocol == None:
fail("Only artifacts downloaded over http(s) are supported: %s" % artifact["coord"])
primary_url_parts.extend([protocol, "://"])
for part in filepath_parts[filepath_parts.index(protocol) + 1:]:
primary_url_parts.extend([part, "/"])
primary_url_parts.pop() # pop the final "/"
# Coursier encodes:
# - ':' as '%3A'
# - '@' as '%40'
#
# The primary_url is the url from which Coursier downloaded the jar from. It looks like this:
# https://repo1.maven.org/maven2/org/threeten/threetenbp/1.3.3/threetenbp-1.3.3.jar
primary_url = "".join(primary_url_parts).replace("%3A", ":").replace("%40", "@")
artifact.update({"url": primary_url})
# The repository for the primary_url has to be one of the repositories provided through
# maven_install. Since Maven artifact URLs are standardized, we can make the `http_file`
# targets more robust by replicating the primary url for each specified repository url.
#
# It does not matter if the artifact is on a repository or not, since http_file takes
# care of 404s.
#
# If the artifact does exist, Bazel's HttpConnectorMultiplexer enforces the SHA-256 checksum
# is correct. By applying the SHA-256 checksum verification across all the mirrored files,
# we get increased robustness in the case where our primary artifact has been tampered with,
# and we somehow ended up using the tampered checksum. Attackers would need to tamper *all*
# mirrored artifacts.
#
# See https://github.com/bazelbuild/bazel/blob/77497817b011f298b7f3a1138b08ba6a962b24b8/src/main/java/com/google/devtools/build/lib/bazel/repository/downloader/HttpConnectorMultiplexer.java#L103
# for more information on how Bazel's HTTP multiplexing works.
#
# TODO(https://github.com/bazelbuild/rules_jvm_external/issues/186): Make this work with
# basic auth.
repository_urls = [r["repo_url"].rstrip("/") for r in repositories]
primary_artifact_path = infer_artifact_path_from_primary_and_repos(primary_url, repository_urls)
mirror_urls = [url + "/" + primary_artifact_path for url in repository_urls]
artifact.update({"mirror_urls": mirror_urls})
files_to_hash.append(repository_ctx.path(artifact["file"]))

) to replace the Coursier-provided maven_install.json in-place, to reflect the Jetifier changes.

I'm not too sure this works because we'd also need to add in the entries for androidx deps and transitive deps and version conflict resolution.

@jin Would you be OK with a 2-pass coursier fetch where we add an extra fetch to grab the added androidx dependencies and then use this second fetch as the basis for the pinned json?

@justhecuke-zz
Copy link
Contributor Author

@jin I think this is ready for review now. Take a look when you can.

@jin jin self-assigned this Feb 29, 2020
Copy link
Member

@jin jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this.

This took a while to review because of the large refactoring delta, and the code diverged away from a linear-ish read into functions, which I don't personally prefer. It'd be great it if the diff was smaller without the unrelated changes.

Do you think you can submit this change with a smaller delta for the jetifier specific work instead?

private/rules/jetifier_maven_map.bzl Show resolved Hide resolved
private/rules/jetifier_maven_map.bzl Show resolved Hide resolved
specs.bzl Outdated Show resolved Hide resolved
specs.bzl Outdated Show resolved Hide resolved
specs.bzl Outdated Show resolved Hide resolved
specs.bzl Outdated Show resolved Hide resolved
@justhecuke-zz
Copy link
Contributor Author

Thanks for this.

This took a while to review because of the large refactoring delta, and the code diverged away from a linear-ish read into functions, which I don't personally prefer. It'd be great it if the diff was smaller without the unrelated changes.

Do you think you can submit this change with a smaller delta for the jetifier specific work instead?

I wanted to avoid copy-pasting the big chunk of code to fetch using coursier twice so I was hoping to keep that one function (make_coursier_dep_tree).

I can certainly inline the other functions I used.

@jin
Copy link
Member

jin commented Mar 3, 2020

Sounds good. A 2-pass fetch seems fine to me, as long as it's clear to end users that there's a cost to this. Perhaps note it in the documentation or surface in the repository_ctx.report_progress line?

@justhecuke-zz
Copy link
Contributor Author

@jin I think I've addressed your comments and you can take another look.

Copy link
Member

@jin jin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is much easier to review.

@jin jin merged commit ce92f42 into bazelbuild:master Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants