Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-9897 Change dependency checking mechanism to use gradle checksum verification #108

Closed
wants to merge 2 commits into from

Conversation

gautamworah96
Copy link
Contributor

Remove custom checksum generation logic from our gradle build.
We can now use gradle to both generate checksums and verify our dependencies to
check that they match that checksum.

Additionally, gradle generates checksums for all artifacts (not just JARs).
These checksums are stored in verification-metadata.xml

Description

I've followed the JIRA recommended method of solving the issue by using gradle based metadata verification.

This solution uses SHA256 over the previously used SHA1 used in the repo. We also generate checksums of all transitive artifacts now (pom files etc). It is possible to disable this feature if needed.

Solution

We add a basic file first and then use ./gradlew --write-verification-metadata sha256 help to regenerate the checksum file. I've also updated the instructions with ways to regenerate the checksum file once we add a new dependency.

The build will now automatically fail if you add a dependency (that is imported by gradle) and is not added to the metadata file.
The error log prints a helpful error stating that the verification-metdata.xml file has to be regenerated.

I could not find a way to run this command automatically everytime we run ./gradlew assemble (this would've removed the need for a developer to manually run a command and update checksum metadata).

Tests

Run ./gradlew assemble. Modified the SHA256 checksums and runnning ./gradlew assemble fails.
Verified that using SHA1 produced the same checksums as the one in the previous checksum files (that were in the repo and this PR removes)

However,

./gradlew check fails with an error for the spotlessInternalRegisterDependencies task:

> Task :spotlessInternalRegisterDependencies FAILED
You probably need to add a repository containing the '[com.google.googlejavaformat:google-java-format:1.9]' artifact in the 'build.gradle' of your root project.
E.g.: 'buildscript { repositories { mavenCentral() }}'
org.gradle.api.InvalidUserDataException: Dependency verification failed for configuration ':detachedConfiguration1'
15 artifacts failed verification:
  - animal-sniffer-annotations-1.18.jar (org.codehaus.mojo:animal-sniffer-annotations:1.18) from repository Gradle Central Plugin Repository
  - animal-sniffer-annotations-1.18.pom (org.codehaus.mojo:animal-sniffer-annotations:1.18) from repository Gradle Central Plugin Repository
  - animal-sniffer-parent-1.18.pom (org.codehaus.mojo:animal-sniffer-parent:1.18) from repository Gradle Central Plugin Repository
  - checker-qual-2.8.1.jar (org.checkerframework:checker-qual:2.8.1) from repository Gradle Central Plugin Repository
  - checker-qual-2.8.1.pom (org.checkerframework:checker-qual:2.8.1) from repository Gradle Central Plugin Repository
  - error_prone_annotations-2.3.2.jar (com.google.errorprone:error_prone_annotations:2.3.2) from repository Gradle Central Plugin Repository
  - error_prone_annotations-2.3.2.pom (com.google.errorprone:error_prone_annotations:2.3.2) from repository Gradle Central Plugin Repository
  - error_prone_parent-2.3.2.pom (com.google.errorprone:error_prone_parent:2.3.2) from repository Gradle Central Plugin Repository
  - google-java-format-1.9.jar (com.google.googlejavaformat:google-java-format:1.9) from repository Gradle Central Plugin Repository
  - google-java-format-1.9.pom (com.google.googlejavaformat:google-java-format:1.9) from repository Gradle Central Plugin Repository
  - google-java-format-parent-1.9.pom (com.google.googlejavaformat:google-java-format-parent:1.9) from repository Gradle Central Plugin Repository
  - guava-28.1-jre.jar (com.google.guava:guava:28.1-jre) from repository Gradle Central Plugin Repository
  - guava-28.1-jre.pom (com.google.guava:guava:28.1-jre) from repository Gradle Central Plugin Repository
  - guava-parent-28.1-jre.pom (com.google.guava:guava-parent:28.1-jre) from repository Gradle Central Plugin Repository
  - mojo-parent-50.pom (org.codehaus.mojo:mojo-parent:50) from repository Gradle Central Plugin Repository
If the artifacts are trustworthy, you will need to update the gradle/verification-metadata.xml file by following the instructions at https://docs.gradle.org/6.8.3/userguide/dependency_verification.html#sec:troubleshooting-verification

These dependencies should've ideally been added automatically by gradle.
I was hoping that someone with better understanding of the Lucene gradle system could comment here?

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

Remove custom checksum generation logic from our gradle build.
We can now use gradle to both generate checksums and verify our dependencies to
check that they match that checksum.

Additionally, gradle generates checksums for all artifacts (not just JARs).
These checksums are stored in verification-metadata.xml
@dweiss
Copy link
Contributor

dweiss commented Apr 27, 2021

Thanks, this looks suspiciously simple!... :) I'll be glad to experiment with it a bit.

I'm not a big fan of the monolithic checksum file - the expanded version (per-jar checksum) seems easier.

Checksums should only be generated for a subset of configurations - I don't think it's realistic to assume we can get checksums of everything (detached configurations, etc.).

gradle/validation/jar-checks.gradle Show resolved Hide resolved
def jarInfoTasks = subprojects.collectMany { it.tasks.matching { it.name == "collectJarInfos" } }

// Update dependency checksums.
task updateLicenses() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. I'd leave this task and use:

./gradlew --write-verification-metadata sha256 updateLicenses

I hate to remember these option switches... the task could verify if they're in place in doFirst and maybe with a hint on how to issue the full command properly if they're missing. Or, alternatively, it could be a GradleBuild task that would recursively invoke the same build with the right options...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mistake here was that we were only generating checksums for the dependencies and plugins that help or updateLicenses tasks use. The ./gradlew --write-verification-metadata sha256 check command records all dependencies used by the check task. Fixing this makes ./gradlew check work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. But does it mean that if we run some other task that pulls dependencies outside of check's scope the build will fail then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will. If we run ./gradlew assemble it checks whether dependencies used in assemble are defined in the metadata. If we run ./gradlew check it checks if dependencies of check are defined in the metadata. As far as I know, our builds mainly use these tasks.

Our build will fail if you run ./gradlew <somenewtask> with the <somenewtask> dependencies not added to the metadata file. I'll try this case out and get back here..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! I tested a sample detachedConfiguration

 tasks.register("checkDetachedDependencies") {
    doLast {
      def detachedConf = configurations.detachedConfiguration(dependencies.create("org.aspectj:aspectjrt:1.8.6"))
      println(detachedConf.files)
    }
  }

This task fails if you execute it separately with ./gradlew checkDetachedDependencies giving the error that it needs the checksum of aspectj. However, ./gradlew check passes because check does not depend on checkDetachedDependencies

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it's the opposite of what we want: we'd like those tasks that have detached dependencies to pass instead of fail. It's really hard to predict where such configurations may occur and it's just not possible to list all possible tasks to "--write-verification-metadata"...

The only way I see it as a functional substitute is if it can be passed an explicit set of configurations from which to generate the metadata and still work with any other configurations. Otherwise it's going to be very difficult to work with - I don't see how we can activate all possible configurations when writing checksums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that check was a superset of almost every other configuration. In case, we wanted to add other detached configurations we could disableDependencyVerification() for them.

However, this seems like too much work for something so simple :)
I'll close this PR and wait silently until gradle adds an option for verifying just some select configurations. Or maybe work on adding it :p

Thanks for reviewing anyways!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check is just a convention aggregation task, nothing else. We have tons of other stuff that isn't connected with check in the execution graph - I bet some of these dependencies/ configurations are in in plugins and it'd be difficult to even hook into them to disable automatic dependency verification.

The work on this attempt isn't lost though (thank you!). Let's keep an eye on what happens with gradle's built-in checkums and retry the attempt when it's more flexible.

gradle/verification-metadata.xml Show resolved Hide resolved
@gautamworah96
Copy link
Contributor Author

Thanks, this looks suspiciously simple!... :) I'll be glad to experiment with it a bit.

💯

I'm not a big fan of the monolithic checksum file - the expanded version (per-jar checksum) seems easier.

I actually thought having a single file would be better for editing and understanding dependencies from a single place.
I don't think there is a way to give multiple checksum file inputs to gradle at this moment.

@dweiss
Copy link
Contributor

dweiss commented Apr 27, 2021

It's fine. I kind of prefer filesystem (file name)-based correspondence of checksums to files but I can live with a monolithic file too.

@gautamworah96 gautamworah96 marked this pull request as draft April 29, 2021 02:34
Update the dependencies.txt instructions
@gautamworah96 gautamworah96 marked this pull request as ready for review May 1, 2021 21:22
@gautamworah96 gautamworah96 requested a review from dweiss May 3, 2021 19:48
@gautamworah96
Copy link
Contributor Author

Some closing notes: We found the current gradle checksum verification functionality unsuitable for our use case.
The current functionality records all dependencies (plugins et al) for a given configuration that was used to generate the verification-metadata.xml file. However, it fails builds on other configurations that use other external dependencies.

When could this possibly work? If gradle were to check dependencies only when the task that was used to generate the metadata file is executed, it would allow developers to safely create more configurations (with other dependencies) without the restriction of adding them to the global metadata file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants