New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RAT check using Gradle #1157
Conversation
Cleaned a bunch of stuff up, included your review feedback. Thanks, @dweiss! One piece that I'm still struggling with is that |
Almost there! Precommit currently fails with some license failures, I'll need to look at that deeper what exclusions we're actually using in the ant build, but I think we're super close now. |
I'll take a look later, Mike. As for applying tasks and anything else -- think of the project structure as a graph. You attach things to this graph in two passes (evaluation, configuration), followed by execution of tasks attached to this graph (in topological order of dependencies). It is conceptually simple. The devil hides in details of how gradle is evaluated, deferred evaluated-collections, etc. This should be helpful: https://docs.gradle.org/current/userguide/build_lifecycle.html I'll review the patch and maybe correct it before committing; when you take a look at the commit vs. your patch you'll see the differences made - I think it'll be easier and faster than explaining (but go ahead and ask if you don't understand something). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good indeed. I didn't have a chance to actually run it - only checked the commit diff. If you can make it pass (and then verify it's failing on invalid sources) that'd certainly help!
gradle/validation/rat-sources.gradle
Outdated
} | ||
|
||
subprojects { | ||
plugins.withId("java", { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These rat checks seem to apply not just to java projects but to other files (non-java files) as well. If so then the task should be applicable to any project, not just those with the java plugin?
I think we'd need to cross-check with what the ant script does (whether it runs those checks on non-java folders).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a quirk of Gradle where the Java plugin is what actually creates sourceSets.main and test, afaict. It also filters out projects where sourceSets.main doesn't exist, which was throwing exceptions for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a quirk :) What I'm saying is that you explicitly rely on the project to be a java convention project. We don't need to require this but then have to decide what to use as source folders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could explicitly code the src/java
and src/test
paths here, but we are also already doing this in gradle/ant-compat/folder-layout.gradle
and I'd like to stick to doing it in one place. As for projects with files but not the java plugin... do we have those? I'll need to figure out what happens if we have a failed file in lucene/ directory directly...
gradle/validation/rat-sources.gradle
Outdated
def unknownLicenses = 0 | ||
ratXml.resource.each { resource -> | ||
if (resource.'license-approval'.@name[0] == "false") { | ||
println('Unknown license: ' + resource.@name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collect offending resources to an array and throw the exception with all offenders -- see other validation tasks (license checksum validation for example)?
this matters when tasks run concurrently - easier to spot the problem (reported by gradle at the end of the build).
patch.txt Inclusion/ exclusion patterns have to be reviewed and consolidated with ant: some projects override these, gradle should exclude project-local build/**, etc. |
It doesn't hang. It just takes a loooooong time on the highlighter - don't know why. The two failures are related to extra exclusions present in the ant build file of the corresponding project. |
Updated patch. Some files still fail; it'd have to be investigated. |
Once you apply the patch included in my last comment change the hardcoded " here: pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")") gradle files are not XMLs so this is responsible for at least some failures. |
Additional Authors: Dawid Weiss <dweiss@apache.org>
Updated based on your patch, pushed latest to this PR. We're missing rat check on There's definitely something else going on here as one of the excluded files from ant is either skipped or passes on Gradle, so more digging needs to happen here. One other question of best practices - do we want all the project exclusions in a central rat.gradle file, or spread out and local to each project? |
Tune include/exclude to match ant Rename in/ex patterns to be more clear about what their functions are
One other thought - we probably shouldn't exclude the |
To be honest I would prefer not to add those license headers to build files, unless it's a requirement of apache legal -- this would have to be verified. For one thing, I don't perceive build files as something particularly valuable as an intellectual property (although it surely does require intellectual input to write them). But even if then the distribution bundle comes with a top-level license file that covers them? The downside of requiring this boilerplate is that they bloat each and every script. |
bq. One other question of best practices - do we want all the project exclusions in a central rat.gradle file, or spread out and local to each project? This is a good question and I think the answer is really subjective. Both approaches have advantages and disadvantages. I personally like the "aspect" oriented approach when everything related to a particular build function is gathered in a single file. So yes, for rat it'd be just that single file -- any special handling of that aspect, exclusions, etc. would be collected there. In an extreme case you should be able to enable/ disable rat just by commenting out the apply block in the master build file. As the build evolves over time it may need to be changed. For example when you move parts of the build into buildSrc and wrap it in plugins it will become necessary to move project-specific logic outside. For now I'd rather keep it in this "aspect-oriented" form if you don't mind (but this is a subjective decision, not any better or common practice). |
Add java conditional inside the task so that we can verify source sets exist before using them.
One final round of updates:
I think that's everything in this PR, and we're ready to merge? We can go and switch from ant task to rat classes directly in a later change if we decide that's worthwhile.
I guess we'll hear from LEGAL about this one way or the other. Not going to bother adding them for this PR until somebody does the checks.
That's fine. I just noticed this and was curious since we were moving from the project oriented approach with ant to this approach and wanted to make sure it was a conscious decision. |
Thanks Mike. I'll merge it in today. I actually looked at apache rat sources yesterday -- they are indeed riddled with encoding issues (writer on top of printstream, etc.) and they're not easy to use directly to apply just to a bunch of files instead of directory+inclusions/exclusions. My motivation here was to replace the whole fileset trickery with gradle's file collections. Then it's easier to pick what you need, pass it to task inputs, etc. Also, what rat does is merely simple pattern detection (line matching)... this could also be written in just plain groovy with relative ease - perhaps at some point we can just switch over to having our own license check where we can control things better. For now I think it's fine. |
// "dev-tools/**" | ||
] | ||
excludes += [ | ||
// Unclear if this needs ASF header, depends on how much was copied from ElasticSearch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was but I changed a whole bunch of stuff along the way. There are similarities and the attribution is there but I think we can consider it our own.
if (project.path == ":lucene:analysis:kuromoji") { | ||
rat { | ||
srcExcludes += [ | ||
// whether rat detects this as binary or not is platform dependent?! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of the reasons I'd rather have our own check - we don't have to rely on a black box (or look inside rat's code to figure out what it does).
includes += [ | ||
"forbiddenApis/**", | ||
"prettify/**", | ||
// If/when :lucene:tools becomes a gradle project, then the following line will fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there will be lucene:tools anymore once we depart from ant - I don't think there is any relevant code in there that should stay.
@@ -72,6 +72,7 @@ org.apache.opennlp:opennlp-tools=1.9.1 | |||
org.apache.pdfbox:*=2.0.17 | |||
org.apache.pdfbox:jempbox=1.8.16 | |||
org.apache.poi:*=4.1.1 | |||
org.apache.rat:apache-rat:0.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you run gradlew precommit? :) Because I think the lock update file isn't included in the patch.
Thanks Mike. I'll fix the remaining tiny issues on gradle-master. |
Still a WIP, but most of the way there.
Need to move the task invocation out to the modules where it belongs. I didn't completely understand the logic for what was included and excluded and how we made those decisions initially.
Am not sure if we should be printing all the violations by default or hiding that behind a
--info
or--verbose
flag.