Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RAT check using Gradle #1157

Merged
merged 5 commits into from Jan 15, 2020
Merged

Add RAT check using Gradle #1157

merged 5 commits into from Jan 15, 2020

Conversation

madrob
Copy link
Contributor

@madrob madrob commented Jan 9, 2020

Still a WIP, but most of the way there.

Need to move the task invocation out to the modules where it belongs. I didn't completely understand the logic for what was included and excluded and how we made those decisions initially.

Am not sure if we should be printing all the violations by default or hiding that behind a --info or --verbose flag.

@madrob madrob requested a review from dweiss January 9, 2020 22:13
gradle/validation/rat-sources.gradle Outdated Show resolved Hide resolved
gradle/validation/rat-sources.gradle Outdated Show resolved Hide resolved
@madrob
Copy link
Contributor Author

madrob commented Jan 10, 2020

Cleaned a bunch of stuff up, included your review feedback. Thanks, @dweiss!

One piece that I'm still struggling with is that ./gradlew rat will execute on the root but doesn't delegate to :lucene:rat and :solr:rat (and in fact those targets don't even exist), and then I'd like those to delegate to rat in all the sub-modules. I tried going through our other validation tasks to look for examples and quickly got lost - can you point me to the best practices here?

@madrob
Copy link
Contributor Author

madrob commented Jan 10, 2020

Almost there! Precommit currently fails with some license failures, I'll need to look at that deeper what exclusions we're actually using in the ant build, but I think we're super close now.

@dweiss
Copy link
Contributor

dweiss commented Jan 11, 2020

I'll take a look later, Mike. As for applying tasks and anything else -- think of the project structure as a graph. You attach things to this graph in two passes (evaluation, configuration), followed by execution of tasks attached to this graph (in topological order of dependencies).

It is conceptually simple. The devil hides in details of how gradle is evaluated, deferred evaluated-collections, etc. This should be helpful:

https://docs.gradle.org/current/userguide/build_lifecycle.html

I'll review the patch and maybe correct it before committing; when you take a look at the commit vs. your patch you'll see the differences made - I think it'll be easier and faster than explaining (but go ahead and ask if you don't understand something).

Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good indeed. I didn't have a chance to actually run it - only checked the commit diff. If you can make it pass (and then verify it's failing on invalid sources) that'd certainly help!

}

subprojects {
plugins.withId("java", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rat checks seem to apply not just to java projects but to other files (non-java files) as well. If so then the task should be applicable to any project, not just those with the java plugin?

I think we'd need to cross-check with what the ant script does (whether it runs those checks on non-java folders).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a quirk of Gradle where the Java plugin is what actually creates sourceSets.main and test, afaict. It also filters out projects where sourceSets.main doesn't exist, which was throwing exceptions for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a quirk :) What I'm saying is that you explicitly rely on the project to be a java convention project. We don't need to require this but then have to decide what to use as source folders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could explicitly code the src/java and src/test paths here, but we are also already doing this in gradle/ant-compat/folder-layout.gradle and I'd like to stick to doing it in one place. As for projects with files but not the java plugin... do we have those? I'll need to figure out what happens if we have a failed file in lucene/ directory directly...

gradle/validation/rat-sources.gradle Outdated Show resolved Hide resolved
gradle/validation/rat-sources.gradle Outdated Show resolved Hide resolved
def unknownLicenses = 0
ratXml.resource.each { resource ->
if (resource.'license-approval'.@name[0] == "false") {
println('Unknown license: ' + resource.@name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect offending resources to an array and throw the exception with all offenders -- see other validation tasks (license checksum validation for example)?

this matters when tasks run concurrently - easier to spot the problem (reported by gradle at the end of the build).

gradle/validation/rat-sources.gradle Show resolved Hide resolved
@dweiss
Copy link
Contributor

dweiss commented Jan 13, 2020

patch.txt
I reworked a few things, Mike. Now... rat hangs for me (?). I wonder if we could bypass ant and use rat classes directly - the conversion between gradle and ant filesets is a bit clumsy (it's fine as a first draft though!).

Inclusion/ exclusion patterns have to be reviewed and consolidated with ant: some projects override these, gradle should exclude project-local build/**, etc.

@dweiss
Copy link
Contributor

dweiss commented Jan 13, 2020

It doesn't hang. It just takes a loooooong time on the highlighter - don't know why. The two failures are related to extra exclusions present in the ant build file of the corresponding project.

@dweiss
Copy link
Contributor

dweiss commented Jan 13, 2020

patch.txt

Updated patch. Some files still fail; it'd have to be investigated.

@dweiss
Copy link
Contributor

dweiss commented Jan 13, 2020

Once you apply the patch included in my last comment change the hardcoded " here:

pattern(substring: "Licensed under the Apache License, Version 2.0 (the "License")")

gradle files are not XMLs so this is responsible for at least some failures.

Additional Authors: Dawid Weiss <dweiss@apache.org>
@madrob
Copy link
Contributor Author

madrob commented Jan 13, 2020

Updated based on your patch, pushed latest to this PR.

We're missing rat check on buildSrc (no ant equivalent), lucene/tools (does not have build.gradle file), and dev-tools (ant rat misses this too) - not sure which of those need to be addressed here.

There's definitely something else going on here as one of the excluded files from ant is either skipped or passes on Gradle, so more digging needs to happen here.

One other question of best practices - do we want all the project exclusions in a central rat.gradle file, or spread out and local to each project?

Tune include/exclude to match ant
Rename in/ex patterns to be more clear about what their functions are
@madrob
Copy link
Contributor Author

madrob commented Jan 13, 2020

One other thought - we probably shouldn't exclude the *.gradle files - there's no reason for those not to have license headers like our build.xml files currently do. WDYT?

@dweiss
Copy link
Contributor

dweiss commented Jan 14, 2020

To be honest I would prefer not to add those license headers to build files, unless it's a requirement of apache legal -- this would have to be verified. For one thing, I don't perceive build files as something particularly valuable as an intellectual property (although it surely does require intellectual input to write them). But even if then the distribution bundle comes with a top-level license file that covers them?

The downside of requiring this boilerplate is that they bloat each and every script.

@dweiss
Copy link
Contributor

dweiss commented Jan 14, 2020

bq. One other question of best practices - do we want all the project exclusions in a central rat.gradle file, or spread out and local to each project?

This is a good question and I think the answer is really subjective.

Both approaches have advantages and disadvantages. I personally like the "aspect" oriented approach when everything related to a particular build function is gathered in a single file. So yes, for rat it'd be just that single file -- any special handling of that aspect, exclusions, etc. would be collected there. In an extreme case you should be able to enable/ disable rat just by commenting out the apply block in the master build file.

As the build evolves over time it may need to be changed. For example when you move parts of the build into buildSrc and wrap it in plugins it will become necessary to move project-specific logic outside.

For now I'd rather keep it in this "aspect-oriented" form if you don't mind (but this is a subjective decision, not any better or common practice).

madrob and others added 3 commits January 14, 2020 15:05
Add java conditional inside the task so that we can verify source sets
exist before using them.
@madrob
Copy link
Contributor Author

madrob commented Jan 14, 2020

One final round of updates:

  • I moved the java plugin check to inside of the task and verified that the other projects would actually get called/fail with bad headers.
  • Added support for incremental builds so that should decrease the impact time to most developers
  • Added in the missing directories that didn't have their own projects. And a landmine if those do end up becoming projects so that somebody pays attention and fixes it (probably one of us)

I think that's everything in this PR, and we're ready to merge?

We can go and switch from ant task to rat classes directly in a later change if we decide that's worthwhile.

To be honest I would prefer not to add those license headers to build files, unless it's a requirement of apache legal -- this would have to be verified. For one thing, I don't perceive build files as something particularly valuable as an intellectual property (although it surely does require intellectual input to write them). But even if then the distribution bundle comes with a top-level license file that covers them?

I guess we'll hear from LEGAL about this one way or the other. Not going to bother adding them for this PR until somebody does the checks.

For now I'd rather keep it in this "aspect-oriented" form if you don't mind (but this is a subjective decision, not any better or common practice).

That's fine. I just noticed this and was curious since we were moving from the project oriented approach with ant to this approach and wanted to make sure it was a conscious decision.

@dweiss
Copy link
Contributor

dweiss commented Jan 15, 2020

Thanks Mike. I'll merge it in today. I actually looked at apache rat sources yesterday -- they are indeed riddled with encoding issues (writer on top of printstream, etc.) and they're not easy to use directly to apply just to a bunch of files instead of directory+inclusions/exclusions.

My motivation here was to replace the whole fileset trickery with gradle's file collections. Then it's easier to pick what you need, pass it to task inputs, etc. Also, what rat does is merely simple pattern detection (line matching)... this could also be written in just plain groovy with relative ease - perhaps at some point we can just switch over to having our own license check where we can control things better.

For now I think it's fine.

// "dev-tools/**"
]
excludes += [
// Unclear if this needs ASF header, depends on how much was copied from ElasticSearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was but I changed a whole bunch of stuff along the way. There are similarities and the attribution is there but I think we can consider it our own.

if (project.path == ":lucene:analysis:kuromoji") {
rat {
srcExcludes += [
// whether rat detects this as binary or not is platform dependent?!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the reasons I'd rather have our own check - we don't have to rely on a black box (or look inside rat's code to figure out what it does).

includes += [
"forbiddenApis/**",
"prettify/**",
// If/when :lucene:tools becomes a gradle project, then the following line will fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will be lucene:tools anymore once we depart from ant - I don't think there is any relevant code in there that should stay.

@@ -72,6 +72,7 @@ org.apache.opennlp:opennlp-tools=1.9.1
org.apache.pdfbox:*=2.0.17
org.apache.pdfbox:jempbox=1.8.16
org.apache.poi:*=4.1.1
org.apache.rat:apache-rat:0.11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you run gradlew precommit? :) Because I think the lock update file isn't included in the patch.

@dweiss dweiss merged commit c9e7eeb into apache:gradle-master Jan 15, 2020
@dweiss
Copy link
Contributor

dweiss commented Jan 15, 2020

Thanks Mike. I'll fix the remaining tiny issues on gradle-master.

@madrob madrob deleted the gradle-rat branch January 15, 2020 16:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants