Skip to content

Conversation

@rmetzger
Copy link
Contributor

@rmetzger rmetzger commented Dec 3, 2020

What is the purpose of the change

During the release validation, we noticed several jar files containing LICENSE files in their root.

This change adds some basic checks to validate the jar files.

NOTICE: The check currently fails because the examples lack LICENSE files.

Brief change log

  • Split license checker into notice and jar checker
  • add jar checker
  • fix flink-table related issues

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 3, 2020

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 1a10256 (Thu Dec 03 09:42:08 UTC 2020)

Warnings:

  • 1 pom.xml files were touched: Check for build and licensing issues.
  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • This pull request references an unassigned Jira ticket. According to the code contribution guide, tickets need to be assigned before starting with the implementation work.

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need a whitelist of sorts for jars that do not require a LICENSE file, i.e. all those that we don't deploy (like examples). Or we just bundle the license in all jars 🤷

if (ze.getName().equals("META-INF/NOTICE")) {
metaInfNoticeSeen = true;
}
if (ze.getName().equals("META-INF/LICENSE")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also reject META-INF/LICENSE.txt

try (ZipInputStream zis = new ZipInputStream(new FileInputStream(file.toFile()))) {
ZipEntry ze;
while ((ze = zis.getNextEntry()) != null) {
if (ze.getName().equals("LICENSE")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe instead check that no root file contains the word "license"?

ZipEntry ze;
while ((ze = zis.getNextEntry()) != null) {
if (ze.getName().equals("LICENSE")) {
LOG.error("Jar file {} contains a LICENSE file in the root folder", file);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly give instructions on what to do here; validate whether the license is required, manually adding it to our licensing setup, and adding an exclusion to the shade plugin.

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 3, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@rmetzger rmetzger force-pushed the FLINK-20455-license-checker branch from 1a10256 to 0aa70d7 Compare December 3, 2020 14:54
@rmetzger
Copy link
Contributor Author

rmetzger commented Dec 3, 2020

I pushed a new version, with more checks, which addresses all issues found by the extended tool.

Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a test that ensures all the wrong things are actually picked up. Wouldn't want us to accidentally break these safeguards unknowingly.


package org.apache.flink.tools.ci.licensecheck;

import org.apache.logging.log4j.core.util.IOUtils;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't add hard dependencies on log4j

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, damn, that was not intentional ;)

Comment on lines +56 to +122
boolean metaInfNoticeSeen = false;
boolean metaInfLicenseSeen = false;
int classFiles = 0;

List<String> errors = new ArrayList<>();
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(file.toFile()))) {
ZipEntry ze;
while ((ze = zis.getNextEntry()) != null) {
final String name = ze.getName();
/**
* There must be a LICENSE file and NOTICE file in the META-INF directory.
* LICENSE or NOTICE files found outside of the META-INF directory are most likely shading mistakes (we are including the files from other dependencies, thus providing an invalid LICENSE file)
*
* In such a case, we recommend updating the shading exclusions, and adding the license file to META-INF/licenses.
*/
if (!name.contains("META-INF") // license files in META-INF are expected
&& !name.contains(".class") // some class files contain LICENSE in their name
&& !name.contains(".ftl") // a false positive in flink-python
&& !name.contains("web/3rdpartylicenses.txt") // a false positive in flink-runtime-web
&& !name.endsWith("/") // ignore directories, e.g. "license/"
&& name.toUpperCase().contains("LICENSE")) {
errors.add("Jar file " + file + " contains a LICENSE file in an unexpected location: " + name);
}
if (!name.contains("META-INF") && !name.contains(".class") && name.toUpperCase().contains("NOTICE")) {
errors.add("Jar file " + file + " contains a NOTICE file in an unexpected location: " + name);
}

if (name.equals("META-INF/NOTICE")) {
String noticeContents = IOUtils.toString(new InputStreamReader(zis));
if (!noticeContents.toLowerCase().contains("flink") || !noticeContents.contains("The Apache Software Foundation")) {
errors.add("The notice file in " + file + ":META-INF/NOTICE does not contain the expected entries");
}
metaInfNoticeSeen = true;
}
if (name.equals("META-INF/LICENSE")) {
String licenseContents = IOUtils.toString(new InputStreamReader(zis));
if (!licenseContents.contains("Apache License") || !licenseContents.contains("Version 2.0, January 2004")) {
errors.add("The notice file in " + file + ":META-INF/LICENSE does not contain the expected entries");
}
metaInfLicenseSeen = true;
}
if (name.endsWith(".class")) {
classFiles++;
}
zis.closeEntry();
}
}

// Empty test jars (with no class files) are skipped
if (classFiles == 0 && file.toString().endsWith("-tests.jar")) {
return 0;
}
for (String error: errors) {
LOG.warn(error);
}
int severeIssues = errors.size();

if (!metaInfLicenseSeen) {
LOG.warn("Missing META-INF/LICENSE in {}", file);
severeIssues++;
}

if (!metaInfNoticeSeen) {
LOG.warn("Missing META-INF/NOTICE in {}", file);
severeIssues++;
}
return severeIssues;
Copy link
Contributor

@zentol zentol Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
boolean metaInfNoticeSeen = false;
boolean metaInfLicenseSeen = false;
int classFiles = 0;
List<String> errors = new ArrayList<>();
try (ZipInputStream zis = new ZipInputStream(new FileInputStream(file.toFile()))) {
ZipEntry ze;
while ((ze = zis.getNextEntry()) != null) {
final String name = ze.getName();
/**
* There must be a LICENSE file and NOTICE file in the META-INF directory.
* LICENSE or NOTICE files found outside of the META-INF directory are most likely shading mistakes (we are including the files from other dependencies, thus providing an invalid LICENSE file)
*
* In such a case, we recommend updating the shading exclusions, and adding the license file to META-INF/licenses.
*/
if (!name.contains("META-INF") // license files in META-INF are expected
&& !name.contains(".class") // some class files contain LICENSE in their name
&& !name.contains(".ftl") // a false positive in flink-python
&& !name.contains("web/3rdpartylicenses.txt") // a false positive in flink-runtime-web
&& !name.endsWith("/") // ignore directories, e.g. "license/"
&& name.toUpperCase().contains("LICENSE")) {
errors.add("Jar file " + file + " contains a LICENSE file in an unexpected location: " + name);
}
if (!name.contains("META-INF") && !name.contains(".class") && name.toUpperCase().contains("NOTICE")) {
errors.add("Jar file " + file + " contains a NOTICE file in an unexpected location: " + name);
}
if (name.equals("META-INF/NOTICE")) {
String noticeContents = IOUtils.toString(new InputStreamReader(zis));
if (!noticeContents.toLowerCase().contains("flink") || !noticeContents.contains("The Apache Software Foundation")) {
errors.add("The notice file in " + file + ":META-INF/NOTICE does not contain the expected entries");
}
metaInfNoticeSeen = true;
}
if (name.equals("META-INF/LICENSE")) {
String licenseContents = IOUtils.toString(new InputStreamReader(zis));
if (!licenseContents.contains("Apache License") || !licenseContents.contains("Version 2.0, January 2004")) {
errors.add("The notice file in " + file + ":META-INF/LICENSE does not contain the expected entries");
}
metaInfLicenseSeen = true;
}
if (name.endsWith(".class")) {
classFiles++;
}
zis.closeEntry();
}
}
// Empty test jars (with no class files) are skipped
if (classFiles == 0 && file.toString().endsWith("-tests.jar")) {
return 0;
}
for (String error: errors) {
LOG.warn(error);
}
int severeIssues = errors.size();
if (!metaInfLicenseSeen) {
LOG.warn("Missing META-INF/LICENSE in {}", file);
severeIssues++;
}
if (!metaInfNoticeSeen) {
LOG.warn("Missing META-INF/NOTICE in {}", file);
severeIssues++;
}
return severeIssues;
int numSevereIssues = 0;
try (final FileSystem fileSystem = FileSystems.newFileSystem(new URI("jar:file", uri.getHost(), uri.getPath(), uri.getFragment()), Collections.emptyMap())) {
// test jars (with no class files) are ignored
if (file.getFileName().toString().endsWith("-tests.jar")) {
for (Path root : fileSystem.getRootDirectories()) {
try (Stream<Path> files = Files.walk(root)) {
long numClassFiles = files
.filter(path -> path.getFileName().toString().endsWith(".class"))
.count();
if (numClassFiles == 0) {
return 0;
}
}
}
}
// check notice file existence and contents
final Path noticeFile = fileSystem.getPath("META-INF", "NOTICE");
if (Files.exists(noticeFile)) {
final String noticeFileContents = Files.readString(noticeFile);
if (!noticeFileContents.toLowerCase().contains("flink") || !noticeFileContents.contains("The Apache Software Foundation")) {
LOG.error("The notify file in {} does not contain the expected entries.", file);
numSevereIssues++;
}
} else {
LOG.error("Missing META-INF/NOTICE in {}", file);
numSevereIssues++;
}
// check license file existence and contents
final Path licenseFile = fileSystem.getPath("META-INF", "LICENSE");
if (Files.exists(licenseFile)) {
final String licenseFileContents = Files.readString(licenseFile);
if (!licenseFileContents.contains("Apache License") || !licenseFileContents.contains("Version 2.0, January 2004")) {
LOG.error("The license file in {} does not contain the expected entries.", file);
numSevereIssues++;
}
} else {
LOG.error("Missing META-INF/NOTICE in {}", file);
numSevereIssues++;
}
for (Path root : fileSystem.getRootDirectories()) {
try (Stream<Path> files = Files.walk(root)) {
/*
* LICENSE or NOTICE files found outside of the META-INF directory are most likely shading mistakes (we are including the files from other dependencies, thus providing an invalid LICENSE file)
*
* <p>In such a case, we recommend updating the shading exclusions, and adding the license file to META-INF/licenses.
*/
final List<String> filesWithIssues = files
.filter(path -> !path.equals(root))
.filter(path -> path.getFileName().toString().toLowerCase().contains("license"))
.filter(path -> !Files.isDirectory(path)) // ignore directories, e.g. "license/"
.filter(path -> !path.getFileName().toString().endsWith(".class")) // some class files contain LICENSE in their name
.filter(path -> !path.getFileName().toString().endsWith(".ftl")) // a false positive in flink-python
.map(Path::toString)
.filter(path -> !path.contains("META-INF")) // license files in META-INF are expected
.filter(path -> !path.endsWith("web/3rdpartylicenses.txt")) // a false positive in flink-runtime-web
.collect(Collectors.toList());
numSevereIssues += filesWithIssues.size();
for (String fileWithIssue : filesWithIssues) {
LOG.error("Jar file {} contains a LICENSE file in an unexpected location: {}", file, fileWithIssue);
}
}
}
}
return numSevereIssues;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to like these overly complex loops, which you dislike ;) (understandably)
Your proposal seems to require Java 11, and it fails with a NPE. But since you are taking over, I'll leave it to you to adjust it.

@zentol
Copy link
Contributor

zentol commented Dec 3, 2020

How about we merge the licensing issues fixes right away (because they do look correct to me), and I'll take care of writing the test tomorrow?

@zentol
Copy link
Contributor

zentol commented Dec 7, 2020

Manually merged.

@zentol zentol closed this Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants