Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAT-355 and RAT-366 fixes in one package #233

Merged
merged 69 commits into from
Apr 23, 2024

Conversation

Claudenw
Copy link
Contributor

@Claudenw Claudenw commented Apr 1, 2024

Contains all the changes for RAT-355 and RAT-366 as one package.

The two issues were intertwined to the point where multiple files would be updated by both changes and in conflicting ways. This branch contains the changes from both.

The main changes are:

  • MetaData now tracks all the licenses that are applicable to the document and allows us to print out the discovered licenses and whether or not they are approved.
  • Matches checked in a single call, not line by line. This should noticeably speed up processing for some cases. This also removed all the "state" code that was inserted when the "not" matcher was added.
  • The Configuration file can be dumped as part of the XML report. There are no styles that report on this information yet, but it is available in the XML output.
  • To assist in parsing and building matcher a ConfigComponent annotation was added, as was a Component interface that is now the root for IHeaderMatcher and ILicense, a Description class that describe the Components, and a DescriptionBuilder that builds the Descriptions. This results in much simpler IHeaderMatcher and IHeaderMatcher.Builder code.

I am currently working on cleaning up the javadoc but I do not expect any further executable code changes.


* Add a new {{{./matcher_def.html}Matcher definition}}. Requires Java expertice.

* Add a new definition format. Requires java expertice as well as expertice with the format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 2x: expertise;
I'd prefer to uppercase Java.

How to define licenses in Apache Rat

All licenses in Apache Rat are defined in configuration files. There is a default
{{{https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/default.xml}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


All licenses in Apache Rat are defined in configuration files. There is a default
{{{https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/default.xml}
XML based configuration file}} that can server as a good example of various definitions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: that can serve

{{{https://github.com/apache/creadur-rat/blob/master/apache-rat-core/src/main/resources/org/apache/rat/default.xml}
XML based configuration file}} that can server as a good example of various definitions.

It is possible to create new parsers for different configuration formats. But that task is beyone the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: is beyond the


* Families

Families are groups that define licenses that share similarities. Each family has an id and a name, and example
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two sentences ? Each family has ..... An example .....

included document. It has to exist by the time all the configuration files are read. The "id" element
is optional, if it is not provided the id will be set to the id of the associated family. However, the
"id" property must be unique across all licenses. The "name" property is also optional. If not specified
the name of the associated family will be used. It is recommened that each license have a unique name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2x typo: It is recommended that each license has a .....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"license have" is the future perfect tense (I think that is what it is called. "license has" would be past perfect.

* Listing components

All the components (Licenses and Matchers) defined in the system can be displayed using the Documentation
tool. To run the too execute:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: tool

java -jar apache-rat-${project.version}.jar org.apache.rat.Documentation
+------------------------------------------+

The tool take the same arguments as the standard rat CLI, so additional license files can be defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Rat CLI

tool. To run the too execute:

+------------------------------------------+
java -jar apache-rat-${project.version}.jar org.apache.rat.Documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind using the same kind of linking as in
https://creadur.apache.org/rat/apache-rat/index.html
"apache-rat/target/" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss this. The index.html shows how to call the code if you have built it directly. What we don't do is show how to run the tools if you have downloaded them.

I want to expand tools to include simple scripts to run the basic command line tools given a download of a compressed file, or something similar to that. Make it easy to run for people who are just downloading and running.


How to define matchers in Apache Rat

Matchers in Apache Rat are paired with Builders. A Matcher must implement the "IHeaderMatcher" interface and its associated Builder must implement the IHeaderMatcher.Builder interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to link to the classes within Gitbox? Maybe this helps in case of refactorings as it would yield a 404 .....

** The Matcher implementation

For our example we will implement a Matcher that implements the phrase "Quality, speed and cost, pick any two” by looking for the occurrence of all three words anywhere in the header.
In most cases is it simplest to extend the AbstractHeaderMatcher class as this class will handle setting of the unique ID for instances that do not otherwise have a unique id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For matters of consistency: is there a difference between ID and id or Id?
I spotted the same in Javadoc of the changes and asked myself if there is a semantic difference between the two or if we should just decide to use one notation in all places.
Personally I'm fine with "Id" in the code and "id" in written text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No semantic difference. I have tried to clean the documentation to make it all "id"

In most cases is it simplest to extend the AbstractHeaderMatcher class as this class will handle setting of the unique ID for instances that do not otherwise have a unique id.

+------------------------------------------+
public interface IHeaderMatcher extends Component {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is a way to reference parts of sources, but in case it remains we should stick a big warning in IHeaderMatcher to keep the site up2date.

+------------------------------------------+

+------------------------------------------+
public abstract class AbstractHeaderMatcher implements IHeaderMatcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is sufficient to add plain references to both classes as you provide a full example afterwards ?!

import org.apache.rat.config.parameters.ConfigComponent;

@ConfigComponent(type = Component.Type.Matcher, name = "QSC", desc = "Reports if the 'Quality, speed and cost, pick any two' rule is violated")
public class QSCMatcher extends AbstraactHeaderMatcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: extends AbstractHeaderMatcher

@Override
public boolean matches(IHeaders headers) {
String text = headers.prune()
return text.contains("quality") && text.contains("speed") && text.contains("cost");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought of being able to use one of the Matchers provided within this PR - shouldn't I be able to "concatenate" matchers, such as ..... (pseudocode only):

AndMatcher.match(TextMatcher.contains("quality"), TextMatcher.contains("speed"), TextMatcher.contains("cost"))

But maybe it is just too late now ;) and I misunderstood the example or above matchers in the code.

In the above example we use the ConfigComponent annotation to identify that this is a Matcher component, that it has the name 'QSC' and a description of what it does.
If the "name" was not specified it would have been extracted from the class name by removing the "Matcher" from "QSCMatcher".

The Constructor calls the AbstractHeaderMatcher with a null argument so tha the parent class will assign an id.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: so that the

The builder must implement the IHeaderMatcher.Builder interface.

+------------------------------------------+
@FunctionalInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above I'd prefer to directly link to RAT sources/interfaces/classes via Gitbox instead of copy&pasting them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all links now point to ASF gitbox.


** Registering the builder for use in XML configuration

In order to use the matcher in a Rat configuration it has to be registered with the system. This can be done by creating file with the builder specified and passing it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be done by creating a file? What kind of file? XML or configuration file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the type just needs to be added for clarification purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only have an XML configuration parser but the system is open to new implementations. I have updated the documentation to reflect XML configuration.

import org.apache.rat.config.parameters.ConfigComponent;

@ConfigComponent(type = Component.Type.Matcher, name = "TPM", desc = "Checks that the three string are found in the header")
public class TPMatcher extends AbstraactHeaderMatcher {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: AbstractHeaderMatcher

+------------------------------------------+

The ComponentConfigurations with the parameter type indicate that the members specify properties of the component. The Matcher must have a "get" method for each parameter and the builder
must have a corresponding "set" method. The names of the methods and the attributes in the XML paarser can be changed by adding a 'name' attribute to the ConfigComponent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: parser


* Embedded matchers.

It is possible to create matchers that embed other matchers. The examples in the codebase are the "All", "Any" and "Not" matchers and their associated builders.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we link to the three matchers in Rat (Gitbox):

  • All
  • Any
  • Not

I'd assume that anyone wanting to add new ones would like to take a closer look at these base matchers.

@ottlinger
Copy link
Contributor

@Claudenw thanks for these big changes with tests and documentation. It's a lot of code that changed, but the examples and tests seem much more simple and straightforward to me.

@Claudenw Claudenw merged commit 66e1346 into apache:master Apr 23, 2024
16 checks passed
@Claudenw Claudenw deleted the RAT-355_RAT-366_merge branch May 22, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants