Skip to content

exadel-inc/etoolbox-anydiff

Repository files navigation

EToolbox AnyDiff

Project logo

It is a Java library and a command line utility to visually compare content of files and manage differences. Mostly aimed at comparing XML and HTML files but can be used with any textual content.

Motivation

Compare web pages as rendered by two different versions of server code or hosted at different environments. Compare Adobe Experience Manager (TM) content packages assembled in different builds (from different code branches, etc.). Compare XML output such as Adobe Granite (TM) markup for AEM dialogs; and more.

This tool was originally created to accompany Exadel Authoring Kit for AEM and perform regression testing. However it can be used to visualize differences between any two sets of files inside and outside the AEM ecosystem.

Features

There is the Java library available via Maven and a command-line application. Both offer the same set of features.

Feature display is per the CLI utility.

Compare two files, directories
java -jar anydiff.jar --left file1.html --right file2.html

This will output to the console (and also to a log file) disparities between two files as follows:

Console output

You can specify more than one files for both the --left and --right arguments, space-separated. You can also specify directories or listing files (the ones with the .lst or .list extensions).

Change the captions for the columns for better clarity with [...] syntax

java -jar anydiff.jar --left "[Original]/var/log/myapp/" --right "[After update]/var/log/myapp"

Compare two AEM packages
java -jar anydiff.jar --left ./target/ui.content-1.120.1.zip --right ./target/ui.content-1.120.2.zip
Compare two URLs
java -jar anydiff.jar --left http://localhost:4502/content/we-retail/us/en.html?foo=bar --right https://some.aem.instance:4502/content/we-retail/us/en.html?foo=bar&@User-Agent=PostmanRuntime/7.33.0&@nosslcheck

Mind the @-prefixed query parameters. This is the way to set custom request headers for the HTTP client. The parameter processed client-side and not passed to the remote endpoint.

Also mind the @nosslcheck. This is not a custom header but a reserved flag that tells to trust all SSL certificate. (Can be useful when working in trusted environments that have issues with SSL certificates. However, be cautious using this option when requesting an occasional Internet host)

Log differences to a file

By default, the same output as seen on the screen is logged to a file under $HOME/.etoolbox-anydiff/logs (in text file, ~...~ marks the removal and +...+ the insertion).

Pass the --html argument (or -h) to the command line to additionally store an HTML log under $HOME/.etoolbox-anydiff/html. Use --browse (-b) to open the HTML file in the default browser.

HTML Output

Modifying comparison output

Use --width XX (or -w XX) to modify the width of the column in the console and log file. Default is 60.

Use --arrange (true|false) (or -a (true|false)) to control comparison of markup files. When set to true, attributes of XML and HTML nodes are arranged alphabetically before comparing. Therefore, no disparity is reported when attributes are in different order. Set it to false if the original order actually matters. Default is true.

Use --normalize (true|false) (or -n (true|false)) to control whether the program re-formats markup files (XML, HTML) before comparison for more accurate and granular results. Default is true.

Use --handle-errorpages (true|false) (or -e (true|false)) to control whether the program should handle error pages (HTTP status 4xx, 5xx) as "normal" pages with comparable markup. Default is false which means that the error is reported instead of comparing content.

Use --ignore-spaces (or -i) to make the comparison neglect the number of spaces between words. Default is false. Please note: this setting is partially overlapped by normalize and arrange because preparing perfectly aligned markup trees leads to many empty lines and indentations removed. In markup files ignoring spaces mostly relates to text nodes and literals. In non-markup files it is more universal.

Java API

The same features are available via the Java API. The usual entry point is the Comparator class which may be used as follows:

class Main {
    // ...
    List<Diff> differences = new AnyDiff()
        .left("path/to/file.html")
        .right("/path/to/another/file.html")
        .compare();
    if (AnyDiff.isMatch(differences)) {
       // ...
    }
}

To use Java API, add the following dependency to your Maven project:

<dependency>
    <groupId>com.exadel.etoolbox</groupId>
    <artifactId>etoolbox-anydiff-core</artifactId>
    <version>1.0.0</version> <!-- always prefer the latest stable version -->
</dependency>

Features that are available only via Java API

Some features are available only via Java API. They are:

  • preprocessor - the ability to specify a routine that will be applied to the content before comparison. This is useful when you need to remove or replace some parts of the content that are not essential or else apply specific formatting (e.g., split into shorter lines);
  • postprocessor - the ability to specify a routine that will be applied to the differences after comparison. This is useful when you need to revert the changes introduced by a preprocessor apply or otherwise reformat the already compared content.

Please see documentation on AnyDiff utility for more details.

Diff filters

One of the powerful features is the ability to eliminate or else "mute" the differences that are not essential or well anticipated. E.g., when comparing live web pages you will certainly face various timestamps, UUIDs, analytic attributes, etc. which do not actually make web pages different.

These and other differences can be skipped via filters which are applied to the differences before they are reported.

There are two ways to define filters: with Java (for use with Java API) and with JavaScript (for use with the command-line interface).

From the Java API perspective, filters are descendants of the Filter interface. You can override one or more methods of it.

From the CLI perspective, filters are .js files stored in a directory that you specify with the --filters "/path/to/filters" argument. Every .js file contains one or more user-defined functions (see below).

A filter does one of the two actions:

  • skip: means that the difference is not reported at all;
  • accept: means that the difference is "acknowledged". It is reported in the output (to say, for the reference) but is not counted as a real difference == does not affect the result of isMatch() call.

A filter can be applied to any of the following entities:

  • diff: this is the "root" object which usually manifests a pair of whole files or web pages. A diff has its getLeft() and getRight() methods that return paths to the files of URLs. With diff one can skip a file/page from analysis by their name;
  • block: this is a sequence of lines that encompass a difference (roughly similar to what we see in a GitHub diff). There are lines with actual differences and lines that are just context. A block has its getLeft() and getRight() methods that returns left and right text accordingly;
  • line: this is a single line of text inside a block;
  • fragment pair: manifests the particular words or symbols within a line that are different for even more granular approach. To expose a fragment pair, a line must have the same number of differences in the left and right part (e.g., a single difference). Also, the first difference must be at the same offset in both parts;
  • fragment: a single char/symbol sequence within a line that is different from the opposite part. May be either a part of a fragment pair or a standalone difference.

Java API provides a separate method for every action and entity, like skipBlock or acceptFragment, etc.

JS API encourages you to define your own functions with the name that matches an action and the argument name that matches an entity. E.g.:

function skip(block) {
    return block.getLeft().startsWith("<!--");
}

function accept(fragments) { /* "fragments" == "fragment pair" in meaning */
    return fragments.getLeft().startsWith("lorem") && fragments.getRight().includes("ipsum");
}

There can be more than one function in a file. All of them will be applied to the differences.

See examples of filters in the test resources folder.

Troubleshooting

My Windows console does not display colored output

This can happen in an older Windows version. See the solution here.

Licensing and credits

This software is licensed under the Apache License, Version 2.0.

The project makes use of the following open-source libraries:

About

Visually compare files, folders, web pages, content packages and more inside and outside the AEM ecosystem. Manage differences with a CLI tool and Java/JS API

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages