A Spring Boot library and standalone application for comparing pairs of XML files and reporting differences. Built on XMLUnit 2.
src/main/java/com/cloudwick/labs/xmldiff/
├── DiffTask.java # Represents one comparison job (control vs. test file)
├── DiffBuilderCustomizer.java # Extension point: customize XMLUnit comparison behavior
├── DiffChecker.java # Core engine: runs the comparison, returns differences
├── DiffCheckException.java # Unchecked exception for XML/IO failures
├── ReportWriter.java # Extension point: write results anywhere
├── CsvReportWriter.java # Default implementation: writes a CSV per task
├── XmlDiffAutoConfiguration.java # Spring Boot auto-configuration for library use
└── app/
├── XmlDiffApplication.java # Standalone Spring Boot entry point
├── BaseFilePoller.java # Abstract orchestrator (template method)
└── FileSystemFilePoller.java # Default poller: scans a directory on a schedule
Build and run the executable JAR:
mvn clean install
java -jar target/xml-diff-tool-1.0.0-exec.jar \
--xml.diff.watch-dir=/path/to/input \
--xml.diff.output-dir=/path/to/reportsPlace paired XML files in the watch directory. The poller picks up any *.control.xml file and looks for a matching *.test.xml with the same base name:
/path/to/input/
job-001.control.xml ← reference / expected
job-001.test.xml ← actual / generated
job-002.control.xml
job-002.test.xml
After each poll, reports are written and the source files are moved out of the watch directory:
/path/to/reports/
job-001-report.csv ← diff report
archive/
job-001.control.xml ← moved here on successful comparison
job-001.test.xml
error/
job-002.control.xml ← moved here if comparison fails (e.g. malformed XML)
job-002.test.xml
Files are removed from the watch directory once processed and will not be re-compared on the next poll.
task_id,result,xpath_control,xpath_test,comparison_type
job-001,DIFFERENT,/record[1]/value[1],/record[1]/value[1],TEXT_VALUE
result will be SIMILAR for insignificant differences (e.g. whitespace) or DIFFERENT for meaningful ones. No rows after the header means the files are identical.
| Property | Default | Description |
|---|---|---|
xml.diff.watch-dir |
./input |
Directory scanned for XML file pairs |
xml.diff.output-dir |
./output |
Root output directory; archive/ and error/ subdirectories are created inside it |
xml.diff.poll-interval-ms |
60000 |
Polling interval in milliseconds |
All properties can also be passed as command-line arguments (--xml.diff.watch-dir=...).
Add the plain JAR (without the -exec classifier) to your pom.xml:
<dependency>
<groupId>com.cloudwick.labs</groupId>
<artifactId>xml-diff-tool</artifactId>
<version>1.0.0</version>
</dependency>Spring Boot auto-configuration wires DiffChecker and CsvReportWriter automatically. Inject DiffChecker wherever you need it:
@Service
public class MyService {
private final DiffChecker diffChecker;
public MyService(DiffChecker diffChecker) {
this.diffChecker = diffChecker;
}
public void compare(Path control, Path test) {
DiffTask task = new DiffTask("my-job", control, test);
List<Difference> differences = diffChecker.findDifferences(task);
// handle differences
}
}Declare a @Bean of type ReportWriter in your application. Spring's @ConditionalOnMissingBean will skip the default CsvReportWriter.
Example: write to a database
@Configuration
public class MyConfig {
@Bean
public ReportWriter reportWriter(DiffRecordRepository repo) {
return (task, differences) -> differences.forEach(d ->
repo.save(new DiffRecord(
task.getId(),
d.getResult().name(),
d.getComparison().getType().name()
))
);
}
}Example: write to a log
@Bean
public ReportWriter reportWriter() {
Logger log = LoggerFactory.getLogger("diff-results");
return (task, differences) ->
differences.forEach(d -> log.warn("[{}] {}", task.getId(), d));
}Declare a @Bean of type DiffBuilderCustomizer to replace the default XMLUnit configuration (whitespace normalization + attribute-aware node matching).
Example: ignore XML comments and match elements by name only
@Bean
public DiffBuilderCustomizer diffBuilderCustomizer() {
return builder -> builder
.ignoreWhitespace()
.ignoreComments()
.withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName));
}Example: treat all differences as similar (check structure only)
@Bean
public DiffBuilderCustomizer diffBuilderCustomizer() {
return builder -> builder
.ignoreWhitespace()
.checkForSimilar();
}Subclass BaseFilePoller to supply tasks from a source other than the filesystem (e.g. a database queue, an S3 bucket listing, a message queue).
@Component
public class DatabaseFilePoller extends BaseFilePoller {
private final JobRepository jobRepository;
public DatabaseFilePoller(DiffChecker diffChecker, ReportWriter reportWriter,
JobRepository jobRepository) {
super(diffChecker, reportWriter);
this.jobRepository = jobRepository;
}
@Scheduled(fixedDelayString = "${xml.diff.poll-interval-ms:60000}")
public void scheduledPoll() {
poll();
}
@Override
protected List<DiffTask> buildTasks() {
return jobRepository.findPendingJobs().stream()
.map(job -> new DiffTask(
job.getId(),
Path.of(job.getControlFilePath()),
Path.of(job.getTestFilePath())))
.toList();
}
}DiffTask exposes a metadata map for attaching arbitrary context that your custom ReportWriter can use:
DiffTask task = new DiffTask("job-001", controlPath, testPath);
task.getMetadata().put("sourceSystem", "pipeline-v2");
task.getMetadata().put("submittedBy", "user@example.com");# Build both JARs and run tests
mvn clean install
# Skip tests
mvn clean install -DskipTests
# Run tests only
mvn testOutput artifacts:
| File | Purpose |
|---|---|
target/xml-diff-tool-1.0.0.jar |
Plain library JAR — add as a Maven dependency |
target/xml-diff-tool-1.0.0-exec.jar |
Executable fat JAR — run standalone |