Skip to content

cloudwicklabs/xml-diff-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xml-diff-tool

A Spring Boot library and standalone application for comparing pairs of XML files and reporting differences. Built on XMLUnit 2.


Project structure

src/main/java/com/cloudwick/labs/xmldiff/
├── DiffTask.java                  # Represents one comparison job (control vs. test file)
├── DiffBuilderCustomizer.java     # Extension point: customize XMLUnit comparison behavior
├── DiffChecker.java               # Core engine: runs the comparison, returns differences
├── DiffCheckException.java        # Unchecked exception for XML/IO failures
├── ReportWriter.java              # Extension point: write results anywhere
├── CsvReportWriter.java           # Default implementation: writes a CSV per task
├── XmlDiffAutoConfiguration.java  # Spring Boot auto-configuration for library use
└── app/
    ├── XmlDiffApplication.java    # Standalone Spring Boot entry point
    ├── BaseFilePoller.java        # Abstract orchestrator (template method)
    └── FileSystemFilePoller.java  # Default poller: scans a directory on a schedule

Usage

Standalone application

Build and run the executable JAR:

mvn clean install
java -jar target/xml-diff-tool-1.0.0-exec.jar \
  --xml.diff.watch-dir=/path/to/input \
  --xml.diff.output-dir=/path/to/reports

Place paired XML files in the watch directory. The poller picks up any *.control.xml file and looks for a matching *.test.xml with the same base name:

/path/to/input/
  job-001.control.xml   ← reference / expected
  job-001.test.xml      ← actual / generated
  job-002.control.xml
  job-002.test.xml

After each poll, reports are written and the source files are moved out of the watch directory:

/path/to/reports/
  job-001-report.csv          ← diff report
  archive/
    job-001.control.xml       ← moved here on successful comparison
    job-001.test.xml
  error/
    job-002.control.xml       ← moved here if comparison fails (e.g. malformed XML)
    job-002.test.xml

Files are removed from the watch directory once processed and will not be re-compared on the next poll.

CSV report format

task_id,result,xpath_control,xpath_test,comparison_type
job-001,DIFFERENT,/record[1]/value[1],/record[1]/value[1],TEXT_VALUE

result will be SIMILAR for insignificant differences (e.g. whitespace) or DIFFERENT for meaningful ones. No rows after the header means the files are identical.

Configuration (application.properties)

Property Default Description
xml.diff.watch-dir ./input Directory scanned for XML file pairs
xml.diff.output-dir ./output Root output directory; archive/ and error/ subdirectories are created inside it
xml.diff.poll-interval-ms 60000 Polling interval in milliseconds

All properties can also be passed as command-line arguments (--xml.diff.watch-dir=...).


As a library

Add the plain JAR (without the -exec classifier) to your pom.xml:

<dependency>
    <groupId>com.cloudwick.labs</groupId>
    <artifactId>xml-diff-tool</artifactId>
    <version>1.0.0</version>
</dependency>

Spring Boot auto-configuration wires DiffChecker and CsvReportWriter automatically. Inject DiffChecker wherever you need it:

@Service
public class MyService {
    private final DiffChecker diffChecker;

    public MyService(DiffChecker diffChecker) {
        this.diffChecker = diffChecker;
    }

    public void compare(Path control, Path test) {
        DiffTask task = new DiffTask("my-job", control, test);
        List<Difference> differences = diffChecker.findDifferences(task);
        // handle differences
    }
}

Customization and extension

Override the report writer

Declare a @Bean of type ReportWriter in your application. Spring's @ConditionalOnMissingBean will skip the default CsvReportWriter.

Example: write to a database

@Configuration
public class MyConfig {

    @Bean
    public ReportWriter reportWriter(DiffRecordRepository repo) {
        return (task, differences) -> differences.forEach(d ->
            repo.save(new DiffRecord(
                task.getId(),
                d.getResult().name(),
                d.getComparison().getType().name()
            ))
        );
    }
}

Example: write to a log

@Bean
public ReportWriter reportWriter() {
    Logger log = LoggerFactory.getLogger("diff-results");
    return (task, differences) ->
        differences.forEach(d -> log.warn("[{}] {}", task.getId(), d));
}

Override comparison behavior

Declare a @Bean of type DiffBuilderCustomizer to replace the default XMLUnit configuration (whitespace normalization + attribute-aware node matching).

Example: ignore XML comments and match elements by name only

@Bean
public DiffBuilderCustomizer diffBuilderCustomizer() {
    return builder -> builder
            .ignoreWhitespace()
            .ignoreComments()
            .withNodeMatcher(new DefaultNodeMatcher(ElementSelectors.byName));
}

Example: treat all differences as similar (check structure only)

@Bean
public DiffBuilderCustomizer diffBuilderCustomizer() {
    return builder -> builder
            .ignoreWhitespace()
            .checkForSimilar();
}

Extend the poller

Subclass BaseFilePoller to supply tasks from a source other than the filesystem (e.g. a database queue, an S3 bucket listing, a message queue).

@Component
public class DatabaseFilePoller extends BaseFilePoller {

    private final JobRepository jobRepository;

    public DatabaseFilePoller(DiffChecker diffChecker, ReportWriter reportWriter,
                               JobRepository jobRepository) {
        super(diffChecker, reportWriter);
        this.jobRepository = jobRepository;
    }

    @Scheduled(fixedDelayString = "${xml.diff.poll-interval-ms:60000}")
    public void scheduledPoll() {
        poll();
    }

    @Override
    protected List<DiffTask> buildTasks() {
        return jobRepository.findPendingJobs().stream()
                .map(job -> new DiffTask(
                        job.getId(),
                        Path.of(job.getControlFilePath()),
                        Path.of(job.getTestFilePath())))
                .toList();
    }
}

Attach metadata to a task

DiffTask exposes a metadata map for attaching arbitrary context that your custom ReportWriter can use:

DiffTask task = new DiffTask("job-001", controlPath, testPath);
task.getMetadata().put("sourceSystem", "pipeline-v2");
task.getMetadata().put("submittedBy", "user@example.com");

Building

# Build both JARs and run tests
mvn clean install

# Skip tests
mvn clean install -DskipTests

# Run tests only
mvn test

Output artifacts:

File Purpose
target/xml-diff-tool-1.0.0.jar Plain library JAR — add as a Maven dependency
target/xml-diff-tool-1.0.0-exec.jar Executable fat JAR — run standalone

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages