Skip to content
A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables
PHP
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
demo
doc Updating the readme with installation and usage information Mar 12, 2016
lib/Caxy/HtmlDiff Issue #77: Performance Fixes (#81) Feb 20, 2019
tests Issue #77: Performance Fixes (#81) Feb 20, 2019
.github_changelog_generator Add changelog for new release v0.1.9 Feb 20, 2019
.gitignore
.scrutinizer.yml
CHANGELOG.md Add changelog for new release v0.1.9 Feb 20, 2019
CODE_OF_CONDUCT.md Updating the readme with installation and usage information Mar 12, 2016
CONTRIBUTING.md
LICENSE
README.md Add previously undocumented config option: setKeepNewLines(false) (#83) Feb 23, 2019
composer.json Resolve PHP 7.3 compatibility issue to fix #79 (#80) Jan 15, 2019
phpunit.xml.dist HTMLDiff Performance inhancement (#54) Jun 6, 2016

README.md

php-htmldiff

Scrutinizer Code Quality Build Status Code Coverage Packagist Average time to resolve an issue Percentage of issues still open

php-htmldiff is a library for comparing two HTML files/snippets and highlighting the differences using simple HTML.

This HTML Diff implementation was forked from rashid2538/php-htmldiff and has been modified with new features, bug fixes, and enhancements to the original code.

For more information on these modifications, read the differences from rashid2538/php-htmldiff or view the CHANGELOG.

Installation

The recommended way to install php-htmldiff is through Composer. Require the caxy/php-htmldiff package by running following command:

composer require caxy/php-htmldiff

This will resolve the latest stable version.

Otherwise, install the library and setup the autoloader yourself.

Working with Symfony

If you are using Symfony, you can use the caxy/HtmlDiffBundle to make life easy!

Usage

use Caxy\HtmlDiff\HtmlDiff;

$htmlDiff = new HtmlDiff($oldHtml, $newHtml);
$content = $htmlDiff->build();

Configuration

The configuration for HtmlDiff is contained in the Caxy\HtmlDiff\HtmlDiffConfig class.

There are two ways to set the configuration:

  1. Configure an Existing HtmlDiff Object
  2. Create and Use a HtmlDiffConfig Object

Configure an Existing HtmlDiff Object

When a new HtmlDiff object is created, it creates a HtmlDiffConfig object with the default configuration. You can change the configuration using setters on the object:

use Caxy\HtmlDiff\HtmlDiff;

// ...

$htmlDiff = new HtmlDiff($oldHtml, $newHtml);

// Set some of the configuration options.
$htmlDiff->getConfig()
    ->setMatchThreshold(80)
    ->setInsertSpaceInReplace(true)
;

// Calculate the differences using the configuration and get the html diff.
$content = $htmlDiff->build();

// ...

Create and Use a HtmlDiffConfig Object

You can also set the configuration by creating an instance of Caxy\HtmlDiff\HtmlDiffConfig and using it when creating a new HtmlDiff object using HtmlDiff::create.

This is useful when creating more than one instance of HtmlDiff:

use Caxy\HtmlDiff\HtmlDiff;
use Caxy\HtmlDiff\HtmlDiffConfig;

// ...

$config = new HtmlDiffConfig();
$config
    ->setMatchThreshold(95)
    ->setInsertSpaceInReplace(true)
;

// Create an HtmlDiff object with the custom configuration.
$firstHtmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config);
$firstContent = $firstHtmlDiff->build();

$secondHtmlDiff = HtmlDiff::create($oldHtml2, $newHtml2, $config);
$secondHtmlDiff->getConfig()->setMatchThreshold(50);

$secondContent = $secondHtmlDiff->build();

// ...

Full Configuration with Defaults:


$config = new HtmlDiffConfig();
$config
    // Percentage required for list items to be considered a match.
    ->setMatchThreshold(80)
    
    // Set the encoding of the text to be diffed.
    ->setEncoding('UTF-8')
    
    // If true, a space will be added between the <del> and <ins> tags of text that was replaced.
    ->setInsertSpaceInReplace(false)
    
    // Option to disable the new Table Diffing feature and treat tables as regular text.
    ->setUseTableDiffing(true)
    
    // Pass an instance of \Doctrine\Common\Cache\Cache to cache the calculated diffs.
    ->setCacheProvider(null)
    
    // Set the cache directory that HTMLPurifier should use.
    ->setPurifierCacheLocation(null)
    
    // Group consecutive deletions and insertions instead of showing a deletion and insertion for each word individually. 
    ->setGroupDiffs(true)
    
    // List of characters to consider part of a single word when in the middle of text.
    ->setSpecialCaseChars(array('.', ',', '(', ')', '\''))
    
    // List of tags to treat as special case tags.
    ->setSpecialCaseTags(array('strong', 'b', 'i', 'big', 'small', 'u', 'sub', 'sup', 'strike', 's', 'p'))
    
    // List of tags (and their replacement strings) to be diffed in isolation.
    ->setIsolatedDiffTags(array(
        'ol'     => '[[REPLACE_ORDERED_LIST]]',
        'ul'     => '[[REPLACE_UNORDERED_LIST]]',
        'sub'    => '[[REPLACE_SUB_SCRIPT]]',
        'sup'    => '[[REPLACE_SUPER_SCRIPT]]',
        'dl'     => '[[REPLACE_DEFINITION_LIST]]',
        'table'  => '[[REPLACE_TABLE]]',
        'strong' => '[[REPLACE_STRONG]]',
        'b'      => '[[REPLACE_B]]',
        'em'     => '[[REPLACE_EM]]',
        'i'      => '[[REPLACE_I]]',
        'a'      => '[[REPLACE_A]]',
    ))
    
    // Sets whether newline characters are kept or removed when `$htmlDiff->build()` is called.
    // For example, if your content includes <pre> tags, you might want to set this to true.
    ->setKeepNewLines(false)
;

Contributing

See CONTRIBUTING file.

Contributor Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See CODE_OF_CONDUCT file.

Credits

Did we miss anyone? If we did, let us know or put in a pull request!

License

php-htmldiff is available under GNU General Public License, version 2. See the LICENSE file for details.

TODO

  • Tests, tests, and more tests! (mostly unit tests) - need more tests before we can major refactoring / cleanup for a v1 release
  • Add documentation for setting up a cache provider (doctrine cache)
    • Maybe add abstraction layer for cache + adapter for doctrine cache
  • Make HTML Purifier an optional dependency - possibly use abstraction layer for purifiers so alternatives could be used (or none at all for performance)
  • Expose configuration for HTML Purifier (used in table diffing) - currently only cache dir is configurable through HtmlDiffConfig object
  • Add option to enable using HTML Purifier to purify all input
  • Performance improvements (we have 1 benchmark test, we should probably get more)
    • Algorithm improvements - trimming alike text at start and ends, store nested diff results in memory to re-use (like we do w/ caching)
    • Benchmark using DOMDocument vs. alternatives vs. string parsing
  • Benchmarking
  • Look into removing dependency on php-simple-html-dom-parser library - possibly find alternative or no library at all. Consider how this affects performance.
  • Refactoring (but... tests first)
    • Overall design/architecture improvements
    • API improvements so a new HtmlDiff isn't required for each new diff (especially so that configuration can be re-used)
  • Split demo application to separate repository
  • Add documentation on alternative htmldiff engines and perhaps some comparisons
You can’t perform that action at this time.