SiteDiff
SiteDiff makes it easy to see differences between two versions of a website. It accepts a set of paths to compare two versions of the site together with potential normalization/sanitization rules. From the provided paths and configuration SiteDiff generates an HTML report of all the status of HTML comparison between the given paths together with a readable diff-like HTML for each specified path containing the differences between the two versions of the site. It is useful tool for QAing re-deployments, site upgrades, etc.
Demo
To quickly see what sitediff can do, follow these steps to get sitediff to compare two sets of static HTMLs served by a simple HTTP server:
cd spec/fixtures/before && python -m SimpleHTTPServer 8801
# Serving HTTP on 0.0.0.0 port 8801 ...
# in a separate session
cd spec/fixtures/after && python -m SimpleHTTPServer 8802
# Serving HTTP on 0.0.0.0 port 8802 ...
# in a third session
bundle exec bin/sitediff diff --before-url=http://localhost:8801 --after-url=http://localhost:8802 spec/fixtures/config.yaml
Or if you have docker installed you can simply:
make build_sitediff # creates a docker image containing the sitediff executable
make start_fixtures # starts two fixture containers serving the same HTML content as above
make sitediff_fixtures # perform the diff, generate a report
make sitediff_serve # serve the reports and diff files so we can browse them
Here is an example SiteDiff report:
And here is an example SiteDiff diff report of a specific path:
Usage
SiteDiff relies on a YAML configuration file to pick up the paths and required sanitization rules. The following configuration blocks are recognized by SiteDiff:
-
sanitization
: a sanitization block contains atitle
, apattern
which is a regular expression in string form, and an optionalsubstitute
defaulting to empty string:sanitization: - title: 'remove form build id' pattern: '<input type="hidden" name="form_build_id" value="form-[a-zA-Z0-9_-]+" *\/?>' substitute: '<input type="hidden" name="form_build_id" value="__form_build_id__">'
Sanitization blocks are typically useful to avoid false positives that are in the form of individual strings (not hierarchical information, see
dom_transform
). -
selector
: defines the specific HTML elements we wish to compare. For example if you want to only compare breadcrumbs betweenbefore
andafter
, you might specify:selector: '#breadcrumb'
if that is how your HTML is structured.
-
dom_transform
: Allows you to edit the DOM tree before diff-ing. This is again useful to allow for expected structural differences to pass through without causing failed comparison tests. Adom_transform
block requires atype
which specifies what kind of DOM transformation to perform, and aselector
which specifies the element on which the action will be performed. Allowedtype
values are the following:-
remove
: removes the entire element specified by theselector
from the HTML, -
unwrap
: replaces the element with its constituents. For example:dom_transform: - type: 'unwrap' - selector: '#123'
will transform the following:
<div id="#123"> <p> Hello </p> <p> World </p> </div>
into:
<p> Hello </p> <p> World </p>
-
-
before
andafter
: these two are the special blocks that can wrap any of the blocks above to indicate that the normalization rules defined in the block should only apply to either thebefore
or theafter
version of the site. If a configuration block is found in the top level, and not underbefore
orafter
, it will apply to both. For example, if you wanted to let different date formatting not create diff failures, you might use the following:before: sanitization: - title: 'remove dates' pattern: '[1-2][0-9]{3}/[0-1][0-9]/[0-9]{2}' substitute: '__date__' after: sanitization: - title: 'remove dates' pattern: '[A-Z][a-z]{2} [0-9]{1,2}(st|nd|rd|th) [1-2][0-9]{3}' substitute: '__date__'
which will replace dates of the form
2004/12/05
inbefore
and dates of the formMay 12th 2004
inafter
with__date__
. -
include
: A configuration file can reference other YAML files to pull in the sanitization rules and/or dom transforms defined by the external file:includes: - config/sanitize_domains.yaml - config/strip_css_js.yaml