Skip to content

Commit

Permalink
Merge ecdc7d9 into 24d4ba8
Browse files Browse the repository at this point in the history
  • Loading branch information
regebro committed Sep 12, 2018
2 parents 24d4ba8 + ecdc7d9 commit 868845e
Show file tree
Hide file tree
Showing 21 changed files with 473 additions and 81 deletions.
3 changes: 2 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ install:
script:
- make flake
- make coverage
- cd docs; make doctest; make html; cd ..
- make -C docs doctest
- make -C docs html

after_success:
- coveralls
Expand Down
11 changes: 10 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,21 @@
Changes
=======

2.0b3 (unreleased)
2.0b4 (unreleased)
------------------

- Nothing changed yet.


2.0b3 (2018-09-11)
------------------

- Replaced the example RMLFormatter with a more generic HTML formatter,
although it only handles HTML snippets at the moment.

- Added a RenameNodeAction, to get rid of an edge case of a node
tail appearing twice.


2.0b2 (2018-09-06)
------------------
Expand Down
3 changes: 3 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,13 @@ include *.txt
include *.yml
include .coveragerc
include Makefile
include docs/requirements.txt
recursive-include tests *.py
recursive-include tests *.xml
recursive-include tests *.html
recursive-include docs *.bat
recursive-include docs *.py
recursive-include docs *.rst
recursive-include docs *.xslt
recursive-include docs Makefile
recursive-exclude docs/build *
8 changes: 4 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ xmldiff

.. image:: https://coveralls.io/repos/github/Shoobx/xmldiff/badge.svg

``xmldiff`` is a library and a command line utility for making diffs out of XML.
``xmldiff`` is a library and a command-line utility for making diffs out of XML.
This may seem like something that doesn't need a dedicated utility,
but change detection in hierarchical data is very different from change detection in flat data.
XML type formats are also not only used for computer readable data,
Expand All @@ -18,8 +18,8 @@ This library provides tools to make human readable diffs in those situations.
Quick usage
-----------

``xmldiff`` is both a command line tool and a Python library.
To use it from the commandline, just run ``xmldiff`` with two input files::
``xmldiff`` is both a command-line tool and a Python library.
To use it from the command-line, just run ``xmldiff`` with two input files::

$ xmldiff file1.xml file2.xml

Expand Down Expand Up @@ -56,7 +56,7 @@ Changes from ``xmldiff`` 1.x

* These formats can show text differences in a semantically meaningful way.

* 2.0 is urrently significantly slower than ``xmldiff`` 2.x,
* 2.0 is urrently significantly slower than ``xmldiff`` 1.x,
but this may change in the future.
Currently we make no effort to make ``xmldiff`` 2.0 fast,
we concentrate on making it correct and usable.
Expand Down
8 changes: 4 additions & 4 deletions TODO.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ TODO
====

First Alpha:
--------------
------------

* 100% coverage, trafis, coveralls, pep8 - Done

Expand All @@ -23,17 +23,17 @@ First Beta:
First Final:
------------

* Documentation
* Documentation - Done


Future releases:
----------------

* An xmlpatch2 utility/command that can apply the diffs.
* An xmlpatch2 utility/command that can apply the diffs. - Issue #10

* Support for the xmldiff diff format.

* Maybe a diff format that looks like a text diff, but understands XML
structure and ignores ignorable whitespace? But that is also doable by
just reformatting the XML and pretty printing it and then using a text
diff, so maybe that's pointless.
diff, so maybe that's pointless. - Done
2 changes: 1 addition & 1 deletion docs/Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Makefile for Sphinx documentation
#

# You can set these variables from the command line.
# You can set these variables from the command-line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
Expand Down
133 changes: 133 additions & 0 deletions docs/source/advanced.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
Advanced Usage
==============

Diffing Formatted Text
----------------------

You can write your own formatter that understands your XML format,
and therefore can apply som intelligence to the format.

One common use case for this is to have more intelligent text handling.
The standard formatters will treat any text as just a value,
and the resulting diff will simply replace one value with another:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> from xmldiff import main, formatting
>>> left = '<body><p>Old Content</p></body>'
>>> right = '<body><p>New Content</p></body>'
>>> main.diff_texts(left, right)
[UpdateTextIn(node='/body/p[1]', text='New Content')]

The ``xml`` formatter will set tags around the text marking it as inserted or deleted:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> formatter=formatting.XMLFormatter()
>>>
>>> left = '<body><p>Old Content</p></body>'
>>> right = '<body><p>New Content</p></body>'
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p><diff:delete>Old</diff:delete><diff:insert>New</diff:insert> Content</p>
</body>

But if your XML format contains text with formats,
the output can in some cases be less than useful,
especially in the case where formatting is added:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> left = '<body><p>My Fine Content</p></body>'
>>> right = '<body><p>My <i>Fine</i> Content</p></body>'
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p>My <diff:delete>Fine Content</diff:delete><i diff:insert=""><diff:insert>Fine</diff:insert></i><diff:insert> Content</diff:insert></p>
</body>

Notice how the the whole text was deleted and then reinserted with formatting.
The XMLFormatter supports a better handling of text with the ``text_tags`` and ``formatting_tags`` parameters. Here is a simple and incomplete example with some common HTML tags:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> formatter=formatting.XMLFormatter(
... text_tags=('p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li'),
... formatting_tags=('b', 'u', 'i', 'strike', 'em', 'super',
... 'sup', 'sub', 'link', 'a', 'span'))
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p>My <i diff:insert-formatting="">Fine</i> Content</p>
</body>

This gives a result that flags the ``<i>`` tag as new formatting.
This more compact output is much more useful and easier to transform into a visual output.


Making a Visual Diff
--------------------

XML and HTML views will of course ignore all these ``diff:`` tags and attributes.
What we want with the HTML output above is to transform the ``diff:insert-formatting`` attribute into something that will make the change visible.
We can achieve that by applying XSLT before the ``render()`` method in the formatter.
This requires subclassing the formatter:


.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> import lxml.etree
>>> XSLT = u'''<?xml version="1.0"?>
... <xsl:stylesheet version="1.0"
... xmlns:diff="http://namespaces.shoobx.com/diff"
... xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
... xmlns="http://www.w3.org/1999/xhtml">
...
... <xsl:template match="@diff:insert-formatting">
... <xsl:attribute name="class">
... <xsl:value-of select="'insert-formatting'"/>
... </xsl:attribute>
... </xsl:template>
...
... <xsl:template match="@* | node()">
... <xsl:copy>
... <xsl:apply-templates select="@* | node()"/>
... </xsl:copy>
... </xsl:template>
... </xsl:stylesheet>'''
>>> XSLT_TEMPLATE = lxml.etree.fromstring(XSLT)
>>> class HTMLFormatter(formatting.XMLFormatter):
... def render(self, result):
... transform = lxml.etree.XSLT(XSLT_TEMPLATE)
... result = transform(result)
... return super(HTMLFormatter, self).render(result)

The XSLT template above of course only handles one case,
inserted formatting.
A more complete XSLT file is included `here <file:_static/htmlformatter.xslt>`_.

Now use that formatter in the diffing:

.. doctest::
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> formatter = HTMLFormatter(
... text_tags=('p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li'),
... formatting_tags=('b', 'u', 'i', 'strike', 'em', 'super',
... 'sup', 'sub', 'link', 'a', 'span'))
>>> result = main.diff_texts(left, right, formatter=formatter)
>>> print(result)
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<p>My <i class="insert-formatting">Fine</i> Content</p>
</body>

You can then add into your CSS files classes that make inserted text green,
deleted text red with an overstrike,
and formatting changes could for example be blue.
This makes it easy to see what has been changed in a HTML document.
3 changes: 2 additions & 1 deletion docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ text strings or ``lxml`` trees.

The arguments to these functions are the same:


Parameters
..........

Expand Down Expand Up @@ -117,7 +118,7 @@ for example with XSLT replacing the tags with the format you need.
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> from xmldiff import formatting
>>> formatter = formatting.HTMLFormatter()
>>> formatter = formatting.XMLFormatter(normalize=formatting.WS_BOTH)
>>> print(main.diff_files("../tests/test_data/insert-node.left.html",
... "../tests/test_data/insert-node.right.html",
... formatter=formatter))
Expand Down
12 changes: 6 additions & 6 deletions docs/source/commandline.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Command line usage
Command-line Usage
==================

``xmldiff`` is both a command line tool and a Python library.
To use it from the commandline, just run ``xmldiff`` with two input files:
``xmldiff`` is both a command-line tool and a Python library.
To use it from the command-line, just run ``xmldiff`` with two input files:

.. code-block:: bash
Expand Down Expand Up @@ -33,7 +33,7 @@ but may not give you a useful output.
If you are using ``xmldiff`` as a library,
you can create your own formatters that is suited for your particular usage of XML.

Whitespace handling
Whitespace Handling
-------------------

Formatters are also responsable for whitespace handling,
Expand Down Expand Up @@ -64,7 +64,7 @@ since the whitespace there occurs inside a tag:
In some XML formats, whitespace inside some tags is also not significant.
The ``html`` formatter is an example of this.
It is aware of that ``<p>`` tags contain text where whitespace isn't significant,
It is aware of that ``<p>`` tags contain text where whitespace isn't significant,
and will by default normalize whitespace inside these tags before comparing it,
effectively replacing any whitespace inside those tags to a single space.
This is so that when diffing two versions of HTML files you will not see changes that would not be visible in the final document.
Expand All @@ -77,7 +77,7 @@ Both the ``diff`` and ``xml`` formatters don't know of any text formatting,
and will therefore always preserve all whitespace inside tags.


Pretty printing
Pretty Printing
---------------

The term "pretty printing" refers to making an output a bit more human readable by structuring it with whitespace.
Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
# Usually you set "language" from the command-line for these cases.
language = None

# There are two options for replacing |today|: either, you set today to some
Expand Down Expand Up @@ -140,7 +140,7 @@
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_static_path = ['static']

# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
Expand Down
26 changes: 14 additions & 12 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ There are some extremely complex issues deep down in ``xmldiff``, but don't
let that scare you away, there's easy things to do as well.


Setting up a dev environment
----------------------------
Setting Up a Development Environment
------------------------------------

To set up a development environment you need a github account, git, and
of course Python with pip installed. You also should have the Python tools
Expand Down Expand Up @@ -52,12 +52,12 @@ There is no support for ``tox`` to run test under different Python versions.
This is because Travis will run all supported versions on pull requests in any case,
and having yet another list of supported Python versions to maintain seems unnecessary.
You can either create your own tox.ini file,
or you can install ```spiny`` <https://pypi.org/project/spiny/>`_,
or you can install `Spiny <https://pypi.org/project/spiny/>`_,
which doesn't require any extra configuration in the normal case,
and will run the tests on all versions that are defined as supported in ``setup.py``:
and will run the tests on all versions that are defined as supported in ``setup.py``.


Pull requests
Pull Requests
-------------

Even if you have write permissions to the repository we discourage pushing changes to master.
Expand All @@ -71,18 +71,18 @@ You pull requests should:

* Include an description of the change in ``CHANGES.txt``

* Sdd you to the contributors list in ``README.txt`` if you aren't already there.
* Add yourself to the contributors list in ``README.txt`` if you aren't already there.


Code quality and conventions
Code Quality and Conventions
----------------------------

``xmldiff`` aims to have 100% test coverage.
You run a coverage report with ``$ make coverage``.
You run a coverage report with ``make coverage``.
This will generate a HTML coverage report in ``htmlcov/index.html``

We run flake8 as a part of all Travis test runs,
the correct way to run it is ``$ make flake``,
the correct way to run it is ``make flake``,
as this includes only the files that should be covered.


Expand All @@ -97,7 +97,8 @@ This is so that adding one word to a paragraph will not cause several lines of c
as that will make any pull request harder to read.

That means that every sentence and most commas should be followed by a new line,
except in cases where this obviously do not make sense.
except in cases where this obviously do not make sense,
for example when using commas to separate things you list.
As a result of this there is no limits on line length,
but if a line becomes very long you might consider rewriting it to make it more understandable.

Expand All @@ -106,10 +107,11 @@ You generate the documentation with a make command::
cd docs
make html

We will be using (but aren't yet) `Read the Docs <https://readthedocs.org/>`_ to host the documentation.
The documentation is hosted on `Read the Docs <https://readthedocs.org/>`_,
the official URL is https://readthedocs.org/projects/xmldiff/.


Implementation details
Implementation Details
----------------------

``xmldiff`` is based on `"Change Detection in Hierarchically StructuredS Information" <http://ilpubs.stanford.edu/115/1/1995-46.pdf>`_
Expand Down

0 comments on commit 868845e

Please sign in to comment.