Skip to content

Commit

Permalink
Merge 5ae1f5a into 6e69693
Browse files Browse the repository at this point in the history
  • Loading branch information
regebro committed Sep 7, 2018
2 parents 6e69693 + 5ae1f5a commit 51bc20e
Show file tree
Hide file tree
Showing 24 changed files with 88 additions and 1,438 deletions.
3 changes: 2 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ Changes
2.0b3 (unreleased)
------------------

- Nothing changed yet.
- Replaced the example RMLFormatter with a more generic HTML formatter,
although it only handles HTML snippets at the moment.


2.0b2 (2018-09-06)
Expand Down
2 changes: 1 addition & 1 deletion MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ include .coveragerc
include Makefile
recursive-include tests *.py
recursive-include tests *.xml
recursive-include tests *.rml
recursive-include tests *.html
recursive-include docs *.bat
recursive-include docs *.py
recursive-include docs *.rst
Expand Down
48 changes: 22 additions & 26 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,11 @@ you just import and call one of the three main API methods.
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> from xmldiff import main
>>> main.diff_files("../tests/test_data/insert-node.left.rml",
... "../tests/test_data/insert-node.right.rml")
[UpdateTextIn(node='/document/story[1]', text=None),
InsertNode(target='/document/story[1]', tag='h1', position=0),
UpdateTextIn(node='/document/story/h1[1]', text='Inserted '),
InsertNode(target='/document/story/h1[1]', tag='i', position=0),
UpdateTextIn(node='/document/story/h1/i[1]', text='Node')]
>>> main.diff_files("../tests/test_data/insert-node.left.html",
... "../tests/test_data/insert-node.right.html")
[UpdateTextIn(node='/body/div[1]', text=None),
InsertNode(target='/body/div[1]', tag='p', position=0),
UpdateTextIn(node='/body/div/p[1]', text='Simple text')]

Which one you choose depends on if the XML is contained in files,
text strings or ``lxml`` trees.
Expand Down Expand Up @@ -62,7 +60,7 @@ If no formatter is specified the diff functions will return a list of actions.
Such a list is called an edit script and contains all changes needed to transform the "left" XML into the "right" XML.

If a formatter is specified that formatter determines the result.
The included formatters, ``diff``, ``xml``, and ``rml`` all return a Unicode string.
The included formatters, ``diff``, ``xml``, and ``html`` all return a Unicode string.


Unique Attributes
Expand All @@ -88,7 +86,7 @@ Using Formatters
By default the diff functions will return an edit script,
but if you pass in a formatter the result will be whatever that formatter returns.

The three included formatters, ``diff``, ``xml`` and ``rml``,
The three included formatters, ``diff``, ``xml`` and ``html``,
all return Unicode strings.
The ``diff`` formatter will return a string with the edit script printed out,
one action per line.
Expand All @@ -103,14 +101,12 @@ so the output is not compatible.

>>> from xmldiff import formatting
>>> formatter = formatting.DiffFormatter()
>>> print(main.diff_files("../tests/test_data/insert-node.left.rml",
... "../tests/test_data/insert-node.right.rml",
>>> print(main.diff_files("../tests/test_data/insert-node.left.html",
... "../tests/test_data/insert-node.right.html",
... formatter=formatter))
[update-text, /document/story[1], null]
[insert, /document/story[1], h1, 0]
[update-text, /document/story/h1[1], "Inserted "]
[insert, /document/story/h1[1], i, 0]
[update-text, /document/story/h1/i[1], "Node"]
[update-text, /body/div[1], null]
[insert, /body/div[1], p, 0]
[update-text, /body/div/p[1], "Simple text"]


The other two differs return XML with tags describing the changes.
Expand All @@ -121,17 +117,17 @@ for example with XSLT replacing the tags with the format you need.
:options: -ELLIPSIS, +NORMALIZE_WHITESPACE

>>> from xmldiff import formatting
>>> formatter = formatting.RMLFormatter()
>>> print(main.diff_files("../tests/test_data/insert-node.left.rml",
... "../tests/test_data/insert-node.right.rml",
>>> formatter = formatting.HTMLFormatter()
>>> print(main.diff_files("../tests/test_data/insert-node.left.html",
... "../tests/test_data/insert-node.right.html",
... formatter=formatter))
<document xmlns:diff="http://namespaces.shoobx.com/diff" title="insert-node">
<story id="id">
<h1 diff:insert="">
<diff:insert>Inserted <i>Node</i></diff:insert>
</h1>
</story>
</document>
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<div id="id">
<p diff:insert="">
<diff:insert>Simple text</diff:insert>
</p>
</div>
</body>


The Edit Script
Expand Down
11 changes: 4 additions & 7 deletions docs/source/commandline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,11 @@ since the whitespace there occurs inside a tag:
<data count="1"> </data><data count="2"></data>
In some XML formats, whitespace inside some tags is also not significant.
The ``rml`` formatter is an example of this.
It's a format that can be used to generate documents,
and has a ``<para>`` tag for formatted text,
similar to HTML's ``<p>`` tag.
The ``rml`` formatter is aware of this,
The ``html`` formatter is an example of this.
It is aware of that ``<p>`` tags contain text where whitespace isn't significant,
and will by default normalize whitespace inside these tags before comparing it,
effectively replacing any whitespace inside those tags to a single space.
This is so that when diffing two versions of RML files you will not see changes that would not be visible in the final document.
This is so that when diffing two versions of HTML files you will not see changes that would not be visible in the final document.

Both of these types of whitespace can be preserved with the ``--keep-whitespace`` argument.
The third case of whitespace,
Expand All @@ -87,7 +84,7 @@ The term "pretty printing" refers to making an output a bit more human readable
In the case of XML this means inserting ignorable whitespace into the XML,
yes, the same in-between whitespace that is ignored by ``xmldiff`` when detecting changes between two files.

``xmldiff``'s ``xml`` and ``rml`` formatters understand the ``--pretty-print`` argument and will insert whitespace to make the output more readable.
``xmldiff``'s ``xml`` and ``html`` formatters understand the ``--pretty-print`` argument and will insert whitespace to make the output more readable.

For example, an XML output that would normally look like this:

Expand Down
5 changes: 5 additions & 0 deletions tests/test_data/complex-text-update.expected.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<div id="id">
<p><diff:insert>Let's see. </diff:insert>This is some simple text demonstrating the features of the <b diff:delete-formatting=""><i diff:insert-formatting="">human text differ</i></b>. This <u diff:delete-formatting="">feature</u> attempts to make changelog readable for humans. The <i diff:insert-formatting="">human text differ</i> uses sentences as its first order matching. <diff:delete>Let's see.</diff:delete><invalid><diff:insert>It should handle unknown tags just fine.</diff:insert></invalid></p>
</div>
</body>
5 changes: 0 additions & 5 deletions tests/test_data/complex-text-update.expected.rml

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<document title="insert-node">
<story id="id">
<body>
<div id="id">

<para>
<p>
This is some simple text demonstrating the features of the <b>human text
differ</b>. This <u>feature</u> attempts to make changelog readable for
humans. The human text differ uses sentences as its first order
matching. Let's see.
</para>
</p>

</story>
</document>
</div>
</body>
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
<document title="insert-node">
<story id="id">
<body>
<div id="id">

<para>
<p>
Let's see. This is some simple text demonstrating the features of the
<i>human text differ</i>. This feature attempts to make changelog
readable for humans. The <i>human text differ</i> uses sentences as its
first order matching.
</para>
first order matching. <invalid>It should handle unknown tags just fine.</invalid>
</p>

</story>
</document>
</div>
</body>
7 changes: 7 additions & 0 deletions tests/test_data/insert-node.expected.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<body xmlns:diff="http://namespaces.shoobx.com/diff">
<div id="id">
<p diff:insert="">
<diff:insert>Simple text</diff:insert>
</p>
</div>
</body>
7 changes: 0 additions & 7 deletions tests/test_data/insert-node.expected.rml

This file was deleted.

4 changes: 4 additions & 0 deletions tests/test_data/insert-node.left.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<body>
<div id="id">
</div>
</body>
4 changes: 0 additions & 4 deletions tests/test_data/insert-node.left.rml

This file was deleted.

5 changes: 5 additions & 0 deletions tests/test_data/insert-node.right.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<body>
<div id="id">
<p>Simple text</p>
</div>
</body>
7 changes: 0 additions & 7 deletions tests/test_data/insert-node.right.rml

This file was deleted.

7 changes: 0 additions & 7 deletions tests/test_data/no-text-substitutions.expected.rml

This file was deleted.

4 changes: 0 additions & 4 deletions tests/test_data/no-text-substitutions.left.rml

This file was deleted.

5 changes: 0 additions & 5 deletions tests/test_data/no-text-substitutions.right.rml

This file was deleted.

Loading

0 comments on commit 51bc20e

Please sign in to comment.