Skip to content

Commit

Permalink
Drop defusedxml dependency; add security notes
Browse files Browse the repository at this point in the history
- Drop defusedxml dependency (its support for lxml is deprecated)

- Add equivalent centralized configuration for lxml parser
  (resolve_entities=False)

- Add tests from defusedxml

- Add security documentation and links

Fixes #132
  • Loading branch information
kislyuk committed Nov 30, 2019
1 parent 1fda670 commit 83c05fb
Show file tree
Hide file tree
Showing 14 changed files with 113 additions and 44 deletions.
63 changes: 33 additions & 30 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,16 @@ payload security in `SAML 2.0 <http://en.wikipedia.org/wiki/SAML_2.0>`_ and
(`Version 1.1 <http://www.w3.org/TR/xmldsig-core1/>`_ and `Version 2.0 <http://www.w3.org/TR/xmldsig-core2>`_).
*SignXML* implements all of the required components of the standard, and most recommended ones. Its features are:

* Use of `defusedxml.lxml <https://bitbucket.org/tiran/defusedxml>`_ to defend against common XML-based attacks when
verifying signatures
* Use of a libxml2-based XML parser configured to defend against
`common XML attacks <https://docs.python.org/3/library/xml.html#xml-vulnerabilities>`_ when verifying signatures
* Extensions to allow signing with and verifying X.509 certificate chains, including hostname/CN validation
* Support for exclusive XML canonicalization with inclusive prefixes (`InclusiveNamespaces PrefixList
<http://www.w3.org/TR/xml-exc-c14n/#def-InclusiveNamespaces-PrefixList>`_, required to verify signatures generated by
some SAML implementations)
* Modern Python compatibility (2.7-3.8+ and PyPy)
* Well-supported, portable, reliable dependencies: `lxml <https://github.com/lxml/lxml>`_, `defusedxml
<https://bitbucket.org/tiran/defusedxml>`_, `cryptography <https://github.com/pyca/cryptography>`_, `eight
<https://github.com/kislyuk/eight>`_, `pyOpenSSL <https://github.com/pyca/pyopenssl>`_
* Well-supported, portable, reliable dependencies: `lxml <https://github.com/lxml/lxml>`_,
`cryptography <https://github.com/pyca/cryptography>`_, `eight <https://github.com/kislyuk/eight>`_,
`pyOpenSSL <https://github.com/pyca/pyopenssl>`_
* Comprehensive testing (including the XMLDSig interoperability suite) and `continuous integration
<https://travis-ci.org/XML-Security/signxml>`_
* Simple interface with useful defaults
Expand All @@ -33,29 +33,26 @@ Note: SignXML depends on `lxml <https://github.com/lxml/lxml>`_ and `cryptograph
<https://github.com/pyca/cryptography>`_, which in turn depend on `OpenSSL <https://www.openssl.org/>`_, `LibXML
<http://xmlsoft.org/>`_, and Python tools to interface with them. You can install those as follows:

+--------------+---------+-------------------------------------------------------------------------------------------------------------+
| OS | Python | Command |
+==============+=========+=============================================================================================================+
| Ubuntu | Python 2| ``apt-get install python-dev python-cffi libxml2-dev libxslt1-dev libssl-dev libffi-dev`` |
| | | ``python-lxml python-cryptography python-openssl python-certifi python-defusedxml build-essential`` |
+--------------+---------+-------------------------------------------------------------------------------------------------------------+
| Ubuntu | Python 3| ``apt-get install python3-dev python3-cffi libxml2-dev libxslt1-dev libssl-dev libffi-dev`` |
| | | ``python3-lxml python3-cryptography python3-openssl python3-certifi python3-defusedxml build-essential`` |
+--------------+---------+-------------------------------------------------------------------------------------------------------------+
| Red Hat | Python 2| ``yum install python-devel python-cffi libxml2-devel libxslt1-devel openssl-devel`` |
+--------------+---------+-------------------------------------------------------------------------------------------------------------+
| Red Hat | Python 3| ``yum install python3-devel python3-cffi libxml2-devel libxslt1-devel openssl-devel`` |
+--------------+---------+-------------------------------------------------------------------------------------------------------------+
| Mac OS | Homebrew| Install `Homebrew <https://brew.sh>`_, then run ``brew install python``. |
| | Python | |
+--------------+---------+-------------------------------------------------------------------------------------------------------------+
+--------------+---------+---------------------------------------------------------------------------------------------+
| OS | Python | Command |
+==============+=========+=============================================================================================+
| Ubuntu | Python 2| ``apt-get install python-dev python-cffi libxml2-dev libxslt1-dev libssl-dev libffi-dev`` |
| | | ``python-lxml python-cryptography python-openssl python-certifi build-essential`` |
+--------------+---------+---------------------------------------------------------------------------------------------+
| Ubuntu | Python 3| ``apt-get install python3-dev python3-cffi libxml2-dev libxslt1-dev libssl-dev libffi-dev`` |
| | | ``python3-lxml python3-cryptography python3-openssl python3-certifi build-essential`` |
+--------------+---------+---------------------------------------------------------------------------------------------+
| Red Hat | Python 2| ``yum install python-devel python-cffi libxml2-devel libxslt1-devel openssl-devel`` |
+--------------+---------+---------------------------------------------------------------------------------------------+
| Red Hat | Python 3| ``yum install python3-devel python3-cffi libxml2-devel libxslt1-devel openssl-devel`` |
+--------------+---------+---------------------------------------------------------------------------------------------+
| Mac OS | Homebrew| Install `Homebrew <https://brew.sh>`_, then run ``brew install python``. |
| | Python | |
+--------------+---------+---------------------------------------------------------------------------------------------+

Synopsis
--------

SignXML uses the `lxml ElementTree API <https://lxml.de/tutorial.html>`_ to work with XML data. See the
*Compatibility with xml.etree.ElementTree* section below if using the
`ElementTree API from the standard library <https://docs.python.org/3/library/xml.etree.elementtree.html>`_.
SignXML uses the `lxml ElementTree API <https://lxml.de/tutorial.html>`_ to work with XML data.

.. code-block:: python
Expand Down Expand Up @@ -142,13 +139,19 @@ For detached signatures, the code above will use the ``Id`` or ``ID`` attribute

See the `API documentation <https://signxml.readthedocs.io/en/latest/#id4>`_ for more.

Compatibility with ``xml.etree.ElementTree``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
XML parsing security and compatibility with ``xml.etree.ElementTree``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SignXML uses the `lxml <https://github.com/lxml/lxml>`_ ElementTree library, not the
`ElementTree from Python's standard library <https://docs.python.org/3.8/library/xml.etree.elementtree.html>`_,
to work with XML. lxml is used due to its XML canonicalization and namespace organization features. Using SignXML
with ``xml.etree.ElementTree`` is possible, but care must be taken to work around some of its namespace manipulation
issues, as covered in `this blog post <https://technotes.shemyak.com/posts/xml-signatures-with-python-elementtree/>`_.
to work with XML. lxml is used due to its superior resistance to XML attacks, as well as XML canonicalization and
namespace organization features. It is recommended that you pass XML string input directly to signxml before further
parsing, and use lxml to work with untrusted XML input in general. If you do pass trusted ``xml.etree.ElementTree``
objects to SignXML, you should be aware of differences in XML namespace handling between the two libraries. See the
following references for more information:

* `How do I use lxml safely as a web-service endpoint? <https://lxml.de/FAQ.html#how-do-i-use-lxml-safely-as-a-web-service-endpoint>`_
* `ElementTree compatibility of lxml.etree <https://lxml.de/compatibility.html>`_
* `XML Signatures with Python ElementTree <https://technotes.shemyak.com/posts/xml-signatures-with-python-elementtree>`_

Authors
-------
Expand Down
1 change: 0 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@
long_description=open('README.rst').read(),
install_requires=[
'lxml >= 4.2.1, < 5',
'defusedxml >= 0.5.0, < 1',
'eight >= 0.4.2, < 1',
'cryptography >= 2.1.4, < 3',
'asn1crypto >= 0.24.0',
Expand Down
13 changes: 7 additions & 6 deletions signxml/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
from eight import str, bytes
from lxml import etree
from lxml.etree import Element, SubElement
from defusedxml.lxml import fromstring

from cryptography.hazmat.primitives.asymmetric import rsa, dsa, ec
from cryptography.hazmat.primitives.asymmetric.padding import PKCS1v15
Expand Down Expand Up @@ -213,7 +212,7 @@ def _c14n(self, nodes, algorithm, inclusive_ns_prefixes=None):
c14n = b""
for node in nodes:
c14n += etree.tostring(node, method="c14n", exclusive=exclusive, with_comments=with_comments,
inclusive_ns_prefixes=inclusive_ns_prefixes) # TODO: optimize if needed
inclusive_ns_prefixes=inclusive_ns_prefixes)
if exclusive is False:
# TODO: there must be a nicer way to do this. See also:
# http://www.w3.org/TR/xml-c14n, "namespace axis"
Expand Down Expand Up @@ -661,7 +660,9 @@ def verify(self, data, require_x509=True, x509_cert=None, cert_subject_name=None
:type hmac_key: string
:param validate_schema: Whether to validate **data** against the XML Signature schema.
:type validate_schema: boolean
:param parser: Custom XML parser instance to use when parsing **data**.
:param parser:
Custom XML parser instance to use when parsing **data**. The default parser arguments used by SignXML are:
``resolve_entities=False``. See https://lxml.de/FAQ.html#how-do-i-use-lxml-safely-as-a-web-service-endpoint.
:type parser: :py:class:`lxml.etree.XMLParser` compatible parser
:param uri_resolver: Function to use to resolve reference URIs that don't start with "#".
:type uri_resolver: callable
Expand Down Expand Up @@ -694,7 +695,7 @@ def verify(self, data, require_x509=True, x509_cert=None, cert_subject_name=None
signature_ref = self._get_signature(root)

# HACK: deep copy won't keep root's namespaces
signature = fromstring(etree.tostring(signature_ref), parser=parser)
signature = self.fromstring(self.tostring(signature_ref))

if validate_schema:
self.schema().assertValid(signature)
Expand Down Expand Up @@ -762,7 +763,7 @@ def verify(self, data, require_x509=True, x509_cert=None, cert_subject_name=None

verify_results = []
for reference in self._findall(signed_info, "Reference"):
copied_root = fromstring(etree.tostring(root), parser=parser)
copied_root = self.fromstring(self.tostring(root))
copied_signature_ref = self._get_signature(copied_root)
transforms = self._find(reference, "Transforms", require=False)
digest_algorithm = self._find(reference, "DigestMethod").get("Algorithm")
Expand All @@ -774,7 +775,7 @@ def verify(self, data, require_x509=True, x509_cert=None, cert_subject_name=None

# We return the signed XML (and only that) to ensure no access to unsigned data happens
try:
payload_c14n_xml = fromstring(payload_c14n)
payload_c14n_xml = self.fromstring(payload_c14n)
except etree.XMLSyntaxError:
payload_c14n_xml = None
verify_results.append(VerifyResult(payload_c14n, payload_c14n_xml, signature))
Expand Down
20 changes: 14 additions & 6 deletions signxml/util/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,8 @@

from eight import str, bytes
from lxml import etree
from defusedxml.lxml import fromstring

from ..exceptions import RedundantCert, InvalidCertificate
from ..exceptions import RedundantCert, InvalidCertificate, InvalidInput

USING_PYTHON2 = True if sys.version_info < (3, 0) else False

Expand Down Expand Up @@ -134,21 +133,30 @@ def schema(cls):
def parser(self):
if self._parser is None:
if self._default_parser is None:
self._default_parser = etree.XMLParser()
self._default_parser = etree.XMLParser(resolve_entities=False)
return self._default_parser
return self._parser

def fromstring(self, xml_string, **kwargs):
xml_node = etree.fromstring(xml_string, parser=self.parser, **kwargs)
for entity in xml_node.iter(etree.Entity):
raise InvalidInput("Entities are not supported in XML input")
return xml_node

def tostring(self, xml_node, **kwargs):
return etree.tostring(xml_node, **kwargs)

def get_root(self, data):
if isinstance(data, (str, bytes)):
return fromstring(data, parser=self.parser)
return self.fromstring(data)
elif isinstance(data, stdlibElementTree.Element):
# TODO: add debug level logging statement re: performance impact here
return fromstring(stdlibElementTree.tostring(data, encoding="utf-8"))
return self.fromstring(stdlibElementTree.tostring(data, encoding="utf-8"))
else:
# HACK: deep copy won't keep root's namespaces resulting in an invalid digest
# We use a copy so we can modify the tree
# TODO: turn this off for xmlenc
return fromstring(etree.tostring(data))
return self.fromstring(etree.tostring(data))


def hmac_sha1(key, message):
Expand Down
5 changes: 5 additions & 0 deletions test/defusedxml-test-data/cyclic.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!DOCTYPE xmlbomb [
<!ENTITY a "123 &b;" >
<!ENTITY b "&a;">
]>
<bomb>&a;</bomb>
8 changes: 8 additions & 0 deletions test/defusedxml-test-data/dtd.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head/>
<body>text</body>
</html>

4 changes: 4 additions & 0 deletions test/defusedxml-test-data/external.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<!DOCTYPE external [
<!ENTITY ee SYSTEM "http://www.w3schools.com/xml/note.xml">
]>
<root>&ee;</root>
5 changes: 5 additions & 0 deletions test/defusedxml-test-data/external_file.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<!DOCTYPE external [
<!ENTITY ee SYSTEM "file:///PATH/TO/xmltestdata/simple.xml">
]>
<root>&ee;</root>

4 changes: 4 additions & 0 deletions test/defusedxml-test-data/quadratic.xml

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions test/defusedxml-test-data/simple-ns.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<?pi data?>
<!-- comment -->
<root xmlns='namespace'>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>
6 changes: 6 additions & 0 deletions test/defusedxml-test-data/simple.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<!-- comment -->
<root>
<element key='value'>text</element>
<element>text</element>tail
<empty-element/>
</root>
7 changes: 7 additions & 0 deletions test/defusedxml-test-data/xmlbomb.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<!DOCTYPE xmlbomb [
<!ENTITY a "1234567890" >
<!ENTITY b "&a;&a;&a;&a;&a;&a;&a;&a;">
<!ENTITY c "&b;&b;&b;&b;&b;&b;&b;&b;">
<!ENTITY d "&c;&c;&c;&c;&c;&c;&c;&c;">
]>
<bomb>&c;</bomb>
4 changes: 4 additions & 0 deletions test/defusedxml-test-data/xmlbomb2.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<!DOCTYPE xmlbomb [
<!ENTITY a "1234567890">
]>
<root>text<bomb>&a;</bomb><tag/></root>
10 changes: 9 additions & 1 deletion test/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,9 @@ def get_ca_pem_file(signature_file):
self.assertIsInstance(e, InvalidCertificate)
elif signature_file.endswith("signature-retrievalmethod-rawx509crt.xml"):
self.assertIsInstance(e, InvalidInput)
elif any(x in signature_file for x in unsupported_cases) or "EntitiesForbidden" in str(e):
elif signature_file.endswith("merlin-xmldsig-twenty-three/signature.xml"):
self.assertIsInstance(e, InvalidInput)
elif any(x in signature_file for x in unsupported_cases):
print("Unsupported test case:", type(e), e)
elif any(x in signature_file for x in bad_interop_cases) or "Unable to resolve reference" in str(e):
print("Bad interop test case:", type(e), e)
Expand Down Expand Up @@ -389,5 +391,11 @@ def test_psha1(self):
self.assertEqual(p_sha1(a, b), c)
self.assertEqual(p_sha1(b, a), d)

def test_xml_attacks(self):
for filename in glob(os.path.join(os.path.dirname(__file__), "defusedxml-test-data", "*.xml")):
with open(filename, "rb") as fh:
with self.assertRaises((InvalidInput, etree.XMLSyntaxError)):
XMLVerifier().verify(fh.read())

if __name__ == '__main__':
unittest.main()

0 comments on commit 83c05fb

Please sign in to comment.