Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define options to suppress warnings generated by invalid iXML content, and add to the defences of XML entity injection. #376

Merged

Conversation

ben-xo
Copy link
Contributor

@ben-xo ben-xo commented Apr 17, 2022

This PR adds LIBXML_WARNING, LIBXML_NONET and LIBXML_COMPACT (if supported), in addition to LIBXML_NOENT, to simplexml_load_string(), which is currently only used for parsing RIFF iXML (http://www.gallery.co.uk/ixml/)

  • LIBXML_NONET, to deny access to entites included remotely. This is the most interesting one, as there is other code nearby which is supposed to address the same issue.
  • LIBXML_NOWARNING, to suppress recoverable parsing warnings which we can't do anything about (and may even be deliberate)
  • If your version of libxml supports it, LIBXML_COMPACT, which is described at https://www.php.net/manual/en/libxml.constants.php/ as "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." The gotcha is that the DOM is readonly, but we're not using this code to manipulate tags so that should be fine.

I'll explain the rationale for each.

Invalid XML / LIBXML_NOWARNING

I encountered a .wav file with an iXML studio mastering tag in my collection. Here's an excerpt from the file:

<?xml version="1.0" encoding="UTF-8"?>
...
<PROJECT>Optiv & CZA - Invisible Things-13 Mixdown-32 NEW 1--22</PROJECT>
...
<BWF_CODING_HISTORY>A=PCM,F=44100,W=16,T=SOUND FORGE Pro 12.1,
</BWF_CODING_HISTORY>

As you can see, the <PROJECT> tag is technically invalid because it has a nameless entity (&). The exact warning is
PHP Warning: simplexml_load_string(): Entity: line 3: parser error : xmlParseEntityRef: no name.

This is clearly both recoverable and harmless, and may even be intentional given that Sound Forge Pro 12 is not old (2018).

Suppressing parser warnings seems to fit with the other warning suppressions nearby in the code.

XML Inclusion / LIBXML_NONET

This one's more interesting. In commit afbdaa0 (2014) additional code was added by the wordpress team with the commit "improved XXE fix". The change adds option LIBXML_NOENT, and references the article http://websec.io/2012/08/27/Preventing-XEE-in-PHP.html.

The problem is that I think LIBXML_NOENT was a typo and should have been LIBXML_NONET. The article does not mention NOENT at all.

Ironically, the NOENT option does the exact opposite of what it sounds like it does - it enables entity parsing. Refs:

I suspect this was a mistake by the original author. Nevertheless, it's fairly harmless to have entities enabled in the XML as long as it's not possible to do remote inclusions, and that's already disabled with libxml_disable_entity_loader().

Adding LIBXML_NONET (as referenced in the article to improve XXE defences) prevents libxml from using the network.

I did not remove LIBXML_NOENT as theoretically you can define your own entites inline at the top of an XML doc, even though I very much doubt it would work properly if you did, given the first bug addressed in this PR.

Performance / LIBXML_COMPACT

If your version of libxml supports it this flag "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." according to https://www.php.net/manual/en/libxml.constants.php . The gotcha is that the returned DOM object is readonly, but we're not using this code to manipulate a DOM so that should be fine.

…, and add to the defences of XML entity injection.

This PR adds LIBXML_WARNING, LIBXML_NONET and LIBXML_COMPACT (if supported), in addition to LIBXML_NOENT, to simplexml_load_string(), which is currently only used for parsing RIFF iXML (http://www.gallery.co.uk/ixml/)

* LIBXML_NONET, to deny access to entites included remotely. This is the most interesting one, as there is other code nearby which is supposed to address the same issue.
* LIBXML_NOWARNING, to suppress recoverable parsing warnings which we can't do anything about (and may even be deliberate)
* If your version of libxml supports it, LIBXML_COMPACT, which is described at https://www.php.net/manual/en/libxml.constants.php/ as "Activate small nodes allocation optimization. This may speed up your     application without needing to change the code." The gotcha is that the DOM is readonly, but we're not using this code to manipulate tags so that should be fine.

I'll explain the rationale for each.

Invalid XML / LIBXML_NOWARNING
==============================

I encountered a .wav file with an iXML studio mastering tag in my collection. Here's an excerpt from the file:

    <?xml version="1.0" encoding="UTF-8"?>
    ...
    <PROJECT>Optiv & CZA - Invisible Things-13 Mixdown-32 NEW 1--22</PROJECT>
    ...
    <BWF_CODING_HISTORY>A=PCM,F=44100,W=16,T=SOUND FORGE Pro 12.1,
    </BWF_CODING_HISTORY>

As you can see, the `<PROJECT>` tag is technically invalid because it has a nameless entity (`&`). The exact warning is
`PHP Warning:  simplexml_load_string(): Entity: line 3: parser error : xmlParseEntityRef: no name`.

This is clearly both recoverable and harmless, and may even be intentional given that Sound Forge Pro 12 is not old (2018).

Suppressing parser warnings seems to fit with the other warning suppressions nearby in the code.

XML Inclusion / LIBXML_NONET
============================

This one's more interesting. In commit `afbdaa04` (2014) additional code was added by the wordpress team with the commit "improved XXE fix". The change adds option LIBXML_NOENT, and references the article http://websec.io/2012/08/27/Preventing-XEE-in-PHP.html.

The problem is that I think LIBXML_NOENT was a typo and should have been LIBXML_NONET. The article does not mention NOENT at all.

Ironically, the NOENT option does the _exact opposite_ of what it sounds like it does - it *enables* entity parsing. Refs:
* https://www.php.net/manual/en/libxml.constants.php - comment "The name of the constant LIBXML_NOENT is very misleading. Adding this flag actually causes the parser to load and insert the external entities. Omitting it leaves the tags untouched, which is probably what you want."
* https://stackoverflow.com/q/38807506/367456 - "What does LIBXML_NOENT do (and why isn't it called LIBXML_ENT)?"

I suspect this was a mistake by the original author. Nevertheless, it's fairly harmless to have entities enabled in the XML as long as it's not possible to do remote inclusions, and that's already disabled with `libxml_disable_entity_loader()`.

Adding LIBXML_NONET (as referenced in the article to improve XXE defences) prevents libxml from using the network.

I did not remove LIBXML_NOENT as theoretically you can define your own entites inline at the top of an XML doc, even though I very much doubt it would work properly if you did, given the first bug addressed in this PR.

Performance / LIBXML_COMPACT
============================

If your version of libxml supports it this flag "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." according to https://www.php.net/manual/en/libxml.constants.php . The gotcha is that the returned DOM object is readonly, but we're not using this code to manipulate a DOM so that should be fine.
@ben-xo ben-xo changed the title Define options to suppress warnings generated by invalid iXML content… Define options to suppress warnings generated by invalid iXML content, and add to the defences of XML entity injection. Apr 17, 2022
@JamesHeinrich JamesHeinrich merged commit 00bd8bf into JamesHeinrich:master Apr 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants