Define options to suppress warnings generated by invalid iXML content, and add to the defences of XML entity injection. #376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds
LIBXML_WARNING
,LIBXML_NONET
andLIBXML_COMPACT
(if supported), in addition toLIBXML_NOENT
, tosimplexml_load_string()
, which is currently only used for parsing RIFF iXML (http://www.gallery.co.uk/ixml/)LIBXML_NONET
, to deny access to entites included remotely. This is the most interesting one, as there is other code nearby which is supposed to address the same issue.LIBXML_NOWARNING
, to suppress recoverable parsing warnings which we can't do anything about (and may even be deliberate)LIBXML_COMPACT
, which is described at https://www.php.net/manual/en/libxml.constants.php/ as "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." The gotcha is that the DOM is readonly, but we're not using this code to manipulate tags so that should be fine.I'll explain the rationale for each.
Invalid XML /
LIBXML_NOWARNING
I encountered a .wav file with an iXML studio mastering tag in my collection. Here's an excerpt from the file:
As you can see, the
<PROJECT>
tag is technically invalid because it has a nameless entity (&
). The exact warning isPHP Warning: simplexml_load_string(): Entity: line 3: parser error : xmlParseEntityRef: no name
.This is clearly both recoverable and harmless, and may even be intentional given that Sound Forge Pro 12 is not old (2018).
Suppressing parser warnings seems to fit with the other warning suppressions nearby in the code.
XML Inclusion /
LIBXML_NONET
This one's more interesting. In commit afbdaa0 (2014) additional code was added by the wordpress team with the commit "improved XXE fix". The change adds option
LIBXML_NOENT
, and references the article http://websec.io/2012/08/27/Preventing-XEE-in-PHP.html.The problem is that I think
LIBXML_NOENT
was a typo and should have beenLIBXML_NONET
. The article does not mention NOENT at all.Ironically, the
NOENT
option does the exact opposite of what it sounds like it does - it enables entity parsing. Refs:LIBXML_NOENT
is very misleading. Adding this flag actually causes the parser to load and insert the external entities. Omitting it leaves the tags untouched, which is probably what you want."LIBXML_NOENT
do (and why isn't it calledLIBXML_ENT
)?"I suspect this was a mistake by the original author. Nevertheless, it's fairly harmless to have entities enabled in the XML as long as it's not possible to do remote inclusions, and that's already disabled with
libxml_disable_entity_loader()
.Adding
LIBXML_NONET
(as referenced in the article to improve XXE defences) prevents libxml from using the network.I did not remove
LIBXML_NOENT
as theoretically you can define your own entites inline at the top of an XML doc, even though I very much doubt it would work properly if you did, given the first bug addressed in this PR.Performance /
LIBXML_COMPACT
If your version of libxml supports it this flag "Activate small nodes allocation optimization. This may speed up your application without needing to change the code." according to https://www.php.net/manual/en/libxml.constants.php . The gotcha is that the returned DOM object is readonly, but we're not using this code to manipulate a DOM so that should be fine.