XMLProcessor: Support namespaces #126
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add Namespace Support to XMLProcessor
This PR upgrades
XMLProcessorto fully support XML Namespaces 1.0. Tags and attributes are now consistently interpreted according to their declared namespaces, fixing compatibility with WordPress WXR files and EPUB metadata.New methods signatures:
Usage comparison:
Rationale
The old parser treated tag and attribute names as opaque strings (
wp:postmeta,wp:tag, etc.), ignoring that these were syntactic sugar for{namespace}local-name.This made it impossible to reliably parse WXR files. The
wp:may refer to different namespaces in different parts of the XML document.After this PR, XML namespaces are first-class citizens in all lookup functions which allows us to correctly identify the content-bearing tags in the relevant, top-level WXR namespace.
Implementation Details
$stack_of_open_elementstracks the hierarchy ofXMLElementframes and the namespaces they define and remove.set_attribute($ns, $attr, $value)andget_attribute($ns, $attr)accept the full namespace string as their first argument to force the developer to take it into consideration.next_tag()andmatches_breadcrumbs()accept two-tuples{$namespace, $local_tag_name}instead of string-based tag names. Tag names are still accepted.*wildcards are supported, too.get_breadcrumbs()return an array of two-tuples{$namespace, $local_tag_name}, e.g.[['', 'root'], ['http://wp.org/export/1.2/', 'post']]Testing instructions
Confirm most of the CI tests pass (aside of the flaky network-related ones)