Enhance the sitemap stylesheet to dynamically generate columns in the table without knowing a priori what elements will appear in the sitemap. #153
Conversation
… table without knowing a priori what elements will appear in the sitemap.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
…olumn headings output by the sitemap stylesheet. Also introduces `esc_xml()` and `esc_xml__()` functions (which are intended to be included in the core merge proposal), that are to equivalent of `esc_html()` and `esc_html__()` but do XML-specific escaping.
* arrays whose keys are local names and | ||
* whose values are column headings. | ||
*/ | ||
$column_headings = apply_filters( 'core_sitemaps_stylesheet_column_headings', $column_headings ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example how to use this new filter with the current state of this plugin (i.e., when extension elements are in the http://www.sitemaps.org/schemas/sitemap/0.9
namespace):
add_filter( 'core_sitemaps_posts_url_list', function( $url_list ) {
foreach ( $url_list as &$url_item ) {
$url_item['my-extension-element' = __( 'some value', 'my-plugin' );
}
return $url_list;
} );
add_filter( 'core_sitemaps_stylesheet_column_headings', function( $column_headings ) {
$column_headings['http://www.sitemaps.org/schemas/sitemap/0.9']['loc'] = __( 'Permalink', 'my-plugin' );
$column_headings['http://www.sitemaps.org/schemas/sitemap/0.9']['my-extension-element'] = __( 'Cool custom element', 'my-plugin' );
} );
And here's how it would be used if/when somethink like the proposal in #151 (comment) is incorporated in this plugin:
add_filter( 'core_site_maps_namespace_bindings', function( $namespace_bindings ) {
$namespace_bindings[ 'my-plugin' ] = 'urn:my-plugin';
} );
add_filter( 'core_sitemaps_posts_url_list', function( $url_list ) {
foreach ( $url_list as &$url_item ) {
$url_item['my-plugin:extension-element'] = __( 'some value', 'my-plugin' );
}
return $url_list;
} );
add_filter( 'core_sitemaps_stylesheet_column_headings', function( $column_headings ) {
$column_headings['http://www.sitemaps.org/schemas/sitemap/0.9']['loc'] = __( 'Permalink', 'my-plugin' );
$column_headings['urn:my-plugin]['extension-element'] = __( 'Cool custom element', 'my-plugin' );
} );
inc/functions.php
Outdated
function esc_xml( $text ) { | ||
$safe_text = wp_check_invalid_utf8( $text ); | ||
$safe_text = _wp_specialchars( $safe_text, ENT_QUOTES ); | ||
$safe_text = html_entity_decode( $safe_text, ENT_HTML5 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for these new esc_xml()
and esc_xml__()
functions is that far too many developers think that since they can use &
, '
, ", etc in XML then they can **also** use all the named character references they are used to using in HTML, e.g.,
,
…`, etc...but they cannot...and doing so will result in a non-well-formed XML instance.
The call to html_entity_decode( $safe_text, ENT_HTML5 ) will replace all of the named character references defined in the HTML spec with their equivalent Unicode code points (e.g.
will become \xA0
, etc).
It would be nice PHP had a native function that would replace them with character references (e.g., ' ' would become
&#A0;`) but unfortunately it doesn't :-(
Note that all uses of esc_attr()
in this plugin (e.g., in Core_Sitemaps_Renderer::get_sitemap_xml()
) should be replaced with calls to esc_xml()
, but I thought it best to keep this PR strictly related to the stylesheet. Will do another PR for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see what happens when content like
is included in an XML instance by doing the following (in v0.2.0 of this plugin...i.e., without the changes in this PR) :
add_filter( 'core_sitemaps_posts_url_list', function( $url_list ) {
foreach ( $url_list as &$url_item ) {
$url_item['foo'] = 'This will be a non-well-formed sitemap, and it will fail to render in the browser';
}
return $url_list;
} );
- Chrome will just show a blank screen (and no error message in the console)
- Firefox will show an error screen (without any error message), but will show
XML Parsing Error: undefined entity
in the console) - Edge and IE will show a screen with the text of content of each element in the sitemap (which is not conformant with the XML spec, but it's Microsoft, so what do you expect), and will display
Invalid tag start: "<?". Question marks should not start tags.
in the console, which is not actually what the error is, but as we all know when parse errors occur it can be hard to output the correct error message) - Not sure what Safari, Opera, etc will do, but I expect one of the above
Would anyone explicitly include
, …
, etc in element content in a sitemap? Maybe not, but they very well could include content stored in post meta, which easily could contain HTML named character references (since such post meta was likely stored so that it could be displayed in HTML).
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
I believe the remaining phpcs errors reported in travis are spurious. They are related to the new |
CLosing in favor of #163. |
Issue Number
Fixes #152
Description
Well, I went ahead and took some time to work on this even tho no one has replied to the issue (I'm kind of compulsive that way :-).
This PR modifies the XSLT stylesheet for sitemaps to allow it to render columns in the HTML table output for all distinct children of
Q{http://www.sitemaps.org/schemas/sitemap/0.9}url
in the sitemap (see below for the meaning of theQ{...}xyz
notation), without knowing a priori what children there may be.The columns are ordered as follows:
Q{http://www.sitemaps.org/schemas/sitemap/0.9}url
, i.e.,URL
,Last Modified
,Change Frequency
andPriority
.http://www.sitemaps.org/schemas/sitemap/0.9
namespace, ordered lexically by theirlocal-name()
(see extension elements in sitemaps #151 for why that is necessary, even though such sitemaps are invalid according to the XML Schema)local-name()
.The new stylesheet is smart enough to deal with cases where different
Q{http://www.sitemaps.org/schemas/sitemap/0.9}url
elements have different children, e.g.will produce a table like:
That example also illustrates that a
class
attribute is added for columns resulting from extension elements in case the plugins that add them want to style such columns differently (to make it clear to users that they extensions). Of course, they won't be able to use theclass
shorthand notation in the CSS selectors, they'll have to doth[class~="urn:sparrowhawkcomputing.com"], td[class~="urn:sparrowhawkcomputing.com"] { color: red; }
since the namespace URI's will contain characters not legal in theclass
shorthand selector syntax.Because no browsers currently support anything other than XSLT/XPath 1.0, the stylesheet is a little more complicated than it would be if they supported XSLT/XPath 2.0/3.0. In particular, it requires using one of 2 extension functions, exsl:node-set() or msxsl:node-set(). The first is supported by Chrome, Firefox, Safari, Opera and other browsers; the later is supported by Edge and IE (11). As a fallback in case neither function is available, then the stylesheet will just render a single
URL
column.We could (should?) get even more elaborate by providing a hook so that plugins that add extension elements to a sitemap can provide translatable text to use in the column heading for their extension elements (instead of just upper-casing the first letter of the
local-name()
as done here...but we can always add that later.Here's a few miscelaneous notes:
Q{...}xyz
notation used here (and in comments in the stylesheet) is a way to refer to namespace qualified elements independent of the prefix used for that namespace in any particular XML instance. See the URIQualifiedName production in the XPath 3.0 spec.Type of change
Please select the relevant options:
Steps to test
I've tested this in Chrome, Firefox, Edge and IE (11) on Windows. Would really appreciate folks testing it on Mac & Linux with as many different browsers as you can!!
To test, you can do something like:
Acceptance criteria