extension elements in sitemaps #151
Comments
For example,
Will produce something like:
which is invalid against the sitemaps XML Schema, since the Similarly,
Will produce something like:
which, again, is not valid against the XML Schema because the |
I had a closer look at the XML Schema while I was working on #153, and I sure hope that none of the major sitemap consumers require that sitemaps be valid against the schema...because they way the schema is written ensures that the average WP plugin author won't be able to produce sitemap instances containing extension elements that validate against the schema! Why? Because the content model for
What that means is that a schema validator needs to be able to find a schema for elements in any extension namespaces. I think the chances are pretty low that the average WP plugin author is going to write a schema document for their extensions and make that schema document available on the web for validators to download. Even if plugin authors did make schema documents available for their extension namespace(s), we'd have to add a filter for them to hook into to specify the URL for the schema document for their extension namespace and then output the appropriate markup in the sitemap instance so that validators knew where to find the schema for the extension namespace, e.g.
What the schema author(s) should have (IMHO) written is:
which would allow validators to validate extension elements if they could find a schema definition for them, or silently ignore them for validation purposes if a schema definition for them couldn't be found. |
That said, I still think we should try to find a way for plugin authors to specify a namespace URI for extensions they add via the various I haven't come up a great way yet, but I'll throw the following good way out just for comment:
So, a plugin would do:
Then,
and the new
To illustrate the
and you'll see that what
Comments? |
And while I'm being pedantic (I'm in one of those moods today...I'll blame it on Covid-ID isolation :-), another bad thing about the sitemaps XML Schema is the type definition for the
which means that:
is not a valid sitemap (since the URL is shorter than 12 characters). Note that http://a.tv is a real URL that resolves! OK, end of pedantic ranting (for today :-) |
I just realized that I forgot to show the sitemap that would be generated by the suggested mods above:
|
Thanks for clearly explaining the name spacing issue with the XML sitemaps spec and illustrating with examples. I wonder if instead of a simple filter approach for adding custom values to the array, we need to provide a registry for URL properties which requires you to add a namespace in order to register the property. Something like this pseudocode: register_sitemap_property( $namespace, $name, $callback ); We can pass some data to any registered callback and if it returns a truthy value, then a property is returned that looks like |
@felixarntz @adamsilverstein would be interested to hear your thoughts on this |
I'm not familiar enough with how sitemaps are typically extended to answer. I like the registry suggestion from @joemcgill which might provider a cleaner API than filters. What do current popular sitemap plugins provide for extensibility? |
In #184 I propose a rather simple solution for this. It does not cover some edge cases like the order of elements, which I think could be handled in a future release if necessary. |
This is an interesting issue. I am using the Core Sitemaps module within my Platinum SEO Plugin. And I have addressed this problem with a filter that lets plugin authors to render the sitemap as they want usint their custom stylesheets. This filter will be added to the function Then the code within this function is changed to this. /**
* Renders a sitemap.
*
* @since 5.5.0
*
* @param array $url_list A list of URLs for a sitemap.
*/
public function render_sitemap( $url_list, $object_subtype ) {
header( 'Content-type: application/xml; charset=UTF-8' );
$this->check_for_simple_xml_availability();
$sitemap_xml = apply_filters( "core_sitemaps_get_sitemap_xml", $url_list, $object_subtype );
if ( empty( $sitemap_xml ) ) {
$sitemap_xml = $this->get_sitemap_xml( $url_list );
}
if ( ! empty( $sitemap_xml ) ) {
// All output is escaped within get_sitemap_xml().
// phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($sitemap_xml);
echo $dom->saveXML();
}
} Note that a filter core_sitemaps_get_sitemap_xml has been added to the function that will let plugin authors to create their sitemap XML and return it. This filter also passed the object_subtype to the filter function that will help plugin authors to use an appropriate custom stylesheet based on the object_subtype. This is how my plugin handles this. add_filter('core_sitemaps_get_sitemap_xml', array($this, 'psp_get_sitemap_xml'), 10, 2); /**
* Gets XML for a sitemap.
*
* @since 5.5.0
*
* @param array $url_list A list of URLs for a sitemap.
* @return string|false A well-formed XML string for a sitemap index. False on error.
*/
public function psp_get_sitemap_xml( $url_list, $object_subtype ) {
$psp_sm_settings = $this->psp_sitemap_settings;
$psp_lastmod_sitemaps_enabled = isset($psp_sm_settings['include_lastmod']) ? $psp_sm_settings['include_lastmod'] : '';
$psp_image_sitemaps_enabled = isset($psp_sm_settings['include_images']) ? $psp_sm_settings['include_images'] : '';
if ($object_subtype === 'post') {
$psp_stylesheet_url = plugins_url( '/sitemap.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
if ($psp_lastmod_sitemaps_enabled) {
$psp_stylesheet_url = plugins_url( '/sitemap-post.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
}
if ($psp_image_sitemaps_enabled) {
$psp_stylesheet_url = plugins_url( '/sitemap-image.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
}
} else {
$psp_stylesheet_url = plugins_url( '/sitemap.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
}
$urlset = new SimpleXMLElement(
sprintf(
'%1$s%2$s%3$s',
'<?xml version="1.0" encoding="UTF-8" ?>',
$this->stylesheet,
'<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd http://www.google.com/schemas/sitemap-image/1.1 http://www.google.com/schemas/sitemap-image/1.1/sitemap-image.xsd" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" />'
)
);
foreach ( $url_list as $url_item ) {
$url = $urlset->addChild( 'url' );
// Add each attribute as a child node to the URL entry.
foreach ( $url_item as $attr => $value ) {
if ( 'url' === $attr ) {
$url->addChild( $attr, esc_url( $value ) );
} else if ('image' === $attr) {
foreach ($value as $imageattr) {
$image = $url->addChild('image:image', null, 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("loc", $imageattr)) $image->addChild('image:loc',esc_url( $imageattr['loc'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("title", $imageattr)) $image->addChild('image:title',esc_attr( $imageattr['title'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("caption", $imageattr)) $image->addChild('image:caption',esc_attr( $imageattr['caption'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
}
} else {
$url->addChild( $attr, esc_attr( $value ) );
}
}
}
return $urlset->asXML();
} Pls. do let me know your feedback on this. @pbiron @swissspidy |
The URL LIST with all the custom POST attributes added by plugins are rendered by the plugin code as follows. This implementation may vary from one plugin to another plugin. if ($object_subtype === 'post') {
$psp_stylesheet_url = plugins_url( '/sitemap.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
if ($psp_lastmod_sitemaps_enabled) {
$psp_stylesheet_url = plugins_url( '/sitemap-post.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
}
if ($psp_image_sitemaps_enabled) {
$psp_stylesheet_url = plugins_url( '/sitemap-image.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
}
} else {
$psp_stylesheet_url = plugins_url( '/sitemap.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
} In the above piece of plugin code, the plugin checks whether the attributes lastmod and images are enabled for the POST (object-subtype) and sets an appropriate custom stylesheet for this sitemap-xml (based on the object-subtype and the enabled attributes). For the rest of the object-subtypes, it just sets a default custom stylesheet as shown below. } else {
$psp_stylesheet_url = plugins_url( '/sitemap.xsl', __FILE__ );
$this->stylesheet = '<?xml-stylesheet type="text/xsl" href="' . esc_url( $psp_stylesheet_url ) . '" ?>';
} The for loop then adds the attributes present in the URL LIST array to a SimpleXMLElement object and returns a sitemap-xml string, expected by the proposed filter foreach ( $url_list as $url_item ) {
$url = $urlset->addChild( 'url' );
// Add each attribute as a child node to the URL entry.
foreach ( $url_item as $attr => $value ) {
if ( 'url' === $attr ) {
$url->addChild( $attr, esc_url( $value ) );
} else if ('image' === $attr) {
foreach ($value as $imageattr) {
$image = $url->addChild('image:image', null, 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("loc", $imageattr)) $image->addChild('image:loc',esc_url( $imageattr['loc'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("title", $imageattr)) $image->addChild('image:title',esc_attr( $imageattr['title'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
if(array_key_exists("caption", $imageattr)) $image->addChild('image:caption',esc_attr( $imageattr['caption'] ), 'http://www.google.com/schemas/sitemap-image/1.1');
}
} else {
$url->addChild( $attr, esc_attr( $value ) );
}
}
}
return $urlset->asXML(); |
Order can be handled in the WP_Query_Args or WP_Term_Args filter. It can be set to either ASC or DESC in the Query Args. |
I was referring to order of attributes within a single sitemap entry, not WP_Query order |
@swissspidy Oh ok! My bad! The above proposed filter to let the third party plugin developer to build his own sitemap-xml might take care of that too. So it will address
|
extension elements in sitemaps
Describe the bug
#88 added filters that allow extension elements to be added to sitemaps (e.g.,
core_sitemaps_posts_url_list
).However, there are a couple of problems with the solution:
Core_Sitemaps_Renderer::get_sitemap_xml()
outputs all elements in the sitemaps namespace (i.e.,http://www.sitemaps.org/schemas/sitemap/0.9
) and the sitemaps XML Schema specifies that any element children ofsitemap:url
other thansitemap:loc
,sitemap:lastmod
,sitemap:changefreq
andsitemap:priority
must be in another namespace (wheresitemap:xyz
is a QName for an element whoselocal-name()
isxyz
and whosenamespace-uri()
is the sitemaps namespace URI). Therefore, if a plugin hooks intocore_sitemaps_posts_url_list
and adds afoo
property to each URL in the list, the generated sitemap XML will be invalid against the XML Schema. There is currently no way for a plugin to tell the renderer what namespace to use for these extension elements.sitemap:url
must be in the order above, with all elements in a foreign namespace coming at the end. So, if a plugin hooks intocore_sitemaps_posts_url_list
and does something likeforeach ( $url_list as $url ) { $url = array_merge( array( 'priority' => 0.9 ), $url ); }
thenCore_Sitemaps_Renderer::get_sitemap_xml()
will output a sitemap that is invalid against the XML Schema.I do not know whether sitemap consumers (e.g., Google, Bing, Yandex, etc) would fail to process a sitemap that was invalid against the XML Schema, but do we want to try our best to ensure that generated sitemaps are valid?
The text was updated successfully, but these errors were encountered: