Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
I have tried your WWW::Sitemap::XML module and it breaks when loading/reading xml sitemaps containing a comment at the top:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"> <!-- created with Free Online Sitemap Generator www.xml-sitemaps.com -->
This happens with a lot of sitemaps as they are usually generated by some sort of service or software package that inserts a comment with their urls as free advertising.
Maybe you can add the following fix to file lib/WWW/Sitemap/XML.pm:
my $class = $self->_entry_class; my $xmlNoComments = $xml->getDocumentElement->toStringC14N(); $xml = XML::LibXML->load_xml( string => $xmlNoComments );
This removes the comments before parsing the file. I tested it on several available sitemaps, ans seems to work fine.
I'm not sure if this is the correct fix, but I monkey-patched WWW::Sitemap::XML::read to do the following: