Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap with comments not working #3

Closed
astuanax opened this issue Mar 8, 2013 · 3 comments

Comments

@astuanax
Copy link

@astuanax astuanax commented Mar 8, 2013

Hi,

I have tried your WWW::Sitemap::XML module and it breaks when loading/reading xml sitemaps containing a comment at the top:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<!--
 created with Free Online Sitemap Generator www.xml-sitemaps.com 
-->

This happens with a lot of sitemaps as they are usually generated by some sort of service or software package that inserts a comment with their urls as free advertising.

Maybe you can add the following fix to file lib/WWW/Sitemap/XML.pm:

my $class = $self->_entry_class;
my $xmlNoComments = $xml->getDocumentElement->toStringC14N();
$xml = XML::LibXML->load_xml( string => $xmlNoComments );

This removes the comments before parsing the file. I tested it on several available sitemaps, ans seems to work fine.

Thx/Len.

@rkleemann

This comment has been minimized.

Copy link

@rkleemann rkleemann commented Jul 10, 2013

I'm not sure if this is the correct fix, but I monkey-patched WWW::Sitemap::XML::read to do the following:

    my @entry = grep { ! $_->nodeName->isa('XML::LibXML::Comment') }
        $url->nonBlankChildNodes;
    push @entries,
        $class->new( map { $_->nodeName => $_->textContent } @entry )
            if @entry;  
@ajgb

This comment has been minimized.

Copy link
Owner

@ajgb ajgb commented Nov 23, 2014

Hi,

Thanks for notifying me about this problem - and I'm sorry for not looking into it earlier.

The current version on github should address this issue, and proper CPAN release will follow shortly.

Thank you,
Alex

@ajgb

This comment has been minimized.

Copy link
Owner

@ajgb ajgb commented Nov 30, 2014

I've just pushed to CPAN version 2.00 which resolves the issues you have described.

@ajgb ajgb closed this Nov 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.