Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot extract nested sitemap from sitemap.xml #738

Closed
peter-chan-hkmci opened this issue Mar 5, 2021 · 4 comments
Closed

Cannot extract nested sitemap from sitemap.xml #738

peter-chan-hkmci opened this issue Mar 5, 2021 · 4 comments

Comments

@peter-chan-hkmci
Copy link

peter-chan-hkmci commented Mar 5, 2021

I have a sitemap.xml which contains the following information:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>(url to sub-sitemap-1.xml)</loc>
    <lastmod>2021-02-02T18:16:47+00:00</lastmod>
  </sitemap>
  <sitemap>
    <loc>(url to sub-sitemap-2.xml)</loc>
    <lastmod>2021-02-02T18:16:47+00:00</lastmod>
  </sitemap>
</sitemapindex>

And I added this sitemap.xml to startURLs tag, like:

<startURLs stayOnDomain="true" stayOnPort="true" stayOnProtocol="true">
  <sitemap>(url to sitemap.xml)</sitemap>
</startURLs>

However, the crawler logged the following error:

ERROR [StandardSitemapResolver] Cannot fetch sitemap: <url of the sitemap.xml> (java.lang.NullPointerException)

So you have any idea?

@essiembre
Copy link
Contributor

There was a similar issue fixed a couple of months ago. Assuming you are using 2.9.0, please try the latest snapshot version (2.9.1-SNAPSHOT).

@jetnet
Copy link

jetnet commented Mar 25, 2021

the latest snapshot still has this issue, unfortunately.

@essiembre
Copy link
Contributor

I was finally able to reproduce and provided a fix. Please try the latest HTTP Collector snapshot and confirm.

@jetnet
Copy link

jetnet commented Apr 1, 2021

Thank you! Nested sitemaps work now. But the PREMATURE-feature stopped working, see #741

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants