Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sitemap parsing #205

Merged

Commits on Apr 25, 2018

  1. Improve sitemap parsing

    - ignore query part of URL to determine sitemap location prefix
      for URL validation, fixes crawler-commons#202
    - resolve relative links in RSS feeds, fixes crawler-commons#203
    - allow non-continuous content (containing XML entities or CDATA)
      when parsing links in RSS feeds, fixes crawler-commons#204
    - extract links from <guid> elements in RSS feeds, fixes crawler-commons#201
    sebastian-nagel committed Apr 25, 2018
    Configuration menu
    Copy the full SHA
    0ef7cf8 View commit details
    Browse the repository at this point in the history