Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemaps limit on "bad url" log messages, fixes #145 #206

Merged

Conversation

sebastian-nagel
Copy link
Contributor

  • degrade log level from warn to debug for lines which are not valid URLs (same level as for all XML-based sitemap formats)
  • only log first 1024 characters of line

Improvements:

  • inline addUrlIntoSitemap(...)
  • trim lines / URLs

- degrade log level to debug for lines which are not valid
- only log first 1024 characters of line
while ((line = reader.readLine()) != null && ++i <= MAX_URLS) {
line = line.trim();
if (line.isEmpty())
continue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit - I think we still want to use { } for single line elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll update these lines and commit. Thanks, @kkrugler!

Copy link
Contributor

@kkrugler kkrugler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, one minor nit.

@sebastian-nagel sebastian-nagel merged commit 8a34e25 into crawler-commons:master Apr 16, 2018
@sebastian-nagel sebastian-nagel deleted the cc-145-bad-url-warnings branch April 16, 2018 11:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants