Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sitemaps] limit on "bad url" log messages #145

Closed
sebastian-nagel opened this issue Feb 6, 2017 · 0 comments
Closed

[Sitemaps] limit on "bad url" log messages #145

sebastian-nagel opened this issue Feb 6, 2017 · 0 comments
Labels

Comments

@sebastian-nagel
Copy link
Contributor

If a sitemap is erroneously detected as plain-text sitemap (cf. #144), SiteMapParser may report all or most of the file content as "bad url". This may result

  • either in many log messages:
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [<?xml version="1.0" encoding="UTF-8"?>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [                <url>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [        <loc>http://www.azonline.de/Sport/Fussball/1.-Bundesliga/2640159-Hamburger-SV-Bruchhagen-Lasse-mich-bei-der-Sportchef-Suche-nicht-hetzen</loc>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [        <news:news>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [            <news:publication>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [                <news:name>Allgemeine Zeitung</news:name>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [                <news:language>ger</news:language>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [            </news:publication>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [            <news:publication_date>2016-12-22T14:52:00Z</news:publication_date>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [            <news:title>Hamburger SV : Bruchhagen: Lasse mich bei der Sportchef-Suche nicht hetzen</news:title>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [        </news:news>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [    </url>]
2016-12-22 13:55:26.628 c.s.SiteMapParser [WARN] Bad url: [                        <url>]
... (5000 lines following)
  • or even in one very long message (more than 180 kB in a single line):
2016-12-22 14:43:22.173 c.s.SiteMapParser [WARN] Bad url: [<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:n="http://www.google.com/schemas/sitemap-news/0.9"> <url> <loc>http://www.hotnews.ro/stiri-politic-21489493-liviu-dragnea-cer-public-serviciilor-secrete-spuna-daca-exista-vreo-problema-securitate-legata-sotul-premierului-propus-sevil-shhaideh.htm</loc> <n:news> <n:publication> <n:name>HotNews.ro</n:name> <n:language>ro</n:language> </n:publication> <n:genres>PressRelease</n:genres> <publication_date>2016-12-22T16:16:35</publication_date> <n:title><![CDATA[Liviu Dragnea: Cer public serviciilor secrete sa spuna daca exista vreo problema de securitate legata de sotul premierului propus Sevil Shhaideh]]></n:title> <n:keywords><![CDATA[]]></n:keywords> </n:news> </url> ...

There should be a limit on both the max. number of lines and the line length, logged as error. This avoids consequential errors, e.g.:

2016-12-22 14:43:22,176 ERROR Unable to write to stream UDP:localhost:514 for appender syslog
2016-12-22 14:43:22,176 ERROR An exception occurred processing Appender syslog org.apache.logging.log4j.core.appender.AppenderLoggingException: Error flushing stream UDP:localhost:514
...
Caused by: java.io.IOException: Message too long (sendto failed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants