New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sitemaps] Trim Unicode whitespace around URLs #224
Comments
Hi @moviewang, can you share a minimal sitemap to reproduce your problem? It's ok if the URLs are masked but the white space should be there. Thanks! |
Yes, the sitemap's loc elements contains white space, but I'have no rights to modify it. Is there any else approach to solve the problem. Thanks! |
Hi @moviewang - can you provide the URL to the sitemap? |
Hi @moviewang, is there any invisible Unicode white space (not in the ASCII range)? I've tried to reproduce it with a similar file: |
Hi @moviewang - I tried modifying one of the existing sitemap index parsing tests to show this problem, but it seems to parse the URL without any issues (assuming they look like what you showed above). I did see some other issues, like the individual sitemaps not having their "processed" flag set, and no last modified date, etc. but that's a different can of worms. |
@moviewang - the dates in your example aren't valid for sitemaps, then need to follow one of these formats. So if the dates are in UTC, they should be something like There are two other issues, which I'll file as bugs separately, but only one might impact you. Data strings aren't getting trimmed, so the whitespace around the date string could cause problems. |
Hi @sebastian-nagel @kkrugler |
Hi @moviewang, I would suggest the parser also trim Unicode white space. I'll open a PR to fix this. The intention of my inquiry was to make sure whether this is the reason for your problem. Thanks! |
…code-whitespace [Sitemaps] Trim Unicode whitespace around URLs, fixes #224
loc.toString().trim() didn' trim all whitespace
The text was updated successfully, but these errors were encountered: