-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes https://github.com/dfabulich/sitemapgen4j/issues/25 #26
Conversation
I don't think this is worth taking a new dependency for this (especially since our commons-lang could conflict with our users' commons-lang). XML escaping is like a one-liner, isn't it? |
1)No, this is not a "one liner" - there are lots of cases and exceptions. You don't want to patch your code every time someone uses a new improper character in URL, do you? |
This context, there's no such thing as a "new" improper character. The only improper characters are (Arguably the control characters could be escaped, too, but those characters aren't valid in URLs, so we don't have to make this library escape them.) Yes, libraries can help with reinventing the wheel, but when using wheels, it's important for each axle to use the same size of wheel. This library can't know what version of commons-lang the user will use, so we can't pick any particular version of commons-lang without conflicting with some users or other. |
So how hundreds of other libraries pick the right version and use commons-lang? |
According to sitemaps.org docs the following chars must be escaped:
Also, keep in mind that URLs must be UTF-8 encoded. |
@marcospereira do you recommend "replaceAll" or something else? |
Yes, I was imagining a naive
Many of those libraries just don't care about conflicting with their users' It's actually not clear to me where in my argument you disagree. I claim: A) This problem can be solved with a It seems like you might disagree with either A or B, or you might actually disagree with both. If you disagree with A, I think I'd need to hear more about why. You gestured toward "new" escapable characters, but it's not clear to me that there are any. If you disagree with B, that we should take a dependency on |
To be clear, I don't think that this library should enforce that URLs be URL-encoded (with percent signs). The sitemap.org docs say, "all URLs (including the URL of your Sitemap) must be URL-escaped and encoded for readability by the web server on which they are located" which is to say that users need to provide us with URLs that their webservers can actually read, but if your webserver can handle |
how about:
Damn, I feel so bad writing such ugly code... |
.replaceAll("'", "'") | ||
.replaceAll("\"", """) | ||
.replaceAll(">", ">") | ||
.replaceAll(">", ">") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this line was duplicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. removed.
It's not that bad. ;-) If you want to improve it, you could use just one |
Great! |
No description provided.