-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAXB generates invalid XML (includes characters illegal in XML 1.0) #960
Comments
@glassfishrobot Commented |
@glassfishrobot Commented |
@glassfishrobot Commented |
@glassfishrobot Commented |
@glassfishrobot Commented |
|
This is linked to #614 |
As per the XML spec [1], the following characters are legal in XML 1.0:
However, JAXB allows other, illegal characters in input strings (e.g. bell character 0x0007, or vertical tab 0x000B), and marshals them into output XML without any errors or warnings.
I know the solution is not to escape them, since they are illegal regardless of whether they are escaped or not (see #226), but the fact that JAXB generates invalid (and unparseable) XML without any sort of error or warning seems wrong to me.
There are a number of workarounds out in the wild [2, 3] that rely on replacing the illegal characters with legal characters (e.g. space 0x0020, or replacement character 0xFFFD). Another option would be to eat the illegal characters and just not write them to the output.
Regardless of the approach, I think it would be a good idea to at least provide an out-of-the-box way for users to ensure the correctness of JAXB-generated XML. Some options:
[1] http://www.w3.org/TR/REC-xml/#NT-Char
[2] http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html
[3] http://camel.apache.org/jaxb.html#JAXB-IgnoringtheNonXMLCharacter
Affected Versions
[2.2.6]
The text was updated successfully, but these errors were encountered: