Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JAXB generates invalid XML (includes characters illegal in XML 1.0) #960

Open
Tomas-Kraus opened this issue May 13, 2013 · 7 comments
Open

Comments

@Tomas-Kraus
Copy link
Member

As per the XML spec [1], the following characters are legal in XML 1.0:

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

However, JAXB allows other, illegal characters in input strings (e.g. bell character 0x0007, or vertical tab 0x000B), and marshals them into output XML without any errors or warnings.

I know the solution is not to escape them, since they are illegal regardless of whether they are escaped or not (see #226), but the fact that JAXB generates invalid (and unparseable) XML without any sort of error or warning seems wrong to me.

There are a number of workarounds out in the wild [2, 3] that rely on replacing the illegal characters with legal characters (e.g. space 0x0020, or replacement character 0xFFFD). Another option would be to eat the illegal characters and just not write them to the output.

Regardless of the approach, I think it would be a good idea to at least provide an out-of-the-box way for users to ensure the correctness of JAXB-generated XML. Some options:

  • Add another property that can be used via Marshaller.setProperty(String, Object) to replace invalid characters with another character ("com.sun.xml.bind.illegalCharacterReplacement"?)
  • Add another property that can be used via Marshaller.setProperty(String, Object) to eat invalid characters ("com.sun.xml.bind.omitIllegalCharacters"?)
  • Enhance the out-of-the-box CharacterEscapeHandler classes to allow for this sort of replacement / omission.
  • Something else?

[1] http://www.w3.org/TR/REC-xml/#NT-Char
[2] http://blog.lesc.se/2009/03/escape-illegal-characters-with-jaxb-xml.html
[3] http://camel.apache.org/jaxb.html#JAXB-IgnoringtheNonXMLCharacter

Affected Versions

[2.2.6]

@Tomas-Kraus
Copy link
Member Author

@glassfishrobot Commented
Reported by gredler

@Tomas-Kraus
Copy link
Member Author

@glassfishrobot Commented
Was assigned to yaroska

@Tomas-Kraus
Copy link
Member Author

@glassfishrobot Commented
snajper said:
Yardo, correct me if I'm wrong but we use JAXP for validating what we read/write. Thus, if valid, I think the issue should be filed against JAXP instead?

@Tomas-Kraus
Copy link
Member Author

@glassfishrobot Commented
trejkaz said:
The offending class is com.sun.xml.internal.bind.marshaller.NioEscapeHandler. The package name makes it sound like JAXB has implemented it directly. Perhaps this is the problem, and the marshalling should have been done using an existing library known to produce valid output, instead of reinventing the wheel poorly?

@Tomas-Kraus
Copy link
Member Author

@glassfishrobot Commented
This issue was imported from java.net JIRA JAXB-960

@Tomas-Kraus
Copy link
Member Author

@phax
Copy link

phax commented Jul 1, 2019

This is linked to #614

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants