Description
Carriage Return characters (\r) found within parsed XML are incorrectly converted to newline characters (\n).
I did a quick scan of the source code and found a likely culprit:
const parseXml = function(xmlData) {
xmlData = xmlData.replace(/\r\n?/g, "\n"); //TODO: remove this line
That pretty clearly replaces any \r character (and possibly \r\n pair) with \n.
The online tool does not exhibit this issue because browser node access APIs encode Carriage Return "strings" (the character \ followed by r) as \\r. The regular expression no longer matches. See:
> "1) \r 2) \\r 3) \n 4) \\n 5) \r\n 6) \\r\\n".replace(/\r\n?/g, "\n")
'1) \n 2) \\r 3) \n 4) \\n 5) \n 6) \\r\\n'
Input
The XML that exhibits this issue is of the form:
<properties object="" engine="">
<property type="string" name="x" state="changed">
<![CDATA[This is a carriage return \r...]]>
</property>
<property type="string" name="y" state="changed">
<![CDATA[\r]]>
</property>
</properties>
Code
The code is pretty straightforward.
const XML_OPTIONS_NO_TAG_PARSE: fastXMLParser.X2jOptionsOptional = {
attributeNamePrefix: "@",
ignoreAttributes: false,
parseAttributeValue: false,
parseTagValue: false,
textNodeName: "#value",
};
const XML_PARSER_NO_TAG_PARSE = new fastXMLParser.XMLParser(XML_OPTIONS_NO_TAG_PARSE);
// ...
const parsed = XML_PARSER_NO_TAG_PARSE.parse(xmlData);
After that code runs, the parsed text node content has \n instead of the expected \r.
Output
Running the above results in the following JSON:
{
"properties": {
"property": [
{
"#value": "This is a carriage return \n...",
"@type": "string",
"@name": "x",
"@state": "changed"
},
{
"#value": "\n",
"@type": "string",
"@name": "y",
"@state": "changed"
},
],
"@object": "",
"@engine": ""
}
}
Expected Data
I expect the following output:
{
"properties": {
"property": [
{
"#value": "This is a carriage return \r...",
"@type": "string",
"@name": "x",
"@state": "changed"
},
{
"#value": "\r",
"@type": "string",
"@name": "y",
"@state": "changed"
},
],
"@object": "",
"@engine": ""
}
}
Would you like to work on this issue?
Description
Carriage Return characters (
\r) found within parsed XML are incorrectly converted to newline characters (\n).I did a quick scan of the source code and found a likely culprit:
That pretty clearly replaces any
\rcharacter (and possibly\r\npair) with\n.The online tool does not exhibit this issue because browser node access APIs encode Carriage Return "strings" (the character
\followed byr) as\\r. The regular expression no longer matches. See:Input
The XML that exhibits this issue is of the form:
Code
The code is pretty straightforward.
After that code runs, the parsed text node content has
\ninstead of the expected\r.Output
Running the above results in the following JSON:
{ "properties": { "property": [ { "#value": "This is a carriage return \n...", "@type": "string", "@name": "x", "@state": "changed" }, { "#value": "\n", "@type": "string", "@name": "y", "@state": "changed" }, ], "@object": "", "@engine": "" } }Expected Data
I expect the following output:
{ "properties": { "property": [ { "#value": "This is a carriage return \r...", "@type": "string", "@name": "x", "@state": "changed" }, { "#value": "\r", "@type": "string", "@name": "y", "@state": "changed" }, ], "@object": "", "@engine": "" } }Would you like to work on this issue?