Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only escape characters xml need to be escaped #81

Merged
merged 3 commits into from Mar 13, 2022
Merged

only escape characters xml need to be escaped #81

merged 3 commits into from Mar 13, 2022

Conversation

leavesster
Copy link
Contributor

as I mentioned at #73 , escape /n/t is not necessary. These words is allowed to be saved at XML。

I think it's quite strange that while html.EscapeString replace the XML escape characters, xml.EscapeText escape more characters than XML allow.

After merge this pr

/t/n will show it origin characters

when developer read a xml

<?xml version="1.0" encoding="utf-8"?>
	<class_list xml:space="preserve">
		<student>
			<name> Robert </name>
			<grade>A+</grade>

		</student>
	</class_list>
`

doc OutputXML will give developer as it was, and it comply with xml specification

before this pr was merge , OutputXML will give human unreadable like this:
<?xml version=\"1.0\" encoding=\"utf-8\"?><class_list xml:space=\"preserve\">&#xA;&#x9;&#x9;<student>&#xA;&#x9;&#x9;&#x9;<name> Robert </name>&#xA;&#x9;&#x9;&#x9;<grade>A+</grade>&#xA;&#xA;&#x9;&#x9;</student>&#xA;&#x9;</class_list>

fix some xml never get origin plain text

before this pr, we need call html.UnescapeString to print the origin plain text
but there is some xml you may never get origin plain text by html.UnescapeString like this xml below

<?xml version="1.0" encoding="utf-8"?>
	<example xml:space="preserve"><word>&amp;#48;		</word></example>

you either get plain text <?xml version=\"1.0\" encoding=\"utf-8\"?><class_list xml:space=\"preserve\">&#xA;&#x9;&#x9;<student>&#xA;&#x9;&#x9;&#x9;<name> Robert </name>&#xA;&#x9;&#x9;&#x9;<grade>A+</grade>&#xA;&#xA;&#x9;&#x9;</student>&#xA;&#x9;</class_list> by OutputXML.

or get <?xml version=\"1.0\" encoding=\"utf-8\"?><example xml:space=\"preserve\"><word>&#48;\t\t</word></example> by OutputXML + html.UnescapeString. None of them are consistent with the original document

@leavesster leavesster changed the title only escape special characters only escape characters xml need to be escaped Mar 12, 2022
@zhengchun zhengchun merged commit e28092b into antchfx:master Mar 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants