Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML encoding for KML: option to use numeric character entities instead of entity references? #20

Closed
cnsgithub opened this issue Jun 20, 2018 · 4 comments

Comments

@cnsgithub
Copy link

cnsgithub commented Jun 20, 2018

First of all, please forgive me posting it here. This is rather a shortcoming in the KML specification or the Google Earth implementation, respectively. But I am not familiar with Google's issue tracking system and the following could be a useful enhancement for owasp-java-encoder, too.

What's the problem?
I just realized that KML files I am generating are vulnerable to injection despite (in my opinion correct) use of HTML encoding powered by owasp-java-encoder. Owasp's Encode.forHtml/forXml works correctly: the input <strike>oops</strike> gets encoded to &lt;strike&gt;oops&lt;/strike&gt;. I do not want my users to inject HTML.

However, the KML specification is unclear about encoding of angle brackets, please see KML Reference Errata. And the software consuming the KML (Google Earth) will render it as HTML.

How can this problem be solved?
According to the errata page mentioned above, numeric character entities have to be used in favor of entity references, i.e. < and > must be encoded to &#60; and &#62; instead of &lt; and &gt;. From looking at owasp's HTMLEncoder/XMLEncoder classes I cannot find an option to control the encode behavior. Do you think this would be a reasonable enhancement? Alternatively, do you think introducing a special KMLEncoder would be reasonable?

How to reproduce?

  1. Download Google Earth, start it and then load a KML file with this content:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2">
    <Document>
        <name><![CDATA[HTML encoding]]></name>
		<Style id="myStyle">
			<BalloonStyle>
				<text>
					<![CDATA[
					Text should be bold only: <b>$[text]</b>
					]]>
				</text>
			</BalloonStyle>
		</Style>
		<Folder>
			<name>Placemarks</name>
			<Placemark>
				<name><![CDATA[insufficient escaping]]></name>
				<address><![CDATA[1200-C Agora Drive, Bel Air, MD 21014, US]]></address>
				<styleUrl><![CDATA[#myStyle]]></styleUrl>
				<ExtendedData>
					<Data name="text">
						<value><![CDATA[&lt;strike&gt;oops&lt;/strike&gt;]]></value>
					</Data>					
				</ExtendedData>
			</Placemark>
			<Placemark>
				<name><![CDATA[sufficient escaping]]></name>
				<address><![CDATA[Leinstraat 104A, B-9660 Opbrakel, Belgium]]></address>
				<styleUrl><![CDATA[#myStyle]]></styleUrl>
				<ExtendedData>
					<Data name="text">
						<value><![CDATA[&#60;strike&#62;fine&#60;/strike&#62;]]></value>
					</Data>					
				</ExtendedData>
			</Placemark>
		</Folder>
	</Document>
</kml>
  1. Open the placemark called insufficient escaping and see the strikethrough word:

image

  1. Open the placemark called sufficient escaping and see everything's working correctly:

image

Please note that this issue can not be reproduced when uploading the KML file to the web-based Google Earth. Things may have changed in the online version. However, the desktop version is still maintained (last build: February 6, 2018).

@jmanico
Copy link
Member

jmanico commented Jun 20, 2018 via email

@cnsgithub
Copy link
Author

Hello and thanks for your honest answer. I fully understand your point of not adding support for edge cases and keeping it small and simple. I don't know why Google Earth decided to implement it this way...

As I already have a delegator class in place to avoid dependencies to org.owasp.encoder throughout my code I worked around this issue by just adding this one:

	/**
	 * This is a special case of XML encoding that replaces angle brackets with numeric character entities (&amp;#60; and &amp;#62;) instead of entity references (&amp;lt; and &amp;gt;).
	 * This prevents Google Earth from rendering inadvertent HTML.
	 * @see <a href="http://kml4earth.appspot.com/kmlErrata.html?#encoding">KML Reference Errata</a>
	 * @see <a href="https://github.com/OWASP/owasp-java-encoder/issues/20">OWASP Issue</a>
	 */
	public static String kmlEncode(String str) {
		return org.owasp.encoder.Encode.forXml(str).replace("&lt;", "&#60;").replace("&gt;", "&#62;");
	}

I know that strictly replacing all entity references by their numerical representations would be the better but more complex approach. For the moment it solves the problem and I didn't manage to break out and inject HTML. Do you think the workaround sufficiently prevents all cases? Let me see, maybe forking your repo should be preferred in the end...

Bye

cnsgithub added a commit to cnsgithub/owasp-java-encoder that referenced this issue Jun 21, 2018
cnsgithub added a commit to cnsgithub/owasp-java-encoder that referenced this issue Jun 21, 2018
@jmanico
Copy link
Member

jmanico commented Jun 24, 2018 via email

@cnsgithub
Copy link
Author

I also integrated a specialized KMLEncoder in https://github.com/cnsgithub/owasp-java-encoder. We can safely close this issue.

cnsgithub added a commit to cnsgithub/owasp-java-encoder that referenced this issue Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants