CDATA correctness fix #39

Closed
wants to merge 1 commit into
from

Conversation

Projects
None yet
2 participants
@bartschuller
Contributor

bartschuller commented Aug 16, 2011

CDATA nodes are evil. They're a hammer people use when they don't want to think about escaping. Those people will get burned, when someone types the magic sequence ]]> into a form.

This commit tests for the issue (take a look at that first, it should make clear why this is a big deal) and also fixes it.

Parsing the resulting XML will create more nodes than originally serialized. That can't be helped, unless you want to disallow creating these evil CDATA nodes in the first place.

@djspiewak

This comment has been minimized.

Show comment Hide comment
@djspiewak

djspiewak Aug 16, 2011

Owner

CDATA is indeed evil. It's important to support though, if only for the sake of XHTML.

You're quite right that ]]> is a problem with the current implementation. I hadn't thought of that before. However, automatically splitting on toString produces some really weird results. Specifically:

val cd = CDATA(...)
XML.parse(cd.toString) == cd        // maybe not!

I think the better thing to do here would be to follow XML's lead and simply reject CDATA nodes which contain ]]>. This is quite analogous to what happens if you try to create a CDATA node in XML's textual representation which contains the magic closing sequence. It's also uniform with the way that we handle invalid QName(s) when creating Elem.

In other words, we should do a quick contains check for "]]>" in the constructor for CDATA, throwing an IllegalArgumentException if it returns true. Would you like to try this and submit a new pull request? Otherwise, I'll implement as soon as I can (possibly this evening).

Owner

djspiewak commented Aug 16, 2011

CDATA is indeed evil. It's important to support though, if only for the sake of XHTML.

You're quite right that ]]> is a problem with the current implementation. I hadn't thought of that before. However, automatically splitting on toString produces some really weird results. Specifically:

val cd = CDATA(...)
XML.parse(cd.toString) == cd        // maybe not!

I think the better thing to do here would be to follow XML's lead and simply reject CDATA nodes which contain ]]>. This is quite analogous to what happens if you try to create a CDATA node in XML's textual representation which contains the magic closing sequence. It's also uniform with the way that we handle invalid QName(s) when creating Elem.

In other words, we should do a quick contains check for "]]>" in the constructor for CDATA, throwing an IllegalArgumentException if it returns true. Would you like to try this and submit a new pull request? Otherwise, I'll implement as soon as I can (possibly this evening).

@djspiewak djspiewak closed this Aug 16, 2011

@djspiewak

This comment has been minimized.

Show comment Hide comment
@djspiewak

djspiewak Aug 16, 2011

Owner

Opened issue #40 to track this.

Owner

djspiewak commented Aug 16, 2011

Opened issue #40 to track this.

@bartschuller

This comment has been minimized.

Show comment Hide comment
@bartschuller

bartschuller Aug 16, 2011

Contributor

It's already night here, so my next time slot is in about 20 hours. I'd be happy to implement it then if you haven't beaten me to it.

Contributor

bartschuller commented Aug 16, 2011

It's already night here, so my next time slot is in about 20 hours. I'd be happy to implement it then if you haven't beaten me to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment