HTML parser doesn't deal with &amp #14

Closed
pennersr opened this Issue Dec 18, 2012 · 8 comments

2 participants

@pennersr

Suppose you've got the following content:

Testing

http://picasaweb.google.com/lh/sredir?uname=test&target=ALBUM&id=123&authkey=abc

(Note: the link itself is not valid due to mangled IDs (it was a private album))

Rendering this content as follows will not work:

{{post.body|linebreaksbr|oembed_html}}

The reason is that the "&" has been escaped and turned into "&amp". The HTML parser over at https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L144 does recognize & extract the URL, but it does not unescape &amp. Hence, &amp is fed to embed.ly... resulting in a 404 over there.

@coleifer
Owner

Hmm...

For starters, check the docs on the linebreaks filter (or the source). It appears that it accepts a parameter that controls what is escaped and will autoescape the text for you:

So you can simply not escape the stuff you pass into the filter.

But yeah, it probably needs to be able to unescape things. I will look into a fix.

@coleifer coleifer closed this in 3ff5c18 Dec 19, 2012
@pennersr

Thanks. The example I posted still does not seem to work, as in the end the example URL posted to (in my case) embed.ly is still as follows:

http://api.embed.ly/1/oembed?format=json&key=8af3ff804a6csecret&url=http%3A%2F%2Fpicasaweb.google.com%2Flh%2Fsredir%3Funame%3Dtest%26amp%3Btarget%3DALBUM%26amp%3Bid%3D123%26amp%3Bauthkey%3Dabc

Note that it contains amp. I didn't dive into it yet, though I can confirm that the amp is now correctly unescaped when extracted in parse_html ...

@coleifer coleifer reopened this Dec 20, 2012
@coleifer
Owner

Can you submit a testcase? Also note the way the testcases are stuctured so as not to make any network calls.

@coleifer
Owner

You might also try putting "|oembed|linebreaksbr" instead.

@pennersr

I'll dive into this, hopefully later today ...

(I do understand that I can workaround this by ordering filters differently, or by disabling auto escaping -- but the oembed_html tag is supposed to handle HTML so it should handle this case just fine)

@coleifer
Owner

I hear ya, yeah -- if you can, could you also maybe get me:

  1. the initial text
  2. the text after piping through django
  3. the url that is sent off to embed.ly or processed by micawber

Thanks mane!

@pennersr pennersr added a commit to pennersr/micawber that referenced this issue Dec 20, 2012
@pennersr pennersr Properly handle ampersands (closes #14) 025c77b
@pennersr

Found the problem. As said, your change correctly turns & into &. However, the str(url) at this line ...

https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L130

... turns the ampersands back into & again.

The fix: change str(url) into url.string and all works fine.

@coleifer
Owner

Cool, found it earlier too, was working on a testcase but got lazy... fix is merged in though.

@coleifer coleifer closed this Dec 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment