Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

HTML parser doesn't deal with &amp #14

Closed
pennersr opened this Issue · 8 comments

2 participants

@pennersr

Suppose you've got the following content:

Testing

http://picasaweb.google.com/lh/sredir?uname=test&target=ALBUM&id=123&authkey=abc

(Note: the link itself is not valid due to mangled IDs (it was a private album))

Rendering this content as follows will not work:

{{post.body|linebreaksbr|oembed_html}}

The reason is that the "&" has been escaped and turned into "&amp". The HTML parser over at https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L144 does recognize & extract the URL, but it does not unescape &amp. Hence, &amp is fed to embed.ly... resulting in a 404 over there.

@coleifer
Owner

Hmm...

For starters, check the docs on the linebreaks filter (or the source). It appears that it accepts a parameter that controls what is escaped and will autoescape the text for you:

So you can simply not escape the stuff you pass into the filter.

But yeah, it probably needs to be able to unescape things. I will look into a fix.

@coleifer coleifer closed this in 3ff5c18
@pennersr

Thanks. The example I posted still does not seem to work, as in the end the example URL posted to (in my case) embed.ly is still as follows:

http://api.embed.ly/1/oembed?format=json&key=8af3ff804a6csecret&url=http%3A%2F%2Fpicasaweb.google.com%2Flh%2Fsredir%3Funame%3Dtest%26amp%3Btarget%3DALBUM%26amp%3Bid%3D123%26amp%3Bauthkey%3Dabc

Note that it contains amp. I didn't dive into it yet, though I can confirm that the amp is now correctly unescaped when extracted in parse_html ...

@coleifer coleifer reopened this
@coleifer
Owner

Can you submit a testcase? Also note the way the testcases are stuctured so as not to make any network calls.

@coleifer
Owner

You might also try putting "|oembed|linebreaksbr" instead.

@pennersr

I'll dive into this, hopefully later today ...

(I do understand that I can workaround this by ordering filters differently, or by disabling auto escaping -- but the oembed_html tag is supposed to handle HTML so it should handle this case just fine)

@coleifer
Owner

I hear ya, yeah -- if you can, could you also maybe get me:

  1. the initial text
  2. the text after piping through django
  3. the url that is sent off to embed.ly or processed by micawber

Thanks mane!

@pennersr pennersr referenced this issue from a commit in pennersr/micawber
@pennersr pennersr Properly handle ampersands (closes #14) 025c77b
@pennersr

Found the problem. As said, your change correctly turns & into &. However, the str(url) at this line ...

https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L130

... turns the ampersands back into & again.

The fix: change str(url) into url.string and all works fine.

@coleifer
Owner

Cool, found it earlier too, was working on a testcase but got lazy... fix is merged in though.

@coleifer coleifer closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.