HTML parser doesn't deal with &amp #14

pennersr opened this Issue Dec 18, 2012 · 8 comments

2 participants


Suppose you've got the following content:


(Note: the link itself is not valid due to mangled IDs (it was a private album))

Rendering this content as follows will not work:


The reason is that the "&" has been escaped and turned into "&amp". The HTML parser over at does recognize & extract the URL, but it does not unescape &amp. Hence, &amp is fed to resulting in a 404 over there.



For starters, check the docs on the linebreaks filter (or the source). It appears that it accepts a parameter that controls what is escaped and will autoescape the text for you:

So you can simply not escape the stuff you pass into the filter.

But yeah, it probably needs to be able to unescape things. I will look into a fix.

@coleifer coleifer closed this in 3ff5c18 Dec 19, 2012

Thanks. The example I posted still does not seem to work, as in the end the example URL posted to (in my case) is still as follows:

Note that it contains amp. I didn't dive into it yet, though I can confirm that the amp is now correctly unescaped when extracted in parse_html ...

@coleifer coleifer reopened this Dec 20, 2012

Can you submit a testcase? Also note the way the testcases are stuctured so as not to make any network calls.


You might also try putting "|oembed|linebreaksbr" instead.


I'll dive into this, hopefully later today ...

(I do understand that I can workaround this by ordering filters differently, or by disabling auto escaping -- but the oembed_html tag is supposed to handle HTML so it should handle this case just fine)


I hear ya, yeah -- if you can, could you also maybe get me:

  1. the initial text
  2. the text after piping through django
  3. the url that is sent off to or processed by micawber

Thanks mane!

@pennersr pennersr added a commit to pennersr/micawber that referenced this issue Dec 20, 2012
@pennersr pennersr Properly handle ampersands (closes #14) 025c77b

Found the problem. As said, your change correctly turns & into &. However, the str(url) at this line ...

... turns the ampersands back into & again.

The fix: change str(url) into url.string and all works fine.


Cool, found it earlier too, was working on a testcase but got lazy... fix is merged in though.

@coleifer coleifer closed this Dec 20, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment