Suppose you've got the following content:
(Note: the link itself is not valid due to mangled IDs (it was a private album))
Rendering this content as follows will not work:
The reason is that the "&" has been escaped and turned into "&". The HTML parser over at https://github.com/coleifer/micawber/blob/master/micawber/parsers.py#L144 does recognize & extract the URL, but it does not unescape &. Hence, & is fed to embed.ly... resulting in a 404 over there.
For starters, check the docs on the linebreaks filter (or the source). It appears that it accepts a parameter that controls what is escaped and will autoescape the text for you:
So you can simply not escape the stuff you pass into the filter.
But yeah, it probably needs to be able to unescape things. I will look into a fix.
Unescape HTML entities in the parser, fixes #14
Thanks. The example I posted still does not seem to work, as in the end the example URL posted to (in my case) embed.ly is still as follows:
Note that it contains amp. I didn't dive into it yet, though I can confirm that the amp is now correctly unescaped when extracted in parse_html ...
Can you submit a testcase? Also note the way the testcases are stuctured so as not to make any network calls.
You might also try putting "|oembed|linebreaksbr" instead.
I'll dive into this, hopefully later today ...
(I do understand that I can workaround this by ordering filters differently, or by disabling auto escaping -- but the oembed_html tag is supposed to handle HTML so it should handle this case just fine)
I hear ya, yeah -- if you can, could you also maybe get me:
Properly handle ampersands (closes #14)
Found the problem. As said, your change correctly turns & into &. However, the str(url) at this line ...
... turns the ampersands back into & again.
The fix: change str(url) into url.string and all works fine.
Cool, found it earlier too, was working on a testcase but got lazy... fix is merged in though.