Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Fixing problem with links inside of < > brackets in safe mode. #72

Merged
merged 15 commits into from Jan 19, 2012

Conversation

Projects
None yet
3 participants
Contributor

mdirolf commented Jan 14, 2012

There's a test case that better explains what was going wrong.

Also, a fix for an issue introduced with the logic inversion in commit 35930e0...

mdirolf and others added some commits Jan 13, 2012

New footnotes configuration option: BACKLINK_TEXT (second try).
BACKLINK_TEXT specifies the text that's used in the link at the end of
the footnote to link back up to the reader's place. It still defaults to
"&#8617;".

Okay, so at first I had an uncessarily complicated commit for this and
submitted a pull request. Waylan showed me a better way to do it, here:

startling@ee7d1a2#markdown/extensions/footnotes.py-P19

So I made another commit and added it to the pull request. But then I
accidentally added yet another commit to the pull request, accidentally.
Since then, I've realized it would be best to start with a new branch
and closed that first pull request.

Hopefully this will be the last try.
fixed an error in the BACKLINK_TEXT option in the footnotes extension.
I accidentally changed the wrong line (L294 instead of L293) to
"self.footnotes.getConfig("BACKLINK_TEXT")" before. This fixes that.
Fixed #69. url_sanitize no longer crashes on unparsable urls.
Also optimized the code to bypass parsing when not in safe_mode and return
immediately upon failure rather than continue parsing when in safe_mode.

Note that in Python2.7+ more urls may fail than in older versions because
IPv6 support was added to urlparse and it apparently mistakenly identifies some
urls as IPv6 when they are not. Seeing this only applies to safe_mode now,
I don't really care.
Fixed #61. stdin and stdout should work better in python 3.
Apparently, in Python3 stdin and stdout take str (unicode) not bytes.
This provides a solution that will work in both python 2 & 3.
Fixed #68. Blank line is not required after html comments.
Interestingly, the change to the misc/mismatched-tags test is inline with
PHP Markdown Extra's behavior  but not markdown.pl, which produces invalid html.
When safe mode is 'escape', don't allow bad html to stop further proc…
…essing.

See tests/html4_safe/html_then_blockquote.(txt|html).

It looks like having unclosed block-level html elements was causing
further processing not to happen, even in the case where we're
escaping HTML. Since we're escaping HTML, it seems like it shouldn't
affect processing at all. This changes output results in a couple
of other tests, but the new output seems reasonable to me.
Contributor

mdirolf commented Jan 14, 2012

Just added another commit - a2377e1 - that fixes an issue where unclosed block-level html elements would stop further processing of markdown. This seems like correct behavior when passing-through html, but not when escaping it. The change turns off html processing when safe mode is set to 'escape'. In that case I think we just want to escape the text and be done with it.

Feedback?

Owner

waylan commented Jan 18, 2012

Hey mdrilf,

Thanks for the work on this. However, in the future, could you keep unrelated issues in separate pull requests? Some of the stuff here is a no-brainier, but others I need to look at closer. Everything all grouped together makes that harder to do. I guess what I'm trying to say is that less work for me gets your requests merged faster.

I'll need to play with 425fde1 and compare with other implementations. Babelmark makes this fairly easy.

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

a2377e1 is an interesting solution I hadn't considered. IIRC, the current behavior is there for historical reasons. We didn't always use ElementTree (which gives us escaping for free). No reason why we can't take advantage of that now. Question though, based on your commit message, is sounds like your patch is just masking a bug in the raw html parser which will still manifest itself when we don't use safe_mode='escape'. If so, we need to fix that. Could you provide more info (preferably in a separate bug report).

I'll probably just merge your commits on the fenced code blocks, but those really should be part of issue #53. At least those bugs will be fixed while waiting for the refactor and the tests will help in doing the refactor as well.

Contributor

mdirolf commented Jan 18, 2012

Hi Waylan,

Thanks for all of your hard work on Python-Markdown. We are using it
at https://fiesta.cc

Replies are inline:

Thanks for the work on this. However, in the future, could you keep unrelated issues in separate pull requests? Some of the stuff here is a no-brainier, but others I need to look at closer. Everything all grouped together makes that harder to do. I guess what I'm trying to say is that less work for me gets your requests merged faster.

Yes I will try to - GH was yelling at me for trying to create a second
pull request but maybe I just didn't search around hard enough.

I'll need to play with 425fde1 and compare with other implementations. Babelmark makes this fairly easy.

I think the test case is definitely demonstrating proper behavior
here, but not sure that my fix is correct. I was fumbling around a bit
trying to figure out the best way to accomplish it.

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

That is a bug fix: commit 35930e0 changed

  •    if netloc != '' or scheme in locless_schemes:
    
  •    if netloc == '' or scheme not in locless_schemes:
    

In order to properly reverse the logic it needs to be an and I
think. There was actually a failing test on my machine (python 2.6)
when I checked out that change.

a2377e1 is an interesting solution I hadn't considered. IIRC, the current behavior is there for historical reasons. We didn't always use ElementTree (which gives us escaping for free). No reason why we can't take advantage of that now. Question though, based on your commit message, is sounds like your patch is just masking a bug in the raw html parser which will still manifest itself when we don't use safe_mode='escape'. If so, we need to fix that. Could you provide more info (preferably in a separate bug report).

I'm not sure that it's masking a bug. The problem is basically that if
I have the following markdown:

<div>
*bold*

In the non-safe-mode case, I'm trying to inline html - so the bold
shouldn't get processed since it's inside of an unclosed

. In the
safe-mode case, the
is basically meaningless, so bold should
turn into bold. Does that help explain the problem?

I'll probably just merge your commits on the fenced code blocks, but those really should be part of issue #53. At least those bugs will be fixed while waiting for the refactor and the tests will help in doing the refactor as well.

Yeah sorry for the poor protocol - we needed all of those changes fast
so I took some shortcuts in reporting them back to you. Will try to be
better in the future.

Owner

waylan commented Jan 18, 2012

Mike,

My replied to your replies are inline.

On Wed, Jan 18, 2012 at 3:53 PM, Mike Dirolf wrote:

Hi Waylan,

Thanks for all of your hard work on Python-Markdown. We are using it
at https://fiesta.cc

Replies are inline:

Thanks for the work on this. However, in the future, could you keep unrelated issues in separate pull requests? Some of the stuff here is a no-brainier, but others I need to look at closer. Everything all grouped together makes that harder to do. I guess what I'm trying to say is that less work for me gets your requests merged faster.

Yes I will try to - GH was yelling at me for trying to create a second
pull request but maybe I just didn't search around hard enough.

Ah right, that weird GH limit of only one pull request per user per
project. I always forget about it - probably because I don't
understand it.

I'll need to play with 425fde1 and compare with other implementations. Babelmark makes this fairly easy.

I think the test case is definitely demonstrating proper behavior
here, but not sure that my fix is correct. I was fumbling around a bit
trying to figure out the best way to accomplish it.

Thanks, I'll take a closer look later tonight.

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

That is a bug fix: commit 35930e0 changed

  •        if netloc != '' or scheme in locless_schemes:
  •        if netloc == '' or scheme not in locless_schemes:

In order to properly reverse the logic it needs to be an and I
think. There was actually a failing test on my machine (python 2.6)
when I checked out that change.

No, there were 2 changes to that line. Both sides of the or were
negated. It is still an or. Besides, I see no code change in your
commit so this a documentation edit - an inaccurate one unless I'm
missing something (like your failing test??). I'll need more info
before doing anything with this.

a2377e1 is an interesting solution I hadn't considered. IIRC, the current behavior is there for historical reasons. We didn't always use ElementTree (which gives us escaping for free). No reason why we can't take advantage of that now. Question though, based on your commit message, is sounds like your patch is just masking a bug in the raw html parser which will still manifest itself when we don't use safe_mode='escape'. If so, we need to fix that. Could you provide more info (preferably in a separate bug report).

I'm not sure that it's masking a bug. The problem is basically that if
I have the following markdown:

   


   bold

In the non-safe-mode case, I'm trying to inline html - so the bold
shouldn't get processed since it's inside of an unclosed

. In the
safe-mode case, the
is basically meaningless, so bold should
turn into bold. Does that help explain the problem?

Thanks, got it now. This change will actually speed up the markdown
parser when safe_mode='escape' as parsing the raw html is a
performance suck. Optimizations that avoid running code unnecessarily
are always good.

I'll probably just merge your commits on the fenced code blocks, but those really should be part of issue #53. At least those bugs will be fixed while waiting for the refactor and the tests will help in doing the refactor as well.

Yeah sorry for the poor protocol - we needed all of those changes fast
so I took some shortcuts in reporting them back to you. Will try to be
better in the future.

I understand. Thanks for the feedback.

Contributor

mdirolf commented Jan 18, 2012

Great - sounds like all of the commits are fine except one still needs
discussion, see below:

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

That is a bug fix: commit 35930e0 changed

  •        if netloc != '' or scheme in locless_schemes:
  •        if netloc == '' or scheme not in locless_schemes:

In order to properly reverse the logic it needs to be an and I
think. There was actually a failing test on my machine (python 2.6)
when I checked out that change.

No, there were 2 changes to that line. Both sides of the or were
negated. It is still an or.  Besides, I see no code change in your
commit so  this a documentation edit - an inaccurate one unless I'm
missing something (like your failing test??). I'll need more info
before doing anything with this.

Yeah but this is boolean logic, right? To negate (A or B) we need (!A
and !B) - your change made it (!A or !B). The failing test case wasn't
a new one I wrote, but one of the existing tests - can you try running
them on Python 2.6?

Owner

waylan commented Jan 19, 2012

On Wed, Jan 18, 2012 at 4:40 PM, Mike Dirolf wrote:

Great - sounds like all of the commits are fine except one still needs
discussion, see below:

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

That is a bug fix: commit 35930e0 changed

  •        if netloc != '' or scheme in locless_schemes:
  •        if netloc == '' or scheme not in locless_schemes:

In order to properly reverse the logic it needs to be an and I
think. There was actually a failing test on my machine (python 2.6)
when I checked out that change.

No, there were 2 changes to that line. Both sides of the or were
negated. It is still an or.  Besides, I see no code change in your
commit so  this a documentation edit - an inaccurate one unless I'm
missing something (like your failing test??). I'll need more info
before doing anything with this.

Yeah but this is boolean logic, right? To negate (A or B) we need (!A
and !B) - your change made it (!A or !B). The failing test case wasn't
a new one I wrote, but one of the existing tests - can you try running
them on Python 2.6?

Right. I was thinking I changed (!A or B) to (A or !B). Anyway, your
patch fixes it. Not sure how I missed that.

Contributor

mdirolf commented Jan 19, 2012

Okay glad we got that figured out - was worried I had done something
wrong there.

On Wed, Jan 18, 2012 at 20:38, Waylan Limberg
reply@reply.github.com
wrote:

On Wed, Jan 18, 2012 at 4:40 PM, Mike Dirolf  wrote:

Great - sounds like all of the commits are fine except one still needs
discussion, see below:

Regarding 542324b, is that just a documentation edit? Was the actual logic changed previously, Should it not have been?

That is a bug fix: commit 35930e0 changed

  •        if netloc != '' or scheme in locless_schemes:
  •        if netloc == '' or scheme not in locless_schemes:

In order to properly reverse the logic it needs to be an and I
think. There was actually a failing test on my machine (python 2.6)
when I checked out that change.

No, there were 2 changes to that line. Both sides of the or were
negated. It is still an or.  Besides, I see no code change in your
commit so  this a documentation edit - an inaccurate one unless I'm
missing something (like your failing test??). I'll need more info
before doing anything with this.

Yeah but this is boolean logic, right? To negate (A or B) we need (!A
and !B) - your change made it (!A or !B). The failing test case wasn't
a new one I wrote, but one of the existing tests - can you try running
them on Python 2.6?

Right. I was thinking I changed (!A or B) to (A or !B). Anyway, your
patch fixes it. Not sure how I missed that.


Reply to this email directly or view it on GitHub:
waylan#72 (comment)

@waylan waylan merged commit 69b365d into Python-Markdown:master Jan 19, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment