Skip to content

Conversation

@kylef
Copy link

@kylef kylef commented Dec 3, 2013

The markdown parser from John Gruber doesn't do this, and there is no mention of this in the specification.

This is a broken implementation because it doesn't encode everything. Ie, there may already be a %20 in the URL and a space. The original %20 should become %2520 and the space would already be turned into %20. If this should be percent encoding the URL's then it should use urllib.quote so it handles these cases and also encodes every other character.

The markdown parser from John Gruber doesn't do this, and there is no mention
of this in the specification.

This is a broken implementation because it doesn't encode everything. Ie, there
may already be a `%20` in the URL and a space. The original `%20` should become
`%2520` and the space would already be turned into `%20`. If this should be
percent encoding the URL's then it should use `urllib.quote` so it handles
these cases and also encodes every other character.
@waylan
Copy link
Member

waylan commented Dec 4, 2013

John Gruber has stated in the past that just because his implementation behaves a certain way doesn't mean it is the correct behavior. I would say this is such a case. Markdown.pl's behavior is a bug IMO.

Of course, the next question is: what is the correct behavior? I looked to other implementations for that. The current behavior was a fix to issue #152 based on what other implementations do. Follow the link to babelmark from that bug report.

Of course, that doesn't address the edge case you provided. A better fix is certainly welcome. Fell free to try out various edge cases on babelmark and see how other implementations handle them.

One final thought. Your patch did not address the tests. With your patch applied we have failing tests. If you'd like me to accept a pull request it needs to have passing tests. Fell free to add additional edge cases to the tests as well.

@kylef
Copy link
Author

kylef commented Dec 4, 2013

Okay, thanks for the feedback @waylan. I will update the pull-request to correctly encode it if this is expected behaviour when I get a chance.

Btw, you should have a look at hooking up travis-ci to run the tests, then it can mark the pull-request as build failure.

@waylan
Copy link
Member

waylan commented Jan 10, 2014

After giving this more thought, I'm not sure about my previous position on this issue. The fact is, an author might paste in a url that is already url encoded. In that case, we want to leave it as-is (for example %20 should not become %2520). However, if a url is not already url encoded, it should be. Not sure of the best way to do that. Perhaps unencode and then reencode??

Also, only the path section of the url should be percent encoded. The query string part should be "plus sign encoded" (use urllib.quote_plus). Regardless of what way I go, the current implementation is clearly wrong as it percent encodes all spaces -- even in the query string.

@waylan
Copy link
Member

waylan commented Jan 10, 2014

This has been fixed in d0e088d. Silly me, I forgot to mention this issue in the commit message.

@waylan waylan closed this Jan 10, 2014
@kylef
Copy link
Author

kylef commented Jan 10, 2014

@waylan That commit did what I did, instead of quoting the whole thing as your comment mentions? Is this correct?

@kylef kylef deleted the percent-encoding branch January 10, 2014 07:43
@waylan
Copy link
Member

waylan commented Jan 11, 2014

@kylef yes, I just turned off the incorrect encoding (percent encoding query strings) until I can figure out a reliable way to properly encode a url without messing with already encoded urls. Perhaps the only way to do that is to do nothing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants