New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pagedown by markdown-it #5526

Merged
merged 1 commit into from Jan 10, 2015

Conversation

Projects
None yet
4 participants
@svbergerem
Copy link
Member

svbergerem commented Jan 4, 2015

Fixes #5457

s = String.fromCharCode(v) + s;
c = c / 51 | 0;
}
return "Q" + s + "Q";

This comment has been minimized.

@Flaburgan

Flaburgan Jan 4, 2015

Member

One name variable are not really easy to read

This comment has been minimized.

@svbergerem
@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 4, 2015

Markdown.Converter and Markdown.Sanitizer are the latest versions from https://code.google.com/p/pagedown/source/browse/ (rev. 2a8c75ce3fb5)

The activated extensions for Markdown.Extra are Tables, Fenced Code Blocks, SmartyPants, Newlines and Strikethrough. A lot of new users are confused because of the default newline behavior of Markdown. That's why I activated Newlines. I can also activate other extensions if someone thinks that they would help.

@Flaburgan

This comment has been minimized.

Copy link
Member

Flaburgan commented Jan 4, 2015

It's a good idea to activate Newlines ;) Maybe footnotes looks cool too.

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 4, 2015

This also fixes (at least) #5216 and #5290

This doesn't fix #5379

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 4, 2015

Looks like updating pagedown breaks some important stuff. (even if pagedown-extra isn't activated) Hashtags at the beginning of a line are now interpreted as headlines. m(

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 5, 2015

I looked at different markdown implementations and read that the reference implementation in perl also misinterprets hashtags and a lot of implementations try to be compatible with the perl implementation. (afaik also pagedown)

We could use markdown-it which fixes all mentioned issues and would also fix #5379 and #4201. Hashtags at the beginning of a line aren't misinterpreted anymore. markdown-it has some problems with links which include non-English chars but that is a known problem: markdown-it/markdown-it#2. We would have to wait until they fixed that but the repo is quite active so hopefully it will be fixed soon.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jan 5, 2015

I looked at different markdown implementations and read that the reference implementation in perl also misinterprets hashtags and a lot of implementations try to be compatible with the perl implementation. (afaik also pagedown)

I'd recommend to forget about "implementstions" and switch to "specification".

http://commonmark.org/

That will solve 99% of your issues at once and forever.

markdown-it has some problems with links which include non-English chars but that is a known problem: markdown-it/markdown-it#2. We would have to wait until they fixed that but the repo is quite active so hopefully it will be fixed soon.

I think ~ 2-3 months, would like to take timeout and see how people accept 3.0.0 release.

Problem is not to fix it fast, problem is to make it work with high speed, and to have nice api for extensions (skype, tel links and so on).

If you can't wait and highest speed is not critical, you can use https://github.com/kevva/url-regex + linkify rule src to quickly create your "right" plugin. That will not be universal & fast as i wish, but can work for you.

PS. When i say fast, i mean > 1mb/sec parse speed. Make sense for server side.

@jhass

This comment has been minimized.

Copy link
Member

jhass commented Jan 5, 2015

Can't we just do the hashtag parsing before the markdown parsing? That should turn the hashtags into links which no longer should be recognized as headings.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jan 5, 2015

Can't we just do the hashtag parsing before the markdown parsing? That should turn the hashtags into links which no longer should be recognized as headings.

I don't recomment such approach. You need to understand doc structure to not break anything else. That's possible only at parse phase. In other word, you will reinvent your own markdown parser or will do dirty tricks with unsafe output :)

AFAIK, CommonMark spec requires space after ## in headers. You should have no conflicts with hashtag.

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 5, 2015

@jhass Calling markdownify after hashtagify removes the links.

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 5, 2015

There are also some other problems with our current approach. By calling hashtagify and mentionify after markdownify we also parse tags and mentions inside of code blocks and I think we shouldn't do that. I guess the cleanest way to do it right would be writing plugins for those two tasks for the parser so they are only called where we really want that.

@svbergerem svbergerem force-pushed the svbergerem:update-pagedown branch from 773a771 to fdafefd Jan 9, 2015

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 9, 2015

I added a temporary fix for markdown-it linkify which we can remove once that upstream function uses another library to find urls. We might also be able to get rid of punycode someday.

Our specs succeed now. (except for some random failures) I'd say I'll do some cleanup and then this PR should be ready for a review.

};

},{"url-regex":1}]},{},[4])(4)
});

This comment has been minimized.

@jhass

jhass Jan 9, 2015

Member

Should probably go into lib/assets

This comment has been minimized.

@svbergerem

svbergerem Jan 9, 2015

Member

Sure, I'll move that file.

var hashtagPlugin = window.markdownitHashtag;
md.use(hashtagPlugin, {
// TODO is this [:alnum:] in ruby?
hashtagRegExp: '[\\x0080-\\xFFFF\\w\\-]+|<3',

This comment has been minimized.

@svbergerem

svbergerem Jan 9, 2015

Member

Can someone help here? I think it is important that we apply the markup to the same hashtags that ActsAsTaggableOn accepts. We set those accepted chars to [:alnum:] but I wasn't able to find out what exactly the chars in [:alnum:] are so I used the same values we had before.

This comment has been minimized.

@jhass

jhass Jan 9, 2015

Member

They're documented at the Regexp class in the Ruby documentation:

POSIX bracket expressions are also similar to character classes. They provide a portable alternative to the above, with the added benefit that they encompass non-ASCII characters. For instance, /\d/ matches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ matches any character in the Unicode Nd category.

So on into the standard:

The set of single-character collating elements whose characters belong to the character class, as defined in the LC_CTYPE category in the current locale.

Oh well not too helpful. Source code diving time then! The engine used by Ruby for regular expressions is called oniguruma. Some digging into its code brings up: https://github.com/ruby/ruby/blob/trunk/enc/unicode/name2ctype.src#L3695

Uhm okay, someone had fun figuring that out there. 566, looks like the number of entries. Well, actually half of it, so we're dealing with pairs here. Talking about codepoints, must be ranges.

Onto some Ruby magic! And then let's crash everyones browser!

Now that was fun, wasn't it? Bonus points when you can view the paste and have all fonts to view all of the beautiful alphanumeric characters!

This comment has been minimized.

@svbergerem

svbergerem Jan 10, 2015

Member

Thank you. I searched for "alnum" in the ruby repo but GitHub didn't list that file.

This comment has been minimized.

@jhass

jhass Jan 10, 2015

Member

Yeah, the Github search is still not very trustworthy.

@@ -97,21 +147,21 @@ describe("app.helpers.textFormatter", function(){
"http://xn--4gbrim.xn----ymcbaaajlc6dj7bxne2c.xn--wgbh1c/",
"http:///scholar.google.com/citations?view_op=top_venues",
"http://lyricstranslate.com/en/someone-you-%E0%B4%A8%E0%B4%BF%E0%B4%A8%E0%B5%8D%E0%B4%A8%E0%B5%86-%E0%B4%AA%E0%B5%8B%E0%B4%B2%E0%B5%8A%E0%B4%B0%E0%B4%BE%E0%B4%B3%E0%B5%8D%E2%80%8D.html",
"http://de.wikipedia.org/wiki/Liste_der_Abk%C3%BCrzungen_%28Netzjargon%29",
"http://de.wikipedia.org/wiki/Liste_der_Abk%C3%BCrzungen_(Netzjargon)",

This comment has been minimized.

@jhass

jhass Jan 9, 2015

Member

Why the test case change?

This comment has been minimized.

@svbergerem

svbergerem Jan 9, 2015

Member

We replaced parentheses because otherwise pagedown had some problems.

// markdown doesn't like '(' or ')' anywhere, except where it wants
var workingUrl = unicodeUrl.replace(/\(/, "%28").replace(/\)/, "%29");

It looks like markdown-it doesn't have those problems so there was no need to replace them. I changed the test because the behavior changed.

This comment has been minimized.

@jhass

jhass Jan 9, 2015

Member

k, thanks.

@svbergerem svbergerem force-pushed the svbergerem:update-pagedown branch 2 times, most recently from 34703e8 to 43fa254 Jan 10, 2015

@svbergerem svbergerem changed the title Update pagedown and integrate pagedown-extra using rails-assets-pagedown Replace pagedown by markdown-it Jan 10, 2015

Steffen van Bergerem

@svbergerem svbergerem force-pushed the svbergerem:update-pagedown branch from 43fa254 to 0267731 Jan 10, 2015

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 10, 2015

The definition of alnum is now in lib/assets/javascripts/posix-bracket-expressions.js. (I hope that is ok)
I did some cleanup so the code should be ready to review/merge.

This PR fixes #3374, #4201, #5216 and #5379.
It closes #5290 because we now have GFM style line breaks. It also closes #5457 because it removes pagedown.

I tested this with the following post:

#5379
*This* **is** a test to show a b*u*g for devel**op**e(r)s
*This* **is** a test to show a b *u*g for devel**op**e(r)s
*This* **is** a test to show a b*u* g for devel**op**e(r)s
#4201
```
http://www.example.com
<3
```
#5216
*http://example.com* **http://example.com** _http://example.com_
#3374
mailto:test@example.com
xmpp:test@example.com

before:
markdown-it-before

after:
markdown-it-after
(There is a link that should be bold. That is a css bug and not a bug in the markdown processing.)

@jhass jhass added this to the next-major milestone Jan 10, 2015

@jhass

This comment has been minimized.

Copy link
Member

jhass commented Jan 10, 2015

<3 Awesome! Thank you!

@jhass jhass merged commit 0267731 into diaspora:develop Jan 10, 2015

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details

jhass added a commit that referenced this pull request Jan 10, 2015

Merge pull request #5526 from svbergerem/update-pagedown
Replace pagedown by markdown-it

@svbergerem svbergerem deleted the svbergerem:update-pagedown branch Jan 10, 2015

@Flaburgan

This comment has been minimized.

Copy link
Member

Flaburgan commented Jan 10, 2015

@svbergerem this is really awesome. Thank you so much!

@Flaburgan

This comment has been minimized.

Copy link
Member

Flaburgan commented Jan 10, 2015

@svbergerem looks like there is some regressions, users started to use br element to have line-break (probably because they didn't know about double spaces), so now we have posts like that:

https://diaspora-fr.org/posts/935093

(Note that the bold also fails...) Same rendering in the SPV and in the stream btw. d-fr runs the last develop code.

@svbergerem

This comment has been minimized.

Copy link
Member

svbergerem commented Jan 10, 2015

Yeah I know. ( @jhass already told me) markdown-it has an option to allow html input but that would allow too much. I am going to write a plugin similar to Markdown.Sanitizer.js (pagedown).

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jan 10, 2015

Bold fails because
follows as text without space or punctuation.

@svbergerem all those hand-made sanitizers are very unsecure. That's why we didn't tryed to filter html at all - no chances to compete with owasp. Consider alternatives:

  • plugin for <br> only, if you don't need anything else
  • convert posts with br to new format (2 spaces or \ in the end of line)

Idea behind markdown-it is not use html at all. I think, historically people mixed html with markdown because it was almost impossible to extend syntax in other parsers.

@jhass

This comment has been minimized.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jan 10, 2015

markdown-it dingus

Hm... seems bold is buggy. Will fix soon. Need some time to understand, will it be enougth to add <> as punctuation or not.

<br> in reference dingus is rendered because it has no option to disable html at all. With all question about seciruty they say "use sanitizer, it's not parser task".

@jhass

This comment has been minimized.

Copy link
Member

jhass commented Jan 10, 2015

<br> in reference dingus is rendered because it has no option to disable html at all.

Sure, that wasn't the point, the markdown-it dingus doesn't render it properly in any case, whether HTML is allowed, not allowed or the strict mode is enabled.

@puzrin

This comment has been minimized.

Copy link

puzrin commented Jan 11, 2015

Fixed in 3.0.3. Thanks for report!

@jhass

This comment has been minimized.

Copy link
Member

jhass commented Jan 11, 2015

Thanks for the quick update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment