Paragraph whitespace #307

bachbui · 2019-11-22T23:22:34Z

This PR updates the Commonmark renderer to support documents containing unicode emspaces (\u2003). There are two relevant changes:
When rendering Paragraphs, we were stripping all leading and trailing whitespace characters from their text to prevent the possibility of these characters being interpreted as markdown symbols, as leading spaces might be interpreted as an indented code block and trailing spaces might be interpreted as a line break. We were being overzealous in this stripping, as MD only considers certain whitespace characters as meaningful in this way. We obtained a narrower set of characters to strip by considering the chars which are matched by [\s], [ \f\n\r\t\v\u00a0\u1680\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff], and checking which of these are meaningful to markdown.
Secondly, we need to encode unicode whitespace characters as html entities in the rendered output, as otherwise many markdown parsers (including Markdown-it) will ignore them. This PR does that for emspace characters only, but other unicode whitespace characters can be added as needed in subsequent PRs.

- @atjson/renderer-commonmark@0.21.14-dev.0 - @atjson/source-commonmark@0.21.13-dev.0

bachbui · 2019-11-22T23:26:02Z

@gnorsilva I've made a dev build for you to test out these changes if you'd like. You can bump these to these packages versions:
@atjson/renderer-commonmark@0.21.14-dev.0
@atjson/source-commonmark@0.21.13-dev.0

gnorsilva

Looks good, checked in copilot-atjson and this fixes the issue 👍

tim-evans

This is lovely! Thanks for handling this so quickly @bachbui

I'm going to create a follow-up issue to track other non-breaking spaces that we should handle :)

bachbui · 2019-11-25T17:01:39Z

@gnorsilva This has been merged and released
@atjson/renderer-commonmark@0.21.14
@atjson/source-commonmark@0.21.13

bachbui added 2 commits November 22, 2019 13:56

🐞 Only strip MD-meaningful spaces in paragraphs

2752efc

🐞Encode \u2003 as &emsp;

17a71c8

bachbui requested review from gnorsilva, tim-evans and blaine November 22, 2019 23:22

Publish

b026eb9

- @atjson/renderer-commonmark@0.21.14-dev.0 - @atjson/source-commonmark@0.21.13-dev.0

gnorsilva approved these changes Nov 25, 2019

View reviewed changes

tim-evans approved these changes Nov 25, 2019

View reviewed changes

tim-evans mentioned this pull request Nov 25, 2019

Unicode whitespace is stripped from leading and trailing positions in markdown paragraphs #310

Open

bachbui merged commit 7aa3034 into latest Nov 25, 2019

bachbui deleted the paragraph-whitespace branch November 25, 2019 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Paragraph whitespace #307

Paragraph whitespace #307

bachbui commented Nov 22, 2019

bachbui commented Nov 22, 2019

gnorsilva left a comment

tim-evans left a comment

bachbui commented Nov 25, 2019

Paragraph whitespace #307

Paragraph whitespace #307

Conversation

bachbui commented Nov 22, 2019

bachbui commented Nov 22, 2019

gnorsilva left a comment

Choose a reason for hiding this comment

tim-evans left a comment

Choose a reason for hiding this comment

bachbui commented Nov 25, 2019