Render Sanitized (!!!) Markdown via an AST #28155

Hamms · 2019-04-23T18:03:12Z

Specifically, instead of rendering markdown to a string with remark, then injecting that string with dangerouslySetInnerHTML, we now:

Parse the markdown to a Markdown Abstract Syntax Tree (MDAST) with remark-parse
Convert the MDAST to an HTML Abstract Syntax Tree (HAST) with remark-rehype
Parse the "raw" HTML nodes in the HAST with rehype-raw
Sanitize the HAST with rehype-sanitize
Compile the HAST directly to React with rehype-react

The advantage of doing all that is that we are now no longer using dangerouslySetInnerHTML to render our markdown, and in fact because we are deeply parsing the generated content we can be confident that we are now protected from XSS injections!

Next steps (in a separate PR): rename the UnsafeRenderedMarkdown component to reflect the fact that it is now safe

…ithout dangerouslySetInnerHTML

…d to be added to rendered markdown components

joshlory

LGTM. Which of these components pulls in the extra parse5 dependency?

joshlory · 2019-04-24T00:31:45Z

apps/src/templates/UnsafeRenderedMarkdown.jsx

+  'title',
+  'value',
+  'xml'
+);


Will be cool to be able to hand this off to a dedicated Blockly component in the future, instead of rendering it as a post-React-render pass!

islemaster · 2019-04-24T00:43:23Z

apps/webpack.js

-  path.resolve(__dirname, 'node_modules', '@code-dot-org', 'snack-sdk')
+  path.resolve(__dirname, 'node_modules', '@code-dot-org', 'snack-sdk'),
+  // parse5 ships in ES6: https://github.com/inikulin/parse5/issues/263#issuecomment-410745073
+  path.resolve(__dirname, 'node_modules', 'parse5')


We encourage users to perform transpilation in their setup if they target ancient platforms that don't support ES6 syntax.

(emphasis mine 😁)

islemaster

Amazing.

Hamms · 2019-04-24T01:42:16Z

Which of these components pulls in the extra parse5 dependency?

That would be rehype-raw, specifically its dependency hast-util-raw, which uses parse5 for a whole bunch of stuff that I haven't even begun to grok

Note that I can see two potential futures where this is particularly relevant:

A future in which we refactor out the parse5 dependency would involve figuring out exactly what that library is doing, and refactoring it to work with native browser functionality, rather than the node-compatible functionality that parse5 provides. It's not at all clear to me how much of the parse5 stuff that's being used is its HTML parsing ability and how much is its syntax tree traversal and creation functionality, so that might end up being a lot of work.
A future in which we stop supporting raw HTML in markdown entirely would simply involve removing rehype-raw (and possibly rehype-sanitize?) from this process. In that future, we might also be able to replace basically all of this with remark-react

Hamms · 2019-04-24T18:41:37Z

Oooh, check it out: because we are now rendering our markdown as React, the unit tests are failing because they're getting the error Unknown prop `align` on <img> tag. Align is a deprecated property for images that we are presumably using somewhere in markdown (I assume in instructions), and we're now being notified of our mistake!

I'll open a separate PR with a content fix.

…ontent

Hamms · 2019-04-26T00:57:53Z

Unfortunately, it looks like this breaks inline Blockly as it's currently implemented. (As an aside, I'm greatly saddened that none of our CI tests caught that).

Specifically, there are two issues. The first is that because the XML is rendered by react, it includes a bunch of comment nodes:

<xml><!-- react-text: 51 -->
  <!-- /react-text --><block type="when_run"><!-- react-text: 53 -->
    <!-- /react-text --><next><!-- react-text: 55 -->
      <!-- /react-text --><block type="controls_repeat_dropdown"><!-- react-text: 57 -->
        <!-- /react-text --><title name="user-content-TIMES"><!-- react-text: 59 -->3<!-- /react-text --></title><!-- react-text: 60 -->
        <!-- /react-text --><statement name="user-content-DO"><!-- react-text: 62 -->
          <!-- /react-text --><block type="maze_moveForward"><!-- react-text: 64 -->
        <!-- /react-text --></block></statement><!-- react-text: 65 -->
        <!-- /react-text --><next><!-- react-text: 67 -->
          <!-- /react-text --><block type="bee_ifNectarAmount"><!-- react-text: 69 -->
            <!-- /react-text --><title name="user-content-ARG1"><!-- react-text: 71 -->nectarRemaining<!-- /react-text --></title><!-- react-text: 72 -->
            <!-- /react-text --><title name="user-content-OP"><!-- react-text: 74 -->==<!-- /react-text --></title><!-- react-text: 75 -->
            <!-- /react-text --><title name="user-content-ARG2"><!-- react-text: 77 -->1<!-- /react-text --></title><!-- react-text: 78 -->
          <!-- /react-text --></block><!-- react-text: 79 -->
        <!-- /react-text --></next><!-- react-text: 80 -->
      <!-- /react-text --></block><!-- react-text: 81 -->
    <!-- /react-text --></next><!-- react-text: 82 -->
  <!-- /react-text --></block><!-- react-text: 83 -->
<!-- /react-text --></xml>

Blockly doesn't know how to deal with comment nodes, and breaks when trying to initialize the blockspace. This isn't awful, though; we can pretty easily use something like TreeWalker to filter out comments from any arbitrary dom structure before passing it off to Blockly.

Second and more annoying: for some reason, the renderer is also actually mutating the attributes on the XML nodes. <title name="user-content-TIMES">3</title>, for example, is supposed to be just <title name="TIMES">3</title>; I'm not sure where the user-content piece is coming from

Specifically, two changes are needed: First, now that we are generating XML with React we end up with a bunch of React comment nodes in our XML which break Blockly. Add a step to the "render XML as Blockly" helper to clean out comment nodes. Second, hast-util-sanitize by default clobbers 'name' attributes for some reason. Blockly XML uses that attribute extensively, so let it through.

Hamms · 2019-04-26T01:22:20Z

Note that right now, the sanitizer is allowing all our Blockly XML through completely un sanitized, which means we are not quite yet safe from XSS injection.

Of course, we currently aren't sanitizing anything, so this PR still puts us in a strictly better place that we currently are. It just means we'll have to make sure to sanitize our XML before we can confidently rename the UnsafeRenderedMarkdown component

… nodes

Hamms · 2019-04-30T00:25:39Z

@islemaster @joshlory could y'all take another look at this, with the Blockly rendering fixes?

islemaster · 2019-04-30T16:46:37Z

👍 On my list, I'll take a look today.

islemaster · 2019-04-30T23:41:25Z

apps/test/unit/templates/convertXmlToBlockly.js

+      '<xml><!-- react-text: 1 --><block type="variables_get"><!-- react-text: 2 --><title name="VAR"><!-- react-text: 3 -->i</title></block></xml>';
+    container.innerHTML = content;
+    convertXmlToBlockly(container);
+    expect(container.innerHTML).to.not.equal(content);


Should there be an expect(container.innerHTML).to.equal(content) check before the convertXmlToBlockly call, just to prove that setting and getting innerHTML doesn't itself cause some transform (like stripping comment nodes)?

islemaster · 2019-04-30T23:42:40Z

apps/test/unit/templates/convertXmlToBlockly.js

+    container.innerHTML = content;
+    convertXmlToBlockly(container);
+    expect(container.innerHTML).to.not.equal(content);
+    expect(container.getElementsByTagName('svg').length).to.equal(2);


Potentially dumb question: Why are there two svg elements?

Blockly creates a <svg id="blocklyFilters"> element to store blurs and whatnot, and then a separate svg element to render the blocks themselves

islemaster

👍 Updates look good.

… changes are actually being applied by the specific method we're testing

joshlory · 2019-05-16T20:00:52Z

Now that this is merged, should we rename UnsafeRenderedMarkdown -> Markdown?

Hamms added 6 commits April 23, 2019 11:01

render markdown sanitized, with the help of an AST

15037cd

remove stripStyles plugin; it's taken care of by sanitization

67428e6

transpile the parse5 library, since it ships in es6

934ea2b

update tests to use JSX equality, now that we're actually rendering w…

334c16d

…ithout dangerouslySetInnerHTML

configure sanitization to support our custom plugins

7b3aeda

update 'allows JS injection' test to now be 'prevents JS injection'

2216334

Hamms changed the title ~~Rehype react~~ Render Sanitized (!!!) Markdown via an AST Apr 23, 2019

Hamms requested review from joshlory and islemaster April 23, 2019 21:52

Hamms marked this pull request as ready for review April 23, 2019 21:52

update some tests that cared about the (unimportant) newline that use…

b37d8c2

…d to be added to rendered markdown components

Hamms mentioned this pull request Apr 23, 2019

[WIP] AST-backed markdown in react #27946

Closed

joshlory approved these changes Apr 24, 2019

View reviewed changes

islemaster reviewed Apr 24, 2019

View reviewed changes

islemaster approved these changes Apr 24, 2019

View reviewed changes

Hamms added 3 commits April 25, 2019 12:49

update integration test to use valid HTML

48a1820

update some more tests that cared about irrelevant newlines in html c…

ab0b8af

…ontent

add support for image formatting in sanitization

2ece40b

Hamms requested review from joshlory and islemaster April 26, 2019 01:20

Hamms added 2 commits April 29, 2019 15:28

Merge branch 'staging' into rehype-react

e2e7c09

make sure removeCommentNodes is a no-op for trees without any comment…

4a88dc1

… nodes

islemaster reviewed Apr 30, 2019

View reviewed changes

islemaster approved these changes Apr 30, 2019

View reviewed changes

add checks before converting xml to Blockly to ensure that the tested…

0ab3e42

… changes are actually being applied by the specific method we're testing

Hamms merged commit c9e2606 into staging May 4, 2019

Hamms deleted the rehype-react branch May 4, 2019 00:44

This was referenced May 6, 2019

Revert "Render Sanitized (!!!) Markdown via an AST" #28374

Closed

Use dangerouslySetInnerHTML for SmallFooter language selector #28375

Merged

Allow inline styles in markdown #28378

Merged

Fix createTreeWalker for IE #28387

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Render Sanitized (!!!) Markdown via an AST #28155

Render Sanitized (!!!) Markdown via an AST #28155

Hamms commented Apr 23, 2019 •

edited

joshlory left a comment

joshlory Apr 24, 2019

islemaster Apr 24, 2019

islemaster left a comment

Hamms commented Apr 24, 2019

Hamms commented Apr 24, 2019

Hamms commented Apr 26, 2019

Hamms commented Apr 26, 2019

Hamms commented Apr 30, 2019

islemaster commented Apr 30, 2019

islemaster Apr 30, 2019

islemaster Apr 30, 2019

Hamms May 3, 2019

islemaster left a comment

joshlory commented May 16, 2019

Render Sanitized (!!!) Markdown via an AST #28155

Render Sanitized (!!!) Markdown via an AST #28155

Conversation

Hamms commented Apr 23, 2019 • edited

joshlory left a comment

Choose a reason for hiding this comment

joshlory Apr 24, 2019

Choose a reason for hiding this comment

islemaster Apr 24, 2019

Choose a reason for hiding this comment

islemaster left a comment

Choose a reason for hiding this comment

Hamms commented Apr 24, 2019

Hamms commented Apr 24, 2019

Hamms commented Apr 26, 2019

Hamms commented Apr 26, 2019

Hamms commented Apr 30, 2019

islemaster commented Apr 30, 2019

islemaster Apr 30, 2019

Choose a reason for hiding this comment

islemaster Apr 30, 2019

Choose a reason for hiding this comment

Hamms May 3, 2019

Choose a reason for hiding this comment

islemaster left a comment

Choose a reason for hiding this comment

joshlory commented May 16, 2019

Hamms commented Apr 23, 2019 •

edited