Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render Sanitized (!!!) Markdown via an AST #28155

Merged
merged 14 commits into from May 4, 2019
Merged

Render Sanitized (!!!) Markdown via an AST #28155

merged 14 commits into from May 4, 2019

Conversation

Hamms
Copy link
Contributor

@Hamms Hamms commented Apr 23, 2019

Specifically, instead of rendering markdown to a string with remark, then injecting that string with dangerouslySetInnerHTML, we now:

  1. Parse the markdown to a Markdown Abstract Syntax Tree (MDAST) with remark-parse
  2. Convert the MDAST to an HTML Abstract Syntax Tree (HAST) with remark-rehype
  3. Parse the "raw" HTML nodes in the HAST with rehype-raw
  4. Sanitize the HAST with rehype-sanitize
  5. Compile the HAST directly to React with rehype-react

The advantage of doing all that is that we are now no longer using dangerouslySetInnerHTML to render our markdown, and in fact because we are deeply parsing the generated content we can be confident that we are now protected from XSS injections!

Next steps (in a separate PR): rename the UnsafeRenderedMarkdown component to reflect the fact that it is now safe

@Hamms Hamms changed the title Rehype react Render Sanitized (!!!) Markdown via an AST Apr 23, 2019
@Hamms Hamms marked this pull request as ready for review April 23, 2019 21:52
…d to be added to rendered markdown components
Copy link
Contributor

@joshlory joshlory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Which of these components pulls in the extra parse5 dependency?

'title',
'value',
'xml'
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be cool to be able to hand this off to a dedicated Blockly component in the future, instead of rendering it as a post-React-render pass!

path.resolve(__dirname, 'node_modules', '@code-dot-org', 'snack-sdk')
path.resolve(__dirname, 'node_modules', '@code-dot-org', 'snack-sdk'),
// parse5 ships in ES6: https://github.com/inikulin/parse5/issues/263#issuecomment-410745073
path.resolve(__dirname, 'node_modules', 'parse5')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We encourage users to perform transpilation in their setup if they target ancient platforms that don't support ES6 syntax.

(emphasis mine 😁)

Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing.

@Hamms
Copy link
Contributor Author

Hamms commented Apr 24, 2019

Which of these components pulls in the extra parse5 dependency?

That would be rehype-raw, specifically its dependency hast-util-raw, which uses parse5 for a whole bunch of stuff that I haven't even begun to grok

Note that I can see two potential futures where this is particularly relevant:

  1. A future in which we refactor out the parse5 dependency would involve figuring out exactly what that library is doing, and refactoring it to work with native browser functionality, rather than the node-compatible functionality that parse5 provides. It's not at all clear to me how much of the parse5 stuff that's being used is its HTML parsing ability and how much is its syntax tree traversal and creation functionality, so that might end up being a lot of work.
  2. A future in which we stop supporting raw HTML in markdown entirely would simply involve removing rehype-raw (and possibly rehype-sanitize?) from this process. In that future, we might also be able to replace basically all of this with remark-react

@Hamms
Copy link
Contributor Author

Hamms commented Apr 24, 2019

Oooh, check it out: because we are now rendering our markdown as React, the unit tests are failing because they're getting the error Unknown prop `align` on <img> tag. Align is a deprecated property for images that we are presumably using somewhere in markdown (I assume in instructions), and we're now being notified of our mistake!

I'll open a separate PR with a content fix.

@Hamms
Copy link
Contributor Author

Hamms commented Apr 26, 2019

Unfortunately, it looks like this breaks inline Blockly as it's currently implemented. (As an aside, I'm greatly saddened that none of our CI tests caught that).

Specifically, there are two issues. The first is that because the XML is rendered by react, it includes a bunch of comment nodes:

<xml><!-- react-text: 51 -->
  <!-- /react-text --><block type="when_run"><!-- react-text: 53 -->
    <!-- /react-text --><next><!-- react-text: 55 -->
      <!-- /react-text --><block type="controls_repeat_dropdown"><!-- react-text: 57 -->
        <!-- /react-text --><title name="user-content-TIMES"><!-- react-text: 59 -->3<!-- /react-text --></title><!-- react-text: 60 -->
        <!-- /react-text --><statement name="user-content-DO"><!-- react-text: 62 -->
          <!-- /react-text --><block type="maze_moveForward"><!-- react-text: 64 -->
        <!-- /react-text --></block></statement><!-- react-text: 65 -->
        <!-- /react-text --><next><!-- react-text: 67 -->
          <!-- /react-text --><block type="bee_ifNectarAmount"><!-- react-text: 69 -->
            <!-- /react-text --><title name="user-content-ARG1"><!-- react-text: 71 -->nectarRemaining<!-- /react-text --></title><!-- react-text: 72 -->
            <!-- /react-text --><title name="user-content-OP"><!-- react-text: 74 -->==<!-- /react-text --></title><!-- react-text: 75 -->
            <!-- /react-text --><title name="user-content-ARG2"><!-- react-text: 77 -->1<!-- /react-text --></title><!-- react-text: 78 -->
          <!-- /react-text --></block><!-- react-text: 79 -->
        <!-- /react-text --></next><!-- react-text: 80 -->
      <!-- /react-text --></block><!-- react-text: 81 -->
    <!-- /react-text --></next><!-- react-text: 82 -->
  <!-- /react-text --></block><!-- react-text: 83 -->
<!-- /react-text --></xml>

Blockly doesn't know how to deal with comment nodes, and breaks when trying to initialize the blockspace. This isn't awful, though; we can pretty easily use something like TreeWalker to filter out comments from any arbitrary dom structure before passing it off to Blockly.

Second and more annoying: for some reason, the renderer is also actually mutating the attributes on the XML nodes. <title name="user-content-TIMES">3</title>, for example, is supposed to be just <title name="TIMES">3</title>; I'm not sure where the user-content piece is coming from

Specifically, two changes are needed:

First, now that we are generating XML with React we end up with a bunch
of React comment nodes in our XML which break Blockly. Add a step to the
"render XML as Blockly" helper to clean out comment nodes.

Second, hast-util-sanitize by default clobbers 'name' attributes for
some reason. Blockly XML uses that attribute extensively, so let it
through.
@Hamms
Copy link
Contributor Author

Hamms commented Apr 26, 2019

Note that right now, the sanitizer is allowing all our Blockly XML through completely un sanitized, which means we are not quite yet safe from XSS injection.

Of course, we currently aren't sanitizing anything, so this PR still puts us in a strictly better place that we currently are. It just means we'll have to make sure to sanitize our XML before we can confidently rename the UnsafeRenderedMarkdown component

@Hamms
Copy link
Contributor Author

Hamms commented Apr 30, 2019

@islemaster @joshlory could y'all take another look at this, with the Blockly rendering fixes?

@islemaster
Copy link
Contributor

👍 On my list, I'll take a look today.

'<xml><!-- react-text: 1 --><block type="variables_get"><!-- react-text: 2 --><title name="VAR"><!-- react-text: 3 -->i</title></block></xml>';
container.innerHTML = content;
convertXmlToBlockly(container);
expect(container.innerHTML).to.not.equal(content);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an expect(container.innerHTML).to.equal(content) check before the convertXmlToBlockly call, just to prove that setting and getting innerHTML doesn't itself cause some transform (like stripping comment nodes)?

container.innerHTML = content;
convertXmlToBlockly(container);
expect(container.innerHTML).to.not.equal(content);
expect(container.getElementsByTagName('svg').length).to.equal(2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially dumb question: Why are there two svg elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blockly creates a <svg id="blocklyFilters"> element to store blurs and whatnot, and then a separate svg element to render the blocks themselves

Copy link
Contributor

@islemaster islemaster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Updates look good.

… changes are actually being applied by the specific method we're testing
@joshlory
Copy link
Contributor

Now that this is merged, should we rename UnsafeRenderedMarkdown -> Markdown?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants