New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix block validation: wrap in exception handler for malformed HTML #8304

Merged
merged 7 commits into from Aug 21, 2018

Conversation

Projects
None yet
2 participants
@johngodley
Contributor

johngodley commented Jul 30, 2018

If you paste malformed HTML into a block the HTML tokenizer can break.

The underlying bug is in the simple-html-tokenizer package. Ideally it will be fixed there, but in the meantime (and also to protect against other unknown errors) this PR wraps the tokenizer in an exception handler.

The fixed behaviour is that the block will show an invalid block warning message, allowing it to be cleaned up in a controlled way.

As far as I can tell this is the only place in Gutenberg where simple-html-tokenizer is used and so I haven't tried to generally wrap the library.

How has this been tested?

An additional unit test has been added which tests the breakage. You can manually verify the problem by:

  • Open developer console
  • Add a quote block
  • Edit HTML of the quote block and replace the content with <blockquote class="wp-block-quote">fsdfsdfsd<p>fdsfsdfsdd</pfd fd fd></blockquote>
  • Look in developer console for this error:
    error
  • Verify that after applying this PR the error is no longer shown and an invalid block warning is shown

Types of changes

Non breaking bug fix

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
@johngodley

This comment has been minimized.

Show comment
Hide comment
@johngodley

johngodley Aug 11, 2018

Contributor

@aduth, I think you worked on this bit of code so dragging you in for thoughts again!

Contributor

johngodley commented Aug 11, 2018

@aduth, I think you worked on this bit of code so dragging you in for thoughts again!

@aduth

Nice find. In 883687d I pushed a failing test case. When actual and expected are both malformed, the function will wrongly return true since there's nothing to iterate from the empty return array of getHTMLTokens.

johngodley and others added some commits Jul 30, 2018

Block validation: wrap in exception handler for malformed HTML
If you paste some malformed HTML into a block the HTML tokenizer can break. Wrapping the tokenizer in an exception handler means we can control the error
Validation: JSDoc cleanup
Sentences ending in periods. Empty newline between description and parameters. Precise return type.
Block validation: check for double invalid HTML
Add a check if both strings are invalid HTML
Block validation: add test for empty string
Empty strings are considered equivalent
@johngodley

This comment has been minimized.

Show comment
Hide comment
@johngodley

johngodley Aug 12, 2018

Contributor

Good catch. Changed the logic so it will fail if one or both strings are invalid.

Contributor

johngodley commented Aug 12, 2018

Good catch. Changed the logic so it will fail if one or both strings are invalid.

@aduth

aduth approved these changes Aug 13, 2018

Show outdated Hide outdated packages/blocks/src/api/validation.js
@@ -390,7 +409,12 @@ export function getNextNonWhitespaceToken( tokens ) {
*/
export function isEquivalentHTML( actual, expected ) {
// Tokenize input content and reserialized save content
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( tokenize );
const [ actualTokens, expectedTokens ] = [ actual, expected ].map( getHTMLTokens );

This comment has been minimized.

@aduth

aduth Aug 13, 2018

Member

I'm inclined to think this change could be simplified to something like:

try {
	const [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize );
} catch ( error ) {
	return false;
}

Logging the warning about the specific string becomes a bit trickier. We could still have getHTMLTokens which catches the specific failure, logs, then throws up to the catch here.

Or we could keep as-is. My only thought was avoiding the overloaded return value, where false is a bit of a semantically ambiguous value.

@aduth

aduth Aug 13, 2018

Member

I'm inclined to think this change could be simplified to something like:

try {
	const [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize );
} catch ( error ) {
	return false;
}

Logging the warning about the specific string becomes a bit trickier. We could still have getHTMLTokens which catches the specific failure, logs, then throws up to the catch here.

Or we could keep as-is. My only thought was avoiding the overloaded return value, where false is a bit of a semantically ambiguous value.

This comment has been minimized.

@johngodley

johngodley Aug 16, 2018

Contributor

AFAIK the const's are scoped to the try block so I don't think is possible unless they were converted to let and defined outside, and it seemed better to keep the constness.

I've cleaned it up a bit already, and happy to make a further change, but otherwise will go as-is.

@johngodley

johngodley Aug 16, 2018

Contributor

AFAIK the const's are scoped to the try block so I don't think is possible unless they were converted to let and defined outside, and it seemed better to keep the constness.

I've cleaned it up a bit already, and happy to make a further change, but otherwise will go as-is.

This comment has been minimized.

@aduth

aduth Aug 16, 2018

Member

Good call. I guess the corrected form would look something closer to:

let actualTokens, expectedTokens;
try {
	( [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize ) );
} catch ( error ) {
	return false;
}

Which starts to be a bit harder to read 😬 Good as it is.

@aduth

aduth Aug 16, 2018

Member

Good call. I guess the corrected form would look something closer to:

let actualTokens, expectedTokens;
try {
	( [
		actualTokens,
		expectedTokens,
	] = [ actual, expected ].map( tokenize ) );
} catch ( error ) {
	return false;
}

Which starts to be a bit harder to read 😬 Good as it is.

Use null instead of false for error value
A little less ambiguous
@aduth

aduth approved these changes Aug 16, 2018

Looks good. I think we should wait until after the 3.6 release to merge this one. Also could use an update to the return type.

Show outdated Hide outdated packages/blocks/src/api/validation.js

@aduth aduth added this to the 3.7 milestone Aug 16, 2018

@aduth

This comment has been minimized.

Show comment
Hide comment
@aduth

aduth Aug 20, 2018

Member

In some final testing, I happened to stumble upon an interesting issue.

  1. Mangle a block so that it becomes invalid HTML which would throw at this step
  2. Observe (as intended with these changes) that the editor doesn't break, it presents the "modified externally prompt"
  3. Press "Keep as HTML"
  4. See your new HTML block with the markup preserved
  5. Press Save
  6. Reload the editor
  7. Observe: The same "modified externally" warning

I'd have expected that since I already chose to "Keep as HTML", I would no longer be presented with this prompt. I assume what's happening is that the malformed markup is triggering the invalidation on the HTML block.

I'm not entirely sure how we accommodate this:

  • Present a different initial message, not reusing "modified externally" but being more clear about the HTML being invalid?
    • If so, what options do we present?
  • Ignore validation on specific block types?
    • It doesn't really seem sensible that we care what type of markup the user introduces into their HTML block.

Thoughts?

I'm also open to merging this one as-is, since the current state is an improvement over the previous error.

Member

aduth commented Aug 20, 2018

In some final testing, I happened to stumble upon an interesting issue.

  1. Mangle a block so that it becomes invalid HTML which would throw at this step
  2. Observe (as intended with these changes) that the editor doesn't break, it presents the "modified externally prompt"
  3. Press "Keep as HTML"
  4. See your new HTML block with the markup preserved
  5. Press Save
  6. Reload the editor
  7. Observe: The same "modified externally" warning

I'd have expected that since I already chose to "Keep as HTML", I would no longer be presented with this prompt. I assume what's happening is that the malformed markup is triggering the invalidation on the HTML block.

I'm not entirely sure how we accommodate this:

  • Present a different initial message, not reusing "modified externally" but being more clear about the HTML being invalid?
    • If so, what options do we present?
  • Ignore validation on specific block types?
    • It doesn't really seem sensible that we care what type of markup the user introduces into their HTML block.

Thoughts?

I'm also open to merging this one as-is, since the current state is an improvement over the previous error.

@johngodley

This comment has been minimized.

Show comment
Hide comment
@johngodley

johngodley Aug 21, 2018

Contributor

Ah yes, interesting!

I tried it with the previous code and the editor just white-screens. I'll merge this as an improvement over the current situation, and file a ticket for the invalid-on-load separately.

A different error message sounds a good idea regardless. I also like the idea of not validating certain blocks - if the user wants the invalid HTML then that's fine. I've been looking at tweaking the validation for other blocks, so this would be a good use case.

Contributor

johngodley commented Aug 21, 2018

Ah yes, interesting!

I tried it with the previous code and the editor just white-screens. I'll merge this as an improvement over the current situation, and file a ticket for the invalid-on-load separately.

A different error message sounds a good idea regardless. I also like the idea of not validating certain blocks - if the user wants the invalid HTML then that's fine. I've been looking at tweaking the validation for other blocks, so this would be a good use case.

@johngodley johngodley merged commit 4f8ffab into master Aug 21, 2018

2 checks passed

codecov/project 50.86% (+0.56%) compared to dab3fc1
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@johngodley johngodley deleted the fix/invalid-html-crash branch Aug 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment