Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove (non-special) comment nodes when pasting content #15557

Merged
merged 4 commits into from May 14, 2019

Conversation

@tfrommen
Copy link
Member

commented May 10, 2019

As suggested by @ellatrix, this is a possible replacement for #15372.


Description

When pasting something from a Google Doc into a RichText-based block, I ended up having a stray (leading) line break.

Having chased through quite a few files and functions, I finally found that the culprit is the (internal) cleanNodeList function in @wordpress/blocks (src/api/raw-handling/utils.js).
This function inserts a line break ( i.e., ultimately, a <br> tag) after non-phrasing-content elements.

The problem with Google Docs, as with potential other sources, is that there are HTML comments, and they, too, would trigger insertion of line breaks.

Steps to Reproduce

  1. Create a Google Doc and input some content, for example, this:.
Some

Content

Here
  1. Select "Content", copy and paste into a RichText component.
  2. The console will show something like that:
Received HTML:

 <html><body>
<!--StartFragment--><meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-27e8f390-7fff-ad20-17df-378c134770be">Content</span><!--EndFragment-->
</body>
</html>
Received plain text:

 Content
Processed inline HTML:

 
<br>Content

The actual problem is the <!--StartFragment--> comment node.

How has this been tested?

Copy-and-paste content from a Google Doc into a RichText component. No leading line break. 馃槈

Screenshots

gdoc-gutenberg

Types of changes

Remove comment nodes so that insertion of line breaks after comment nodes is now being prevented.

Checklist:

  • My code is tested.
  • My code follows the WordPress code style.
  • My code follows the accessibility standards.
  • My code has proper inline documentation.
  • I've included developer documentation if appropriate.

tfrommen added some commits May 10, 2019

@youknowriad

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

Hey @tfrommen Thanks for the PRs, I noticed you have some meaningful contributions to Gutenberg. Let me know if you want to be added as a collaborator to the project. That way you could avoid working on forks.

@tfrommen

This comment has been minimized.

Copy link
Member Author

commented May 10, 2019

I will follow-up with unit tests once this general approach has been OK'd. 馃檪

@youknowriad youknowriad requested a review from WordPress/gutenberg-core May 13, 2019

@aduth

This comment has been minimized.

Copy link
Member

commented May 13, 2019

Is there a related issue for this?

When pasting something from a Google Doc into a RichText-based block, I ended up having a stray (leading) line break.

I'm curious if this is something which has changed on Google Docs as far as what markup they produce, or if there are very specific circumstance which result in the comment node being included.

In any case, we should consider either adding new or updating existing fixtures in this directory to account for variations in the markup we're able to handle:

https://github.com/WordPress/gutenberg/tree/master/test/integration/fixtures

@tfrommen

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

@aduth

Is there a related issue for this?

I didn't find any related issue. As I had a fix for this already, I also did not create an issue myself, but instead created the PR and provided as much information as I had.

I'm curious if this is something which has changed on Google Docs as far as what markup they produce, or if there are very specific circumstance which result in the comment node being included.

I don't know. However, I tested this in multiple browsers, and I always get the same structure:

<html><body>
<!--StartFragment--><meta charset="utf-8">[GDOC CONTENT/MARKUP HERE]<!--EndFragment-->
</body>
</html> 

It doesn't matter if this is a single letter, or line, or even several paragraphs.

@aduth

This comment has been minimized.

Copy link
Member

commented May 13, 2019

Would you be able to share a public Google Docs document, and a specific fragment of text you're selecting to yield this result?

For example, I've been trying with this document:

https://docs.google.com/document/d/19xXX0fr2F0n1JE2DSYJCN8mnoEp5BH65z1iPvpGPrVc/edit

Pasting the first line of the document into this CodePen textarea (to retrieve the clipboard contents as HTML):

https://codepen.io/aduth/pen/VOKJyw

I receive:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-08bd7213-7fff-d35a-a4a3-ae5184167e96"><a href="https://drive.google.com/drive/u/0/folders/1k4bWkN088Hte1mehmPkKZHbois4Zjsar" style="text-decoration:none;"><span style="font-size:11pt;font-family:Arial;color:#1155cc;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:underline;-webkit-text-decoration-skip:none;text-decoration-skip-ink:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">(View All Agendas)</span></a></b>

(Similar results with the Steps to Reproduce from the original comment)

@aduth

This comment has been minimized.

Copy link
Member

commented May 13, 2019

To clarify: I think this is both a reasonable approach, and sensible that we'd omit HTML comments from sourced paste contents. My only concern at this point is being able to track down the circumstances under which the original issue can occur.

@tfrommen

This comment has been minimized.

Copy link
Member Author

commented May 13, 2019

@aduth when I copy the word "Hooks" from your document and paste it, I get this:

<html><body>
<!--StartFragment--><meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-bf3f5a98-7fff-39e1-71d5-0803876c48b6">Hooks</span><!--EndFragment-->
</body>
</html>

As was to be expected.
I get this for both Firefox and Chrome.

However, I now also tested with Microsoft Edge, and all I get there is this:

<meta charset="utf-8"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;" id="docs-internal-guid-e99261f1-7fff-ec19-189c-29466fb16b6a">Hooks</span>

So, maybe it is a browser and OS combination thing, I don't know. I'm using Window 10 Pro (64-bit).

@ellatrix
Copy link
Member

left a comment

This looks good. A small unit test would be great, perhaps also updated fixtures.

@ellatrix

This comment has been minimized.

Copy link
Member

commented May 14, 2019

I don't get the HTML comments in Mac and Chrome.

@ellatrix

This comment has been minimized.

Copy link
Member

commented May 14, 2019

I wonder why removeInvalidHTML doesn't remove the comment node. None of the schemas have comments as part of them.

tfrommen added some commits May 14, 2019

@tfrommen

This comment has been minimized.

Copy link
Member Author

commented May 14, 2019

I added both some unit tests and new fixtures for this.

All is green, so, can I go ahead and merge? 馃榿

@@ -1 +1 @@
<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-7102d5c2-7fff-c8d1-1082-5abceee52545"><br /><div dir="ltr" style="margin-left:0pt;"><table style="border:none;border-collapse:collapse;width:451.27559055118115pt"><colgroup><col width="*" /><col width="*" /><col width="*" /></colgroup><tr style="height:3.75pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">One</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Two</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Three</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">1</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">2</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">3</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">I</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">II</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:5pt 5pt 5pt 5pt;"><p dir="ltr" style="line-height:1.2;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:11pt;font-family:Arial;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">III</span></p></td></tr></table></div></b>

This comment has been minimized.

Copy link
@ellatrix

ellatrix May 14, 2019

Member

What's the purpose of updating this? I don't see any comments added.

This comment has been minimized.

Copy link
@tfrommen

tfrommen May 14, 2019

Author Member

I saw that the leading meta tag was not right in both the existing fixtures. One file had it twice (once with single and once with double quotes), the other did not have the meta tag at all.

Comments I just added to the two new files google-docs-(table-)with-comments.

@ellatrix

This comment has been minimized.

Copy link
Member

commented May 14, 2019

It would be good to know why removeInvalidHTML doesn't remove the comments... In the meantime this seems like a good thing to merge.

@youknowriad youknowriad merged commit 13a6b26 into WordPress:master May 14, 2019

1 check passed

Travis CI - Pull Request Build Passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can鈥檛 perform that action at this time.