New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classic block: Convert to Blocks removes valid inline formatting #6102

Open
ZebulanStanphill opened this Issue Apr 10, 2018 · 5 comments

Comments

Projects
None yet
5 participants
@ZebulanStanphill
Contributor

ZebulanStanphill commented Apr 10, 2018

Issue Overview

The Paragraph block, as well as other textual blocks, allow <abbr>, <b>, <code>, <i>, <kbd>, <mark>, <span>, <time>, and various other inline formatting and semantic tags to be added using the "Edit as HTML" option, and these are considered valid and are not removed on save.

However, when converting a Classic block to standard blocks using the "Convert to Blocks" option, the paragraphs, lists, and blockquotes are stripped of some inline formatting tags, and a few are converted to tags that have a different semantic meaning.

I think inline formatting that is considered valid by the standard blocks should not be removed when converting a Classic block to standard blocks. Otherwise, you are stripping out formatting from the original post when you do not need to.

Steps to Reproduce (for bugs)

  1. Create a post using the Classic Editor or insert a Classic block in the Gutenberg editor.
  2. Insert the following:
<abbr>abbr</abbr> <b>b</b> <br>br <bdi>bdi</bdi> <bdo dir="rtl">bdo</bdo> <cite>cite</cite> <code>code</code> <data value="value">data</data> <dfn>dfn</dfn> <em>em</em> <i>i</i> <kbd>kbd</kbd> <mark>mark</mark> <q>q</q> <ruby>ruby <rb>rb</rb> <rp>rp</rp> <rt>rt</rt> <rtc>rtc</rtc> <rp>rp</rp></ruby> <s>s</s> <samp>samp</samp> <small>small</small> <span style="color:red">span</span> <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> <time datetime="2018">time</time> <u>u</u> <var>var</var> <wbr>wbr
  1. Save the post.
  2. Open the post in the Gutenberg editor.
  3. Convert the Classic block to standard blocks using the Convert to Blocks option.
  4. Notice how some of the inline formatting has been removed, and some of it has been converted to other elements with different semantic meaning, e.g. <b> tags are converted to <strong> tags.

Expected Behavior

Converting a Classic block to standard blocks using the "Convert to Blocks" option should preserve all inline formatting tags that are considered valid by the resulting blocks.

Current Behavior

Converting a Classic block to standard blocks using the "Convert to Blocks" option removes several valid HTML5 tags, and converts some tags to other tags with different semantic meanings, e.g. <b> tags are converted to <strong> tags. The previously given sample input above is transformed into the following:

<p><abbr>abbr</abbr> <strong>b</strong> <br/>br bdi bdo cite <code>code</code> data dfn <em>em</em> <em>i</em> kbd mark q ruby rb rp rt rtc rp s samp small span <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> time u var wbr</p>

Other Notes

If a Classic block contains obsolete HTML tags like <color>, then converting them to <span> tags upon conversion to standard blocks seems like a good idea.

<b> and <i> tags are converted to <strong> and <em> respectively. However, it should be considered that there are valid cases where a <b> or <i> are used semantically as per the HTML5 specification. Additionally, <span> tags containing font-weight:bold or font-style:italic are converted to <strong> or <em> tags respectively, and this seems like a bad idea. Most of the time, when someone makes text bold or italic using a <span> tag, they are doing it for purely stylistic reasons and not semantic reasons, so converting those to <strong> and <em> tags seems like a bad idea.

Related Issues and/or PRs

@ZebulanStanphill

This comment has been minimized.

Show comment
Hide comment
@ZebulanStanphill

ZebulanStanphill Jul 27, 2018

Contributor

@danielbachhuber I did a full test with all valid HTML5 inline semantic elements and here are the results.

I put the following into a Classic block:

<abbr>abbr</abbr> <b>b</b> <br>br <bdi>bdi</bdi> <bdo dir="rtl">bdo</bdo> <cite>cite</cite> <code>code</code> <data value="value">data</data> <dfn>dfn</dfn> <em>em</em> <i>i</i> <kbd>kbd</kbd> <mark>mark</mark> <q>q</q> <ruby>ruby <rb>rb</rb> <rp>rp</rp> <rt>rt</rt> <rtc>rtc</rtc> <rp>rp</rp></ruby> <s>s</s> <samp>samp</samp> <small>small</small> <span style="color:red">span</span> <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> <time datetime="2018">time</time> <u>u</u> <var>var</var> <wbr>wbr

I then converted it to a Paragraph block. Everything should have been the same except for the added <p> tags. However, some tags were stripped out, and others were converted to different tags:

<p><abbr>abbr</abbr> <strong>b</strong> <br/>br bdi bdo cite <code>code</code> data dfn <em>em</em> <em>i</em> kbd mark q ruby rb rp rt rtc rp s samp small span <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> time u var wbr</p>

And yes, <b>, <i>, <small>, and <u> are all valid HTML tags with semantic meaning.

The current behavior seems to have improved from when I first made the issue, but a lot of tags are still stripped out. Personally, I think this should be resolved before the Try Callout goes out.

Contributor

ZebulanStanphill commented Jul 27, 2018

@danielbachhuber I did a full test with all valid HTML5 inline semantic elements and here are the results.

I put the following into a Classic block:

<abbr>abbr</abbr> <b>b</b> <br>br <bdi>bdi</bdi> <bdo dir="rtl">bdo</bdo> <cite>cite</cite> <code>code</code> <data value="value">data</data> <dfn>dfn</dfn> <em>em</em> <i>i</i> <kbd>kbd</kbd> <mark>mark</mark> <q>q</q> <ruby>ruby <rb>rb</rb> <rp>rp</rp> <rt>rt</rt> <rtc>rtc</rtc> <rp>rp</rp></ruby> <s>s</s> <samp>samp</samp> <small>small</small> <span style="color:red">span</span> <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> <time datetime="2018">time</time> <u>u</u> <var>var</var> <wbr>wbr

I then converted it to a Paragraph block. Everything should have been the same except for the added <p> tags. However, some tags were stripped out, and others were converted to different tags:

<p><abbr>abbr</abbr> <strong>b</strong> <br/>br bdi bdo cite <code>code</code> data dfn <em>em</em> <em>i</em> kbd mark q ruby rb rp rt rtc rp s samp small span <strong>strong</strong> <sub>sub</sub> <sup>sup</sup> time u var wbr</p>

And yes, <b>, <i>, <small>, and <u> are all valid HTML tags with semantic meaning.

The current behavior seems to have improved from when I first made the issue, but a lot of tags are still stripped out. Personally, I think this should be resolved before the Try Callout goes out.

@danielbachhuber

This comment has been minimized.

Show comment
Hide comment
@danielbachhuber

danielbachhuber Jul 27, 2018

Member

#6878 and #7604 are both related.

The first is relevant because there's still some amount of the HTML5 spec we need to accommodate for: #6878 (comment)

The second because, in the scenario where we're stripping out HTML, we need to improve the user experience.

I don't think this needs to be resolved prior to the Try Callout though, for two reasons:

  1. The user can undo the conversion process.
  2. This is a much larger architectural conversation that there's not necessarily a "fix" for.
Member

danielbachhuber commented Jul 27, 2018

#6878 and #7604 are both related.

The first is relevant because there's still some amount of the HTML5 spec we need to accommodate for: #6878 (comment)

The second because, in the scenario where we're stripping out HTML, we need to improve the user experience.

I don't think this needs to be resolved prior to the Try Callout though, for two reasons:

  1. The user can undo the conversion process.
  2. This is a much larger architectural conversation that there's not necessarily a "fix" for.
@ZebulanStanphill

This comment has been minimized.

Show comment
Hide comment
@ZebulanStanphill

ZebulanStanphill Jul 27, 2018

Contributor

@danielbachhuber Okay, fair enough. 👍

Contributor

ZebulanStanphill commented Jul 27, 2018

@danielbachhuber Okay, fair enough. 👍

@ZebulanStanphill ZebulanStanphill changed the title from Converting Classic block to standard blocks removes valid inline formatting to Classic block: Convert to Blocks removes valid inline formatting Aug 4, 2018

@fumikito

This comment has been minimized.

Show comment
Hide comment
@fumikito

fumikito Sep 18, 2018

@ZebulanStanphill @danielbachhuber

F.Y.I.
I tried some fix phrasingContentReducer for ruby tag.

const phrasingContentSchema = {
	strong: {},
	em: {},
	del: {},
	ins: {},
	a: { attributes: [ 'href', 'target', 'rel' ] },
	code: {},
	abbr: { attributes: [ 'title' ] },
	sub: {},
	sup: {},
	br: {},
	ruby: {
		children: {
			rt: {
				children: {
					'#text': {}
				}
			},
			'#text': {}
		}
	},
	'#text': {},
};

Now it works and classic block conversion keeps expected HTML.

2018-09-18 15 56 47

Fundamentally, ruby tag is not supported by current WordPress(filtered by TinyMCE) and I need a plugin to enter ruby. ruby is rarely used but in Japanese.

In my opinion, JS filter hook for phrasingContentReducer will help plugins for i18n or semantic freaks like <q cite="some source">Some quotes</q>, <ins datetime="date-of-insertion">Inserted text</ins>, and so on.

fumikito commented Sep 18, 2018

@ZebulanStanphill @danielbachhuber

F.Y.I.
I tried some fix phrasingContentReducer for ruby tag.

const phrasingContentSchema = {
	strong: {},
	em: {},
	del: {},
	ins: {},
	a: { attributes: [ 'href', 'target', 'rel' ] },
	code: {},
	abbr: { attributes: [ 'title' ] },
	sub: {},
	sup: {},
	br: {},
	ruby: {
		children: {
			rt: {
				children: {
					'#text': {}
				}
			},
			'#text': {}
		}
	},
	'#text': {},
};

Now it works and classic block conversion keeps expected HTML.

2018-09-18 15 56 47

Fundamentally, ruby tag is not supported by current WordPress(filtered by TinyMCE) and I need a plugin to enter ruby. ruby is rarely used but in Japanese.

In my opinion, JS filter hook for phrasingContentReducer will help plugins for i18n or semantic freaks like <q cite="some source">Some quotes</q>, <ins datetime="date-of-insertion">Inserted text</ins>, and so on.

@michakrapp

This comment has been minimized.

Show comment
Hide comment
@michakrapp

michakrapp Oct 8, 2018

I got a similar behaviour with the <address> element.
When converting to blocks link markup inside of it get completely strip off.

Unfortunately in this case it gets attached to a youtube link before them and are not "visible" to the user after the converting.

Test:

  • new post with classic block
  • switch to html mode of classic block
  • add html text shown below
  • switch to visual mode
  • select "convert to blocks"

HTML before: (links exist)

<p>https://www.youtube.com/watch?v=2DkaLmUHYOw&amp;rel=0</p>
<address><a href="http://www.wope.net" target="_blank" mce_href="http://www.wope.net">www.wope.net</a></address>
<address style="text-align: justify;" mce_style="text-align: justify;"><a href="http://www.mediaresourcegroup.de" target="_blank" mce_href="http://www.mediaresourcegroup.de">www.mediaresourcegroup.de</a><p></p>
<p>Foto/Text/Video: Markus Wilmsmann</p>
</address>

HTML after: (links are added to youtube link)

<!-- wp:core-embed/youtube {"url":"https://www.youtube.com/watch?v=2DkaLmUHYOw\u0026rel=0www.wope.netwww.mediaresourcegroup.de"} -->
<figure class="wp-block-embed-youtube wp-block-embed"><div class="wp-block-embed__wrapper">
https://www.youtube.com/watch?v=2DkaLmUHYOw&amp;rel=0www.wope.netwww.mediaresourcegroup.de
</div></figure>
<!-- /wp:core-embed/youtube -->

<!-- wp:paragraph -->
<p>Foto/Text/Video: Markus Wilmsmann</p>
<!-- /wp:paragraph -->

michakrapp commented Oct 8, 2018

I got a similar behaviour with the <address> element.
When converting to blocks link markup inside of it get completely strip off.

Unfortunately in this case it gets attached to a youtube link before them and are not "visible" to the user after the converting.

Test:

  • new post with classic block
  • switch to html mode of classic block
  • add html text shown below
  • switch to visual mode
  • select "convert to blocks"

HTML before: (links exist)

<p>https://www.youtube.com/watch?v=2DkaLmUHYOw&amp;rel=0</p>
<address><a href="http://www.wope.net" target="_blank" mce_href="http://www.wope.net">www.wope.net</a></address>
<address style="text-align: justify;" mce_style="text-align: justify;"><a href="http://www.mediaresourcegroup.de" target="_blank" mce_href="http://www.mediaresourcegroup.de">www.mediaresourcegroup.de</a><p></p>
<p>Foto/Text/Video: Markus Wilmsmann</p>
</address>

HTML after: (links are added to youtube link)

<!-- wp:core-embed/youtube {"url":"https://www.youtube.com/watch?v=2DkaLmUHYOw\u0026rel=0www.wope.netwww.mediaresourcegroup.de"} -->
<figure class="wp-block-embed-youtube wp-block-embed"><div class="wp-block-embed__wrapper">
https://www.youtube.com/watch?v=2DkaLmUHYOw&amp;rel=0www.wope.netwww.mediaresourcegroup.de
</div></figure>
<!-- /wp:core-embed/youtube -->

<!-- wp:paragraph -->
<p>Foto/Text/Video: Markus Wilmsmann</p>
<!-- /wp:paragraph -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment