Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

supports.html = false doesn't escape HTML (regression?) #13218

Closed
Viper007Bond opened this issue Jan 6, 2019 · 25 comments
Closed

supports.html = false doesn't escape HTML (regression?) #13218

Viper007Bond opened this issue Jan 6, 2019 · 25 comments

Comments

@Viper007Bond
Copy link

@Viper007Bond Viper007Bond commented Jan 6, 2019

supports.html = false doesn't seem to work anymore. I swear it used to escape HTML entities properly.

To Reproduce

  1. Add a core/code block.
  2. Type/paste some code containing HTML entities, such as:
<a href="http://wordpress.org/">WordPress</a>  &lt;strong&gt;
  1. Publish post.

Expected behavior

Code content should stay the same. It does initially but if you edit the post again, there's raw < in the source and things like &lt; have been converted to <. The escaping of HTML hasn't been done.

<!-- wp:code -->
<pre class="wp-block-code"><code>&lt;a href="http://wordpress.org/">WordPress&lt;/a>  &lt;strong&gt;</code></pre>
<!-- /wp:code -->

Should be something like this:

<!-- wp:code -->
<pre class="wp-block-code"><code>&lt;a href="http://wordpress.org/"&gt;WordPress&lt;/a&gt;  &amp;lt;strong&amp;gt;</code></pre>
<!-- /wp:code -->

Desktop

  • OS: Windows 10
  • Browser: Firefox Developer Edition
  • Version: 65.0b8

Additional context

  • WordPress Version: 5.0.3-alpha-44409
@youknowriad
Copy link
Contributor

@youknowriad youknowriad commented Jan 7, 2019

AFAIK, This option is not about HTML escaping, it's about enabling/disabling the HTML mode for this block.

@Viper007Bond
Copy link
Author

@Viper007Bond Viper007Bond commented Jan 7, 2019

Oh, well I still believe the behavior has changed. I think all entities used to be escaped when stored in the database.

Even if I'm remembering wrong, the core/code block isn't behaving as expected. Content should stay as entered.

@youknowriad
Copy link
Contributor

@youknowriad youknowriad commented Jan 7, 2019

Yes, probably. I believe I saw similar issues about HTML escaping.

@designsimply
Copy link
Member

@designsimply designsimply commented Jan 7, 2019

Closed #11624 in favor of this issue because it is more concise. Please see the screenshots at #11624 (comment).

@finip
Copy link

@finip finip commented Jan 7, 2019

@designsimply okay, I don't think I'm going to be taken seriously anymore. thank you for remembering me:)
i'll continue to work on this issue, and I'll add an article about it later, because i'm trying to solve these problems these days, maybe i thought you had given up my problem. but now there is a new hope, let me feel good about this problem again, and everything will start over again! uuuuuuuuuuu

@aduth
Copy link
Member

@aduth aduth commented Jan 7, 2019

The issue here may be the fact that the Code Block defines its content attribute as sourced by text, not html, which is the difference of textContent vs. innerHTML.

var el = document.createElement( 'div' );
el.innerHTML = '<code>chicken &gt; ribs</code>';

el.firstChild.textContent;
// "chicken > ribs"

el.firstChild.innerHTML;
// "chicken &gt; ribs"
@finip
Copy link

@finip finip commented Jan 7, 2019

@designsimply okay, I don't think I'm going to be taken seriously anymore. thank you for remembering me:)
i'll continue to work on this issue, and I'll add an article about it later, because i'm trying to solve these problems these days, maybe i thought you had given up my problem. but now there is a new hope, let me feel good about this problem again, and everything will start over again! uuuuuuuuuuu

https://webkv.com/2019/01/02/wordpress-code-highlight-plug-in-crayon-syntax-highlighter-error-repair-in-php7-3-0.shtml

i can determine which script this problem occurs because by comparing classic editor with gutenbeerg's html switch, classic editor is correct at present, only gutenberg is incorrect, but I can't modify the code directly because I am not competent and familiar with the new block editor.

@Viper007Bond
Copy link
Author

@Viper007Bond Viper007Bond commented Jan 7, 2019

The issue here may be the fact that the Code Block defines its content attribute as sourced by text, not html, which is the difference of textContent vs. innerHTML.

I don't believe so as it's being saved unescaped when it should be being escaped. Right now it's a weird mix of escaped and unescaped due to stuff like kses.

@finip
Copy link

@finip finip commented Jan 7, 2019

i went through two kinds of editors: gutenberg and classic editor to compare articles published. obviously this only happens in the background editor. the front-end display is consistent. that is to say, it just happens between html/content switching, but this will cause problems. i think it is the pair of parentheses on the right that have not been escaped. this is usually caused by a js in the block editor when it is converted. it's missing a filter or an end, but tomorrow i'll have to concentrate on gutenberg code to locate it. i almost gave up on this issue, but when i saw the concern of so many officials and developers, i had new hope:)

005
007

@finip
Copy link

@finip finip commented Jan 7, 2019

at present, i can only locate this file gutenberg-master\test\integration\full-content\fixtures\core__code.serialized.html,
`

export default function MyButton() {
	return <Button>Click Me!</Button>;
}
` in the latest single block editor, but in wordpress version, i can't find any information about it anyway. i can only suspect that this file wp-includes\js\dist\blocks.js is between 899-912 lines.

`showdown.helper.unescapeHTMLEntities = function (txt) {
'use strict';

return txt
.replace(/"/g, '"')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/&/g, '&');
};`
i need help. it's beyond my skill range.
008

@aduth
Copy link
Member

@aduth aduth commented Jan 8, 2019

I don't believe so as it's being saved unescaped when it should be being escaped. Right now it's a weird mix of escaped and unescaped due to stuff like kses.

It may be a combination. I think another issue is that the Code block's save function should be using dangerouslySetInnerHTML so as not to be subject to the default escaping which takes place in the serialization of the React element.

save( { attributes } ) {
return <pre><code>{ attributes.content }</code></pre>;
},

@finip
Copy link

@finip finip commented Jan 12, 2019

i noticed that only < transformed & lt; & transformed & amp; other transformations will not occur, so the problem is &. so which js is in control
'?

<pre-class="wp-block-code"> < &

<! - / wp: code - >'
it, should we not just focus on js, whether there is a template error here, it should be a template, such as .htm .xml and other frameworks.

004

and maybe this here:

function escape( text ) {
return text
.replace( /\t/g, '\\t' )
.replace( /\r/g, '\\r' )
.replace( /\n/g, '\\n' )
.replace( /\&/g, '&amp;' )
.replace( /</g, '&lt;' );
}

export default {
specialTokens: {
textEscaped: [ 'a b', 'i &lt;3 tags', '1&amp;2&amp;3&amp;4' ],
htmlEscaped: [ 'a&nbsp;&nbsp;&nbsp;b', 'i&nbsp;&lt;3&nbsp;tags', '1&amp;2&amp;3&amp;4' ],
textUnescaped: [ 'a b', 'i <3 tags', '1&2&3&4' ],
htmlUnescaped: [ 'a&nbsp;&nbsp;&nbsp;b', 'i&nbsp;&lt;3&nbsp;tags', '1&amp;2&amp;3&amp;4' ],
},
specialSuggestions: {
textEscaped: [ '&lt;3', 'Stuff &amp; Things', 'Tags &amp; Stuff', 'Tags &amp; Stuff 2' ],

export function escapeAmpersand( value ) {
return value.replace( /&(?!([a-z0-9]+|#[0-9]+|#x[a-f0-9]+);)/gi, '&amp;' );
}

@David-Else
Copy link

@David-Else David-Else commented Mar 10, 2019

Hi, I waited to update to Gutenberg until Wordpress 5.1 hoping it would work, now my entire coding blog is broken and the code is all invalid!

I use SyntaxHighlighter Evolved, the author of the plugin says the fix is needed in core, can anyone tell me when this will work again or how I can fix it... bit of a nightmare. THANKS.

@finip
Copy link

@finip finip commented Mar 10, 2019

Hi,David
5.1 is also the problem, because it has not been fixed, if you do not want to wait for this problem, please change to class editor, I have been looking for these reasons, but no progress.

@keithn
Copy link

@keithn keithn commented Mar 10, 2019

for anyone following up on this issue, The author of the issue, Viper007Bond or alex, has died, you can follow his journey of fighting with cancer at https://alex.blog/

@finip
Copy link

@finip finip commented Mar 11, 2019

i was very depressed to hear that. i suddenly did not know the direction. our great plug-in author worked hard until the last moment of his life. i came too late. i was very sad. i should have solved this problem earlier, but i didn't!

@David-Else
Copy link

@David-Else David-Else commented Mar 11, 2019

I am so sorry to hear about Viper007Bond, amazing that he kept on working on the plugin, a true coder.

Please see Automattic/syntaxhighlighter#98 (comment) for a temporary workaround/solution.

@Tech-Nomad
Copy link

@Tech-Nomad Tech-Nomad commented May 19, 2019

Sorry to hear that about Alex 😔 I am using his "Regenerate Thumbnails" quite often...

Wrapping the code with "code" blocks:

<!-- wp:code -->
<!-- /wp:code -->

...seems to work well now.
But there is still an issue with HTML blocks. Ampersands are still getting replaced by HTML entities. I didn't dive into Gutenberg development yet. Does maybe somebody have a filter function to disable ampersand replacement by Gutenberg in HTML blocks? I like the appearance of HTML blocks in the visual editor much more than the appearance of code blocks. I also get here a warning about invalid code pointing to the ampersands and offering replacement by HTML entities...

@aduth
Copy link
Member

@aduth aduth commented May 21, 2019

The original issue was meant to have been resolved with #13996 (for the code block).

However, in repeating the test case from the original comment (Step 2 of "To Reproduce"), it seems that a block can still become invalidated after reloading the page. I think it's specifically related to cases where the author manually escapes their own entities. Automatic escaping from #13996 seems to work well enough.

cc @davilera

But there is still an issue with HTML blocks. Ampersands are still getting replaced by HTML entities.

This is a separate question to the original issue. I think this escaping will always be expected to avoid dealing with complexities of the ampersand (related resource). In any case, it would be for its own issue if you'd like to create one. The scope of this one is targeting the code block.

As previously mentioned, the supports.html is not documented to make guarantees about escaping, but related to #13927, the Code block in particular should be expected to respect code as written (not subject to other transformations like shortcode, embed processing).

@davilera
Copy link
Contributor

@davilera davilera commented May 22, 2019

I guess the question is: should code blocks show the same characters the user typed (that is, if they wrote &lt;, then &lt; should be shown) or what the user intended to display using a escaped HTML entity (i.e. &lt; should be <).

In personally agree with @aduth. Code blocks should respect code as written, which means that there shouldn't be any transformations like shortcode or embed processing. But I'd go a step further and say that written HTML entities should also be shown as written.

If you want to write HTML entities, then use an HTML block. Otherwise, the chars you type are the chars that should be shown. Does this make sense? What do you think?

@aduth
Copy link
Member

@aduth aduth commented May 22, 2019

But I'd go a step further and say that written HTML entities should also be shown as written.

Yes, I agree with this. The problem is: Currently a block becomes marked as invalid the next time you edit (repeating the steps from the original comment).

@davilera
Copy link
Contributor

@davilera davilera commented May 22, 2019

I see. I think the problem comes down to how the block is parsed for re-editing: it's probably using innerContent, which pulls the properly escaped values and unescapes them. I'll further investigate the issue and see if there's anything I can do about it.

@davilera
Copy link
Contributor

@davilera davilera commented May 22, 2019

@aduth, I just created a PR for this issue. Unfortunately, it introduces a breaking change (a test does not pass, now). And I didn't have time to include any new tests... I will when I have the time, but it'd be great if someone could please review it.

@finip
Copy link

@finip finip commented May 24, 2019

i don't have much ability to test this problem after every new version of wp updates, but it still exists. my intuition tells me that the escape is hidden in a deep js or symbol. the only way to solve it is to compare the escape methods of gutenberg and classic editor. but i can't do this. track to the specific escape function, because i have tried many times to find, but i failed.

@basememara
Copy link

@basememara basememara commented Oct 5, 2019

This is still broken :(

@aduth aduth mentioned this issue Nov 6, 2019
5 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

10 participants