Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gutenberg for non-admins and HTML entities #50050

Open
TrevorMills opened this issue Apr 24, 2023 · 1 comment
Open

Gutenberg for non-admins and HTML entities #50050

TrevorMills opened this issue Apr 24, 2023 · 1 comment
Labels
Needs Technical Feedback Needs testing from a developer perspective. Needs Testing Needs further testing to be confirmed. [Type] Bug An existing feature does not function as intended

Comments

@TrevorMills
Copy link

TrevorMills commented Apr 24, 2023

Description

As a non-administrator ( someone without unfiltered_html capability ), when I save a prop to a React based component, some characters are getting converted to their encoded versions - & becomes &, > becomes >.

I've verified this on a vanilla WordPress installation - no other plugins and the default twentytwentythree theme.

Consider the following simple component, which eventually becomes a block:

import React from 'react';

export interface IMyComponentProps {
    title: string;
}

/**
 * MyComponent example
 * @block
 */
const MyComponent: React.FC<IMyComponentProps> = ({ title }) => {
    return <div className="my-component">
        <h2>{title}</h2>
    </div>;
};

export default MyComponent;

I work in a system that allows such a component to be turned into block automatically, and the block looks like this:

image

If a user without unfiltered_html capability saves such a post, what gets written to the DB is the following:

<!-- wp:calian-com-components/calian-my-component {"title":"This \u0026amp; That"} -->
<div class="my-component"><h2>This &amp; That</h2></div>
<!-- /wp:calian-com-components/calian-my-component -->

In the attributes for the block, the ampersand ends up getting encoded as \u0026amp;. That's a raw value that gets passed along as attributes, so when the page refreshes in the back end, that ends up showing up as &amp;. The immediate fallout from that is that the "Attempt Block Recovery" shows up because of the discrepancy between the &amp; that's written to the database and the &amp;amp; that the block now wants to render:

image

And once you recover the block, that &amp; shows up in the prop for the block:

image

I consider this a bug because it's changing the prop that gets passed to the React component from a & to &amp;, causing the Attempt Block Recovery, and looking ugly for the non-admin users. React components that accept props and render out markup shouldn't have to worry about HTML decoding the props it receives.

Further, I wonder about the reasoning behind writing HTML entities to the database at all, given that recommended practice is to put all text from the DB through escaping filters before displaying them. This happens for Advanced Custom Fields when I try to save & in text fields. I'm sure that's a can of worms, and I am not asking for any change there.

For myself, I addressed this in my system with this:

<?php

/**
 * Class MyEntityFix
 *
 * This class exists to help out with the &amp; issue
 *
 */
class MyEntityFix {

	public function __construct() {
		if ( function_exists( 'current_user_can' ) && ! current_user_can( 'unfiltered_html' ) ) {
			add_filter( 'pre_kses', array( $this, 'pre_kses_block_attributes' ), 100, 3 );
		}
	}

	/**
	 * Helper function to decode html entities to address the pesky &amp; issue.
	 *
	 * @param mixed $props
	 *
	 * @return mixed
	 */
	public static function html_entity_decode_deep( $props ) {
		if ( is_array( $props ) ) {
			$props = array_map( array( __CLASS__, 'html_entity_decode_deep' ), $props );
		} elseif ( is_object( $props ) ) {
			$props = (object) array_map( array( __CLASS__, 'html_entity_decode_deep' ), (array) $props );
		} elseif ( is_string( $props ) ) {
			// The ENT_NOQUOTES flag is so that we leave &quot; alone, as it might be used to
			// encode quotes in JSON string or attribute.
			$props = html_entity_decode( $props, ENT_NOQUOTES | ENT_SUBSTITUTE | ENT_HTML401 );
		}
		return $props;
	}

	/**
	 * In tracking down the &amp; issue within WordPress, this is where I landed.  This filter runs after all of the
	 * pre_kses filters.  We're keeping those intact because we want to keep the job of stripping out invalid HTML out of
	 * attributes.  Primary case is we don't want editors able to put <script> tags into a dangerouslyAllowHtml block, though
	 * things like <a> and <strong> are still fine.
	 *
	 * The issue really just lives in the block attributes.  By the time we get to here, wp_kses has converted a single ampersand
	 * into \u0026amp;, which decodes to &amp;, which will display exactly like that when passed into a prop.  A couple of other
	 * characters ( namely < and > ) are similarly encoded.  This traces back to the serialize_block_attributes() function, which
	 * special-cases these characters.
	 *
	 * So, how this filter works is we pass block attributes through static::html_entity_decode_deep() before passing then through
	 * serialize_block_attributes() again to encode how WP wants them.  This will convert \u0026amp; to simply \u0026, which is what
	 * we want.
	 *
	 * @param string         $content           Content to filter through KSES.
	 * @param array[]|string $allowed_html      An array of allowed HTML elements and attributes,
	 *                                          or a context name such as 'post'. See wp_kses_allowed_html()
	 *                                          for the list of accepted context names.
	 * @param string[]       $allowed_protocols Array of allowed URL protocols.
	 *
	 * @return string
	 */
	public function pre_kses_block_attributes( $content, $allowed_html, $allowed_protocols ) {
		if ( 'post' === $allowed_html ) {
			$content = preg_replace_callback( '/(<!-- wp:[^ ]+ )(.+?)( -->)/', function( $matches ) {
				$attributes = MyEntityFix::html_entity_decode_deep( json_decode( $matches[2], true ) );
				$encoded = serialize_block_attributes( $attributes );
				return "{$matches[1]}{$encoded}{$matches[3]}";
			}, $content );
		}
		return $content;
	}

}

Step-by-step reproduction instructions

  1. Create a simple block that has a text field property controlling one of its attributes
  2. As a non-admin ( someone without unfiltered_html capability ), enter "This & That" into that text field and save the page.
  3. Reload that page, and observe that attribute has now become This &amp; That

Screenshots, screen recording, code snippet

No response

Environment info

  • WordPress 6.2

Please confirm that you have searched existing issues in the repo.

Yes

Please confirm that you have tested with all plugins deactivated except Gutenberg.

Yes

@Mamaduka Mamaduka added the Needs Technical Feedback Needs testing from a developer perspective. label Apr 25, 2023
@jordesign jordesign added [Type] Bug An existing feature does not function as intended Needs Testing Needs further testing to be confirmed. labels Jul 20, 2023
@ktmn
Copy link

ktmn commented Jan 26, 2024

I was making a new issue but then found this, so I'm dumping some info that may be relevant here:

Possible culprit functions:

$input = '&';
var_dump([
	'input' => $input,
	'normalized' => wp_kses_normalize_entities($input),
	'serialized' => serialize_block_attributes($input),
	'mangled' => serialize_block_attributes(wp_kses_normalize_entities($input)),
]);

Outputs:

array(4) {
  ["input"]=> string(1) "&"
  ["normalized"]=>  string(5) "&amp;"
  ["serialized"]=> string(8) ""\u0026""
  ["mangled"]=> string(12) ""\u0026amp;""
}

Here's my reproduction instructions:

  1. As an admin user (current_user_can('unfiltered_html')) create a post with a block whose attributes contain the character &. A core/paragraph block with the Block name Hello & world does the trick:
<!-- wp:paragraph {"metadata":{"name":"Hello \u0026 world"}} -->
<p>Test</p>
<!-- /wp:paragraph -->

The & character is \u0026 in the code editor.

  1. As an editor user (!current_user_can('unfiltered_html')) open the post, change the paragraph text "Test" to something else, save.
  2. Open the post for the 3rd time with any user, look in the code editor:
<!-- wp:paragraph {"metadata":{"name":"Hello \u0026amp; world"}} -->
<p>Test 2</p>
<!-- /wp:paragraph -->

\u0026 is now \u0026amp;, the Block name input now reads Hello &amp; world instead of Hello & world.

Filter to modify unfiltered_html cap:

add_filter('user_has_cap', static function($allcaps, $caps, $args, $user) {
	if(in_array('editor', $user->roles, true)) {
		$allcaps['unfiltered_html'] = false;
	}
	return $allcaps;
}, 10, 4);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Technical Feedback Needs testing from a developer perspective. Needs Testing Needs further testing to be confirmed. [Type] Bug An existing feature does not function as intended
Projects
None yet
Development

No branches or pull requests

4 participants