Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 889 - Unicode characters not escaped #890

Merged
merged 8 commits into from
Jul 21, 2022

Conversation

amalajith
Copy link
Contributor

Description of the Change

Add slashes to the post content to fixe the issue with Unicode characters in block attributes not escaped and getting stripped out during syndication

Closes #889

Alternate Designs

Possible Drawbacks

Verification Process

After adding the wp_slash(), the syndicated content has properly escaped slashes in the Unicode characters and the content is now properly rendered.

image

Checklist:

  • I have read the CONTRIBUTING document.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my change.
  • All new and existing tests passed.

Changelog Entry

Credits

Props @

Fixes the issue with Unicode characters in block attributes not escaped and getting stripped out during syndication
@jeffpaul jeffpaul requested review from a team, cadic and dkotter and removed request for a team June 8, 2022 11:34
@jeffpaul jeffpaul added this to the 1.7.0 milestone Jun 8, 2022
@dkotter
Copy link
Collaborator

dkotter commented Jun 9, 2022

@amalajith Thanks for the PR! I made a few formatting changes but otherwise looks good.

But a couple things that may improve this a bit:

  1. Thinking that instead of slashing the post_content in both the push and pull methods, might be better to handle that in the prepare_post utility. This would avoid duplicate code
  2. The fix here only applies to Internal connections. We should probably fix this for External connections as well. Is that something you can add to this PR? Looks like a few places we would have to add that: https://github.com/10up/distributor/blob/1.6.9/includes/classes/ExternalConnections/WordPressExternalConnection.php#L701, https://github.com/10up/distributor/blob/1.6.9/includes/classes/ExternalConnections/WordPressExternalConnection.php#L1053 and https://github.com/10up/distributor/blob/1.6.9/includes/utils.php#L987
  3. Is there any concern with slashing the post_content? I know that solves issues with things like unicode but just wondering if this could introduce bugs in other scenarios?

Copy link
Contributor

@cadic cadic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, should we also wrap post_content with wp_slash for non-gutenberg content?

$new_post_args = array(
'post_title' => html_entity_decode( get_the_title( $post_id ), ENT_QUOTES, get_bloginfo( 'charset' ) ),
'post_name' => $post->post_name,
'post_content' => Utils\get_processed_content( $post->post_content ),
'post_excerpt' => $post->post_excerpt,
'post_type' => $post->post_type,
'post_author' => isset( $post->post_author ) ? $post->post_author : get_current_user_id(),
'post_status' => 'publish',
);

@dkotter
Copy link
Collaborator

dkotter commented Jun 10, 2022

Just wondering, should we also wrap post_content with wp_slash for non-gutenberg content?

Yeah, I think having it for both is probably the way to go, as long as that doesn't have any unintended consequences. And as mentioned in my comment, if we move this code to the prepare_post utility method, that will take care of both Push and Pull and both block and non-block content. The one thing I just noticed here though is that the Internal Push method runs content through get_processed_content, which External connections use as well (and External connections don't use the prepare_post method). So if we add slashing to the get_processed_content method to solve External connections, we will end up slashing content twice for Internal pushes, so we'll need to handle that.

@peterwilsoncc
Copy link
Collaborator

Just wondering, should we also wrap post_content with wp_slash for non-gutenberg content?

Yeah, I think having it for both is probably the way to go, as long as that doesn't have any unintended consequences.

It shouldn't have any consequences, WP expects the following to be slashed and unslashes it within wp_insert_post():

  • post_author
  • post_date
  • post_date_gmt
  • post_content
  • post_content_filtered
  • post_title
  • post_excerpt
  • post_status
  • post_type
  • comment_status
  • ping_status
  • post_password
  • post_name
  • to_ping
  • pinged
  • post_modified
  • post_modified_gmt
  • post_parent
  • menu_order
  • post_mime_type
  • guid

see source code

I wasn't able to reproduce the bug using external connections, so either the data is being slashed there already or this bug doesn't affect both connection types.

I was using this mini-plugin for testing -- it needs to be active on all sites & creates a test post on each site test-block.zip

@peterwilsoncc
Copy link
Collaborator

I've done some additional testing locally.

I think I'll move the slashing to include the entire post object rather than just the content. This will ensure the bug is fixed in it's entirety rather than risk the same bug appearing in other fields, such as title.

@peterwilsoncc peterwilsoncc merged commit 6cab43d into 10up:develop Jul 21, 2022
jmslbam added a commit to burovoordeboeg/distributor-meta-data that referenced this pull request Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode characters are not escaped during syndication
5 participants