Skip to content

Fix: Content Resizing: The "shorten" action does not detect the character length of Japanese text#581

Open
hbhalodia wants to merge 6 commits into
WordPress:developfrom
hbhalodia:fix/issue-578
Open

Fix: Content Resizing: The "shorten" action does not detect the character length of Japanese text#581
hbhalodia wants to merge 6 commits into
WordPress:developfrom
hbhalodia:fix/issue-578

Conversation

@hbhalodia
Copy link
Copy Markdown
Contributor

@hbhalodia hbhalodia commented May 19, 2026

What?

Closes #578

Why?

  • PR uses wordCountType for counting words or characters based on the users locale

How?

  • PR uses the same technique that Gutenberg uses for wordCount. To check for user's locale and update the countType based on _x() language pack.

Use of AI Tools

  • None

Testing Instructions

  1. Change the site language to Japanese.
  2. Open a post.
  3. Copy the following text and paste it multiple times inside a paragraph block: あああああああああああ
  4. Execute the Shorten action
  5. Confirm that no error notice is displayed and word shorten is working as expected.

Screenshots or screencast

Screen.Recording.2026-05-19.at.3.52.39.PM.mov

Changelog Entry

Fixed - Add wordCountType to check for user's locale and update to count character or words.

Open WordPress Playground Preview

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: hbhalodia <hbhalodia@git.wordpress.org>
Co-authored-by: dkotter <dkotter@git.wordpress.org>
Co-authored-by: jeffpaul <jeffpaul@git.wordpress.org>
Co-authored-by: t-hamano <wildworks@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Comment on lines 198 to 206
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Team, Do we need to update the copy here to use characters? based on wordCountType and also on similar places?

IMO we should update that, waiting for feedback before applying the change.

Thanks,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a couple thoughts here.

We have a few places that use the @wordpress/wordcount package and now two separate places that use the wordCountType to determine if we should look at words or characters. Wondering if we extract all that out into a shared helper so we're not duplicating code?

In addition, we're inconsistent right now on if we look at words or characters (not just based on locale but just in general). For instance, we default to looking at words here and for Content Classification but I believe Content Summarization defaults to looking at characters.

I'm thinking it might be nice to standardize this across the plugin to avoid these inconsistencies. This may also simplify the code a bit if we decide to standardize on characters instead of words, avoiding the need to change things depending on locale.

Any thoughts on these?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also noting we have #545 open that adds a minimum content length to Editorial Notes and it uses characters.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice and testing, I've seen AI reviews and feedback much more useful when there's a LOT more content available for it to utilize. However, I don't know that setting something like 500 words is realistic for all use cases for the AI plugin so perhaps something much lower like 100 words or 100 characters (noting neither are super specific so much as a round number). To some extent here, I don't care too much about the minimum length so much as we set something and use it across the plugin and allow that to be filterable. So let's just YOLO into something and can iterate if feedback shows we need to raise or lower that minimum threshold.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As one approach, how about considering both words and characters? In other words, the label displays both words and characters.

_n(
	'+%1$d word (+%2$d characters)',
	'+%1$d words (+%2$d characters)',
	magnitude,
	'ai'
),
magnitude,
charMagnitude

Furthermore, the threshold check might be able to consider both of those.

if ( action === 'shorten' ) {
	const wordCount = count( blockContent, 'words', {} );
	const charCount = count( blockContent, 'characters_excluding_spaces', {} );
	if ( wordCount < SHORTEN_MIN_WORDS && charCount < SHORTEN_MIN_CHARS ) {
		noticesDispatch.createErrorNotice();
		return;
	}
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a few places that use the @wordpress/wordcount package and now two separate places that use the wordCountType to determine if we should look at words or characters. Wondering if we extract all that out into a shared helper so we're not duplicating code?

Sure I would make a helper function, that checks for type.

This may also simplify the code a bit if we decide to standardize on characters instead of words, avoiding the need to change things depending on locale.

Not sure on this, but what as @t-hamano said, if looking at English it is mostly used words, while some languages such as Japanese uses characters (in that some languages have 2 parts, to include space or to not include space). So to make it consistent, we should go with characters.

But worth noting the suggestion about checking for both words and characters. But still, what if words are less and characters satisfies teh condition, then above code will still throw the notice.

As per me we should follow the gutenberg approach, define both words and characters min length individually. Based on the locale use either words or characters for counting. And this should be conistent across all the exoeriments, this, Content Classification and Content Summarization?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to any approach here, what I'm wanting is to ensure we approach this in a way that works across all of our features. Right now we are inconsistent with what features require a minimum content length and which ones don't. And the ones that do require it do it differently.

I would like to see a PR that standardizes this across all features and does it in a reusable way to avoid code duplication. At a high-level I think this means a client-side helper that all features can use. A server-side helper (with a filter around the content length) that all features can use. And then integrating that within all features, so a feature remains in a disabled state until the minimum length is hit.

As far as what the minimum length should be and if that should be characters, words or both, I'm open to suggestions there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dkotter @t-hamano, I have updated the PR with changes to make it first reusable accross all the features.

I am closing this PR - #577, We can focus on to improve all this feature in this same PR.

Currently checking on if we can optimize to make use of both characters and words and display both as a suggestion shared by @t-hamano.

Thanks,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Team, I need the suggestion here. Based on current state of PR, everything is standarized, duplication removed and used helper functions.

All the features will either use characters if user's locale is the language that uses characters (Eg: Japanese), and will use the words if the users local is the language that use words to count (Eg: English).

Now, we have a question to what to show the user, should we update the text to show characters or words depending on the user's locale, or Should we introduce both minWords and minCharacter count and show user both and disable experiments if both of them does not satisfies the condition?

Currently, I am updating the PR with the first approach, to update text based on user's locale, will update it based on the feedback.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, we have a question to what to show the user, should we update the text to show characters or words depending on the user's locale

I'm saying this as someone that only speaks English so definitely want other opinions here but if the decision is to use either words or characters depending on locale (and not standardize on characters only or standardize on checking both) I think the right approach is to update the message to show either characters or words, not both.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.15%. Comparing base (246680c) to head (3e0143c).

Files with missing lines Patch % Lines
.../Content_Classification/Content_Classification.php 0.00% 4 Missing ⚠️
.../Experiments/Content_Resizing/Content_Resizing.php 0.00% 2 Missing ⚠️
...cludes/Experiments/Summarization/Summarization.php 0.00% 1 Missing ⚠️
includes/helpers.php 0.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             develop     #581      +/-   ##
=============================================
- Coverage      73.18%   73.15%   -0.03%     
  Complexity      1731     1731              
=============================================
  Files             85       85              
  Lines           7473     7476       +3     
=============================================
  Hits            5469     5469              
- Misses          2004     2007       +3     
Flag Coverage Δ
unit 73.15% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dkotter dkotter linked an issue May 21, 2026 that may be closed by this pull request
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants