Fix: Content Resizing: The "shorten" action does not detect the character length of Japanese text by hbhalodia · Pull Request #581 · WordPress/ai

hbhalodia · 2026-05-19T10:24:44Z

What?

Closes #578

Why?

PR uses wordCountType for counting words or characters based on the users locale

How?

PR uses the same technique that Gutenberg uses for wordCount. To check for user's locale and update the countType based on _x() language pack.

Use of AI Tools

None

Testing Instructions

Change the site language to Japanese.
Open a post.
Copy the following text and paste it multiple times inside a paragraph block: あああああああああああ
Execute the Shorten action
Confirm that no error notice is displayed and word shorten is working as expected.

Screenshots or screencast

Screen.Recording.2026-05-19.at.3.52.39.PM.mov

Changelog Entry

Fixed - Add wordCountType to check for user's locale and update to count character or words.

…aracters and words

github-actions · 2026-05-19T10:24:54Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: hbhalodia <hbhalodia@git.wordpress.org>
Co-authored-by: dkotter <dkotter@git.wordpress.org>
Co-authored-by: jeffpaul <jeffpaul@git.wordpress.org>
Co-authored-by: t-hamano <wildworks@git.wordpress.org>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

hbhalodia · 2026-05-19T10:26:03Z

Hi Team, Do we need to update the copy here to use characters? based on wordCountType and also on similar places?

IMO we should update that, waiting for feedback before applying the change.

Thanks,

So a couple thoughts here.

We have a few places that use the @wordpress/wordcount package and now two separate places that use the wordCountType to determine if we should look at words or characters. Wondering if we extract all that out into a shared helper so we're not duplicating code?

In addition, we're inconsistent right now on if we look at words or characters (not just based on locale but just in general). For instance, we default to looking at words here and for Content Classification but I believe Content Summarization defaults to looking at characters.

I'm thinking it might be nice to standardize this across the plugin to avoid these inconsistencies. This may also simplify the code a bit if we decide to standardize on characters instead of words, avoiding the need to change things depending on locale.

Any thoughts on these?

Also noting we have #545 open that adds a minimum content length to Editorial Notes and it uses characters.

In practice and testing, I've seen AI reviews and feedback much more useful when there's a LOT more content available for it to utilize. However, I don't know that setting something like 500 words is realistic for all use cases for the AI plugin so perhaps something much lower like 100 words or 100 characters (noting neither are super specific so much as a round number). To some extent here, I don't care too much about the minimum length so much as we set something and use it across the plugin and allow that to be filterable. So let's just YOLO into something and can iterate if feedback shows we need to raise or lower that minimum threshold.

As one approach, how about considering both words and characters? In other words, the label displays both words and characters.

_n( '+%1$d word (+%2$d characters)', '+%1$d words (+%2$d characters)', magnitude, 'ai' ), magnitude, charMagnitude

Furthermore, the threshold check might be able to consider both of those.

if ( action === 'shorten' ) { const wordCount = count( blockContent, 'words', {} ); const charCount = count( blockContent, 'characters_excluding_spaces', {} ); if ( wordCount < SHORTEN_MIN_WORDS && charCount < SHORTEN_MIN_CHARS ) { noticesDispatch.createErrorNotice(); return; } }

We have a few places that use the @wordpress/wordcount package and now two separate places that use the wordCountType to determine if we should look at words or characters. Wondering if we extract all that out into a shared helper so we're not duplicating code?

Sure I would make a helper function, that checks for type.

This may also simplify the code a bit if we decide to standardize on characters instead of words, avoiding the need to change things depending on locale.

Not sure on this, but what as @t-hamano said, if looking at English it is mostly used words, while some languages such as Japanese uses characters (in that some languages have 2 parts, to include space or to not include space). So to make it consistent, we should go with characters.

But worth noting the suggestion about checking for both words and characters. But still, what if words are less and characters satisfies teh condition, then above code will still throw the notice.

As per me we should follow the gutenberg approach, define both words and characters min length individually. Based on the locale use either words or characters for counting. And this should be conistent across all the exoeriments, this, Content Classification and Content Summarization?

I'm open to any approach here, what I'm wanting is to ensure we approach this in a way that works across all of our features. Right now we are inconsistent with what features require a minimum content length and which ones don't. And the ones that do require it do it differently.

I would like to see a PR that standardizes this across all features and does it in a reusable way to avoid code duplication. At a high-level I think this means a client-side helper that all features can use. A server-side helper (with a filter around the content length) that all features can use. And then integrating that within all features, so a feature remains in a disabled state until the minimum length is hit.

As far as what the minimum length should be and if that should be characters, words or both, I'm open to suggestions there.

Hi @dkotter @t-hamano, I have updated the PR with changes to make it first reusable accross all the features.

I am closing this PR - #577, We can focus on to improve all this feature in this same PR.

Currently checking on if we can optimize to make use of both characters and words and display both as a suggestion shared by @t-hamano.

Thanks,

Hi Team, I need the suggestion here. Based on current state of PR, everything is standarized, duplication removed and used helper functions.

All the features will either use characters if user's locale is the language that uses characters (Eg: Japanese), and will use the words if the users local is the language that use words to count (Eg: English).

Now, we have a question to what to show the user, should we update the text to show characters or words depending on the user's locale, or Should we introduce both minWords and minCharacter count and show user both and disable experiments if both of them does not satisfies the condition?

Currently, I am updating the PR with the first approach, to update text based on user's locale, will update it based on the feedback.

Now, we have a question to what to show the user, should we update the text to show characters or words depending on the user's locale

I'm saying this as someone that only speaks English so definitely want other opinions here but if the decision is to use either words or characters depending on locale (and not standardize on characters only or standardize on checking both) I think the right approach is to update the message to show either characters or words, not both.

codecov · 2026-05-19T10:30:48Z

Codecov Report

❌ Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.15%. Comparing base (246680c) to head (3e0143c).

Files with missing lines	Patch %	Lines
.../Content_Classification/Content_Classification.php	0.00%	4 Missing ⚠️
.../Experiments/Content_Resizing/Content_Resizing.php	0.00%	2 Missing ⚠️
...cludes/Experiments/Summarization/Summarization.php	0.00%	1 Missing ⚠️
includes/helpers.php	0.00%	1 Missing ⚠️

Additional details and impacted files

@@              Coverage Diff              @@
##             develop     #581      +/-   ##
=============================================
- Coverage      73.18%   73.15%   -0.03%     
  Complexity      1731     1731              
=============================================
  Files             85       85              
  Lines           7473     7476       +3     
=============================================
  Hits            5469     5469              
- Misses          2004     2007       +3

Flag	Coverage Δ
unit	`73.15% <0.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…more counts

Update content resize to use wordCountType for word count based on ch…

a69dc87

…aracters and words

hbhalodia commented May 19, 2026

View reviewed changes

dkotter mentioned this pull request May 19, 2026

Release version 1.0.0 #567

Closed

42 tasks

dkotter assigned hbhalodia May 19, 2026

dkotter modified the milestones: 1.0.0, 1.1.0 May 19, 2026

dkotter mentioned this pull request May 20, 2026

Feature: Disable Review notes until minimum content threshold is reached #545

Open

hbhalodia added 2 commits May 21, 2026 14:27

Merge branch 'develop' into fix/issue-578

ec4f0cc

Standardize wordCount across the experiments and added filter to add …

923446f

…more counts

hbhalodia mentioned this pull request May 21, 2026

Fix: Content Classification: Make character count locale-aware #577

Closed

hbhalodia added 3 commits May 21, 2026 16:08

Update text based on users locale and use exact min content length data

53ade0f

Fix phpcs errors

72f53df

Fix e2e tests for content summarization

3e0143c

dkotter linked an issue May 21, 2026 that may be closed by this pull request

Content Classification: Make character count locale-aware #571

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Content Resizing: The "shorten" action does not detect the character length of Japanese text#581

Fix: Content Resizing: The "shorten" action does not detect the character length of Japanese text#581
hbhalodia wants to merge 6 commits into
WordPress:developfrom
hbhalodia:fix/issue-578

hbhalodia commented May 19, 2026 •

edited by github-actions Bot

Loading

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

hbhalodia May 19, 2026

Uh oh!

dkotter May 19, 2026

Uh oh!

dkotter May 19, 2026

Uh oh!

jeffpaul May 19, 2026

Uh oh!

t-hamano May 20, 2026

Uh oh!

hbhalodia May 20, 2026

Uh oh!

dkotter May 20, 2026

Uh oh!

hbhalodia May 21, 2026

Uh oh!

hbhalodia May 21, 2026

Uh oh!

dkotter May 21, 2026

Uh oh!

codecov Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

hbhalodia commented May 19, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What?

Why?

How?

Use of AI Tools

Testing Instructions

Screenshots or screencast

Changelog Entry

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hbhalodia commented May 19, 2026 •

edited by github-actions Bot

Loading

github-actions Bot commented May 19, 2026 •

edited

Loading

codecov Bot commented May 19, 2026 •

edited

Loading