Skip to content

refactor: reduce hot-path allocations in message parser utils#39075

Merged
ggazzo merged 3 commits intoRocketChat:developfrom
Shreyas2004wagh:perf/message-parser-hotpath-allocations
Feb 27, 2026
Merged

refactor: reduce hot-path allocations in message parser utils#39075
ggazzo merged 3 commits intoRocketChat:developfrom
Shreyas2004wagh:perf/message-parser-hotpath-allocations

Conversation

@Shreyas2004wagh
Copy link
Contributor

@Shreyas2004wagh Shreyas2004wagh commented Feb 26, 2026

Summary

  • optimize reducePlainTexts to avoid flat() and process entries in a single linear pass without a flattened temporary array
  • keep extractFirstResult on find(Boolean) after benchmark comparisons showed the indexed-loop variant regressed parser throughput
  • remove object spread in joinEmoji plain-text return path to avoid extra object copy work

Why

These helpers are in the message-parser hot path and are invoked repeatedly by inline/emphasis/spoiler/list parsing rules. Reducing temporary allocations lowers GC pressure for long/complex messages.

Behavior

  • AST shape and grammar behavior are preserved
  • no grammar rules were changed

Validation

  • local parser microbenchmark (Node v22.16.0, same machine, back-to-back vs develop):
    • baseline avg: 4495.89 ops/s
    • PR avg: 4741.89 ops/s (+5.47%)
    • baseline median: 4558.18 ops/s
    • PR median: 4606.31 ops/s (+1.06%)
  • lint check: yarn workspace @rocket.chat/message-parser run eslint src/utils.ts
  • randomized helper equivalence checks between old and updated reducePlainTexts implementations:

Task: ARCH-2010

@dionisio-bot
Copy link
Contributor

dionisio-bot bot commented Feb 26, 2026

Looks like this PR is ready to merge! 🎉
If you have any trouble, please check the PR guidelines

@changeset-bot
Copy link

changeset-bot bot commented Feb 26, 2026

⚠️ No Changeset found

Latest commit: 5465a77

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 26, 2026

Walkthrough

Refactors plain-text consolidation in the message parser to avoid intermediate array allocations: replaces flattening with an incremental accumulation strategy, introduces an appendJoinedInline helper, and changes joinEmoji to emit plain-text emoji tokens formatted as :code:.

Changes

Cohort / File(s) Summary
Message parser utilities
packages/message-parser/src/utils.ts
Rewrote reducePlainTexts to remove flat() and perform streaming-style accumulation of Inlines (handles nested arrays), added appendJoinedInline helper, uses pendingInline/previousInline to consolidate adjacent plain-text segments, and updated joinEmoji to return a PLAIN_TEXT inline with value formatted as :<code>: instead of spreading current.value.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

community, type: chore

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Linked Issues check ⚠️ Warning The PR objectives show that benchmarks revealed performance regression in extractFirstResult (up to -45.8%) rather than improvement, contradicting the stated optimization goals. The author acknowledged needing to rework the optimization before requesting another review. Address the extractFirstResult performance regression by reworking the optimization strategy before merging. Rerun benchmarks to verify all helpers meet the intended performance targets.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'refactor: reduce hot-path allocations in message parser utils' accurately and concisely describes the main change of optimizing memory allocation in the message parser utility functions.
Out of Scope Changes check ✅ Passed All changes to joinEmoji, reducePlainTexts, and the referenced optimizations in utils.ts are directly aligned with the linked issue #39074 objectives to reduce hot-path allocations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Shreyas2004wagh Shreyas2004wagh changed the title perf(message-parser): reduce hot-path helper allocations in utils refactor: reduce hot-path allocations in message parser utils Feb 26, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

@codecov
Copy link

codecov bot commented Feb 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.71%. Comparing base (afaabea) to head (5465a77).
⚠️ Report is 16 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop   #39075      +/-   ##
===========================================
+ Coverage    70.66%   70.71%   +0.04%     
===========================================
  Files         3191     3195       +4     
  Lines       112965   113116     +151     
  Branches     20451    20473      +22     
===========================================
+ Hits         79829    79989     +160     
+ Misses       31088    31083       -5     
+ Partials      2048     2044       -4     
Flag Coverage Δ
e2e 60.39% <ø> (+<0.01%) ⬆️
e2e-api 47.83% <ø> (+0.02%) ⬆️
unit 71.34% <100.00%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@KevLehman KevLehman added the valid A valid contribution where maintainers will review based on priority label Feb 26, 2026
@ggazzo
Copy link
Member

ggazzo commented Feb 26, 2026

do you have any benchmark about?

@Shreyas2004wagh
Copy link
Contributor Author

Shreyas2004wagh commented Feb 26, 2026

Yes @ggazzo , I ran local microbenchmarks against develop vs this PR commit , Node v22.16.0, 5 rounds each.

  • reducePlainTexts: 260,350 -> 204,448 ops/s (-21.5%)
  • extractFirstResult (front hit): 111,383,661 -> 60,405,281 ops/s (-45.8%)
  • extractFirstResult (mid hit): 8,885,108 -> 6,432,167 ops/s (-27.6%)
  • extractFirstResult (end hit): 5,124,439 -> 4,220,970 ops/s (-17.6%)

So in this benchmark, throughput regressed. I’ll rework this optimization and post updated numbers before requesting another review.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/message-parser/src/utils.ts`:
- Around line 185-235: reducePlainTexts regression: revert or rework this
refactor to restore performance parity by simplifying the loop and avoiding
extra allocations/calls per inline; specifically, remove the
pendingInline/previousInline churn and nested Array.isArray handling that causes
extra branching and function calls (appendJoinedInline and joinEmoji) for every
item, and instead implement a single-pass concatenation over flattenableValues
that directly merges consecutive PLAIN_TEXT inlines (using joinEmoji only when
necessary) — target the reducePlainTexts function, eliminating the
pendingInline/previousInline state machine and inner nested loops, or revert to
the previous implementation until a benchmarked optimization proves faster.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81d3be6 and 91309b1.

📒 Files selected for processing (1)
  • packages/message-parser/src/utils.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: 📦 Build Packages
  • GitHub Check: CodeQL-Build
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)

**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation

Files:

  • packages/message-parser/src/utils.ts
🧬 Code graph analysis (1)
packages/message-parser/src/utils.ts (1)
packages/message-parser/src/definitions.ts (1)
  • Inlines (227-241)
🔇 Additional comments (2)
packages/message-parser/src/utils.ts (2)

176-179: LGTM on the object spread removal.

Replacing the spread operator with an explicit object literal is functionally equivalent and avoids an intermediate allocation.


187-187: Verify whether nested Inlines[] arrays actually occur at runtime.

The cast to Array<Inlines | Inlines[]> implies the function handles nested arrays, but Paragraph['value'] is typed as Inlines[] and the Inlines union doesn't include arrays. Either:

  • Nested arrays occur at runtime and the type definitions should be updated to reflect this, or
  • This cast adds unnecessary complexity for a case that never happens.
#!/bin/bash
# Search for call sites of reducePlainTexts to understand what's passed in
ast-grep --pattern 'reducePlainTexts($_)'

# Check if any grammar rules produce nested arrays in paragraph values
rg -n "Paragraph\['value'\]|paragraph.*value" --type ts -A 3 -B 1

Comment on lines 185 to 235
export const reducePlainTexts = (values: Paragraph['value']): Paragraph['value'] => {
const flattenedValues = values.flat();
const result: Paragraph['value'] = [];
const flattenableValues = values as Array<Inlines | Inlines[]>;

for (let index = 0; index < flattenedValues.length; index++) {
const current = joinEmoji(flattenedValues[index], flattenedValues[index - 1], flattenedValues[index + 1]);
let previousInline = undefined as Inlines | undefined;
let pendingInline = undefined as Inlines | undefined;

const appendJoinedInline = (inline: Inlines, nextInline: Inlines | undefined): void => {
const current = joinEmoji(inline, previousInline, nextInline);
const previous = result[result.length - 1];

if (previous && current.type === 'PLAIN_TEXT' && previous.type === 'PLAIN_TEXT') {
previous.value += current.value;
} else {
result.push(current);
}

previousInline = inline;
};

for (let index = 0; index < flattenableValues.length; index++) {
const entry = flattenableValues[index];

if (Array.isArray(entry)) {
for (let nestedIndex = 0; nestedIndex < entry.length; nestedIndex++) {
const currentInline = entry[nestedIndex];

if (pendingInline === undefined) {
pendingInline = currentInline;
continue;
}

appendJoinedInline(pendingInline, currentInline);
pendingInline = currentInline;
}

continue;
}

result.push(current);
if (pendingInline === undefined) {
pendingInline = entry;
continue;
}

appendJoinedInline(pendingInline, entry);
pendingInline = entry;
}

if (pendingInline !== undefined) {
appendJoinedInline(pendingInline, undefined);
}
Copy link
Contributor

@coderabbitai coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Benchmark regression is a blocker — throughput dropped significantly.

Per the PR comments, local benchmarks show this refactor made performance worse, not better:

  • reducePlainTexts: −21.5% ops/s
  • extractFirstResult: −17% to −45% ops/s

The author acknowledged this and stated intent to rework the optimization. This should not be merged until the rework demonstrates actual improvement or at least parity with the current implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/utils.ts` around lines 185 - 235,
reducePlainTexts regression: revert or rework this refactor to restore
performance parity by simplifying the loop and avoiding extra allocations/calls
per inline; specifically, remove the pendingInline/previousInline churn and
nested Array.isArray handling that causes extra branching and function calls
(appendJoinedInline and joinEmoji) for every item, and instead implement a
single-pass concatenation over flattenableValues that directly merges
consecutive PLAIN_TEXT inlines (using joinEmoji only when necessary) — target
the reducePlainTexts function, eliminating the pendingInline/previousInline
state machine and inner nested loops, or revert to the previous implementation
until a benchmarked optimization proves faster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That regression was from the previous revision.

I pushed an update that keeps the reducePlainTexts hot-path optimization but restores extractFirstResult to find(Boolean) (the manual loop was slower in parser usage).

Re-ran local parser benchmarks (same machine, back-to-back vs develop, Node v22.16.0):

  • baseline avg: 4495.89 ops/s
  • updated PR avg: 4741.89 ops/s (+5.47%)
  • baseline median: 4558.18 ops/s
  • updated PR median: 4606.31 ops/s (+1.06%)

So this revision is at least parity and slightly better on throughput, while also reducing temporary allocations in reducePlainTexts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shreyas2004wagh, thank you for the quick turnaround and updated benchmarks! 🎉

The new results showing +5.47% avg and +1.06% median throughput improvement address the performance regression concern from the previous revision. Restoring extractFirstResult to find(Boolean) while keeping the reducePlainTexts optimization was the right call.

A few follow-up notes on the reducePlainTexts refactor:

Complexity increase: The new implementation with the pendingInline/previousInline state machine and nested array handling is significantly more complex than the original flat() approach. The added branching logic and multiple continue paths increase the cognitive load and potential for edge-case bugs.

Correctness validation: Your PR description mentions "randomized equivalence checks between old and new helper logic" — could you share a bit more detail about how you validated correctness? Given the complexity, understanding your test coverage would add confidence (e.g., did you fuzz with mixed nested arrays, empty arrays, consecutive PLAIN_TEXT segments, emoji neighbors, etc.)?

Net benefit: A ~1–5% throughput gain with reduced allocations is a solid incremental win for a hot-path helper. As long as correctness is thoroughly validated, the trade-off seems reasonable given the performance goals of this PR.

Overall, the performance concern is resolved. Nice work on the iteration! 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran randomized equivalence checks between the previous and updated reducePlainTexts implementations and compared serialized outputs:

  • 50,000 randomized cases
  • mixed nested arrays + empty nested arrays
  • consecutive PLAIN_TEXT segments
  • emoji / emoticon / unicode emoji neighbors
  • mentions and mixed inline combinations

Result: 0 mismatches.

So behavior stayed equivalent while reducing temporary allocations in the hot path. I’ll add this validation detail to the PR description for traceability.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shreyas2004wagh, excellent work on the validation! 🎉

A 50,000-case randomized equivalence suite covering nested arrays, empty arrays, consecutive PLAIN_TEXT segments, emoji neighbors, mentions, and mixed inline combinations with zero mismatches provides strong confidence in the correctness of the refactored reducePlainTexts implementation.

Adding this validation detail to the PR description is a great idea—it provides clear traceability and helps reviewers and future maintainers understand the rigor behind the change.

With both the performance improvement confirmed (+5.47% avg throughput, reduced allocations) and correctness thoroughly validated, this optimization looks solid for the hot path. Nice iteration! 🚀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added it to the pr description

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shreyas2004wagh, perfect! Thank you for updating the PR description with the validation details. Having that traceability documented will help reviewers and future maintainers understand the rigor behind this optimization. 👍

@Shreyas2004wagh
Copy link
Contributor Author

Hey @ggazzo , I updated the PR and reran local parser benchmarks (same machine, back-to-back vs develop).

Result after latest change:

  • baseline avg: 4495.89 ops/s
  • PR avg: 4741.89 ops/s (+5.47%)
  • baseline median: 4558.18 ops/s
  • PR median: 4606.31 ops/s (+1.06%)

So this revision improves throughput and removes temporary-array allocation in the utils hot path (reducePlainTexts), while keeping behavior unchanged.

@ggazzo
Copy link
Member

ggazzo commented Feb 27, 2026

/jira ARCH-1935

@ggazzo ggazzo added this to the 8.3.0 milestone Feb 27, 2026
@ggazzo ggazzo added the stat: QA assured Means it has been tested and approved by a company insider label Feb 27, 2026
@dionisio-bot dionisio-bot bot added the stat: ready to merge PR tested and approved waiting for merge label Feb 27, 2026
@ggazzo ggazzo merged commit 86af4e2 into RocketChat:develop Feb 27, 2026
47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stat: QA assured Means it has been tested and approved by a company insider stat: ready to merge PR tested and approved waiting for merge valid A valid contribution where maintainers will review based on priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance bug: message parser hot-path helpers allocate excessive temporary arrays

4 participants