refactor: reduce hot-path allocations in message parser utils#39075
Conversation
|
Looks like this PR is ready to merge! 🎉 |
|
WalkthroughRefactors plain-text consolidation in the message parser to avoid intermediate array allocations: replaces flattening with an incremental accumulation strategy, introduces an appendJoinedInline helper, and changes joinEmoji to emit plain-text emoji tokens formatted as :code:. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #39075 +/- ##
===========================================
+ Coverage 70.66% 70.71% +0.04%
===========================================
Files 3191 3195 +4
Lines 112965 113116 +151
Branches 20451 20473 +22
===========================================
+ Hits 79829 79989 +160
+ Misses 31088 31083 -5
+ Partials 2048 2044 -4
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
do you have any benchmark about? |
|
Yes @ggazzo , I ran local microbenchmarks against develop vs this PR commit , Node v22.16.0, 5 rounds each.
So in this benchmark, throughput regressed. I’ll rework this optimization and post updated numbers before requesting another review. |
… extractFirstResult
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/message-parser/src/utils.ts`:
- Around line 185-235: reducePlainTexts regression: revert or rework this
refactor to restore performance parity by simplifying the loop and avoiding
extra allocations/calls per inline; specifically, remove the
pendingInline/previousInline churn and nested Array.isArray handling that causes
extra branching and function calls (appendJoinedInline and joinEmoji) for every
item, and instead implement a single-pass concatenation over flattenableValues
that directly merges consecutive PLAIN_TEXT inlines (using joinEmoji only when
necessary) — target the reducePlainTexts function, eliminating the
pendingInline/previousInline state machine and inner nested loops, or revert to
the previous implementation until a benchmarked optimization proves faster.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
packages/message-parser/src/utils.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: 📦 Build Packages
- GitHub Check: CodeQL-Build
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)
**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation
Files:
packages/message-parser/src/utils.ts
🧬 Code graph analysis (1)
packages/message-parser/src/utils.ts (1)
packages/message-parser/src/definitions.ts (1)
Inlines(227-241)
🔇 Additional comments (2)
packages/message-parser/src/utils.ts (2)
176-179: LGTM on the object spread removal.Replacing the spread operator with an explicit object literal is functionally equivalent and avoids an intermediate allocation.
187-187: Verify whether nestedInlines[]arrays actually occur at runtime.The cast to
Array<Inlines | Inlines[]>implies the function handles nested arrays, butParagraph['value']is typed asInlines[]and theInlinesunion doesn't include arrays. Either:
- Nested arrays occur at runtime and the type definitions should be updated to reflect this, or
- This cast adds unnecessary complexity for a case that never happens.
#!/bin/bash # Search for call sites of reducePlainTexts to understand what's passed in ast-grep --pattern 'reducePlainTexts($_)' # Check if any grammar rules produce nested arrays in paragraph values rg -n "Paragraph\['value'\]|paragraph.*value" --type ts -A 3 -B 1
| export const reducePlainTexts = (values: Paragraph['value']): Paragraph['value'] => { | ||
| const flattenedValues = values.flat(); | ||
| const result: Paragraph['value'] = []; | ||
| const flattenableValues = values as Array<Inlines | Inlines[]>; | ||
|
|
||
| for (let index = 0; index < flattenedValues.length; index++) { | ||
| const current = joinEmoji(flattenedValues[index], flattenedValues[index - 1], flattenedValues[index + 1]); | ||
| let previousInline = undefined as Inlines | undefined; | ||
| let pendingInline = undefined as Inlines | undefined; | ||
|
|
||
| const appendJoinedInline = (inline: Inlines, nextInline: Inlines | undefined): void => { | ||
| const current = joinEmoji(inline, previousInline, nextInline); | ||
| const previous = result[result.length - 1]; | ||
|
|
||
| if (previous && current.type === 'PLAIN_TEXT' && previous.type === 'PLAIN_TEXT') { | ||
| previous.value += current.value; | ||
| } else { | ||
| result.push(current); | ||
| } | ||
|
|
||
| previousInline = inline; | ||
| }; | ||
|
|
||
| for (let index = 0; index < flattenableValues.length; index++) { | ||
| const entry = flattenableValues[index]; | ||
|
|
||
| if (Array.isArray(entry)) { | ||
| for (let nestedIndex = 0; nestedIndex < entry.length; nestedIndex++) { | ||
| const currentInline = entry[nestedIndex]; | ||
|
|
||
| if (pendingInline === undefined) { | ||
| pendingInline = currentInline; | ||
| continue; | ||
| } | ||
|
|
||
| appendJoinedInline(pendingInline, currentInline); | ||
| pendingInline = currentInline; | ||
| } | ||
|
|
||
| continue; | ||
| } | ||
|
|
||
| result.push(current); | ||
| if (pendingInline === undefined) { | ||
| pendingInline = entry; | ||
| continue; | ||
| } | ||
|
|
||
| appendJoinedInline(pendingInline, entry); | ||
| pendingInline = entry; | ||
| } | ||
|
|
||
| if (pendingInline !== undefined) { | ||
| appendJoinedInline(pendingInline, undefined); | ||
| } |
There was a problem hiding this comment.
Benchmark regression is a blocker — throughput dropped significantly.
Per the PR comments, local benchmarks show this refactor made performance worse, not better:
reducePlainTexts: −21.5% ops/sextractFirstResult: −17% to −45% ops/s
The author acknowledged this and stated intent to rework the optimization. This should not be merged until the rework demonstrates actual improvement or at least parity with the current implementation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/message-parser/src/utils.ts` around lines 185 - 235,
reducePlainTexts regression: revert or rework this refactor to restore
performance parity by simplifying the loop and avoiding extra allocations/calls
per inline; specifically, remove the pendingInline/previousInline churn and
nested Array.isArray handling that causes extra branching and function calls
(appendJoinedInline and joinEmoji) for every item, and instead implement a
single-pass concatenation over flattenableValues that directly merges
consecutive PLAIN_TEXT inlines (using joinEmoji only when necessary) — target
the reducePlainTexts function, eliminating the pendingInline/previousInline
state machine and inner nested loops, or revert to the previous implementation
until a benchmarked optimization proves faster.
There was a problem hiding this comment.
That regression was from the previous revision.
I pushed an update that keeps the reducePlainTexts hot-path optimization but restores extractFirstResult to find(Boolean) (the manual loop was slower in parser usage).
Re-ran local parser benchmarks (same machine, back-to-back vs develop, Node v22.16.0):
- baseline avg: 4495.89 ops/s
- updated PR avg: 4741.89 ops/s (+5.47%)
- baseline median: 4558.18 ops/s
- updated PR median: 4606.31 ops/s (+1.06%)
So this revision is at least parity and slightly better on throughput, while also reducing temporary allocations in reducePlainTexts.
There was a problem hiding this comment.
@Shreyas2004wagh, thank you for the quick turnaround and updated benchmarks! 🎉
The new results showing +5.47% avg and +1.06% median throughput improvement address the performance regression concern from the previous revision. Restoring extractFirstResult to find(Boolean) while keeping the reducePlainTexts optimization was the right call.
A few follow-up notes on the reducePlainTexts refactor:
Complexity increase: The new implementation with the pendingInline/previousInline state machine and nested array handling is significantly more complex than the original flat() approach. The added branching logic and multiple continue paths increase the cognitive load and potential for edge-case bugs.
Correctness validation: Your PR description mentions "randomized equivalence checks between old and new helper logic" — could you share a bit more detail about how you validated correctness? Given the complexity, understanding your test coverage would add confidence (e.g., did you fuzz with mixed nested arrays, empty arrays, consecutive PLAIN_TEXT segments, emoji neighbors, etc.)?
Net benefit: A ~1–5% throughput gain with reduced allocations is a solid incremental win for a hot-path helper. As long as correctness is thoroughly validated, the trade-off seems reasonable given the performance goals of this PR.
Overall, the performance concern is resolved. Nice work on the iteration! 🚀
There was a problem hiding this comment.
I ran randomized equivalence checks between the previous and updated reducePlainTexts implementations and compared serialized outputs:
- 50,000 randomized cases
- mixed nested arrays + empty nested arrays
- consecutive
PLAIN_TEXTsegments - emoji / emoticon / unicode emoji neighbors
- mentions and mixed inline combinations
Result: 0 mismatches.
So behavior stayed equivalent while reducing temporary allocations in the hot path. I’ll add this validation detail to the PR description for traceability.
There was a problem hiding this comment.
@Shreyas2004wagh, excellent work on the validation! 🎉
A 50,000-case randomized equivalence suite covering nested arrays, empty arrays, consecutive PLAIN_TEXT segments, emoji neighbors, mentions, and mixed inline combinations with zero mismatches provides strong confidence in the correctness of the refactored reducePlainTexts implementation.
Adding this validation detail to the PR description is a great idea—it provides clear traceability and helps reviewers and future maintainers understand the rigor behind the change.
With both the performance improvement confirmed (+5.47% avg throughput, reduced allocations) and correctness thoroughly validated, this optimization looks solid for the hot path. Nice iteration! 🚀
There was a problem hiding this comment.
added it to the pr description
There was a problem hiding this comment.
@Shreyas2004wagh, perfect! Thank you for updating the PR description with the validation details. Having that traceability documented will help reviewers and future maintainers understand the rigor behind this optimization. 👍
|
Hey @ggazzo , I updated the PR and reran local parser benchmarks (same machine, back-to-back vs develop). Result after latest change:
So this revision improves throughput and removes temporary-array allocation in the utils hot path ( |
|
/jira ARCH-1935 |
Summary
Why
These helpers are in the message-parser hot path and are invoked repeatedly by inline/emphasis/spoiler/list parsing rules. Reducing temporary allocations lowers GC pressure for long/complex messages.
Behavior
Validation
Fix Performance bug: message parser hot-path helpers allocate excessive temporary arrays #39074
Task: ARCH-2010