feat(message-parser): Handwritten Lexer for Message Parser#39458
feat(message-parser): Handwritten Lexer for Message Parser#39458Se3do wants to merge 63 commits into
Conversation
Extract scanning logic into dedicated modules: - blocks, code, emoji, formatting, inline, links, math, structural - central dispatch via scanners/index.ts
|
Looks like this PR is not ready to merge, because of the following issues:
Please fix the issues and try again If you have any trouble, please check the PR guidelines |
|
WalkthroughThis PR introduces a comprehensive handwritten lexer for the message-parser package that tokenizes Rocket.Chat messages into a flat stream of tokens. The implementation uses a three-layer architecture with an orchestrator lexer, character-code-based scanner dispatch table, and utility helpers, supported by extensive constants (character codes, emoji, regexes) and a complete test suite. Changes
Sequence DiagramsequenceDiagram
participant Caller
participant Lexer
participant ScanContext
participant ScannerTable
participant Scanner
participant TokenArray
Caller->>Lexer: tokenize()
Lexer->>ScanContext: create context<br/>(input, len, tokens[])
loop for each character position
Lexer->>Lexer: check CHAR_CLASS<br/>(fast path for ASCII)
alt ASCII special char
Lexer->>ScannerTable: lookup scanner<br/>for charCode
ScannerTable->>Scanner: dispatch
Scanner->>ScanContext: flushText()<br/>emit token
Scanner-->>Lexer: return next pos
else Unicode emoji
Lexer->>Scanner: scanUnicodeEmoji
Scanner->>ScanContext: emit EMOJI token
Scanner-->>Lexer: return next pos
else Plain text
Lexer->>Lexer: accumulate text
end
end
Lexer->>ScanContext: flushText()<br/>(remaining)
Lexer->>ScanContext: emit EOF token
Lexer->>TokenArray: return tokens[]
TokenArray-->>Caller: flat token stream
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can generate a title for your PR based on the changes with custom instructions.Set the |
There was a problem hiding this comment.
12 issues found across 39 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/message-parser-handwritten/src/lexer/constants/charSets.ts">
<violation number="1" location="packages/message-parser-handwritten/src/lexer/constants/charSets.ts:63">
P2: Unicode emoji start detection omits newer `U+1FA00–U+1FAFF` range, causing modern emoji to be tokenized as TEXT instead of `EMOJI_UNICODE`.</violation>
</file>
<file name="packages/message-parser-handwritten/src/lexer/constants/emoticons.ts">
<violation number="1" location="packages/message-parser-handwritten/src/lexer/constants/emoticons.ts:95">
P2: Emoticon key has an accidental trailing space, causing `>.<` not to match unless followed by whitespace.</violation>
</file>
<file name="packages/message-parser-handwritten/src/index.ts">
<violation number="1" location="packages/message-parser-handwritten/src/index.ts:2">
P1: Public `parse` API is exported but always throws, causing deterministic runtime failure for all calls.</violation>
</file>
<file name="packages/message-parser-handwritten/src/lexer/lexer.ts">
<violation number="1" location="packages/message-parser-handwritten/src/lexer/lexer.ts:32">
P2: Lexer silently truncates when `MAX_TOKENS` is reached but still emits `EOF`, making partial tokenization look complete.</violation>
</file>
<file name="packages/message-parser-handwritten/src/lexer/scanners/inline.ts">
<violation number="1" location="packages/message-parser-handwritten/src/lexer/scanners/inline.ts:101">
P2: `scanC` emits `COLOR` without flushing pending text, which can produce overlapping or out-of-order TEXT/COLOR tokens.</violation>
</file>
<file name="packages/message-parser-handwritten/src/ast/definitions.ts">
<violation number="1" location="packages/message-parser-handwritten/src/ast/definitions.ts:204">
P2: `ASTNode`, `Types`, and other node unions are out of sync, so valid node variants are not consistently representable across exported AST typings.</violation>
</file>
<file name="packages/message-parser-handwritten/src/utils.ts">
<violation number="1" location="packages/message-parser-handwritten/src/utils.ts:240">
P2: `timestampFromHours` mixes UTC date extraction with local-time parsing, which can generate a timestamp for the wrong calendar day when `timezone` is omitted.</violation>
<violation number="2" location="packages/message-parser-handwritten/src/utils.ts:242">
P1: Using `| 0` to coerce Unix timestamps causes 32-bit overflow, producing incorrect timestamps for valid future dates.</violation>
</file>
<file name="packages/message-parser-handwritten/benchmarks/lexer.bench.ts">
<violation number="1" location="packages/message-parser-handwritten/benchmarks/lexer.bench.ts:1">
P2: Shebang uses `env` with multi-word interpreter (`npx ts-node`) without `-S`, making direct execution unreliable.</violation>
</file>
<file name="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts">
<violation number="1" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:28">
P2: Dangling trailing ZWJ is consumed into `EMOJI_UNICODE` instead of being rolled back, producing invalid emoji token boundaries.</violation>
<violation number="2" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:30">
P2: ZWJ emoji scanning can advance past input length by blindly doing `i += 2` for any surrogate, allowing scanner to return out-of-bounds positions on malformed surrogate input.</violation>
<violation number="3" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:35">
P2: ZWJ-joined emoji components cannot include skin-tone modifiers, causing valid emoji ZWJ sequences to be split into incorrect tokens.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Pull request overview
This PR introduces a new message-parser-handwritten package containing Stage 1 of a two-stage parser replacement for Rocket.Chat's current PEG-based message parser. The lexer tokenizes raw message input into a flat token stream in a single linear pass using a scanner dispatch table keyed by character code.
Changes:
- A complete handwritten lexer (
Lexerclass) with scanner dispatch table, character classification, and scanner functions for all message syntax elements (formatting, code, links, emoji, mentions, math, lists, etc.) - AST type definitions and utility functions mirroring the existing
message-parserpackage for future parser compatibility - Comprehensive test suites covering structural tokens, formatting, code, links, math, inline elements, blocks, and general/edge-case scenarios, plus a benchmark suite
Reviewed changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
package.json |
New package definition with build scripts, dependencies, and metadata |
src/lexer/lexer.ts |
Main lexer loop with fast-lane text path, Unicode path, and scanner dispatch |
src/lexer/Token.ts |
Token kind enum and Token interface/factory |
src/lexer/ScanContext.ts |
Scan context interface and shared utilities (flushText, emit, consumeRun, tryEmoticon) |
src/lexer/helpers.ts |
Helpers for scanning mention bodies and emoji shortcodes |
src/lexer/scanners/index.ts |
Scanner dispatch table mapping char codes to scanner functions |
src/lexer/scanners/structural.ts |
Scanners for newlines, escapes, and whitespace |
src/lexer/scanners/formatting.ts |
Scanners for asterisk, underscore, and tilde emphasis markers |
src/lexer/scanners/code.ts |
Scanners for inline code and fenced code blocks |
src/lexer/scanners/math.ts |
Scanner for $ and $$ KaTeX delimiters |
src/lexer/scanners/links.ts |
Scanners for angle brackets, markdown links, images, and timestamps |
src/lexer/scanners/blocks.ts |
Scanners for headings, list bullets, task bullets, and pipes/spoilers |
src/lexer/scanners/inline.ts |
Scanners for colons (emoji/URL), @mentions, +phone, color tokens, and emoticons |
src/lexer/scanners/emoji.ts |
Unicode emoji scanner with ZWJ/variation selector/skin tone support |
src/lexer/constants/charCodes.ts |
Named character code constants |
src/lexer/constants/charSets.ts |
Character classification lookup tables and emoji start detection |
src/lexer/constants/emoticons.ts |
Emoticon trie and lookup table |
src/lexer/constants/regexes.ts |
Sticky regexes for URLs, emails, phones, colors, timestamps |
src/ast/definitions.ts |
AST type definitions (mirrored from existing parser) |
src/ast/guards.ts |
Type guard utility for AST nodes |
src/utils.ts |
AST node factory utilities (mirrored from existing parser) |
src/index.ts |
Placeholder public API (throws "Not implemented") |
tests/*.test.ts |
Comprehensive test suites for all lexer features |
benchmarks/lexer.bench.ts |
Performance benchmark suite |
| Config files | tsconfig, webpack, jest, eslint, gitignore configurations |
README.md |
Architecture documentation and performance benchmarks |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 14
🧹 Nitpick comments (7)
packages/message-parser-handwritten/src/lexer/Token.ts (1)
2-67: Remove the implementation comments from this module.The grouping is already clear from the names and spacing, and this package guideline asks us to keep implementation files comment-free.
As per coding guidelines,
**/*.{ts,tsx,js}: Avoid code comments in the implementation.Also applies to: 73-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/Token.ts` around lines 2 - 67, Remove the inline implementation/grouping comments and any other non-functional comments from this enum file; leave only the token identifiers (e.g., TRIPLE_BACKTICK, ASTERISK, BACKTICK, KATEX_INLINE_START, IMAGE_OPEN, MENTION_USER, EMOJI_SHORTCODE, TIMESTAMP, UL_BULLET, COLOR, URL, WHITESPACE, ESCAPED, TEXT, CODE_CONTENT, EOF) and their values intact, preserving spacing for readability—ensure no comment lines remain anywhere in Token.ts.packages/message-parser-handwritten/src/lexer/constants/charCodes.ts (1)
1-43: Drop the inline character annotations from this constants table.These comments just restate the constant names/literals and violate the repo rule to keep implementation files comment-free.
As per coding guidelines, "Avoid code comments in the implementation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/constants/charCodes.ts` around lines 1 - 43, Remove all inline trailing comments in the char code constants table (e.g., the // \t, // A, // z annotations) so the file contains only the exported constants (for symbols like CH_TAB, CH_LF, CH_SPACE, CH_A_UP, CH_A_LO, CH_Z_LO, CH_PIPE, CH_TILDE, etc.); keep the constant names and numeric values unchanged and do not add new comments or explanatory text in this implementation file.packages/message-parser-handwritten/src/lexer/lexer.ts (1)
7-8: Remove code comment per coding guidelines.As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files. The constant name
MAX_TOKENSis self-documenting.Suggested fix
-// Safety cap to avoid runaway tokenization on malformed input. const MAX_TOKENS = 4096;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/lexer.ts` around lines 7 - 8, Remove the inline code comment above the MAX_TOKENS constant; the constant name MAX_TOKENS is self-documenting so delete the comment line "// Safety cap to avoid runaway tokenization on malformed input." and leave the const MAX_TOKENS = 4096; declaration unchanged.packages/message-parser-handwritten/src/lexer/scanners/index.ts (1)
1-1: Remove code comment per coding guidelines.As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files.
Suggested fix
-// Scanner dispatch table - maps char codes to scanner functions - import { ScanFn } from '../ScanContext';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/scanners/index.ts` at line 1, Remove the implementation-level comment "// Scanner dispatch table - maps char codes to scanner functions" at the top of the file; per TypeScript coding guidelines, avoid inline code comments in implementation files—leave the file as-is otherwise and ensure no other standalone implementation comments remain in this module (e.g., referencing the scanner dispatch table).packages/message-parser-handwritten/src/lexer/scanners/math.ts (1)
1-1: Remove code comment per coding guidelines.As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files.
Suggested fix
-// $ and $$ for KaTeX math - tracks open/close state - import { ScanContext, flushText, emit } from '../ScanContext';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/scanners/math.ts` at line 1, Remove the inline code comment at the top of math.ts ("// $ and $$ for KaTeX math - tracks open/close state") to comply with the TypeScript coding guideline against implementation comments; simply delete that comment line in packages/message-parser-handwritten/src/lexer/scanners/math.ts so the file contains only code and no implementation comments.packages/message-parser-handwritten/tests/code.test.ts (1)
1-9: Extract the lexer test helpers into one shared module.
tok/kinds/kv/raware duplicated across the lexer suites, so EOF filtering and tuple shaping can drift the next timeTokenchanges. A smalltests/utils.tswould keep those assertions consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/tests/code.test.ts` around lines 1 - 9, Extract the duplicated helper functions tok, kinds, kv, and raw into a single shared test helper module (e.g., tests/utils.ts) and update test files to import them instead of redefining; move the EOF filtering and tuple shaping logic (uses of Lexer and TokenKind.EOF) into that module so all lexer test suites use the same tok(s: string) => new Lexer(s).tokenize() and the common kinds/kv/raw wrappers, ensuring TokenKind and Lexer are imported there and exported helpers are reused across tests.packages/message-parser-handwritten/src/lexer/scanners/blocks.ts (1)
56-70: Potential out-of-bounds access when checking task bullet syntax.When accessing
input[pos + 3]on line 60, if the input string is shorter thanpos + 4, this returnsundefined. While the subsequent comparisons would fail gracefully, accessinginput.charCodeAt(pos + 4)andinput.charCodeAt(pos + 5)on lines 63-64 would returnNaN, which also fails the comparison safely. However, for defensive coding, consider adding a length check.Suggested defensive check
if ( next === CH_SPACE && input.charCodeAt(pos + 2) === CH_LBRACKET ) { + if (pos + 5 >= input.length) { + // Not enough characters for task bullet + } else { const flag = input[pos + 3]; if ( (flag === 'x' || flag === ' ') && input.charCodeAt(pos + 4) === CH_RBRACKET && input.charCodeAt(pos + 5) === CH_SPACE ) { flushText(ctx, pos); const raw = `- [${flag}] `; emit(ctx, TokenKind.TASK_BULLET, raw, flag, pos); return pos + raw.length; } + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser-handwritten/src/lexer/scanners/blocks.ts` around lines 56 - 70, The task-bullet detection in scanners/blocks.ts reads input[pos + 3], pos + 4 and pos + 5 without ensuring the input has that many characters, risking out-of-bounds reads; update the conditional around TokenKind.TASK_BULLET detection to first check that input.length >= pos + 6 (or equivalent) before accessing input[pos + 3], input.charCodeAt(pos + 4) and input.charCodeAt(pos + 5), then proceed to call flushText(ctx, pos) and emit(ctx, TokenKind.TASK_BULLET, ...) only when the length check and existing character checks pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/message-parser-handwritten/benchmarks/lexer.bench.ts`:
- Around line 1-11: The package is missing tinybench in its devDependencies
causing the import of "Bench" and "Task" from 'tinybench' in
benchmarks/lexer.bench.ts to fail in isolated installs; add an explicit
devDependency entry for "tinybench" (choose a compatible version, e.g. a caret
range) to packages/message-parser-handwritten/package.json and run an install
(or yarn workspace install) so the benchmark file importing Bench/Task resolves
in CI and local isolated environments.
In `@packages/message-parser-handwritten/package.json`:
- Around line 5-12: Update the package metadata in package.json so homepage,
bugs.url, and repository.url point to the Rocket.Chat monorepo instead of
RocketChat/fuselage: change "homepage" to the Rocket.Chat repo readme URL, set
"bugs.url" to the Rocket.Chat issues URL, and update "repository.url" to the
git+https URL for the Rocket.Chat repository while preserving
"repository.directory" (package directory) as
"packages/message-parser-handwritten"; ensure the values replace the current
fuselage references everywhere they appear.
In `@packages/message-parser-handwritten/README.md`:
- Around line 45-55: Add the language specifier "text" to both fenced code
blocks in README so markdownlint stops flagging them; edit the first fenced
block containing "# Hello **world** :smile: `@john`" and the second block
containing the token sequence (HEADING_MARKER, WS, TEXT, ASTERISK,
EMOJI_SHORTCODE, MENTION_USER, EOF) and prefix each opening ``` with "text"
(i.e., ```text).
In `@packages/message-parser-handwritten/src/index.ts`:
- Around line 1-3: The exported parse function in index.ts currently throws and
echoes raw input; either remove its public export so the package only exposes
the lexer (keep parse as an internal/non-exported function and export the
lexer-only entrypoint instead), or if parse must remain exported temporarily,
replace the thrown Error to a static, non-sensitive placeholder (e.g. throw new
Error("Parsing not implemented")) and remove any interpolation of the input;
update references to the exported symbol accordingly.
In `@packages/message-parser-handwritten/src/lexer/constants/charSets.ts`:
- Around line 51-64: isUnicodeEmojiStart currently treats the broad BMP range
0x2300–0x27BF as emoji which lets non-emoji chars (e.g. U+2500 box-drawing) be
routed to scanUnicodeEmoji and emitted as EMOJI_UNICODE; narrow the check by
either (A) replacing the single range test with explicit emoji BMP ranges (e.g.
0x2600–0x26FF and 0x2700–0x27BF) or (preferred) perform a Unicode property test
inside isUnicodeEmojiStart (e.g. use a /\p{Emoji}/u test on input.charAt(pos))
so only true emoji return true, and ensure scanUnicodeEmoji/EMOJI_UNICODE are
unchanged otherwise.
In `@packages/message-parser-handwritten/src/lexer/scanners/code.ts`:
- Around line 36-38: The comment and call for the unclosed fence are
misindented; align the comment "// unclosed fence" and the emitCodeBody(ctx,
bodyStart, pos); statement with the surrounding function block indentation
(match the other statements in the function that use the same indentation level)
so the comment and the emitCodeBody invocation sit at the same indent as nearby
statements handling code bodies (referencing emitCodeBody, ctx, bodyStart, pos).
In `@packages/message-parser-handwritten/src/lexer/scanners/emoji.ts`:
- Around line 10-11: The surrogate-range check around the input advance is too
broad: change the condition that uses c1 to only detect high surrogates
(0xD800–0xDBFF) so you only advance by 2 when the first code unit is a valid
high surrogate; leave advancement as 1 for everything else (including standalone
low surrogates). Update the logic in emoji.ts where c1 is read (variable c1 and
the subsequent advance) to use the precise high-surrogate range check instead of
0xD800–0xDFFF to avoid incorrectly treating lone low surrogates as part of a
surrogate pair.
- Around line 25-37: The ZWJ sequence loop in emoji.ts currently treats any
surrogate (0xd800–0xdfff) as a high surrogate; update the surrogate pair check
inside the while loop (the branch that inspects the variable `next`) to only
treat high surrogates (0xD800–0xDBFF) as the start of a surrogate pair, leaving
low surrogates (0xDC00–0xDFFF) to be handled as malformed input; adjust the
conditional that now reads `if (next >= 0xd800 && next <= 0xdfff)` to use the
correct high-surrogate range and keep the rest of the ZWJ/VS16 handling intact.
In `@packages/message-parser-handwritten/src/lexer/scanners/inline.ts`:
- Around line 96-104: The COLOR token path is missing a call to flushText before
emitting, causing prior accumulated text to be merged incorrectly; update the
inline scanner so that when input.startsWith('color:#', pos) and COLOR_RE
matches, you call flushText(ctx, pos) immediately before emit(ctx,
TokenKind.COLOR, m[0], m[1], pos), then return pos + m[0].length; reference the
functions flushText and emit, the variables ctx, pos, input, and the regex
COLOR_RE/TokenKind.COLOR to locate and modify the code.
In `@packages/message-parser-handwritten/src/lexer/scanners/structural.ts`:
- Around line 20-23: The branches in structural.ts that handle escaped KaTeX
delimiters emit KATEX tokens but do not toggle the parser state, so add the same
ctx flag updates as scanDollar(): after emitting KATEX_BLOCK_START set
ctx.katexBlockOpen = true; after emitting KATEX_BLOCK_END set ctx.katexBlockOpen
= false; after emitting KATEX_INLINE_START set ctx.katexInlineOpen = true; after
emitting KATEX_INLINE_END set ctx.katexInlineOpen = false; keep the existing
flushText(ctx, pos) and emit(...) calls (TokenKind.KATEX_BLOCK_START /
KATEX_BLOCK_END / KATEX_INLINE_START / KATEX_INLINE_END) and return the same pos
+ 2 values so escaped delimiters enter/exit KaTeX mode correctly.
In `@packages/message-parser-handwritten/src/lexer/Token.ts`:
- Around line 1-68: Replace the exported const enum TokenKind with a safe public
representation (either a plain exported enum TokenKind or an exported as-const
object + union type) so it no longer emits a const enum in the .d.ts surface;
update any usages to match the chosen representation. Also remove all
implementation/section comments inside this file (the large block comments
around token groups and the single field comment) so only the token declarations
remain. Ensure the exported symbol name TokenKind is preserved and all
references compile after the change.
In `@packages/message-parser-handwritten/src/utils.ts`:
- Around line 237-270: timestampFromHours() and timestampFromIsoTime() must stop
using bitwise truncation and must build the epoch seconds in the correct
timezone; replace the "| 0" truncation with Math.floor(.../1000) in both
functions, and change timestamp construction so you compute the timestamp in UTC
(use Date.UTC with the current date's UTC year/month/day for timestampFromHours
and for timestampFromIsoTime parse
year/month/day/hours/minutes/seconds/milliseconds into numeric components and
call Date.UTC) then apply any explicit timezone offset parsing when a timezone
string is provided (convert "+HH:MM"/"-HH:MM" into minutes and add/subtract that
offset from the UTC ms value) so that both functions produce correct epoch
seconds without 2038 overflow and without local/UTC date mismatches.
- Around line 83-97: autoLink currently treats inputs like "javascript://..." as
links because tldParse and isValidLink allow non-HTTP schemes; update autoLink
to reject any src that has a scheme other than "http:" or "https:" (except allow
protocol-relative "//") before creating href. Concretely, in the autoLink
function add a guard that returns plain(src) unless src startsWith("//") or has
a URL scheme of "http" or "https" (use a simple scheme check on the raw src
string or via URL parsing), then proceed to compute href and call link;
reference autoLink, tldParse and isValidLink when making the change.
---
Nitpick comments:
In `@packages/message-parser-handwritten/src/lexer/constants/charCodes.ts`:
- Around line 1-43: Remove all inline trailing comments in the char code
constants table (e.g., the // \t, // A, // z annotations) so the file contains
only the exported constants (for symbols like CH_TAB, CH_LF, CH_SPACE, CH_A_UP,
CH_A_LO, CH_Z_LO, CH_PIPE, CH_TILDE, etc.); keep the constant names and numeric
values unchanged and do not add new comments or explanatory text in this
implementation file.
In `@packages/message-parser-handwritten/src/lexer/lexer.ts`:
- Around line 7-8: Remove the inline code comment above the MAX_TOKENS constant;
the constant name MAX_TOKENS is self-documenting so delete the comment line "//
Safety cap to avoid runaway tokenization on malformed input." and leave the
const MAX_TOKENS = 4096; declaration unchanged.
In `@packages/message-parser-handwritten/src/lexer/scanners/blocks.ts`:
- Around line 56-70: The task-bullet detection in scanners/blocks.ts reads
input[pos + 3], pos + 4 and pos + 5 without ensuring the input has that many
characters, risking out-of-bounds reads; update the conditional around
TokenKind.TASK_BULLET detection to first check that input.length >= pos + 6 (or
equivalent) before accessing input[pos + 3], input.charCodeAt(pos + 4) and
input.charCodeAt(pos + 5), then proceed to call flushText(ctx, pos) and
emit(ctx, TokenKind.TASK_BULLET, ...) only when the length check and existing
character checks pass.
In `@packages/message-parser-handwritten/src/lexer/scanners/index.ts`:
- Line 1: Remove the implementation-level comment "// Scanner dispatch table -
maps char codes to scanner functions" at the top of the file; per TypeScript
coding guidelines, avoid inline code comments in implementation files—leave the
file as-is otherwise and ensure no other standalone implementation comments
remain in this module (e.g., referencing the scanner dispatch table).
In `@packages/message-parser-handwritten/src/lexer/scanners/math.ts`:
- Line 1: Remove the inline code comment at the top of math.ts ("// $ and $$ for
KaTeX math - tracks open/close state") to comply with the TypeScript coding
guideline against implementation comments; simply delete that comment line in
packages/message-parser-handwritten/src/lexer/scanners/math.ts so the file
contains only code and no implementation comments.
In `@packages/message-parser-handwritten/src/lexer/Token.ts`:
- Around line 2-67: Remove the inline implementation/grouping comments and any
other non-functional comments from this enum file; leave only the token
identifiers (e.g., TRIPLE_BACKTICK, ASTERISK, BACKTICK, KATEX_INLINE_START,
IMAGE_OPEN, MENTION_USER, EMOJI_SHORTCODE, TIMESTAMP, UL_BULLET, COLOR, URL,
WHITESPACE, ESCAPED, TEXT, CODE_CONTENT, EOF) and their values intact,
preserving spacing for readability—ensure no comment lines remain anywhere in
Token.ts.
In `@packages/message-parser-handwritten/tests/code.test.ts`:
- Around line 1-9: Extract the duplicated helper functions tok, kinds, kv, and
raw into a single shared test helper module (e.g., tests/utils.ts) and update
test files to import them instead of redefining; move the EOF filtering and
tuple shaping logic (uses of Lexer and TokenKind.EOF) into that module so all
lexer test suites use the same tok(s: string) => new Lexer(s).tokenize() and the
common kinds/kv/raw wrappers, ensuring TokenKind and Lexer are imported there
and exported helpers are reused across tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 516195fa-ba54-4ae8-99d4-fadb55f4f1bd
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (38)
packages/message-parser-handwritten/.eslintrc.jsonpackages/message-parser-handwritten/.gitignorepackages/message-parser-handwritten/README.mdpackages/message-parser-handwritten/benchmarks/lexer.bench.tspackages/message-parser-handwritten/jest.config.tspackages/message-parser-handwritten/package.jsonpackages/message-parser-handwritten/src/ast/definitions.tspackages/message-parser-handwritten/src/ast/guards.tspackages/message-parser-handwritten/src/index.tspackages/message-parser-handwritten/src/lexer/ScanContext.tspackages/message-parser-handwritten/src/lexer/Token.tspackages/message-parser-handwritten/src/lexer/constants/charCodes.tspackages/message-parser-handwritten/src/lexer/constants/charSets.tspackages/message-parser-handwritten/src/lexer/constants/emoticons.tspackages/message-parser-handwritten/src/lexer/constants/regexes.tspackages/message-parser-handwritten/src/lexer/helpers.tspackages/message-parser-handwritten/src/lexer/lexer.tspackages/message-parser-handwritten/src/lexer/scanners/blocks.tspackages/message-parser-handwritten/src/lexer/scanners/code.tspackages/message-parser-handwritten/src/lexer/scanners/emoji.tspackages/message-parser-handwritten/src/lexer/scanners/formatting.tspackages/message-parser-handwritten/src/lexer/scanners/index.tspackages/message-parser-handwritten/src/lexer/scanners/inline.tspackages/message-parser-handwritten/src/lexer/scanners/links.tspackages/message-parser-handwritten/src/lexer/scanners/math.tspackages/message-parser-handwritten/src/lexer/scanners/structural.tspackages/message-parser-handwritten/src/utils.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/tests/inline.test.tspackages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/structural.test.tspackages/message-parser-handwritten/tsconfig.build.jsonpackages/message-parser-handwritten/tsconfig.jsonpackages/message-parser-handwritten/webpack.config.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: cubic · AI code reviewer
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)
**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation
Files:
packages/message-parser-handwritten/src/lexer/scanners/math.tspackages/message-parser-handwritten/src/index.tspackages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/webpack.config.tspackages/message-parser-handwritten/src/ast/guards.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/src/lexer/scanners/emoji.tspackages/message-parser-handwritten/src/lexer/scanners/formatting.tspackages/message-parser-handwritten/src/lexer/constants/regexes.tspackages/message-parser-handwritten/src/lexer/scanners/structural.tspackages/message-parser-handwritten/src/lexer/scanners/index.tspackages/message-parser-handwritten/src/lexer/scanners/blocks.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/src/lexer/scanners/links.tspackages/message-parser-handwritten/src/lexer/helpers.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/src/lexer/scanners/inline.tspackages/message-parser-handwritten/src/lexer/lexer.tspackages/message-parser-handwritten/src/lexer/scanners/code.tspackages/message-parser-handwritten/src/lexer/Token.tspackages/message-parser-handwritten/src/lexer/constants/charSets.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/src/lexer/constants/emoticons.tspackages/message-parser-handwritten/tests/structural.test.tspackages/message-parser-handwritten/tests/inline.test.tspackages/message-parser-handwritten/jest.config.tspackages/message-parser-handwritten/benchmarks/lexer.bench.tspackages/message-parser-handwritten/src/lexer/ScanContext.tspackages/message-parser-handwritten/src/lexer/constants/charCodes.tspackages/message-parser-handwritten/src/ast/definitions.tspackages/message-parser-handwritten/src/utils.ts
🧠 Learnings (21)
📚 Learning: 2026-02-26T19:25:44.063Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In the Rocket.Chat repository, do not reference Biome lint rules in code review feedback. Biome is not used even if biome.json exists; only reference Biome rules if there is explicit, project-wide usage documented. For TypeScript files, review lint implications without Biome guidance unless the project enables Biome rules.
Applied to files:
packages/message-parser-handwritten/src/lexer/scanners/math.tspackages/message-parser-handwritten/src/index.tspackages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/webpack.config.tspackages/message-parser-handwritten/src/ast/guards.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/src/lexer/scanners/emoji.tspackages/message-parser-handwritten/src/lexer/scanners/formatting.tspackages/message-parser-handwritten/src/lexer/constants/regexes.tspackages/message-parser-handwritten/src/lexer/scanners/structural.tspackages/message-parser-handwritten/src/lexer/scanners/index.tspackages/message-parser-handwritten/src/lexer/scanners/blocks.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/src/lexer/scanners/links.tspackages/message-parser-handwritten/src/lexer/helpers.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/src/lexer/scanners/inline.tspackages/message-parser-handwritten/src/lexer/lexer.tspackages/message-parser-handwritten/src/lexer/scanners/code.tspackages/message-parser-handwritten/src/lexer/Token.tspackages/message-parser-handwritten/src/lexer/constants/charSets.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/src/lexer/constants/emoticons.tspackages/message-parser-handwritten/tests/structural.test.tspackages/message-parser-handwritten/tests/inline.test.tspackages/message-parser-handwritten/jest.config.tspackages/message-parser-handwritten/benchmarks/lexer.bench.tspackages/message-parser-handwritten/src/lexer/ScanContext.tspackages/message-parser-handwritten/src/lexer/constants/charCodes.tspackages/message-parser-handwritten/src/ast/definitions.tspackages/message-parser-handwritten/src/utils.ts
📚 Learning: 2026-02-26T19:25:44.063Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In this repository (RocketChat/Rocket.Chat), Biome lint rules are not used even if a biome.json exists. When reviewing TypeScript files (e.g., packages/ui-voip/src/providers/useMediaSession.ts), ensure lint suggestions do not reference Biome-specific rules. Rely on general ESLint/TypeScript lint rules and project conventions instead.
Applied to files:
packages/message-parser-handwritten/src/lexer/scanners/math.tspackages/message-parser-handwritten/src/index.tspackages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/webpack.config.tspackages/message-parser-handwritten/src/ast/guards.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/src/lexer/scanners/emoji.tspackages/message-parser-handwritten/src/lexer/scanners/formatting.tspackages/message-parser-handwritten/src/lexer/constants/regexes.tspackages/message-parser-handwritten/src/lexer/scanners/structural.tspackages/message-parser-handwritten/src/lexer/scanners/index.tspackages/message-parser-handwritten/src/lexer/scanners/blocks.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/src/lexer/scanners/links.tspackages/message-parser-handwritten/src/lexer/helpers.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/src/lexer/scanners/inline.tspackages/message-parser-handwritten/src/lexer/lexer.tspackages/message-parser-handwritten/src/lexer/scanners/code.tspackages/message-parser-handwritten/src/lexer/Token.tspackages/message-parser-handwritten/src/lexer/constants/charSets.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/src/lexer/constants/emoticons.tspackages/message-parser-handwritten/tests/structural.test.tspackages/message-parser-handwritten/tests/inline.test.tspackages/message-parser-handwritten/jest.config.tspackages/message-parser-handwritten/benchmarks/lexer.bench.tspackages/message-parser-handwritten/src/lexer/ScanContext.tspackages/message-parser-handwritten/src/lexer/constants/charCodes.tspackages/message-parser-handwritten/src/ast/definitions.tspackages/message-parser-handwritten/src/utils.ts
📚 Learning: 2025-12-10T21:00:54.909Z
Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.
Applied to files:
packages/message-parser-handwritten/.eslintrc.jsonpackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tsconfig.jsonpackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/tests/inline.test.tspackages/message-parser-handwritten/jest.config.ts
📚 Learning: 2026-02-26T19:26:01.702Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:26:01.702Z
Learning: The RocketChat/Rocket.Chat project does not use Biome for linting, despite the presence of a biome.json file in the repository. Lint-related suggestions should not reference Biome rules.
Applied to files:
packages/message-parser-handwritten/.eslintrc.json
📚 Learning: 2026-01-08T15:03:59.621Z
Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 38068
File: apps/meteor/tests/data/apps/app-packages/README.md:14-16
Timestamp: 2026-01-08T15:03:59.621Z
Learning: For the RocketChat/Rocket.Chat repository, do not analyze or report formatting issues (such as hard tabs vs spaces, line breaks, etc.). The project relies on automated linting tools to enforce formatting standards.
Applied to files:
packages/message-parser-handwritten/.eslintrc.json
📚 Learning: 2025-11-19T12:32:29.696Z
Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 37547
File: packages/i18n/src/locales/en.i18n.json:634-634
Timestamp: 2025-11-19T12:32:29.696Z
Learning: Repo: RocketChat/Rocket.Chat
Context: i18n workflow
Learning: In this repository, new translation keys should be added to packages/i18n/src/locales/en.i18n.json only; other locale files are populated via the external translation pipeline and/or fall back to English. Do not request adding the same key to all locale files in future reviews.
Applied to files:
packages/message-parser-handwritten/.eslintrc.json
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files
Applied to files:
packages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/tests/structural.test.tspackages/message-parser-handwritten/jest.config.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.spec.ts : Use descriptive test names that clearly communicate expected behavior in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/tests/inline.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `expect` matchers for assertions (`toEqual`, `toContain`, `toBeTruthy`, `toHaveLength`, etc.) instead of `assert` statements in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/links.test.tspackages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.ts
📚 Learning: 2025-11-24T17:08:26.531Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Write test cases in Markdown format following the standardized template with ALL necessary components: Title, Description, Preconditions, Type, Steps, and Expected Result
Applied to files:
packages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/blocks.test.ts
📚 Learning: 2025-11-24T17:08:26.531Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Define clear, measurable expected results in test cases that can be validated and later converted into automated tests
Applied to files:
packages/message-parser-handwritten/tests/formatting.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.{ts,tsx,js} : Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tsconfig.jsonpackages/message-parser-handwritten/webpack.config.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/benchmarks/lexer.bench.tspackages/message-parser-handwritten/src/utils.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file
Applied to files:
packages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts
Applied to files:
packages/message-parser-handwritten/tests/formatting.test.tspackages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.tspackages/message-parser-handwritten/tests/general.test.tspackages/message-parser-handwritten/tests/inline.test.ts
📚 Learning: 2026-02-25T20:10:16.987Z
Learnt from: ahmed-n-abdeltwab
Repo: RocketChat/Rocket.Chat PR: 38913
File: packages/ddp-client/src/legacy/types/SDKLegacy.ts:34-34
Timestamp: 2026-02-25T20:10:16.987Z
Learning: In the RocketChat/Rocket.Chat monorepo, packages/ddp-client and apps/meteor do not use TypeScript project references. Module augmentations in apps/meteor (e.g., declare module 'rocket.chat/rest-typings') are not visible when compiling packages/ddp-client in isolation, which is why legacy SDK methods that depend on OperationResult types for OpenAPI-migrated endpoints must remain commented out.
Applied to files:
packages/message-parser-handwritten/tsconfig.json
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Maintain test isolation between test cases in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.tspackages/message-parser-handwritten/tests/math.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.step()` for complex test scenarios to improve organization in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/code.test.tspackages/message-parser-handwritten/tests/blocks.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.beforeAll()` and `test.afterAll()` for setup/teardown in Playwright tests
Applied to files:
packages/message-parser-handwritten/tests/code.test.ts
📚 Learning: 2025-11-19T18:20:07.720Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37419
File: packages/i18n/src/locales/en.i18n.json:918-921
Timestamp: 2025-11-19T18:20:07.720Z
Learning: Repo: RocketChat/Rocket.Chat — i18n/formatting
Learning: This repository uses a custom message formatting parser in UI blocks/messages; do not assume standard Markdown rules. For keys like Call_ended_bold, Call_not_answered_bold, Call_failed_bold, and Call_transferred_bold in packages/i18n/src/locales/en.i18n.json, retain the existing single-asterisk emphasis unless maintainers request otherwise.
Applied to files:
packages/message-parser-handwritten/src/lexer/scanners/formatting.tspackages/message-parser-handwritten/package.json
📚 Learning: 2026-02-24T19:05:56.710Z
Learnt from: ahmed-n-abdeltwab
Repo: RocketChat/Rocket.Chat PR: 0
File: :0-0
Timestamp: 2026-02-24T19:05:56.710Z
Learning: Rocket.Chat repo context: When a workspace manifest on develop already pins a dependency version (e.g., packages/web-ui-registration → "rocket.chat/ui-contexts": "27.0.1"), a lockfile change in a feature PR that upgrades only that dependency’s resolution is considered a manifest-driven sync and can be kept, preferably as a small "chore: sync yarn.lock with manifests" commit.
Applied to files:
packages/message-parser-handwritten/package.json
📚 Learning: 2026-03-06T18:09:14.766Z
Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 39397
File: packages/gazzodown/src/elements/Timestamp/DateTimeFormats.spec.tsx:20-23
Timestamp: 2026-03-06T18:09:14.766Z
Learning: In the RocketChat/Rocket.Chat gazzodown package (`packages/gazzodown`), tests are intended to run under the UTC timezone, but as of PR `#39397` this is NOT yet explicitly enforced in `jest.config.ts` or the `package.json` test scripts (which just run `jest` without `TZ=UTC`). To make timezone-sensitive snapshot tests reliable across all environments, `TZ=UTC` should be added to the test scripts in `package.json` or to `jest.config.ts` via `testEnvironmentOptions.timezone`. Without explicit UTC enforcement, snapshot tests involving date-fns formatted output or `toLocaleString()` will fail for contributors in non-UTC timezones.
Applied to files:
packages/message-parser-handwritten/jest.config.ts
🪛 markdownlint-cli2 (0.21.0)
packages/message-parser-handwritten/README.md
[warning] 45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 51-51: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (31)
packages/message-parser-handwritten/webpack.config.ts (1)
5-12: No changes needed. The webpack configuration paths resolve correctly in this monorepo setup. Yarn and turbo automatically execute workspace scripts with the working directory set to the package directory (packages/message-parser-handwritten/), so relative paths like./src,./dist, and./tsconfig.build.jsonresolve as intended. The build configuration works without modification.> Likely an incorrect or invalid review comment.packages/message-parser-handwritten/src/lexer/constants/regexes.ts (1)
1-13: No issues found. Each use of these sticky regexes is preceded by an explicitlastIndexassignment before calling.exec(). The shared state is properly managed throughout the codebase across all scanner files (links.ts, inline.ts, blocks.ts).> Likely an incorrect or invalid review comment.packages/message-parser-handwritten/src/lexer/scanners/math.ts (1)
7-27: LGTM!The
scanDollarimplementation is correct. The lookahead for$$is properly handled before the single$case, state toggling is symmetric, and position advancement is accurate.packages/message-parser-handwritten/tests/blocks.test.ts (1)
1-167: LGTM!Comprehensive test suite covering headings, unordered/ordered list bullets, task bullets, pipes, and spoiler fences. Tests properly verify line-start context behavior, newline handling, emoticon fallbacks, and edge cases for invalid patterns.
packages/message-parser-handwritten/tests/structural.test.ts (1)
1-118: LGTM!Well-structured test suite covering newlines (LF, CRLF, CR), whitespace collapsing, escape sequences, and round-trip integrity. The round-trip tests using
test.eachare a good practice for verifying token reconstruction.packages/message-parser-handwritten/tests/formatting.test.ts (1)
1-110: LGTM!Comprehensive formatting test suite covering asterisk, underscore, and tilde markers with proper edge case coverage for line-start bullet behavior and emoticon fallbacks.
packages/message-parser-handwritten/src/lexer/scanners/index.ts (1)
28-84: LGTM!Clean dispatch table design with explicit character code registrations. The loop for digit registration and the categorized grouping of scanner registrations make the code maintainable.
packages/message-parser-handwritten/src/lexer/scanners/emoji.ts (1)
1-42: Overall implementation is solid.The emoji scanner correctly handles variation selectors (VS-15/VS-16), skin tone modifiers, and ZWJ sequences. The structure follows the Unicode emoji specification pattern.
packages/message-parser-handwritten/tests/links.test.ts (1)
1-133: LGTM!Thorough test suite covering markdown links, images, angle brackets, blockquote markers, emoticons, and timestamps with comprehensive edge case handling for empty parts, nested brackets, and fallback behaviors.
packages/message-parser-handwritten/src/lexer/lexer.ts (1)
10-68: LGTM!The Lexer implementation is well-structured with clear separation of concerns:
- Plain ASCII fast path for common text
- Non-ASCII path with emoji detection
- Dispatch table for special characters with graceful fallback
The
MAX_TOKENSsafety cap prevents runaway tokenization, andflushTextbefore EOF ensures no accumulated text is lost.packages/message-parser-handwritten/src/lexer/scanners/links.ts (3)
1-32: LGTM - scanAngleOpen correctly handles timestamp, emoticon, and angle bracket cases.The timestamp detection logic properly validates the
<t:prefix and uses regex for inner content before emitting. The emoticon fallback and plain angle bracket emission are correctly ordered.
34-47: LGTM - scanAngleClose correctly distinguishes blockquote markers from angle brackets.The
isLineStartcheck correctly identifies blockquote markers at line start (pos === 0 or after newline), with emoticon fallback properly prioritized.
49-79: LGTM - Link-related scanners are straightforward and correct.
scanExclamation,scanCloseBracket,scanBracketOpen, andscanParenClosecorrectly handle their respective markdown link constructs with proper text flushing and token emission.packages/message-parser-handwritten/src/lexer/scanners/blocks.ts (3)
12-45: LGTM - scanHash correctly handles headings, emoticons, and channel mentions.The heading detection logic properly limits to 4 levels and requires trailing whitespace. Channel mention scanning via
scanMentionBodyis correctly sequenced after emoticon fallback.
90-128: LGTM - scanDigit handles ordered lists, emoticons, and URLs correctly.The ordered list detection at line start with digit-dot-space pattern is correct. URL matching with trailing punctuation trimming follows the established pattern.
130-150: LGTM - scanPipe correctly distinguishes block vs inline spoiler fences.The block spoiler detection requiring line-start position and line-end (or EOF) following
||is correctly implemented. The token kind distinction aligns with Token.ts definitions.packages/message-parser-handwritten/src/lexer/scanners/code.ts (4)
5-14: LGTM - scanBacktick correctly dispatches between fenced and inline code.The triple-backtick detection and delegation to specialized scanners is clean and correct.
17-39: LGTM - scanFencedBlock correctly handles fenced code blocks.The scanner properly emits opening/closing triple backticks with CODE_CONTENT body between them. Unclosed fences emit accumulated content as CODE_CONTENT, which is appropriate for partial code blocks.
41-66: LGTM - scanInlineCode correctly handles single backtick code spans.The newline-termination behavior (emitting unclosed inline code as TEXT) matches typical markdown behavior where inline code cannot span lines. EOF handling is also correct.
68-80: LGTM - Helper functions are clean and correct.
emitCodeBodyandemitTextBodycorrectly handle the empty-range case and emit appropriate token kinds.packages/message-parser-handwritten/src/lexer/constants/emoticons.ts (3)
95-95: Trailing space in emoticon may be unintentional.The emoticon
'>.< '(line 95) contains a trailing space. Verify if this is intentional, as it would require users to type>.<(with space) to trigger thepersevereemoji, which differs from other emoticons that don't require trailing whitespace.
1-9: LGTM - TrieNode interface and factory are well-designed.Using
Object.create(null)for the character map avoids prototype pollution and is a good practice for dictionary-like structures.
128-149: LGTM - Trie construction and lookup are correct.
buildEmoticonTriecorrectly builds a prefix trie with longest-match semantics (storingrawat terminal nodes). The dual lookup structure (trie for matching, object for shortcode resolution) is efficient.packages/message-parser-handwritten/src/lexer/ScanContext.ts (3)
5-12: LGTM - ScanContext interface is well-structured.The interface cleanly separates readonly input data from mutable state (
textStart,katexBlockOpen,katexInlineOpen). The token array is mutable but the reference is readonly, which is appropriate.
14-42: LGTM - Utility functions are correct and efficient.
isLineStart,flushText,emit, andconsumeRunare all pure/predictable functions with correct implementations. TheflushTextearly return ontextStart === -1prevents unnecessary work.
44-69: LGTM - tryEmoticon implements greedy longest-match correctly.The trie traversal records the longest valid match (
lastMatch/lastMatchEnd) as it walks, ensuring patterns like:-)are preferred over:)when both exist in the trie. The fallback torawwhengetEmoticonShortCodereturns undefined handles edge cases gracefully.packages/message-parser-handwritten/src/lexer/scanners/inline.ts (5)
8-33: LGTM - scanColon correctly handles URL schemes, emoji shortcodes, and emoticons.The priority order (URL scheme → emoji shortcode → emoticon → plain text) is correct for typical markdown-like syntax.
35-56: LGTM - tryUrlScheme cleverly rescans from previous TEXT token.The approach of popping the previous TEXT token to capture the URL scheme prefix (e.g.,
httpbefore://) is efficient and correct. The fallback to plain text on invalid URLs handles edge cases properly.
58-67: LGTM - scanAt correctly handles user mentions.The mention body scanning and token emission are straightforward and correct.
131-136: LGTM - scanEmoticonStarter is a clean fallback scanner.Simple and correct delegation to
tryEmoticonwith text accumulation fallback.
69-90: No ReDoS vulnerability in PHONE_RE regex.Empirical testing of the regex
/\+(?:\(?\d+\)?[\s\-]?){1,}\d/ywith pathological inputs (100+ parentheses, repeated non-digits) completes in microseconds with no exponential backtracking. The pattern requires at least one digit in the repeating group and a final digit as a boundary condition, which prevents catastrophic backtracking. Malformed inputs fail efficiently.> Likely an incorrect or invalid review comment.
…code emoji detection in lexer
…emoticon key '>.<'
…_TOKENS is reached
…terpreter in shebang
…with skin-tone modifiers
… a valid emoji component
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
packages/message-parser/LEXER.md (1)
45-55:⚠️ Potential issue | 🟡 MinorAdd language identifiers to fenced code blocks.
markdownlintMD040 is triggered at Line 45 and Line 51. Please addtextto both opening fences.Suggested fix
-``` +```text # Hello **world** :smile: `@john`@@
-+text
HEADING_MARKER("1") → WS(" ") → TEXT("Hello") → WS(" ") → ASTERISK("") →
TEXT("world") → ASTERISK("") → WS(" ") → EMOJI_SHORTCODE("smile") →
WS(" ") → MENTION_USER("john") → EOF🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/LEXER.md` around lines 45 - 55, The two fenced code blocks that start with "```" containing the sample Markdown ("# Hello **world** :smile: `@john`") and the token sequence (HEADING_MARKER...EOF) need explicit language identifiers to satisfy markdownlint MD040; update both opening fences from ``` to ```text so they read "```text" to mark them as plain text. Ensure you only modify the two opening fence lines (for the human-readable example and the flat array of tokens) and leave the block contents and closing fences unchanged.packages/message-parser/src/lexer/constants/charSets.ts (1)
51-65:⚠️ Potential issue | 🟠 MajorNarrow BMP emoji range or add validation; remove inline comments.
The range
0x2300–0x27BFstill includes non-emoji characters (box-drawing U+2500–U+257F, block elements U+2580–U+259F). Characters like─(U+2500) are incorrectly tokenized asEMOJI_UNICODE. ThescanUnicodeEmojifunction lacks validation to filter these out. Either narrow the BMP range to actual emoji blocks (e.g., U+2600–U+27BF for Miscellaneous Symbols and Dingbats) or add precise character validation before tokenizing.Also, per coding guidelines, remove inline comments (lines 50, 53).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/src/lexer/constants/charSets.ts` around lines 51 - 65, The BMP range in isUnicodeEmojiStart is too broad and includes non-emoji chars; tighten the BMP check to the proper emoji block(s) (e.g., use 0x2600–0x27BF) or replace the single-range test with a precise whitelist/lookup of allowed BMP codepoints, and add a corresponding validation step in scanUnicodeEmoji to verify the actual character(s) against that whitelist before emitting EMOJI_UNICODE (use the same validation helper to avoid duplication). Also remove the inline end-of-line comments inside isUnicodeEmojiStart and ensure any explanatory notes live in surrounding doc/comments instead.
🧹 Nitpick comments (5)
packages/message-parser/src/index.ts (1)
27-28: Keep the handwritten lexer off the root barrel until the API settles.The PR still describes this as a Stage 1 prototype. Re-exporting it here makes
Lexer,Token,TokenKind, andmakeTokenpart of the stable package contract immediately; a dedicated subpath export would leave room to evolve the token model before Stage 2 lands. The inline banner on Line 27 can go away with that move.As per coding guidelines: "Avoid code comments in the implementation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/src/index.ts` around lines 27 - 28, The root barrel currently re-exports the handwritten lexer (Lexer, Token, TokenKind, makeToken) and includes an inline banner comment; remove that re-export and delete the inline comment from the root index so these symbols are not part of the stable package contract, then create a dedicated subpath module that re-exports Lexer/Token/TokenKind/makeToken and add a corresponding package subpath export (so consumers can import the lexer from the dedicated subpath while the root API remains stable).packages/message-parser/jest.config.ts (1)
10-23: Narrow the transform regex to exclude JSX extensions since this package has no JSX files.The pattern
^.+\.m?(t|j)sx?$at line 10 advertises support for.tsx/.jsxfiles, but there are no JSX/TSX sources inpackages/message-parser. The SWC parser also omits JSX syntax configuration. Narrowing the regex to^.+\.m?[tj]s$clarifies the actual scope.Suggested patch
- '^.+\\.m?(t|j)sx?$': [ + '^.+\\.m?[tj]s$': [🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/jest.config.ts` around lines 10 - 23, The transform regex in the Jest config currently allows JSX/TSX (the pattern '^.+\\.m?(t|j)sx?$') while the SWC jsc.parser settings disable JSX and this package has no JSX/TSX files; update the transform key to a narrower regex '^.+\\.m?[tj]s$' so it only matches .js/.mjs/.ts/.mts files and remove JSX/TSX support, keeping the existing '@swc/jest' transformer and jsc config (refer to the transform entry and the jsc.parser block in the snippet to locate the change).packages/message-parser/src/lexer/ScanContext.ts (1)
46-50: Potential infinite loop ifposis negative.
consumeRundoesn't validate thatposis non-negative. If called with a negativepos,input.charCodeAt(pos + count)would access negative indices, returningNaNon each iteration. SinceNaN !== charCode, the loop would exit immediately withcount = 0, so this is functionally safe but unexpected input could cause subtle issues.The current implementation is safe due to JavaScript's
charCodeAtbehavior, but callers must ensure valid positions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/src/lexer/ScanContext.ts` around lines 46 - 50, The function consumeRun should guard against invalid starting positions: detect if pos is negative or beyond input.length (or not an integer) and return 0 early rather than calling input.charCodeAt with an out-of-range index; update the consumeRun implementation to validate pos (e.g., ensure Number.isInteger(pos) && pos >= 0 && pos < input.length) before entering the while loop that compares input.charCodeAt(pos + count) to charCode so callers with bad positions won't produce unexpected behavior.packages/message-parser/src/lexer/constants/charSets.ts (1)
10-16: Consider removing inline comments per coding guidelines.The coding guidelines specify to "avoid code comments in the implementation." While these comments serve as documentation, they could be moved to JSDoc or external documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/src/lexer/constants/charSets.ts` around lines 10 - 16, Remove the inline implementation comment above the CHAR_CLASS declaration and any other inline comments in this file; instead add a brief JSDoc block above the export of CHAR_CLASS describing its purpose and the kinds of characters it marks (used for tokenization), leaving the code that initializes CHAR_CLASS and the uses of symbols like CHAR_CLASS, CH_0, CH_9, and CH_C_LO unchanged so behavior is preserved.packages/message-parser/src/lexer/scanners/links.ts (1)
14-25: Potential out-of-bounds access when checking timestamp pattern.Lines 15-16 access
pos + 1andpos + 2without bounds checking. Ifposis near the end of input,charCodeAtreturnsNaN, which won't match the expected codes, so the condition fails safely. However, line 20 accessespos + 3 + m[0].lengthwhich could also be out of bounds—charCodeAtwill returnNaNin that case too, causing the comparison to fail safely.The current implementation is functionally correct due to
charCodeAtreturningNaNfor out-of-bounds indices, but explicit bounds checks would improve clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/message-parser/src/lexer/scanners/links.ts` around lines 14 - 25, The timestamp scanner should avoid implicit out-of-bounds charCodeAt reads; add explicit length checks before indexing: ensure pos + 2 < input.length before testing the CH_T_LO and CH_COLON checks, and after matching TS_INNER (use TS_INNER.exec) ensure pos + 3 + m[0].length < input.length (or <= depending on slice semantics) before calling input.charCodeAt for CH_GT; only then call flushText(ctx, pos), compute raw and call emit(ctx, TokenKind.TIMESTAMP, raw, m[1], pos). Keep TS_INNER, flushText, emit and TokenKind.TIMESTAMP references intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/message-parser/benchmarks/lexer.bench.ts`:
- Around line 2-8: Remove the top banner comment that starts with "Benchmark
suite for the Lexer" (it references the stale path
`packages/message-parser-handwritten/`) and delete the other section/banner
comments scattered in the file that merely repeat what the types/functions
already convey; if any brief context must remain, replace the stale path with
the current location and keep a single one-line comment, otherwise remove them
entirely to comply with the "avoid code comments in the implementation"
guideline.
In `@packages/message-parser/src/lexer/constants/regexes.ts`:
- Around line 4-5: The EMAIL_RE currently captures sentence-ending punctuation
(e.g. "foo@example.com.") so emitted EMAIL tokens include trailing punctuation;
fix this by mirroring the URL handling: either tighten EMAIL_RE to exclude
terminal punctuation with a lookahead (e.g. require next char be whitespace,
end, or a non-trailing-punct boundary) or, more simply, after matching with
EMAIL_RE strip trailing punctuation using the existing TRAIL_PUNCT trimming
logic before emitting the token; update the code path that emits EMAIL tokens to
call the same TRAIL_PUNCT trim routine used for URLs so emitted EMAIL tokens
contain only the clean address.
- Around line 10-11: The COLOR_RE alternation currently favors shorter hex
branches first causing 8- and 4-digit alpha hex colors to be mis-tokenized
(e.g., "#ffffffff" -> "#ffffff"); update the COLOR_RE so the 8-digit and 4-digit
branches come before the 6- and 3-digit branches and add a hex boundary (e.g., a
negative lookahead or explicit boundary after the hex group) so that
alpha-containing hex colors are matched atomically; locate and edit the exported
constant COLOR_RE in the lexer/constants/regexes.ts to apply this change.
---
Duplicate comments:
In `@packages/message-parser/LEXER.md`:
- Around line 45-55: The two fenced code blocks that start with "```" containing
the sample Markdown ("# Hello **world** :smile: `@john`") and the token sequence
(HEADING_MARKER...EOF) need explicit language identifiers to satisfy
markdownlint MD040; update both opening fences from ``` to ```text so they read
"```text" to mark them as plain text. Ensure you only modify the two opening
fence lines (for the human-readable example and the flat array of tokens) and
leave the block contents and closing fences unchanged.
In `@packages/message-parser/src/lexer/constants/charSets.ts`:
- Around line 51-65: The BMP range in isUnicodeEmojiStart is too broad and
includes non-emoji chars; tighten the BMP check to the proper emoji block(s)
(e.g., use 0x2600–0x27BF) or replace the single-range test with a precise
whitelist/lookup of allowed BMP codepoints, and add a corresponding validation
step in scanUnicodeEmoji to verify the actual character(s) against that
whitelist before emitting EMOJI_UNICODE (use the same validation helper to avoid
duplication). Also remove the inline end-of-line comments inside
isUnicodeEmojiStart and ensure any explanatory notes live in surrounding
doc/comments instead.
---
Nitpick comments:
In `@packages/message-parser/jest.config.ts`:
- Around line 10-23: The transform regex in the Jest config currently allows
JSX/TSX (the pattern '^.+\\.m?(t|j)sx?$') while the SWC jsc.parser settings
disable JSX and this package has no JSX/TSX files; update the transform key to a
narrower regex '^.+\\.m?[tj]s$' so it only matches .js/.mjs/.ts/.mts files and
remove JSX/TSX support, keeping the existing '@swc/jest' transformer and jsc
config (refer to the transform entry and the jsc.parser block in the snippet to
locate the change).
In `@packages/message-parser/src/index.ts`:
- Around line 27-28: The root barrel currently re-exports the handwritten lexer
(Lexer, Token, TokenKind, makeToken) and includes an inline banner comment;
remove that re-export and delete the inline comment from the root index so these
symbols are not part of the stable package contract, then create a dedicated
subpath module that re-exports Lexer/Token/TokenKind/makeToken and add a
corresponding package subpath export (so consumers can import the lexer from the
dedicated subpath while the root API remains stable).
In `@packages/message-parser/src/lexer/constants/charSets.ts`:
- Around line 10-16: Remove the inline implementation comment above the
CHAR_CLASS declaration and any other inline comments in this file; instead add a
brief JSDoc block above the export of CHAR_CLASS describing its purpose and the
kinds of characters it marks (used for tokenization), leaving the code that
initializes CHAR_CLASS and the uses of symbols like CHAR_CLASS, CH_0, CH_9, and
CH_C_LO unchanged so behavior is preserved.
In `@packages/message-parser/src/lexer/ScanContext.ts`:
- Around line 46-50: The function consumeRun should guard against invalid
starting positions: detect if pos is negative or beyond input.length (or not an
integer) and return 0 early rather than calling input.charCodeAt with an
out-of-range index; update the consumeRun implementation to validate pos (e.g.,
ensure Number.isInteger(pos) && pos >= 0 && pos < input.length) before entering
the while loop that compares input.charCodeAt(pos + count) to charCode so
callers with bad positions won't produce unexpected behavior.
In `@packages/message-parser/src/lexer/scanners/links.ts`:
- Around line 14-25: The timestamp scanner should avoid implicit out-of-bounds
charCodeAt reads; add explicit length checks before indexing: ensure pos + 2 <
input.length before testing the CH_T_LO and CH_COLON checks, and after matching
TS_INNER (use TS_INNER.exec) ensure pos + 3 + m[0].length < input.length (or <=
depending on slice semantics) before calling input.charCodeAt for CH_GT; only
then call flushText(ctx, pos), compute raw and call emit(ctx,
TokenKind.TIMESTAMP, raw, m[1], pos). Keep TS_INNER, flushText, emit and
TokenKind.TIMESTAMP references intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 95910465-d4aa-45b0-b4e7-19656aa4c6d9
📒 Files selected for processing (31)
packages/message-parser/LEXER.mdpackages/message-parser/benchmarks/lexer.bench.tspackages/message-parser/jest.config.tspackages/message-parser/package.jsonpackages/message-parser/src/index.tspackages/message-parser/src/lexer/ScanContext.tspackages/message-parser/src/lexer/Token.tspackages/message-parser/src/lexer/constants/charCodes.tspackages/message-parser/src/lexer/constants/charSets.tspackages/message-parser/src/lexer/constants/emoticons.tspackages/message-parser/src/lexer/constants/regexes.tspackages/message-parser/src/lexer/helpers.tspackages/message-parser/src/lexer/index.tspackages/message-parser/src/lexer/lexer.tspackages/message-parser/src/lexer/scanners/blocks.tspackages/message-parser/src/lexer/scanners/code.tspackages/message-parser/src/lexer/scanners/emoji.tspackages/message-parser/src/lexer/scanners/formatting.tspackages/message-parser/src/lexer/scanners/index.tspackages/message-parser/src/lexer/scanners/inline.tspackages/message-parser/src/lexer/scanners/links.tspackages/message-parser/src/lexer/scanners/math.tspackages/message-parser/src/lexer/scanners/structural.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/general.test.tspackages/message-parser/tests/lexer/inline.test.tspackages/message-parser/tests/lexer/links.test.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/tests/lexer/structural.test.ts
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{ts,tsx,js}
📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)
**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation
Files:
packages/message-parser/src/lexer/scanners/code.tspackages/message-parser/src/lexer/scanners/index.tspackages/message-parser/src/lexer/helpers.tspackages/message-parser/tests/lexer/structural.test.tspackages/message-parser/src/lexer/Token.tspackages/message-parser/src/lexer/lexer.tspackages/message-parser/src/lexer/scanners/links.tspackages/message-parser/src/lexer/scanners/structural.tspackages/message-parser/src/lexer/scanners/math.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/src/lexer/scanners/emoji.tspackages/message-parser/src/lexer/scanners/blocks.tspackages/message-parser/tests/lexer/links.test.tspackages/message-parser/src/lexer/constants/regexes.tspackages/message-parser/src/lexer/index.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/src/lexer/ScanContext.tspackages/message-parser/src/lexer/scanners/formatting.tspackages/message-parser/src/lexer/constants/charSets.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/benchmarks/lexer.bench.tspackages/message-parser/src/lexer/scanners/inline.tspackages/message-parser/tests/lexer/general.test.tspackages/message-parser/jest.config.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/src/lexer/constants/emoticons.tspackages/message-parser/src/index.tspackages/message-parser/tests/lexer/inline.test.tspackages/message-parser/src/lexer/constants/charCodes.ts
🧠 Learnings (18)
📚 Learning: 2026-02-26T19:25:44.063Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In the Rocket.Chat repository, do not reference Biome lint rules in code review feedback. Biome is not used even if biome.json exists; only reference Biome rules if there is explicit, project-wide usage documented. For TypeScript files, review lint implications without Biome guidance unless the project enables Biome rules.
Applied to files:
packages/message-parser/src/lexer/scanners/code.tspackages/message-parser/src/lexer/scanners/index.tspackages/message-parser/src/lexer/helpers.tspackages/message-parser/tests/lexer/structural.test.tspackages/message-parser/src/lexer/Token.tspackages/message-parser/src/lexer/lexer.tspackages/message-parser/src/lexer/scanners/links.tspackages/message-parser/src/lexer/scanners/structural.tspackages/message-parser/src/lexer/scanners/math.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/src/lexer/scanners/emoji.tspackages/message-parser/src/lexer/scanners/blocks.tspackages/message-parser/tests/lexer/links.test.tspackages/message-parser/src/lexer/constants/regexes.tspackages/message-parser/src/lexer/index.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/src/lexer/ScanContext.tspackages/message-parser/src/lexer/scanners/formatting.tspackages/message-parser/src/lexer/constants/charSets.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/benchmarks/lexer.bench.tspackages/message-parser/src/lexer/scanners/inline.tspackages/message-parser/tests/lexer/general.test.tspackages/message-parser/jest.config.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/src/lexer/constants/emoticons.tspackages/message-parser/src/index.tspackages/message-parser/tests/lexer/inline.test.tspackages/message-parser/src/lexer/constants/charCodes.ts
📚 Learning: 2026-02-26T19:25:44.063Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In this repository (RocketChat/Rocket.Chat), Biome lint rules are not used even if a biome.json exists. When reviewing TypeScript files (e.g., packages/ui-voip/src/providers/useMediaSession.ts), ensure lint suggestions do not reference Biome-specific rules. Rely on general ESLint/TypeScript lint rules and project conventions instead.
Applied to files:
packages/message-parser/src/lexer/scanners/code.tspackages/message-parser/src/lexer/scanners/index.tspackages/message-parser/src/lexer/helpers.tspackages/message-parser/tests/lexer/structural.test.tspackages/message-parser/src/lexer/Token.tspackages/message-parser/src/lexer/lexer.tspackages/message-parser/src/lexer/scanners/links.tspackages/message-parser/src/lexer/scanners/structural.tspackages/message-parser/src/lexer/scanners/math.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/src/lexer/scanners/emoji.tspackages/message-parser/src/lexer/scanners/blocks.tspackages/message-parser/tests/lexer/links.test.tspackages/message-parser/src/lexer/constants/regexes.tspackages/message-parser/src/lexer/index.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/src/lexer/ScanContext.tspackages/message-parser/src/lexer/scanners/formatting.tspackages/message-parser/src/lexer/constants/charSets.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/benchmarks/lexer.bench.tspackages/message-parser/src/lexer/scanners/inline.tspackages/message-parser/tests/lexer/general.test.tspackages/message-parser/jest.config.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/src/lexer/constants/emoticons.tspackages/message-parser/src/index.tspackages/message-parser/tests/lexer/inline.test.tspackages/message-parser/src/lexer/constants/charCodes.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files
Applied to files:
packages/message-parser/tests/lexer/structural.test.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/links.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/jest.config.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts
Applied to files:
packages/message-parser/tests/lexer/structural.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/jest.config.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.{ts,tsx,js} : Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Applied to files:
packages/message-parser/tests/lexer/structural.test.tspackages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/jest.config.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/tests/lexer/inline.test.ts
📚 Learning: 2026-03-09T21:20:12.687Z
Learnt from: pierre-lehnen-rc
Repo: RocketChat/Rocket.Chat PR: 39386
File: apps/meteor/server/services/push/tokenManagement/findDocumentToUpdate.ts:12-15
Timestamp: 2026-03-09T21:20:12.687Z
Learning: In `apps/meteor/server/services/push/tokenManagement/findDocumentToUpdate.ts`, the early return `if (data.voipToken) return null` (Lines 13-15) is intentionally correct. VoIP token updates always include an `_id`, so they are handled by the `_id` lookup block above (Lines 5-9) and never reach this guard. The guard is only a safety net for edge cases where `_id` is absent or no document was found, preventing an incorrect `token + appName` fallback match for VoIP-only payloads.
Applied to files:
packages/message-parser/src/lexer/lexer.ts
📚 Learning: 2025-11-24T17:08:26.531Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Write test cases in Markdown format following the standardized template with ALL necessary components: Title, Description, Preconditions, Type, Steps, and Expected Result
Applied to files:
packages/message-parser/tests/lexer/formatting.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `expect` matchers for assertions (`toEqual`, `toContain`, `toBeTruthy`, `toHaveLength`, etc.) instead of `assert` statements in Playwright tests
Applied to files:
packages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/jest.config.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.spec.ts : Use descriptive test names that clearly communicate expected behavior in Playwright tests
Applied to files:
packages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/code.test.tspackages/message-parser/tests/lexer/general.test.ts
📚 Learning: 2025-11-19T18:20:07.720Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37419
File: packages/i18n/src/locales/en.i18n.json:918-921
Timestamp: 2025-11-19T18:20:07.720Z
Learning: Repo: RocketChat/Rocket.Chat — i18n/formatting
Learning: This repository uses a custom message formatting parser in UI blocks/messages; do not assume standard Markdown rules. For keys like Call_ended_bold, Call_not_answered_bold, Call_failed_bold, and Call_transferred_bold in packages/i18n/src/locales/en.i18n.json, retain the existing single-asterisk emphasis unless maintainers request otherwise.
Applied to files:
packages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/src/lexer/scanners/formatting.tspackages/message-parser/LEXER.md
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file
Applied to files:
packages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/tests/lexer/blocks.test.tspackages/message-parser/tests/lexer/math.test.tspackages/message-parser/tests/lexer/inline.test.ts
📚 Learning: 2025-12-10T21:00:54.909Z
Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.
Applied to files:
packages/message-parser/tests/lexer/formatting.test.tspackages/message-parser/src/lexer/constants/regexes.tspackages/message-parser/jest.config.tspackages/message-parser/tests/lexer/inline.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.step()` for complex test scenarios to improve organization in Playwright tests
Applied to files:
packages/message-parser/tests/lexer/formatting.test.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Maintain test isolation between test cases in Playwright tests
Applied to files:
packages/message-parser/tests/lexer/code.test.tspackages/message-parser/jest.config.ts
📚 Learning: 2026-01-17T01:51:47.764Z
Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 38219
File: packages/core-typings/src/cloud/Announcement.ts:5-6
Timestamp: 2026-01-17T01:51:47.764Z
Learning: In packages/core-typings/src/cloud/Announcement.ts, the AnnouncementSchema.createdBy field intentionally overrides IBannerSchema.createdBy (object with _id and optional username) with a string enum ['cloud', 'system'] to match existing runtime behavior. This is documented as technical debt with a FIXME comment at apps/meteor/app/cloud/server/functions/syncWorkspace/handleCommsSync.ts:53 and should not be flagged as an error until the runtime behavior is corrected.
Applied to files:
packages/message-parser/src/lexer/scanners/inline.ts
📚 Learning: 2026-03-06T18:09:17.867Z
Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 39397
File: packages/gazzodown/src/elements/Timestamp/DateTimeFormats.spec.tsx:20-23
Timestamp: 2026-03-06T18:09:17.867Z
Learning: In the RocketChat/Rocket.Chat gazzodown package (`packages/gazzodown`), tests are intended to run under the UTC timezone, but as of PR `#39397` this is NOT yet explicitly enforced in `jest.config.ts` or the `package.json` test scripts (which just run `jest` without `TZ=UTC`). To make timezone-sensitive snapshot tests reliable across all environments, `TZ=UTC` should be added to the test scripts in `package.json` or to `jest.config.ts` via `testEnvironmentOptions.timezone`. Without explicit UTC enforcement, snapshot tests involving date-fns formatted output or `toLocaleString()` will fail for contributors in non-UTC timezones.
Applied to files:
packages/message-parser/jest.config.ts
📚 Learning: 2025-12-16T17:29:45.163Z
Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37834
File: apps/meteor/tests/e2e/page-objects/fragments/admin-flextab-emoji.ts:12-22
Timestamp: 2025-12-16T17:29:45.163Z
Learning: In page object files under `apps/meteor/tests/e2e/page-objects/`, always import `expect` from `../../utils/test` (Playwright's async expect), not from Jest. Jest's `expect` has a synchronous signature and will cause TypeScript errors when used with web-first assertions like `toBeVisible()`.
Applied to files:
packages/message-parser/jest.config.ts
📚 Learning: 2025-11-24T17:08:17.065Z
Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : All test files must be created in `apps/meteor/tests/e2e/` directory
Applied to files:
packages/message-parser/jest.config.ts
🪛 markdownlint-cli2 (0.21.0)
packages/message-parser/LEXER.md
[warning] 45-45: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 51-51: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (28)
packages/message-parser/package.json (1)
38-40: Nice split between parser and lexer benchmarks.Keeping
bench:parserandbench:lexerseparately addressable makes local profiling and CI targeting much easier.packages/message-parser/src/lexer/scanners/math.ts (1)
1-31: LGTM!The scanner correctly handles
$(inline) and$$(block) KaTeX delimiters with proper state toggling and token emission. The boundary check atpos + 1is implicitly safe sincecharCodeAtreturnsNaNfor out-of-bounds indices, which won't equalCH_DOLLAR.packages/message-parser/src/lexer/scanners/index.ts (1)
1-86: LGTM!The scanner dispatch table is well-organized with clear groupings. Character code registrations align with the
CHAR_CLASSspecial character classifications. The sparse array approach is efficient for O(1) dispatch lookups.packages/message-parser/tests/lexer/inline.test.ts (1)
1-202: LGTM!Comprehensive test coverage for inline token scanning including emoji shortcodes, mentions, URLs, colors, phone numbers, and emoticons. Edge cases are well-covered (unclosed shortcodes, trailing punctuation stripping, invalid patterns falling through to TEXT).
packages/message-parser/tests/lexer/math.test.ts (1)
1-114: LGTM!Thorough test coverage for KaTeX delimiter handling across inline (
$), block ($$), and backslash (\[,\],\(,\)) variants. The tests correctly verify the toggle-based state machine behavior and round-trip preservation.packages/message-parser/tests/lexer/links.test.ts (1)
1-133: LGTM!Excellent coverage of link-related tokenization including markdown links, images, angle brackets, emoticons (
<3,>:)), and timestamps. Edge cases are well-tested: empty parts, nested brackets, blockquote detection at line start vs mid-line, and malformed timestamp fallbacks.packages/message-parser/src/lexer/lexer.ts (2)
69-71: EOF token may exceed MAX_TOKENS limit.The EOF token is pushed unconditionally after the loop breaks at
MAX_TOKENS. WhileflushTextinternally guards against exceeding the cap, the directpush()call for EOF doesn't, potentially resulting inMAX_TOKENS + 1tokens.Suggested fix
flushText(ctx, pos); - ctx.tokens.push(makeToken(TokenKind.EOF, '', '', pos)); + if (ctx.tokens.length < MAX_TOKENS) { + ctx.tokens.push(makeToken(TokenKind.EOF, '', '', pos)); + } return ctx.tokens;
1-73: Well-structured lexer implementation.The three-path dispatch (plain ASCII fast path, non-ASCII/emoji path, special char scanner dispatch) is efficient. The
CHAR_CLASScheck enables O(1) classification, and scanner table dispatch avoids conditional chains.packages/message-parser/tests/lexer/formatting.test.ts (1)
1-110: LGTM!Comprehensive formatting marker tests covering asterisk, underscore, and tilde behaviors. Good coverage of context-sensitive tokenization (line-start bullets vs mid-line emphasis) and emoticon fallbacks (
*),*-)).packages/message-parser/tests/lexer/general.test.ts (1)
1-263: LGTM!Excellent comprehensive test suite covering boundary conditions, token position invariants, round-trips, MAX_TOKENS limits, Unicode emoji (including ZWJ sequences, variation selectors, skin tone modifiers), and non-ASCII text handling. The position invariant checks (contiguous tokens,
end = start + raw.length) are particularly valuable for ensuring lexer correctness.packages/message-parser/src/lexer/scanners/formatting.ts (1)
1-51: LGTM!The formatting scanners are well-structured with clear separation of concerns. The asterisk scanner correctly handles the three cases (list bullet, emoticon, emphasis) with proper priority, and the underscore/tilde scanners follow a consistent pattern for consuming runs of their respective characters.
packages/message-parser/tests/lexer/structural.test.ts (1)
1-118: LGTM!Comprehensive test coverage for structural tokenization. The helper functions (
tok,kinds,kv,raw) provide clean abstractions for assertions. The round-trip tests on lines 109-117 are particularly valuable for ensuring the lexer preserves input faithfully.packages/message-parser/src/lexer/scanners/code.ts (1)
1-83: LGTM!The code scanner correctly distinguishes between fenced blocks (which can span multiple lines) and inline code spans (which cannot cross line boundaries). The unclosed-fence handling is appropriate - emitting
CODE_CONTENTfor fenced blocks butTEXTfor unclosed inline spans reflects the semantic difference between the two constructs.packages/message-parser/src/lexer/helpers.ts (1)
1-66: LGTM!Both helper functions are well-designed with proper boundary checks and backtracking logic. The
scanMentionBodycorrectly handles optional sub-address segments (:or@separators) with safe backtracking when the segment after a separator is invalid. The callers ininline.tsandblocks.tsproperly guard against empty strings and -1 return values.packages/message-parser/src/lexer/scanners/emoji.ts (1)
14-15: Surrogate range check may be overly broad.Line 15 checks the full surrogate range (
0xd800-0xdfff), which includes both high surrogates (0xd800-0xdbff) and low surrogates (0xdc00-0xdfff). In valid UTF-16, a low surrogate should never appear independently at the start of a character. While this doesn't cause incorrect behavior for well-formed input (thei + 1 < lenguard prevents out-of-bounds access), narrowing to high surrogates would be more precise.The same pattern appears on line 45 in the ZWJ loop.
Proposed refinement
- i += (c1 >= 0xd800 && c1 <= 0xdfff && i + 1 < len) ? 2 : 1; + i += (c1 >= 0xd800 && c1 <= 0xdbff && i + 1 < len) ? 2 : 1;- if (next >= 0xd800 && next <= 0xdfff && i + 1 < len && isUnicodeEmojiStart(input, i)) { i += 2; } + if (next >= 0xd800 && next <= 0xdbff && i + 1 < len && isUnicodeEmojiStart(input, i)) { i += 2; }packages/message-parser/src/lexer/scanners/structural.ts (1)
1-53: LGTM!The structural scanners are well-implemented. The
scanEscapefunction correctly handles the EOF edge case (backslash at end of input) by falling through to plain text - this is verified by the testexpect(kinds('\\')).toEqual(['TEXT']). The KaTeX delimiter handling with state flag toggles integrates cleanly with the broader lexer state machine.packages/message-parser/tests/lexer/blocks.test.ts (2)
1-167: Good test coverage for block-level constructs.The tests comprehensively cover headings, list bullets (unordered, task, ordered), and pipe/spoiler syntax. The emoticon edge cases (e.g.,
#),-_-,8-)) are particularly valuable for ensuring the lexer correctly prioritizes emoticon matching in ambiguous contexts.
87-98: Test expectation is correct; no changes needed.The
]character in'- [y] text'doesn't produce a separate token becausescanCloseBracket()only emits a token when]is immediately followed by((link href syntax). Otherwise, it accumulates]into pending text without emitting a token. This is correct behavior:[y]is not a valid link pattern, so the]is treated as regular text and merged withyinto a single TEXT token.packages/message-parser/src/lexer/scanners/inline.ts (3)
64-73: Email addresses may be misclassified as user mentions.For input like
alice@example.com, the textaliceaccumulates as TEXT, thenscanAtis invoked at@and emitsMENTION_USERfor@example(or potentially@example.comdepending on MENTION_OK character set), leaving the rest as separate tokens instead of producing a singleEmail detection via
EMAIL_REonly runs fromscanPlus(for+...patterns) andscanC, but neither handles the common case of an email address starting with a letter.
41-61: LGTM - Back-patching logic for URL scheme detection.The
tryUrlSchemefunction correctly handles the case where a URL scheme prefix (likehttp) has already been emitted as TEXT. The back-patch viactx.tokens.pop()is safe sincectx.tokensis a plain array per the ScanContext definition. TheTRAIL_PUNCTreplacement ensures trailing punctuation (like periods at end of sentences) isn't included in the URL.
102-139: LGTM - Multi-pattern scanner for 'c' character.The
scanCfunction correctly prioritizes patterns:color:#hex`` first (most specific), then URL, then email. TheflushTextcall before emitting COLOR (line 110) addresses the previously flagged issue.packages/message-parser/tests/lexer/code.test.ts (1)
1-109: LGTM! Comprehensive test coverage for code tokenization.The test file is well-structured with clear helper functions (
tok,kinds,kv,raw) that reduce boilerplate. Good coverage of edge cases including empty spans, unclosed code at newline/EOF, multiple spans, and round-trip integrity verification.packages/message-parser/src/lexer/ScanContext.ts (1)
1-81: LGTM! Well-designed context and utility functions.The
ScanContextinterface provides a clean abstraction for scanner state. The safety cap (MAX_TOKENS) prevents runaway tokenization. Utility functions are well-scoped and correctly implemented.packages/message-parser/src/lexer/scanners/blocks.ts (3)
26-29: Type castString(count) as '1' | '2' | '3' | '4'assumes count is 1-4.The while loop (line 22) can result in
count = 0if the character atposisn't#, though since this scanner is dispatched for#,countwill always be ≥ 1. However, ifcountexceeds 4 (not possible due tocount < 4guard), the cast would be incorrect.The current logic is safe because:
- Scanner is only called when current char is
#- Loop stops at
count < 4So
countis guaranteed to be 1-4. The code is correct.
62-76: Task bullet detection could fail on short input.Lines 66-70 access
input[pos + 3],charCodeAt(pos + 4), andcharCodeAt(pos + 5)without explicit length checks. However,charCodeAtreturnsNaNfor out-of-bounds indices, andinput[pos + 3]returnsundefined. The comparisons will fail safely, so no runtime error occurs.Functionally correct, though explicit bounds checks could improve clarity.
123-133: URL regex behavior is correct with sticky flag.The regex uses the
y(sticky) flag, which ensuresexec()only matches at exactly thelastIndexposition. SettinglastIndex = posand then callingexec()correctly enforces that the match begins at the intended position.> Likely an incorrect or invalid review comment.packages/message-parser/src/lexer/constants/charCodes.ts (1)
1-45: LGTM! Clear and correct ASCII character code constants.All character codes are accurate. The inline comments serve as useful documentation showing the actual character each code represents, which aids maintainability for this reference file.
packages/message-parser/src/lexer/scanners/links.ts (1)
77-88: Standalone brackets and parentheses always emit link tokens—this is intentional and tested.The test suite (links.test.ts:36–38) explicitly validates that a lone
[producesLINK_OPENand a lone)producesLINK_HREF_CLOSE. This confirms the behavior is by design for the Stage 1 flat token stream, with disambiguation deferred to Stage 2 parsing.
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/message-parser/src/index.ts">
<violation number="1" location="packages/message-parser/src/index.ts:22">
P1: `parse()` now returns `Token[]` for the handwritten engine and exposes a `Root | Token[]` public return type, breaking the prior AST-return contract.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
…se entry to scan context
… and unicode mentions
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
|
@cubic-dev-ai review. |
@Se3do I have started the AI code review. It will take a few minutes to complete. |
Summary
Prototype of a handwritten lexer for
message-parserthat replaces regex/PEG-basedtokenization with a single-pass, scanner-dispatched architecture.
The Goal
A complete O(N) handwritten TypeScript lexer that tokenizes Rocket.Chat messages
into a flat token stream, designed as Stage 1 of a two-stage pipeline replacing Peggy.
Why
Peggy's PEG grammar relies on regex-heavy backtracking and 7 global skip flags
that break packrat memoization guarantees. The handwritten approach eliminates
both problems:
scanner dispatch table (
SCANNER_TABLE[charCode]), no backtracking.The lexer carries only a minimal ScanContext (input, position, token list).
regardless of parse context. Nesting decisions are deferred entirely to the
parser stage (Stage 2).
Performance vs. current parser
The lexer alone is 250–650× faster. The future parser will add AST overhead but
operates on the pre-tokenized stream — it never re-scans raw input.
Next Steps
The next stage of this project is implementing the handwritten parser that
consumes the token stream produced by the lexer.
Planned work includes:
Summary by CodeRabbit
New Features
Documentation