feat(message-parser): Handwritten Lexer for Message Parser by Se3do · Pull Request #39458 · RocketChat/Rocket.Chat

Se3do · 2026-03-08T19:27:20Z

Summary

Prototype of a handwritten lexer for message-parser that replaces regex/PEG-based
tokenization with a single-pass, scanner-dispatched architecture.

The Goal

A complete O(N) handwritten TypeScript lexer that tokenizes Rocket.Chat messages
into a flat token stream, designed as Stage 1 of a two-stage pipeline replacing Peggy.

Why

Peggy's PEG grammar relies on regex-heavy backtracking and 7 global skip flags
that break packrat memoization guarantees. The handwritten approach eliminates
both problems:

Single linear pass — each character is visited exactly once via a
scanner dispatch table (SCANNER_TABLE[charCode]), no backtracking.
No global state — scanners are side-effect-free beyond the shared context (ctx, pos) → newPos.
The lexer carries only a minimal ScanContext (input, position, token list).
Deterministic output — every input produces the same flat token array
regardless of parse context. Nesting decisions are deferred entirely to the
parser stage (Stage 2).

Performance vs. current parser

Input	Current Parser	Handwritten Lexer
Short text	~12k ops/s	~7.8M ops/s
Medium text	~2.8k ops/s	~816k ops/s
Long text	~234 ops/s	~64k ops/s

The lexer alone is 250–650× faster. The future parser will add AST overhead but
operates on the pre-tokenized stream — it never re-scans raw input.

Next Steps

The next stage of this project is implementing the handwritten parser that
consumes the token stream produced by the lexer.

Planned work includes:

recursive-descent parser implementation
AST construction
nested formatting resolution
full parser performance benchmarking

Summary by CodeRabbit

New Features
- Introduced a new message tokenizer for improved text parsing and formatting support, enabling better handling of markdown-like syntax including headings, lists, code blocks, links, images, emojis, mentions, and KaTeX expressions.
Documentation
- Added comprehensive documentation describing the tokenizer architecture, design, and tokenization strategy.

…nd jest setup

Extract scanning logic into dedicated modules: - blocks, code, emoji, formatting, inline, links, math, structural - central dispatch via scanners/index.ts

dionisio-bot · 2026-03-08T19:27:24Z

Looks like this PR is not ready to merge, because of the following issues:

This PR is missing the 'stat: QA assured' label
This PR is missing the required milestone or project

Please fix the issues and try again

If you have any trouble, please check the PR guidelines

changeset-bot · 2026-03-08T19:27:24Z

⚠️ No Changeset found

Latest commit: 6f0c6f1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

CLAassistant · 2026-03-08T19:27:27Z

All committers have signed the CLA.

coderabbitai · 2026-03-08T19:27:47Z

Walkthrough

This PR introduces a comprehensive handwritten lexer for the message-parser package that tokenizes Rocket.Chat messages into a flat stream of tokens. The implementation uses a three-layer architecture with an orchestrator lexer, character-code-based scanner dispatch table, and utility helpers, supported by extensive constants (character codes, emoji, regexes) and a complete test suite.

Changes

Cohort / File(s)	Summary
Documentation & Configuration `LEXER.md`, `jest.config.ts`, `package.json`	Added lexer documentation describing architecture and workflow; updated Jest config to explicitly set `testEnvironment: 'node'`, enable `@swc/jest` transform with TypeScript and ES2020 targets; added `bench:lexer` and `bench:parser` scripts.
Core Lexer Interfaces & Types `src/index.ts`, `src/lexer/index.ts`, `src/lexer/Token.ts`, `src/lexer/ScanContext.ts`	Introduced Token, TokenKind enum, ScanContext interface, and core utility functions (makeToken, flushText, emit, consumeRun, tryEmoticon); added MAX_TOKENS safety cap; re-exported lexer entities through barrel files.
Character Classification Constants `src/lexer/constants/charCodes.ts`, `src/lexer/constants/charSets.ts`	Added 45 ASCII character code constants and six Uint8Array character classification tables (CHAR_CLASS, ESCAPABLE, MENTION_OK, SHORTCODE_OK, DIGIT_ASCII, WS_ASCII); implemented isUnicodeEmojiStart utility for emoji detection.
Emoticon & Regex Constants `src/lexer/constants/emoticons.ts`, `src/lexer/constants/regexes.ts`	Introduced emoticon trie builder, EMOTICONS list, EMOTICON_TRIE dispatch structure, and getEmoticonShortCode lookup; added six regex patterns for URL, email, phone, color, TypeScript, and trailing punctuation matching.
Helper Functions `src/lexer/helpers.ts`	Added scanMentionBody and scanEmojiShortCode utilities for validating `@mention/`@channel names and :emoji: shortcodes with support for sub-addressing and character validation.
Lexer Orchestrator `src/lexer/lexer.ts`	Implemented main Lexer class with tokenize() method; orchestrates ScanContext iteration, dispatches to SCANNER_TABLE, handles Unicode emoji detection, manages text buffering and katex state, appends EOF token.
Scanner Modules `src/lexer/scanners/blocks.ts`, `src/lexer/scanners/code.ts`, `src/lexer/scanners/emoji.ts`	Added scanners for block-level tokens (headings via #, bullets via - and digits, pipes/spoilers), inline code spans and fenced blocks, and Unicode emoji sequences with modifier/ZWJ sequence handling.
Scanner Modules (continued) `src/lexer/scanners/formatting.ts`, `src/lexer/scanners/inline.ts`, `src/lexer/scanners/links.ts`	Added scanners for formatting markers (asterisk, underscore, tilde), inline tokens (emoji shortcodes, mentions, URLs, colors, phones, emoticons), and link/image syntax (brackets, parentheses, angle brackets, timestamps).
Scanner Modules (final) `src/lexer/scanners/math.ts`, `src/lexer/scanners/structural.ts`, `src/lexer/scanners/index.ts`	Added scanners for KaTeX math delimiters ($, $$, [, (]), structural tokens (newlines, escapes, whitespace); built SCANNER_TABLE dispatch mechanism mapping 128 character codes to scanner functions.
Benchmark Suite `benchmarks/lexer.bench.ts`	Introduced lexer performance benchmark script with multiple fixture categories (Plain Text, URLs, Emoji, Mentions, Code, KaTeX, Escapes, Adversarial), throughput/latency measurements via tinybench, and formatted result output.
Comprehensive Test Suite `tests/lexer/blocks.test.ts`, `tests/lexer/code.test.ts`, `tests/lexer/formatting.test.ts`, `tests/lexer/general.test.ts`, `tests/lexer/inline.test.ts`, `tests/lexer/links.test.ts`, `tests/lexer/math.test.ts`, `tests/lexer/structural.test.ts`	Added 8 test files covering headings, lists, tasks, spoilers, inline/fenced code, formatting markers, boundary conditions, emoji handling, mentions, URLs, colors, phones, links, images, KaTeX delimiters, newlines, whitespace, escapes, and round-trip token integrity.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Lexer
    participant ScanContext
    participant ScannerTable
    participant Scanner
    participant TokenArray

    Caller->>Lexer: tokenize()
    Lexer->>ScanContext: create context<br/>(input, len, tokens[])
    
    loop for each character position
        Lexer->>Lexer: check CHAR_CLASS<br/>(fast path for ASCII)
        alt ASCII special char
            Lexer->>ScannerTable: lookup scanner<br/>for charCode
            ScannerTable->>Scanner: dispatch
            Scanner->>ScanContext: flushText()<br/>emit token
            Scanner-->>Lexer: return next pos
        else Unicode emoji
            Lexer->>Scanner: scanUnicodeEmoji
            Scanner->>ScanContext: emit EMOJI token
            Scanner-->>Lexer: return next pos
        else Plain text
            Lexer->>Lexer: accumulate text
        end
    end
    
    Lexer->>ScanContext: flushText()<br/>(remaining)
    Lexer->>ScanContext: emit EOF token
    Lexer->>TokenArray: return tokens[]
    TokenArray-->>Caller: flat token stream

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 70.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main addition: a handwritten lexer for the message-parser package, which is the primary focus of this substantial PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can generate a title for your PR based on the changes with custom instructions.

Set the reviews.auto_title_instructions setting to generate a title for your PR based on the changes in the PR with custom instructions.

cubic-dev-ai

12 issues found across 39 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/message-parser-handwritten/src/lexer/constants/charSets.ts">

<violation number="1" location="packages/message-parser-handwritten/src/lexer/constants/charSets.ts:63">
P2: Unicode emoji start detection omits newer `U+1FA00–U+1FAFF` range, causing modern emoji to be tokenized as TEXT instead of `EMOJI_UNICODE`.</violation>
</file>

<file name="packages/message-parser-handwritten/src/lexer/constants/emoticons.ts">

<violation number="1" location="packages/message-parser-handwritten/src/lexer/constants/emoticons.ts:95">
P2: Emoticon key has an accidental trailing space, causing `>.<` not to match unless followed by whitespace.</violation>
</file>

<file name="packages/message-parser-handwritten/src/index.ts">

<violation number="1" location="packages/message-parser-handwritten/src/index.ts:2">
P1: Public `parse` API is exported but always throws, causing deterministic runtime failure for all calls.</violation>
</file>

<file name="packages/message-parser-handwritten/src/lexer/lexer.ts">

<violation number="1" location="packages/message-parser-handwritten/src/lexer/lexer.ts:32">
P2: Lexer silently truncates when `MAX_TOKENS` is reached but still emits `EOF`, making partial tokenization look complete.</violation>
</file>

<file name="packages/message-parser-handwritten/src/lexer/scanners/inline.ts">

<violation number="1" location="packages/message-parser-handwritten/src/lexer/scanners/inline.ts:101">
P2: `scanC` emits `COLOR` without flushing pending text, which can produce overlapping or out-of-order TEXT/COLOR tokens.</violation>
</file>

<file name="packages/message-parser-handwritten/src/ast/definitions.ts">

<violation number="1" location="packages/message-parser-handwritten/src/ast/definitions.ts:204">
P2: `ASTNode`, `Types`, and other node unions are out of sync, so valid node variants are not consistently representable across exported AST typings.</violation>
</file>

<file name="packages/message-parser-handwritten/src/utils.ts">

<violation number="1" location="packages/message-parser-handwritten/src/utils.ts:240">
P2: `timestampFromHours` mixes UTC date extraction with local-time parsing, which can generate a timestamp for the wrong calendar day when `timezone` is omitted.</violation>

<violation number="2" location="packages/message-parser-handwritten/src/utils.ts:242">
P1: Using `| 0` to coerce Unix timestamps causes 32-bit overflow, producing incorrect timestamps for valid future dates.</violation>
</file>

<file name="packages/message-parser-handwritten/benchmarks/lexer.bench.ts">

<violation number="1" location="packages/message-parser-handwritten/benchmarks/lexer.bench.ts:1">
P2: Shebang uses `env` with multi-word interpreter (`npx ts-node`) without `-S`, making direct execution unreliable.</violation>
</file>

<file name="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts">

<violation number="1" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:28">
P2: Dangling trailing ZWJ is consumed into `EMOJI_UNICODE` instead of being rolled back, producing invalid emoji token boundaries.</violation>

<violation number="2" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:30">
P2: ZWJ emoji scanning can advance past input length by blindly doing `i += 2` for any surrogate, allowing scanner to return out-of-bounds positions on malformed surrogate input.</violation>

<violation number="3" location="packages/message-parser-handwritten/src/lexer/scanners/emoji.ts:35">
P2: ZWJ-joined emoji components cannot include skin-tone modifiers, causing valid emoji ZWJ sequences to be split into incorrect tokens.</violation>
</file>

Since this is your first cubic review, here's how it works:

cubic automatically reviews your code and comments on bugs and improvements
Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
Add one-off context when rerunning by tagging @cubic-dev-ai with guidance or docs links (including llms.txt)
Ask questions if you need clarification on any suggestion

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

Copilot

Pull request overview

This PR introduces a new message-parser-handwritten package containing Stage 1 of a two-stage parser replacement for Rocket.Chat's current PEG-based message parser. The lexer tokenizes raw message input into a flat token stream in a single linear pass using a scanner dispatch table keyed by character code.

Changes:

A complete handwritten lexer (Lexer class) with scanner dispatch table, character classification, and scanner functions for all message syntax elements (formatting, code, links, emoji, mentions, math, lists, etc.)
AST type definitions and utility functions mirroring the existing message-parser package for future parser compatibility
Comprehensive test suites covering structural tokens, formatting, code, links, math, inline elements, blocks, and general/edge-case scenarios, plus a benchmark suite

Reviewed changes

Copilot reviewed 38 out of 39 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`package.json`	New package definition with build scripts, dependencies, and metadata
`src/lexer/lexer.ts`	Main lexer loop with fast-lane text path, Unicode path, and scanner dispatch
`src/lexer/Token.ts`	Token kind enum and Token interface/factory
`src/lexer/ScanContext.ts`	Scan context interface and shared utilities (flushText, emit, consumeRun, tryEmoticon)
`src/lexer/helpers.ts`	Helpers for scanning mention bodies and emoji shortcodes
`src/lexer/scanners/index.ts`	Scanner dispatch table mapping char codes to scanner functions
`src/lexer/scanners/structural.ts`	Scanners for newlines, escapes, and whitespace
`src/lexer/scanners/formatting.ts`	Scanners for asterisk, underscore, and tilde emphasis markers
`src/lexer/scanners/code.ts`	Scanners for inline code and fenced code blocks
`src/lexer/scanners/math.ts`	Scanner for $ and $$ KaTeX delimiters
`src/lexer/scanners/links.ts`	Scanners for angle brackets, markdown links, images, and timestamps
`src/lexer/scanners/blocks.ts`	Scanners for headings, list bullets, task bullets, and pipes/spoilers
`src/lexer/scanners/inline.ts`	Scanners for colons (emoji/URL), @mentions, +phone, color tokens, and emoticons
`src/lexer/scanners/emoji.ts`	Unicode emoji scanner with ZWJ/variation selector/skin tone support
`src/lexer/constants/charCodes.ts`	Named character code constants
`src/lexer/constants/charSets.ts`	Character classification lookup tables and emoji start detection
`src/lexer/constants/emoticons.ts`	Emoticon trie and lookup table
`src/lexer/constants/regexes.ts`	Sticky regexes for URLs, emails, phones, colors, timestamps
`src/ast/definitions.ts`	AST type definitions (mirrored from existing parser)
`src/ast/guards.ts`	Type guard utility for AST nodes
`src/utils.ts`	AST node factory utilities (mirrored from existing parser)
`src/index.ts`	Placeholder public API (throws "Not implemented")
`tests/*.test.ts`	Comprehensive test suites for all lexer features
`benchmarks/lexer.bench.ts`	Performance benchmark suite
Config files	tsconfig, webpack, jest, eslint, gitignore configurations
`README.md`	Architecture documentation and performance benchmarks

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 14

🧹 Nitpick comments (7)

packages/message-parser-handwritten/src/lexer/Token.ts (1)
2-67: Remove the implementation comments from this module.

The grouping is already clear from the names and spacing, and this package guideline asks us to keep implementation files comment-free.

As per coding guidelines, **/*.{ts,tsx,js}: Avoid code comments in the implementation.

Also applies to: 73-73
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/Token.ts` around lines 2 - 67,
Remove the inline implementation/grouping comments and any other non-functional
comments from this enum file; leave only the token identifiers (e.g.,
TRIPLE_BACKTICK, ASTERISK, BACKTICK, KATEX_INLINE_START, IMAGE_OPEN,
MENTION_USER, EMOJI_SHORTCODE, TIMESTAMP, UL_BULLET, COLOR, URL, WHITESPACE,
ESCAPED, TEXT, CODE_CONTENT, EOF) and their values intact, preserving spacing
for readability—ensure no comment lines remain anywhere in Token.ts.
packages/message-parser-handwritten/src/lexer/constants/charCodes.ts (1)
1-43: Drop the inline character annotations from this constants table.

These comments just restate the constant names/literals and violate the repo rule to keep implementation files comment-free.

As per coding guidelines, "Avoid code comments in the implementation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/constants/charCodes.ts` around
lines 1 - 43, Remove all inline trailing comments in the char code constants
table (e.g., the // \t, // A, // z annotations) so the file contains only the
exported constants (for symbols like CH_TAB, CH_LF, CH_SPACE, CH_A_UP, CH_A_LO,
CH_Z_LO, CH_PIPE, CH_TILDE, etc.); keep the constant names and numeric values
unchanged and do not add new comments or explanatory text in this implementation
file.
packages/message-parser-handwritten/src/lexer/lexer.ts (1)
7-8: Remove code comment per coding guidelines.

As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files. The constant name MAX_TOKENS is self-documenting.
Suggested fix
-// Safety cap to avoid runaway tokenization on malformed input.
 const MAX_TOKENS = 4096;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/lexer.ts` around lines 7 - 8,
Remove the inline code comment above the MAX_TOKENS constant; the constant name
MAX_TOKENS is self-documenting so delete the comment line "// Safety cap to
avoid runaway tokenization on malformed input." and leave the const MAX_TOKENS =
4096; declaration unchanged.
packages/message-parser-handwritten/src/lexer/scanners/index.ts (1)
1-1: Remove code comment per coding guidelines.

As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files.
Suggested fix
-// Scanner dispatch table - maps char codes to scanner functions
-
 import { ScanFn } from '../ScanContext';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/scanners/index.ts` at line 1,
Remove the implementation-level comment "// Scanner dispatch table - maps char
codes to scanner functions" at the top of the file; per TypeScript coding
guidelines, avoid inline code comments in implementation files—leave the file
as-is otherwise and ensure no other standalone implementation comments remain in
this module (e.g., referencing the scanner dispatch table).
packages/message-parser-handwritten/src/lexer/scanners/math.ts (1)
1-1: Remove code comment per coding guidelines.

As per coding guidelines: "Avoid code comments in the implementation" for TypeScript files.
Suggested fix
-// $ and $$ for KaTeX math - tracks open/close state
-
 import { ScanContext, flushText, emit } from '../ScanContext';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/scanners/math.ts` at line 1,
Remove the inline code comment at the top of math.ts ("// $ and $$ for KaTeX
math - tracks open/close state") to comply with the TypeScript coding guideline
against implementation comments; simply delete that comment line in
packages/message-parser-handwritten/src/lexer/scanners/math.ts so the file
contains only code and no implementation comments.
packages/message-parser-handwritten/tests/code.test.ts (1)
1-9: Extract the lexer test helpers into one shared module.

tok / kinds / kv / raw are duplicated across the lexer suites, so EOF filtering and tuple shaping can drift the next time Token changes. A small tests/utils.ts would keep those assertions consistent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/tests/code.test.ts` around lines 1 - 9,
Extract the duplicated helper functions tok, kinds, kv, and raw into a single
shared test helper module (e.g., tests/utils.ts) and update test files to import
them instead of redefining; move the EOF filtering and tuple shaping logic (uses
of Lexer and TokenKind.EOF) into that module so all lexer test suites use the
same tok(s: string) => new Lexer(s).tokenize() and the common kinds/kv/raw
wrappers, ensuring TokenKind and Lexer are imported there and exported helpers
are reused across tests.
packages/message-parser-handwritten/src/lexer/scanners/blocks.ts (1)
56-70: Potential out-of-bounds access when checking task bullet syntax.

When accessing input[pos + 3] on line 60, if the input string is shorter than pos + 4, this returns undefined. While the subsequent comparisons would fail gracefully, accessing input.charCodeAt(pos + 4) and input.charCodeAt(pos + 5) on lines 63-64 would return NaN, which also fails the comparison safely. However, for defensive coding, consider adding a length check.
Suggested defensive check
         if (
             next === CH_SPACE &&
             input.charCodeAt(pos + 2) === CH_LBRACKET
         ) {
+            if (pos + 5 >= input.length) {
+                // Not enough characters for task bullet
+            } else {
             const flag = input[pos + 3];
             if (
                 (flag === 'x' || flag === ' ') &&
                 input.charCodeAt(pos + 4) === CH_RBRACKET &&
                 input.charCodeAt(pos + 5) === CH_SPACE
             ) {
                 flushText(ctx, pos);
                 const raw = `- [${flag}] `;
                 emit(ctx, TokenKind.TASK_BULLET, raw, flag, pos);
                 return pos + raw.length;
             }
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser-handwritten/src/lexer/scanners/blocks.ts` around
lines 56 - 70, The task-bullet detection in scanners/blocks.ts reads input[pos +
3], pos + 4 and pos + 5 without ensuring the input has that many characters,
risking out-of-bounds reads; update the conditional around TokenKind.TASK_BULLET
detection to first check that input.length >= pos + 6 (or equivalent) before
accessing input[pos + 3], input.charCodeAt(pos + 4) and input.charCodeAt(pos +
5), then proceed to call flushText(ctx, pos) and emit(ctx,
TokenKind.TASK_BULLET, ...) only when the length check and existing character
checks pass.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/message-parser-handwritten/benchmarks/lexer.bench.ts`:
- Around line 1-11: The package is missing tinybench in its devDependencies
causing the import of "Bench" and "Task" from 'tinybench' in
benchmarks/lexer.bench.ts to fail in isolated installs; add an explicit
devDependency entry for "tinybench" (choose a compatible version, e.g. a caret
range) to packages/message-parser-handwritten/package.json and run an install
(or yarn workspace install) so the benchmark file importing Bench/Task resolves
in CI and local isolated environments.

In `@packages/message-parser-handwritten/package.json`:
- Around line 5-12: Update the package metadata in package.json so homepage,
bugs.url, and repository.url point to the Rocket.Chat monorepo instead of
RocketChat/fuselage: change "homepage" to the Rocket.Chat repo readme URL, set
"bugs.url" to the Rocket.Chat issues URL, and update "repository.url" to the
git+https URL for the Rocket.Chat repository while preserving
"repository.directory" (package directory) as
"packages/message-parser-handwritten"; ensure the values replace the current
fuselage references everywhere they appear.

In `@packages/message-parser-handwritten/README.md`:
- Around line 45-55: Add the language specifier "text" to both fenced code
blocks in README so markdownlint stops flagging them; edit the first fenced
block containing "# Hello **world** :smile: `@john`" and the second block
containing the token sequence (HEADING_MARKER, WS, TEXT, ASTERISK,
EMOJI_SHORTCODE, MENTION_USER, EOF) and prefix each opening ``` with "text"
(i.e., ```text).

In `@packages/message-parser-handwritten/src/index.ts`:
- Around line 1-3: The exported parse function in index.ts currently throws and
echoes raw input; either remove its public export so the package only exposes
the lexer (keep parse as an internal/non-exported function and export the
lexer-only entrypoint instead), or if parse must remain exported temporarily,
replace the thrown Error to a static, non-sensitive placeholder (e.g. throw new
Error("Parsing not implemented")) and remove any interpolation of the input;
update references to the exported symbol accordingly.

In `@packages/message-parser-handwritten/src/lexer/constants/charSets.ts`:
- Around line 51-64: isUnicodeEmojiStart currently treats the broad BMP range
0x2300–0x27BF as emoji which lets non-emoji chars (e.g. U+2500 box-drawing) be
routed to scanUnicodeEmoji and emitted as EMOJI_UNICODE; narrow the check by
either (A) replacing the single range test with explicit emoji BMP ranges (e.g.
0x2600–0x26FF and 0x2700–0x27BF) or (preferred) perform a Unicode property test
inside isUnicodeEmojiStart (e.g. use a /\p{Emoji}/u test on input.charAt(pos))
so only true emoji return true, and ensure scanUnicodeEmoji/EMOJI_UNICODE are
unchanged otherwise.

In `@packages/message-parser-handwritten/src/lexer/scanners/code.ts`:
- Around line 36-38: The comment and call for the unclosed fence are
misindented; align the comment "// unclosed fence" and the emitCodeBody(ctx,
bodyStart, pos); statement with the surrounding function block indentation
(match the other statements in the function that use the same indentation level)
so the comment and the emitCodeBody invocation sit at the same indent as nearby
statements handling code bodies (referencing emitCodeBody, ctx, bodyStart, pos).

In `@packages/message-parser-handwritten/src/lexer/scanners/emoji.ts`:
- Around line 10-11: The surrogate-range check around the input advance is too
broad: change the condition that uses c1 to only detect high surrogates
(0xD800–0xDBFF) so you only advance by 2 when the first code unit is a valid
high surrogate; leave advancement as 1 for everything else (including standalone
low surrogates). Update the logic in emoji.ts where c1 is read (variable c1 and
the subsequent advance) to use the precise high-surrogate range check instead of
0xD800–0xDFFF to avoid incorrectly treating lone low surrogates as part of a
surrogate pair.
- Around line 25-37: The ZWJ sequence loop in emoji.ts currently treats any
surrogate (0xd800–0xdfff) as a high surrogate; update the surrogate pair check
inside the while loop (the branch that inspects the variable `next`) to only
treat high surrogates (0xD800–0xDBFF) as the start of a surrogate pair, leaving
low surrogates (0xDC00–0xDFFF) to be handled as malformed input; adjust the
conditional that now reads `if (next >= 0xd800 && next <= 0xdfff)` to use the
correct high-surrogate range and keep the rest of the ZWJ/VS16 handling intact.

In `@packages/message-parser-handwritten/src/lexer/scanners/inline.ts`:
- Around line 96-104: The COLOR token path is missing a call to flushText before
emitting, causing prior accumulated text to be merged incorrectly; update the
inline scanner so that when input.startsWith('color:#', pos) and COLOR_RE
matches, you call flushText(ctx, pos) immediately before emit(ctx,
TokenKind.COLOR, m[0], m[1], pos), then return pos + m[0].length; reference the
functions flushText and emit, the variables ctx, pos, input, and the regex
COLOR_RE/TokenKind.COLOR to locate and modify the code.

In `@packages/message-parser-handwritten/src/lexer/scanners/structural.ts`:
- Around line 20-23: The branches in structural.ts that handle escaped KaTeX
delimiters emit KATEX tokens but do not toggle the parser state, so add the same
ctx flag updates as scanDollar(): after emitting KATEX_BLOCK_START set
ctx.katexBlockOpen = true; after emitting KATEX_BLOCK_END set ctx.katexBlockOpen
= false; after emitting KATEX_INLINE_START set ctx.katexInlineOpen = true; after
emitting KATEX_INLINE_END set ctx.katexInlineOpen = false; keep the existing
flushText(ctx, pos) and emit(...) calls (TokenKind.KATEX_BLOCK_START /
KATEX_BLOCK_END / KATEX_INLINE_START / KATEX_INLINE_END) and return the same pos
+ 2 values so escaped delimiters enter/exit KaTeX mode correctly.

In `@packages/message-parser-handwritten/src/lexer/Token.ts`:
- Around line 1-68: Replace the exported const enum TokenKind with a safe public
representation (either a plain exported enum TokenKind or an exported as-const
object + union type) so it no longer emits a const enum in the .d.ts surface;
update any usages to match the chosen representation. Also remove all
implementation/section comments inside this file (the large block comments
around token groups and the single field comment) so only the token declarations
remain. Ensure the exported symbol name TokenKind is preserved and all
references compile after the change.

In `@packages/message-parser-handwritten/src/utils.ts`:
- Around line 237-270: timestampFromHours() and timestampFromIsoTime() must stop
using bitwise truncation and must build the epoch seconds in the correct
timezone; replace the "| 0" truncation with Math.floor(.../1000) in both
functions, and change timestamp construction so you compute the timestamp in UTC
(use Date.UTC with the current date's UTC year/month/day for timestampFromHours
and for timestampFromIsoTime parse
year/month/day/hours/minutes/seconds/milliseconds into numeric components and
call Date.UTC) then apply any explicit timezone offset parsing when a timezone
string is provided (convert "+HH:MM"/"-HH:MM" into minutes and add/subtract that
offset from the UTC ms value) so that both functions produce correct epoch
seconds without 2038 overflow and without local/UTC date mismatches.
- Around line 83-97: autoLink currently treats inputs like "javascript://..." as
links because tldParse and isValidLink allow non-HTTP schemes; update autoLink
to reject any src that has a scheme other than "http:" or "https:" (except allow
protocol-relative "//") before creating href. Concretely, in the autoLink
function add a guard that returns plain(src) unless src startsWith("//") or has
a URL scheme of "http" or "https" (use a simple scheme check on the raw src
string or via URL parsing), then proceed to compute href and call link;
reference autoLink, tldParse and isValidLink when making the change.

---

Nitpick comments:
In `@packages/message-parser-handwritten/src/lexer/constants/charCodes.ts`:
- Around line 1-43: Remove all inline trailing comments in the char code
constants table (e.g., the // \t, // A, // z annotations) so the file contains
only the exported constants (for symbols like CH_TAB, CH_LF, CH_SPACE, CH_A_UP,
CH_A_LO, CH_Z_LO, CH_PIPE, CH_TILDE, etc.); keep the constant names and numeric
values unchanged and do not add new comments or explanatory text in this
implementation file.

In `@packages/message-parser-handwritten/src/lexer/lexer.ts`:
- Around line 7-8: Remove the inline code comment above the MAX_TOKENS constant;
the constant name MAX_TOKENS is self-documenting so delete the comment line "//
Safety cap to avoid runaway tokenization on malformed input." and leave the
const MAX_TOKENS = 4096; declaration unchanged.

In `@packages/message-parser-handwritten/src/lexer/scanners/blocks.ts`:
- Around line 56-70: The task-bullet detection in scanners/blocks.ts reads
input[pos + 3], pos + 4 and pos + 5 without ensuring the input has that many
characters, risking out-of-bounds reads; update the conditional around
TokenKind.TASK_BULLET detection to first check that input.length >= pos + 6 (or
equivalent) before accessing input[pos + 3], input.charCodeAt(pos + 4) and
input.charCodeAt(pos + 5), then proceed to call flushText(ctx, pos) and
emit(ctx, TokenKind.TASK_BULLET, ...) only when the length check and existing
character checks pass.

In `@packages/message-parser-handwritten/src/lexer/scanners/index.ts`:
- Line 1: Remove the implementation-level comment "// Scanner dispatch table -
maps char codes to scanner functions" at the top of the file; per TypeScript
coding guidelines, avoid inline code comments in implementation files—leave the
file as-is otherwise and ensure no other standalone implementation comments
remain in this module (e.g., referencing the scanner dispatch table).

In `@packages/message-parser-handwritten/src/lexer/scanners/math.ts`:
- Line 1: Remove the inline code comment at the top of math.ts ("// $ and $$ for
KaTeX math - tracks open/close state") to comply with the TypeScript coding
guideline against implementation comments; simply delete that comment line in
packages/message-parser-handwritten/src/lexer/scanners/math.ts so the file
contains only code and no implementation comments.

In `@packages/message-parser-handwritten/src/lexer/Token.ts`:
- Around line 2-67: Remove the inline implementation/grouping comments and any
other non-functional comments from this enum file; leave only the token
identifiers (e.g., TRIPLE_BACKTICK, ASTERISK, BACKTICK, KATEX_INLINE_START,
IMAGE_OPEN, MENTION_USER, EMOJI_SHORTCODE, TIMESTAMP, UL_BULLET, COLOR, URL,
WHITESPACE, ESCAPED, TEXT, CODE_CONTENT, EOF) and their values intact,
preserving spacing for readability—ensure no comment lines remain anywhere in
Token.ts.

In `@packages/message-parser-handwritten/tests/code.test.ts`:
- Around line 1-9: Extract the duplicated helper functions tok, kinds, kv, and
raw into a single shared test helper module (e.g., tests/utils.ts) and update
test files to import them instead of redefining; move the EOF filtering and
tuple shaping logic (uses of Lexer and TokenKind.EOF) into that module so all
lexer test suites use the same tok(s: string) => new Lexer(s).tokenize() and the
common kinds/kv/raw wrappers, ensuring TokenKind and Lexer are imported there
and exported helpers are reused across tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 516195fa-ba54-4ae8-99d4-fadb55f4f1bd

📥 Commits

Reviewing files that changed from the base of the PR and between 4f43a85 and d5d139b.

⛔ Files ignored due to path filters (1)

yarn.lock is excluded by !**/yarn.lock, !**/*.lock

📒 Files selected for processing (38)

packages/message-parser-handwritten/.eslintrc.json
packages/message-parser-handwritten/.gitignore
packages/message-parser-handwritten/README.md
packages/message-parser-handwritten/benchmarks/lexer.bench.ts
packages/message-parser-handwritten/jest.config.ts
packages/message-parser-handwritten/package.json
packages/message-parser-handwritten/src/ast/definitions.ts
packages/message-parser-handwritten/src/ast/guards.ts
packages/message-parser-handwritten/src/index.ts
packages/message-parser-handwritten/src/lexer/ScanContext.ts
packages/message-parser-handwritten/src/lexer/Token.ts
packages/message-parser-handwritten/src/lexer/constants/charCodes.ts
packages/message-parser-handwritten/src/lexer/constants/charSets.ts
packages/message-parser-handwritten/src/lexer/constants/emoticons.ts
packages/message-parser-handwritten/src/lexer/constants/regexes.ts
packages/message-parser-handwritten/src/lexer/helpers.ts
packages/message-parser-handwritten/src/lexer/lexer.ts
packages/message-parser-handwritten/src/lexer/scanners/blocks.ts
packages/message-parser-handwritten/src/lexer/scanners/code.ts
packages/message-parser-handwritten/src/lexer/scanners/emoji.ts
packages/message-parser-handwritten/src/lexer/scanners/formatting.ts
packages/message-parser-handwritten/src/lexer/scanners/index.ts
packages/message-parser-handwritten/src/lexer/scanners/inline.ts
packages/message-parser-handwritten/src/lexer/scanners/links.ts
packages/message-parser-handwritten/src/lexer/scanners/math.ts
packages/message-parser-handwritten/src/lexer/scanners/structural.ts
packages/message-parser-handwritten/src/utils.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/tests/inline.test.ts
packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/structural.test.ts
packages/message-parser-handwritten/tsconfig.build.json
packages/message-parser-handwritten/tsconfig.json
packages/message-parser-handwritten/webpack.config.ts

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: cubic · AI code reviewer

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)

**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation

Files:

packages/message-parser-handwritten/src/lexer/scanners/math.ts
packages/message-parser-handwritten/src/index.ts
packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/webpack.config.ts
packages/message-parser-handwritten/src/ast/guards.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/src/lexer/scanners/emoji.ts
packages/message-parser-handwritten/src/lexer/scanners/formatting.ts
packages/message-parser-handwritten/src/lexer/constants/regexes.ts
packages/message-parser-handwritten/src/lexer/scanners/structural.ts
packages/message-parser-handwritten/src/lexer/scanners/index.ts
packages/message-parser-handwritten/src/lexer/scanners/blocks.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/src/lexer/scanners/links.ts
packages/message-parser-handwritten/src/lexer/helpers.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/src/lexer/scanners/inline.ts
packages/message-parser-handwritten/src/lexer/lexer.ts
packages/message-parser-handwritten/src/lexer/scanners/code.ts
packages/message-parser-handwritten/src/lexer/Token.ts
packages/message-parser-handwritten/src/lexer/constants/charSets.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/src/lexer/constants/emoticons.ts
packages/message-parser-handwritten/tests/structural.test.ts
packages/message-parser-handwritten/tests/inline.test.ts
packages/message-parser-handwritten/jest.config.ts
packages/message-parser-handwritten/benchmarks/lexer.bench.ts
packages/message-parser-handwritten/src/lexer/ScanContext.ts
packages/message-parser-handwritten/src/lexer/constants/charCodes.ts
packages/message-parser-handwritten/src/ast/definitions.ts
packages/message-parser-handwritten/src/utils.ts

🧠 Learnings (21)

📚 Learning: 2026-02-26T19:25:44.063Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In the Rocket.Chat repository, do not reference Biome lint rules in code review feedback. Biome is not used even if biome.json exists; only reference Biome rules if there is explicit, project-wide usage documented. For TypeScript files, review lint implications without Biome guidance unless the project enables Biome rules.

Applied to files:

packages/message-parser-handwritten/src/lexer/scanners/math.ts
packages/message-parser-handwritten/src/index.ts
packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/webpack.config.ts
packages/message-parser-handwritten/src/ast/guards.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/src/lexer/scanners/emoji.ts
packages/message-parser-handwritten/src/lexer/scanners/formatting.ts
packages/message-parser-handwritten/src/lexer/constants/regexes.ts
packages/message-parser-handwritten/src/lexer/scanners/structural.ts
packages/message-parser-handwritten/src/lexer/scanners/index.ts
packages/message-parser-handwritten/src/lexer/scanners/blocks.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/src/lexer/scanners/links.ts
packages/message-parser-handwritten/src/lexer/helpers.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/src/lexer/scanners/inline.ts
packages/message-parser-handwritten/src/lexer/lexer.ts
packages/message-parser-handwritten/src/lexer/scanners/code.ts
packages/message-parser-handwritten/src/lexer/Token.ts
packages/message-parser-handwritten/src/lexer/constants/charSets.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/src/lexer/constants/emoticons.ts
packages/message-parser-handwritten/tests/structural.test.ts
packages/message-parser-handwritten/tests/inline.test.ts
packages/message-parser-handwritten/jest.config.ts
packages/message-parser-handwritten/benchmarks/lexer.bench.ts
packages/message-parser-handwritten/src/lexer/ScanContext.ts
packages/message-parser-handwritten/src/lexer/constants/charCodes.ts
packages/message-parser-handwritten/src/ast/definitions.ts
packages/message-parser-handwritten/src/utils.ts

📚 Learning: 2026-02-26T19:25:44.063Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In this repository (RocketChat/Rocket.Chat), Biome lint rules are not used even if a biome.json exists. When reviewing TypeScript files (e.g., packages/ui-voip/src/providers/useMediaSession.ts), ensure lint suggestions do not reference Biome-specific rules. Rely on general ESLint/TypeScript lint rules and project conventions instead.

Applied to files:

packages/message-parser-handwritten/src/lexer/scanners/math.ts
packages/message-parser-handwritten/src/index.ts
packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/webpack.config.ts
packages/message-parser-handwritten/src/ast/guards.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/src/lexer/scanners/emoji.ts
packages/message-parser-handwritten/src/lexer/scanners/formatting.ts
packages/message-parser-handwritten/src/lexer/constants/regexes.ts
packages/message-parser-handwritten/src/lexer/scanners/structural.ts
packages/message-parser-handwritten/src/lexer/scanners/index.ts
packages/message-parser-handwritten/src/lexer/scanners/blocks.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/src/lexer/scanners/links.ts
packages/message-parser-handwritten/src/lexer/helpers.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/src/lexer/scanners/inline.ts
packages/message-parser-handwritten/src/lexer/lexer.ts
packages/message-parser-handwritten/src/lexer/scanners/code.ts
packages/message-parser-handwritten/src/lexer/Token.ts
packages/message-parser-handwritten/src/lexer/constants/charSets.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/src/lexer/constants/emoticons.ts
packages/message-parser-handwritten/tests/structural.test.ts
packages/message-parser-handwritten/tests/inline.test.ts
packages/message-parser-handwritten/jest.config.ts
packages/message-parser-handwritten/benchmarks/lexer.bench.ts
packages/message-parser-handwritten/src/lexer/ScanContext.ts
packages/message-parser-handwritten/src/lexer/constants/charCodes.ts
packages/message-parser-handwritten/src/ast/definitions.ts
packages/message-parser-handwritten/src/utils.ts

📚 Learning: 2025-12-10T21:00:54.909Z

Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.

Applied to files:

packages/message-parser-handwritten/.eslintrc.json
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tsconfig.json
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/tests/inline.test.ts
packages/message-parser-handwritten/jest.config.ts

📚 Learning: 2026-02-26T19:26:01.702Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:26:01.702Z
Learning: The RocketChat/Rocket.Chat project does not use Biome for linting, despite the presence of a biome.json file in the repository. Lint-related suggestions should not reference Biome rules.

Applied to files:

packages/message-parser-handwritten/.eslintrc.json

📚 Learning: 2026-01-08T15:03:59.621Z

Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 38068
File: apps/meteor/tests/data/apps/app-packages/README.md:14-16
Timestamp: 2026-01-08T15:03:59.621Z
Learning: For the RocketChat/Rocket.Chat repository, do not analyze or report formatting issues (such as hard tabs vs spaces, line breaks, etc.). The project relies on automated linting tools to enforce formatting standards.

Applied to files:

packages/message-parser-handwritten/.eslintrc.json

📚 Learning: 2025-11-19T12:32:29.696Z

Learnt from: d-gubert
Repo: RocketChat/Rocket.Chat PR: 37547
File: packages/i18n/src/locales/en.i18n.json:634-634
Timestamp: 2025-11-19T12:32:29.696Z
Learning: Repo: RocketChat/Rocket.Chat
Context: i18n workflow
Learning: In this repository, new translation keys should be added to packages/i18n/src/locales/en.i18n.json only; other locale files are populated via the external translation pipeline and/or fall back to English. Do not request adding the same key to all locale files in future reviews.

Applied to files:

packages/message-parser-handwritten/.eslintrc.json

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files

Applied to files:

packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/tests/structural.test.ts
packages/message-parser-handwritten/jest.config.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.spec.ts : Use descriptive test names that clearly communicate expected behavior in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/tests/inline.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `expect` matchers for assertions (`toEqual`, `toContain`, `toBeTruthy`, `toHaveLength`, etc.) instead of `assert` statements in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/links.test.ts
packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts

📚 Learning: 2025-11-24T17:08:26.531Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Write test cases in Markdown format following the standardized template with ALL necessary components: Title, Description, Preconditions, Type, Steps, and Expected Result

Applied to files:

packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts

📚 Learning: 2025-11-24T17:08:26.531Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Define clear, measurable expected results in test cases that can be validated and later converted into automated tests

Applied to files:

packages/message-parser-handwritten/tests/formatting.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.{ts,tsx,js} : Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tsconfig.json
packages/message-parser-handwritten/webpack.config.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/benchmarks/lexer.bench.ts
packages/message-parser-handwritten/src/utils.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file

Applied to files:

packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts

Applied to files:

packages/message-parser-handwritten/tests/formatting.test.ts
packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts
packages/message-parser-handwritten/tests/general.test.ts
packages/message-parser-handwritten/tests/inline.test.ts

📚 Learning: 2026-02-25T20:10:16.987Z

Learnt from: ahmed-n-abdeltwab
Repo: RocketChat/Rocket.Chat PR: 38913
File: packages/ddp-client/src/legacy/types/SDKLegacy.ts:34-34
Timestamp: 2026-02-25T20:10:16.987Z
Learning: In the RocketChat/Rocket.Chat monorepo, packages/ddp-client and apps/meteor do not use TypeScript project references. Module augmentations in apps/meteor (e.g., declare module 'rocket.chat/rest-typings') are not visible when compiling packages/ddp-client in isolation, which is why legacy SDK methods that depend on OperationResult types for OpenAPI-migrated endpoints must remain commented out.

Applied to files:

packages/message-parser-handwritten/tsconfig.json

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Maintain test isolation between test cases in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts
packages/message-parser-handwritten/tests/math.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.step()` for complex test scenarios to improve organization in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/code.test.ts
packages/message-parser-handwritten/tests/blocks.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.beforeAll()` and `test.afterAll()` for setup/teardown in Playwright tests

Applied to files:

packages/message-parser-handwritten/tests/code.test.ts

📚 Learning: 2025-11-19T18:20:07.720Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37419
File: packages/i18n/src/locales/en.i18n.json:918-921
Timestamp: 2025-11-19T18:20:07.720Z
Learning: Repo: RocketChat/Rocket.Chat — i18n/formatting
Learning: This repository uses a custom message formatting parser in UI blocks/messages; do not assume standard Markdown rules. For keys like Call_ended_bold, Call_not_answered_bold, Call_failed_bold, and Call_transferred_bold in packages/i18n/src/locales/en.i18n.json, retain the existing single-asterisk emphasis unless maintainers request otherwise.

Applied to files:

packages/message-parser-handwritten/src/lexer/scanners/formatting.ts
packages/message-parser-handwritten/package.json

📚 Learning: 2026-02-24T19:05:56.710Z

Learnt from: ahmed-n-abdeltwab
Repo: RocketChat/Rocket.Chat PR: 0
File: :0-0
Timestamp: 2026-02-24T19:05:56.710Z
Learning: Rocket.Chat repo context: When a workspace manifest on develop already pins a dependency version (e.g., packages/web-ui-registration → "rocket.chat/ui-contexts": "27.0.1"), a lockfile change in a feature PR that upgrades only that dependency’s resolution is considered a manifest-driven sync and can be kept, preferably as a small "chore: sync yarn.lock with manifests" commit.

Applied to files:

packages/message-parser-handwritten/package.json

📚 Learning: 2026-03-06T18:09:14.766Z

Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 39397
File: packages/gazzodown/src/elements/Timestamp/DateTimeFormats.spec.tsx:20-23
Timestamp: 2026-03-06T18:09:14.766Z
Learning: In the RocketChat/Rocket.Chat gazzodown package (`packages/gazzodown`), tests are intended to run under the UTC timezone, but as of PR `#39397` this is NOT yet explicitly enforced in `jest.config.ts` or the `package.json` test scripts (which just run `jest` without `TZ=UTC`). To make timezone-sensitive snapshot tests reliable across all environments, `TZ=UTC` should be added to the test scripts in `package.json` or to `jest.config.ts` via `testEnvironmentOptions.timezone`. Without explicit UTC enforcement, snapshot tests involving date-fns formatted output or `toLocaleString()` will fail for contributors in non-UTC timezones.

Applied to files:

packages/message-parser-handwritten/jest.config.ts

🪛 markdownlint-cli2 (0.21.0)

packages/message-parser-handwritten/README.md

[warning] 45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (31)

packages/message-parser-handwritten/webpack.config.ts (1)
5-12: No changes needed. The webpack configuration paths resolve correctly in this monorepo setup. Yarn and turbo automatically execute workspace scripts with the working directory set to the package directory (packages/message-parser-handwritten/), so relative paths like ./src, ./dist, and ./tsconfig.build.json resolve as intended. The build configuration works without modification.
			> Likely an incorrect or invalid review comment.
packages/message-parser-handwritten/src/lexer/constants/regexes.ts (1)
1-13: No issues found. Each use of these sticky regexes is preceded by an explicit lastIndex assignment before calling .exec(). The shared state is properly managed throughout the codebase across all scanner files (links.ts, inline.ts, blocks.ts).
			> Likely an incorrect or invalid review comment.
packages/message-parser-handwritten/src/lexer/scanners/math.ts (1)

7-27: LGTM!

The scanDollar implementation is correct. The lookahead for $$ is properly handled before the single $ case, state toggling is symmetric, and position advancement is accurate.

packages/message-parser-handwritten/tests/blocks.test.ts (1)

1-167: LGTM!

Comprehensive test suite covering headings, unordered/ordered list bullets, task bullets, pipes, and spoiler fences. Tests properly verify line-start context behavior, newline handling, emoticon fallbacks, and edge cases for invalid patterns.

packages/message-parser-handwritten/tests/structural.test.ts (1)

1-118: LGTM!

Well-structured test suite covering newlines (LF, CRLF, CR), whitespace collapsing, escape sequences, and round-trip integrity. The round-trip tests using test.each are a good practice for verifying token reconstruction.

packages/message-parser-handwritten/tests/formatting.test.ts (1)

1-110: LGTM!

Comprehensive formatting test suite covering asterisk, underscore, and tilde markers with proper edge case coverage for line-start bullet behavior and emoticon fallbacks.

packages/message-parser-handwritten/src/lexer/scanners/index.ts (1)

28-84: LGTM!

Clean dispatch table design with explicit character code registrations. The loop for digit registration and the categorized grouping of scanner registrations make the code maintainable.

packages/message-parser-handwritten/src/lexer/scanners/emoji.ts (1)

1-42: Overall implementation is solid.

The emoji scanner correctly handles variation selectors (VS-15/VS-16), skin tone modifiers, and ZWJ sequences. The structure follows the Unicode emoji specification pattern.

packages/message-parser-handwritten/tests/links.test.ts (1)

1-133: LGTM!

Thorough test suite covering markdown links, images, angle brackets, blockquote markers, emoticons, and timestamps with comprehensive edge case handling for empty parts, nested brackets, and fallback behaviors.

packages/message-parser-handwritten/src/lexer/lexer.ts (1)

10-68: LGTM!

The Lexer implementation is well-structured with clear separation of concerns:

Plain ASCII fast path for common text

Non-ASCII path with emoji detection

Dispatch table for special characters with graceful fallback

The MAX_TOKENS safety cap prevents runaway tokenization, and flushText before EOF ensures no accumulated text is lost.

packages/message-parser-handwritten/src/lexer/scanners/links.ts (3)

1-32: LGTM - scanAngleOpen correctly handles timestamp, emoticon, and angle bracket cases.

The timestamp detection logic properly validates the <t: prefix and uses regex for inner content before emitting. The emoticon fallback and plain angle bracket emission are correctly ordered.

34-47: LGTM - scanAngleClose correctly distinguishes blockquote markers from angle brackets.

The isLineStart check correctly identifies blockquote markers at line start (pos === 0 or after newline), with emoticon fallback properly prioritized.

49-79: LGTM - Link-related scanners are straightforward and correct.

scanExclamation, scanCloseBracket, scanBracketOpen, and scanParenClose correctly handle their respective markdown link constructs with proper text flushing and token emission.

packages/message-parser-handwritten/src/lexer/scanners/blocks.ts (3)

12-45: LGTM - scanHash correctly handles headings, emoticons, and channel mentions.

The heading detection logic properly limits to 4 levels and requires trailing whitespace. Channel mention scanning via scanMentionBody is correctly sequenced after emoticon fallback.

90-128: LGTM - scanDigit handles ordered lists, emoticons, and URLs correctly.

The ordered list detection at line start with digit-dot-space pattern is correct. URL matching with trailing punctuation trimming follows the established pattern.

130-150: LGTM - scanPipe correctly distinguishes block vs inline spoiler fences.

The block spoiler detection requiring line-start position and line-end (or EOF) following || is correctly implemented. The token kind distinction aligns with Token.ts definitions.

packages/message-parser-handwritten/src/lexer/scanners/code.ts (4)

5-14: LGTM - scanBacktick correctly dispatches between fenced and inline code.

The triple-backtick detection and delegation to specialized scanners is clean and correct.

17-39: LGTM - scanFencedBlock correctly handles fenced code blocks.

The scanner properly emits opening/closing triple backticks with CODE_CONTENT body between them. Unclosed fences emit accumulated content as CODE_CONTENT, which is appropriate for partial code blocks.

41-66: LGTM - scanInlineCode correctly handles single backtick code spans.

The newline-termination behavior (emitting unclosed inline code as TEXT) matches typical markdown behavior where inline code cannot span lines. EOF handling is also correct.

68-80: LGTM - Helper functions are clean and correct.

emitCodeBody and emitTextBody correctly handle the empty-range case and emit appropriate token kinds.

packages/message-parser-handwritten/src/lexer/constants/emoticons.ts (3)

95-95: Trailing space in emoticon may be unintentional.

The emoticon '>.< ' (line 95) contains a trailing space. Verify if this is intentional, as it would require users to type >.< (with space) to trigger the persevere emoji, which differs from other emoticons that don't require trailing whitespace.

1-9: LGTM - TrieNode interface and factory are well-designed.

Using Object.create(null) for the character map avoids prototype pollution and is a good practice for dictionary-like structures.

128-149: LGTM - Trie construction and lookup are correct.

buildEmoticonTrie correctly builds a prefix trie with longest-match semantics (storing raw at terminal nodes). The dual lookup structure (trie for matching, object for shortcode resolution) is efficient.

packages/message-parser-handwritten/src/lexer/ScanContext.ts (3)

5-12: LGTM - ScanContext interface is well-structured.

The interface cleanly separates readonly input data from mutable state (textStart, katexBlockOpen, katexInlineOpen). The token array is mutable but the reference is readonly, which is appropriate.

14-42: LGTM - Utility functions are correct and efficient.

isLineStart, flushText, emit, and consumeRun are all pure/predictable functions with correct implementations. The flushText early return on textStart === -1 prevents unnecessary work.

44-69: LGTM - tryEmoticon implements greedy longest-match correctly.

The trie traversal records the longest valid match (lastMatch/lastMatchEnd) as it walks, ensuring patterns like :-) are preferred over :) when both exist in the trie. The fallback to raw when getEmoticonShortCode returns undefined handles edge cases gracefully.
packages/message-parser-handwritten/src/lexer/scanners/inline.ts (5)
8-33: LGTM - scanColon correctly handles URL schemes, emoji shortcodes, and emoticons.

The priority order (URL scheme → emoji shortcode → emoticon → plain text) is correct for typical markdown-like syntax.

35-56: LGTM - tryUrlScheme cleverly rescans from previous TEXT token.

The approach of popping the previous TEXT token to capture the URL scheme prefix (e.g., http before ://) is efficient and correct. The fallback to plain text on invalid URLs handles edge cases properly.

58-67: LGTM - scanAt correctly handles user mentions.

The mention body scanning and token emission are straightforward and correct.

131-136: LGTM - scanEmoticonStarter is a clean fallback scanner.

Simple and correct delegation to tryEmoticon with text accumulation fallback.

69-90: No ReDoS vulnerability in PHONE_RE regex.

Empirical testing of the regex /\+(?:$?\d+$?[\s\-]?){1,}\d/y with pathological inputs (100+ parentheses, repeated non-digits) completes in microseconds with no exponential backtracking. The pattern requires at least one digit in the repeating group and a final digit as a boundary condition, which prevents catastrophic backtracking. Malformed inputs fail efficiently.
			> Likely an incorrect or invalid review comment.

…code emoji detection in lexer

…emoticon key '>.<'

…_TOKENS is reached

…OLOR token

…terpreter in shebang

…with skin-tone modifiers

… a valid emoji component

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

packages/message-parser/LEXER.md (1)
45-55: ⚠️ Potential issue | 🟡 Minor

Add language identifiers to fenced code blocks.

markdownlint MD040 is triggered at Line 45 and Line 51. Please add text to both opening fences.
Suggested fix
-```
+```text
 # Hello **world** :smile: `@john`
@@
- +text
HEADING_MARKER("1") → WS(" ") → TEXT("Hello") → WS(" ") → ASTERISK("") →
TEXT("world") → ASTERISK("") → WS(" ") → EMOJI_SHORTCODE("smile") →
WS(" ") → MENTION_USER("john") → EOF
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/LEXER.md` around lines 45 - 55, The two fenced code
blocks that start with "```" containing the sample Markdown ("# Hello **world**
:smile: `@john`") and the token sequence (HEADING_MARKER...EOF) need explicit
language identifiers to satisfy markdownlint MD040; update both opening fences
from ``` to ```text so they read "```text" to mark them as plain text. Ensure
you only modify the two opening fence lines (for the human-readable example and
the flat array of tokens) and leave the block contents and closing fences
unchanged.
packages/message-parser/src/lexer/constants/charSets.ts (1)
51-65: ⚠️ Potential issue | 🟠 Major

Narrow BMP emoji range or add validation; remove inline comments.

The range 0x2300–0x27BF still includes non-emoji characters (box-drawing U+2500–U+257F, block elements U+2580–U+259F). Characters like ─ (U+2500) are incorrectly tokenized as EMOJI_UNICODE. The scanUnicodeEmoji function lacks validation to filter these out. Either narrow the BMP range to actual emoji blocks (e.g., U+2600–U+27BF for Miscellaneous Symbols and Dingbats) or add precise character validation before tokenizing.

Also, per coding guidelines, remove inline comments (lines 50, 53).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/lexer/constants/charSets.ts` around lines 51 -
65, The BMP range in isUnicodeEmojiStart is too broad and includes non-emoji
chars; tighten the BMP check to the proper emoji block(s) (e.g., use
0x2600–0x27BF) or replace the single-range test with a precise whitelist/lookup
of allowed BMP codepoints, and add a corresponding validation step in
scanUnicodeEmoji to verify the actual character(s) against that whitelist before
emitting EMOJI_UNICODE (use the same validation helper to avoid duplication).
Also remove the inline end-of-line comments inside isUnicodeEmojiStart and
ensure any explanatory notes live in surrounding doc/comments instead.

🧹 Nitpick comments (5)

packages/message-parser/src/index.ts (1)
27-28: Keep the handwritten lexer off the root barrel until the API settles.

The PR still describes this as a Stage 1 prototype. Re-exporting it here makes Lexer, Token, TokenKind, and makeToken part of the stable package contract immediately; a dedicated subpath export would leave room to evolve the token model before Stage 2 lands. The inline banner on Line 27 can go away with that move.

As per coding guidelines: "Avoid code comments in the implementation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/index.ts` around lines 27 - 28, The root barrel
currently re-exports the handwritten lexer (Lexer, Token, TokenKind, makeToken)
and includes an inline banner comment; remove that re-export and delete the
inline comment from the root index so these symbols are not part of the stable
package contract, then create a dedicated subpath module that re-exports
Lexer/Token/TokenKind/makeToken and add a corresponding package subpath export
(so consumers can import the lexer from the dedicated subpath while the root API
remains stable).
packages/message-parser/jest.config.ts (1)
10-23: Narrow the transform regex to exclude JSX extensions since this package has no JSX files.

The pattern ^.+\.m?(t|j)sx?$ at line 10 advertises support for .tsx/.jsx files, but there are no JSX/TSX sources in packages/message-parser. The SWC parser also omits JSX syntax configuration. Narrowing the regex to ^.+\.m?[tj]s$ clarifies the actual scope.
Suggested patch
-		'^.+\\.m?(t|j)sx?$': [
+		'^.+\\.m?[tj]s$': [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/jest.config.ts` around lines 10 - 23, The transform
regex in the Jest config currently allows JSX/TSX (the pattern
'^.+\\.m?(t|j)sx?$') while the SWC jsc.parser settings disable JSX and this
package has no JSX/TSX files; update the transform key to a narrower regex
'^.+\\.m?[tj]s$' so it only matches .js/.mjs/.ts/.mts files and remove JSX/TSX
support, keeping the existing '@swc/jest' transformer and jsc config (refer to
the transform entry and the jsc.parser block in the snippet to locate the
change).
packages/message-parser/src/lexer/ScanContext.ts (1)
46-50: Potential infinite loop if pos is negative.

consumeRun doesn't validate that pos is non-negative. If called with a negative pos, input.charCodeAt(pos + count) would access negative indices, returning NaN on each iteration. Since NaN !== charCode, the loop would exit immediately with count = 0, so this is functionally safe but unexpected input could cause subtle issues.

The current implementation is safe due to JavaScript's charCodeAt behavior, but callers must ensure valid positions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/lexer/ScanContext.ts` around lines 46 - 50, The
function consumeRun should guard against invalid starting positions: detect if
pos is negative or beyond input.length (or not an integer) and return 0 early
rather than calling input.charCodeAt with an out-of-range index; update the
consumeRun implementation to validate pos (e.g., ensure Number.isInteger(pos) &&
pos >= 0 && pos < input.length) before entering the while loop that compares
input.charCodeAt(pos + count) to charCode so callers with bad positions won't
produce unexpected behavior.
packages/message-parser/src/lexer/constants/charSets.ts (1)
10-16: Consider removing inline comments per coding guidelines.

The coding guidelines specify to "avoid code comments in the implementation." While these comments serve as documentation, they could be moved to JSDoc or external documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/lexer/constants/charSets.ts` around lines 10 -
16, Remove the inline implementation comment above the CHAR_CLASS declaration
and any other inline comments in this file; instead add a brief JSDoc block
above the export of CHAR_CLASS describing its purpose and the kinds of
characters it marks (used for tokenization), leaving the code that initializes
CHAR_CLASS and the uses of symbols like CHAR_CLASS, CH_0, CH_9, and CH_C_LO
unchanged so behavior is preserved.
packages/message-parser/src/lexer/scanners/links.ts (1)
14-25: Potential out-of-bounds access when checking timestamp pattern.

Lines 15-16 access pos + 1 and pos + 2 without bounds checking. If pos is near the end of input, charCodeAt returns NaN, which won't match the expected codes, so the condition fails safely. However, line 20 accesses pos + 3 + m[0].length which could also be out of bounds—charCodeAt will return NaN in that case too, causing the comparison to fail safely.

The current implementation is functionally correct due to charCodeAt returning NaN for out-of-bounds indices, but explicit bounds checks would improve clarity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/message-parser/src/lexer/scanners/links.ts` around lines 14 - 25,
The timestamp scanner should avoid implicit out-of-bounds charCodeAt reads; add
explicit length checks before indexing: ensure pos + 2 < input.length before
testing the CH_T_LO and CH_COLON checks, and after matching TS_INNER (use
TS_INNER.exec) ensure pos + 3 + m[0].length < input.length (or <= depending on
slice semantics) before calling input.charCodeAt for CH_GT; only then call
flushText(ctx, pos), compute raw and call emit(ctx, TokenKind.TIMESTAMP, raw,
m[1], pos). Keep TS_INNER, flushText, emit and TokenKind.TIMESTAMP references
intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/message-parser/benchmarks/lexer.bench.ts`:
- Around line 2-8: Remove the top banner comment that starts with "Benchmark
suite for the Lexer" (it references the stale path
`packages/message-parser-handwritten/`) and delete the other section/banner
comments scattered in the file that merely repeat what the types/functions
already convey; if any brief context must remain, replace the stale path with
the current location and keep a single one-line comment, otherwise remove them
entirely to comply with the "avoid code comments in the implementation"
guideline.

In `@packages/message-parser/src/lexer/constants/regexes.ts`:
- Around line 4-5: The EMAIL_RE currently captures sentence-ending punctuation
(e.g. "foo@example.com.") so emitted EMAIL tokens include trailing punctuation;
fix this by mirroring the URL handling: either tighten EMAIL_RE to exclude
terminal punctuation with a lookahead (e.g. require next char be whitespace,
end, or a non-trailing-punct boundary) or, more simply, after matching with
EMAIL_RE strip trailing punctuation using the existing TRAIL_PUNCT trimming
logic before emitting the token; update the code path that emits EMAIL tokens to
call the same TRAIL_PUNCT trim routine used for URLs so emitted EMAIL tokens
contain only the clean address.
- Around line 10-11: The COLOR_RE alternation currently favors shorter hex
branches first causing 8- and 4-digit alpha hex colors to be mis-tokenized
(e.g., "#ffffffff" -> "#ffffff"); update the COLOR_RE so the 8-digit and 4-digit
branches come before the 6- and 3-digit branches and add a hex boundary (e.g., a
negative lookahead or explicit boundary after the hex group) so that
alpha-containing hex colors are matched atomically; locate and edit the exported
constant COLOR_RE in the lexer/constants/regexes.ts to apply this change.

---

Duplicate comments:
In `@packages/message-parser/LEXER.md`:
- Around line 45-55: The two fenced code blocks that start with "```" containing
the sample Markdown ("# Hello **world** :smile: `@john`") and the token sequence
(HEADING_MARKER...EOF) need explicit language identifiers to satisfy
markdownlint MD040; update both opening fences from ``` to ```text so they read
"```text" to mark them as plain text. Ensure you only modify the two opening
fence lines (for the human-readable example and the flat array of tokens) and
leave the block contents and closing fences unchanged.

In `@packages/message-parser/src/lexer/constants/charSets.ts`:
- Around line 51-65: The BMP range in isUnicodeEmojiStart is too broad and
includes non-emoji chars; tighten the BMP check to the proper emoji block(s)
(e.g., use 0x2600–0x27BF) or replace the single-range test with a precise
whitelist/lookup of allowed BMP codepoints, and add a corresponding validation
step in scanUnicodeEmoji to verify the actual character(s) against that
whitelist before emitting EMOJI_UNICODE (use the same validation helper to avoid
duplication). Also remove the inline end-of-line comments inside
isUnicodeEmojiStart and ensure any explanatory notes live in surrounding
doc/comments instead.

---

Nitpick comments:
In `@packages/message-parser/jest.config.ts`:
- Around line 10-23: The transform regex in the Jest config currently allows
JSX/TSX (the pattern '^.+\\.m?(t|j)sx?$') while the SWC jsc.parser settings
disable JSX and this package has no JSX/TSX files; update the transform key to a
narrower regex '^.+\\.m?[tj]s$' so it only matches .js/.mjs/.ts/.mts files and
remove JSX/TSX support, keeping the existing '@swc/jest' transformer and jsc
config (refer to the transform entry and the jsc.parser block in the snippet to
locate the change).

In `@packages/message-parser/src/index.ts`:
- Around line 27-28: The root barrel currently re-exports the handwritten lexer
(Lexer, Token, TokenKind, makeToken) and includes an inline banner comment;
remove that re-export and delete the inline comment from the root index so these
symbols are not part of the stable package contract, then create a dedicated
subpath module that re-exports Lexer/Token/TokenKind/makeToken and add a
corresponding package subpath export (so consumers can import the lexer from the
dedicated subpath while the root API remains stable).

In `@packages/message-parser/src/lexer/constants/charSets.ts`:
- Around line 10-16: Remove the inline implementation comment above the
CHAR_CLASS declaration and any other inline comments in this file; instead add a
brief JSDoc block above the export of CHAR_CLASS describing its purpose and the
kinds of characters it marks (used for tokenization), leaving the code that
initializes CHAR_CLASS and the uses of symbols like CHAR_CLASS, CH_0, CH_9, and
CH_C_LO unchanged so behavior is preserved.

In `@packages/message-parser/src/lexer/ScanContext.ts`:
- Around line 46-50: The function consumeRun should guard against invalid
starting positions: detect if pos is negative or beyond input.length (or not an
integer) and return 0 early rather than calling input.charCodeAt with an
out-of-range index; update the consumeRun implementation to validate pos (e.g.,
ensure Number.isInteger(pos) && pos >= 0 && pos < input.length) before entering
the while loop that compares input.charCodeAt(pos + count) to charCode so
callers with bad positions won't produce unexpected behavior.

In `@packages/message-parser/src/lexer/scanners/links.ts`:
- Around line 14-25: The timestamp scanner should avoid implicit out-of-bounds
charCodeAt reads; add explicit length checks before indexing: ensure pos + 2 <
input.length before testing the CH_T_LO and CH_COLON checks, and after matching
TS_INNER (use TS_INNER.exec) ensure pos + 3 + m[0].length < input.length (or <=
depending on slice semantics) before calling input.charCodeAt for CH_GT; only
then call flushText(ctx, pos), compute raw and call emit(ctx,
TokenKind.TIMESTAMP, raw, m[1], pos). Keep TS_INNER, flushText, emit and
TokenKind.TIMESTAMP references intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 95910465-d4aa-45b0-b4e7-19656aa4c6d9

📥 Commits

Reviewing files that changed from the base of the PR and between 5e0100d and 2ac67d4.

📒 Files selected for processing (31)

packages/message-parser/LEXER.md
packages/message-parser/benchmarks/lexer.bench.ts
packages/message-parser/jest.config.ts
packages/message-parser/package.json
packages/message-parser/src/index.ts
packages/message-parser/src/lexer/ScanContext.ts
packages/message-parser/src/lexer/Token.ts
packages/message-parser/src/lexer/constants/charCodes.ts
packages/message-parser/src/lexer/constants/charSets.ts
packages/message-parser/src/lexer/constants/emoticons.ts
packages/message-parser/src/lexer/constants/regexes.ts
packages/message-parser/src/lexer/helpers.ts
packages/message-parser/src/lexer/index.ts
packages/message-parser/src/lexer/lexer.ts
packages/message-parser/src/lexer/scanners/blocks.ts
packages/message-parser/src/lexer/scanners/code.ts
packages/message-parser/src/lexer/scanners/emoji.ts
packages/message-parser/src/lexer/scanners/formatting.ts
packages/message-parser/src/lexer/scanners/index.ts
packages/message-parser/src/lexer/scanners/inline.ts
packages/message-parser/src/lexer/scanners/links.ts
packages/message-parser/src/lexer/scanners/math.ts
packages/message-parser/src/lexer/scanners/structural.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/general.test.ts
packages/message-parser/tests/lexer/inline.test.ts
packages/message-parser/tests/lexer/links.test.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/tests/lexer/structural.test.ts

📜 Review details

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{ts,tsx,js}

📄 CodeRabbit inference engine (.cursor/rules/playwright.mdc)

**/*.{ts,tsx,js}: Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests
Avoid code comments in the implementation

Files:

packages/message-parser/src/lexer/scanners/code.ts
packages/message-parser/src/lexer/scanners/index.ts
packages/message-parser/src/lexer/helpers.ts
packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/src/lexer/Token.ts
packages/message-parser/src/lexer/lexer.ts
packages/message-parser/src/lexer/scanners/links.ts
packages/message-parser/src/lexer/scanners/structural.ts
packages/message-parser/src/lexer/scanners/math.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/src/lexer/scanners/emoji.ts
packages/message-parser/src/lexer/scanners/blocks.ts
packages/message-parser/tests/lexer/links.test.ts
packages/message-parser/src/lexer/constants/regexes.ts
packages/message-parser/src/lexer/index.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/src/lexer/ScanContext.ts
packages/message-parser/src/lexer/scanners/formatting.ts
packages/message-parser/src/lexer/constants/charSets.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/benchmarks/lexer.bench.ts
packages/message-parser/src/lexer/scanners/inline.ts
packages/message-parser/tests/lexer/general.test.ts
packages/message-parser/jest.config.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/src/lexer/constants/emoticons.ts
packages/message-parser/src/index.ts
packages/message-parser/tests/lexer/inline.test.ts
packages/message-parser/src/lexer/constants/charCodes.ts

🧠 Learnings (18)

📚 Learning: 2026-02-26T19:25:44.063Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In the Rocket.Chat repository, do not reference Biome lint rules in code review feedback. Biome is not used even if biome.json exists; only reference Biome rules if there is explicit, project-wide usage documented. For TypeScript files, review lint implications without Biome guidance unless the project enables Biome rules.

Applied to files:

packages/message-parser/src/lexer/scanners/code.ts
packages/message-parser/src/lexer/scanners/index.ts
packages/message-parser/src/lexer/helpers.ts
packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/src/lexer/Token.ts
packages/message-parser/src/lexer/lexer.ts
packages/message-parser/src/lexer/scanners/links.ts
packages/message-parser/src/lexer/scanners/structural.ts
packages/message-parser/src/lexer/scanners/math.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/src/lexer/scanners/emoji.ts
packages/message-parser/src/lexer/scanners/blocks.ts
packages/message-parser/tests/lexer/links.test.ts
packages/message-parser/src/lexer/constants/regexes.ts
packages/message-parser/src/lexer/index.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/src/lexer/ScanContext.ts
packages/message-parser/src/lexer/scanners/formatting.ts
packages/message-parser/src/lexer/constants/charSets.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/benchmarks/lexer.bench.ts
packages/message-parser/src/lexer/scanners/inline.ts
packages/message-parser/tests/lexer/general.test.ts
packages/message-parser/jest.config.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/src/lexer/constants/emoticons.ts
packages/message-parser/src/index.ts
packages/message-parser/tests/lexer/inline.test.ts
packages/message-parser/src/lexer/constants/charCodes.ts

📚 Learning: 2026-02-26T19:25:44.063Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 38778
File: packages/ui-voip/src/providers/useMediaSession.ts:192-192
Timestamp: 2026-02-26T19:25:44.063Z
Learning: In this repository (RocketChat/Rocket.Chat), Biome lint rules are not used even if a biome.json exists. When reviewing TypeScript files (e.g., packages/ui-voip/src/providers/useMediaSession.ts), ensure lint suggestions do not reference Biome-specific rules. Rely on general ESLint/TypeScript lint rules and project conventions instead.

Applied to files:

packages/message-parser/src/lexer/scanners/code.ts
packages/message-parser/src/lexer/scanners/index.ts
packages/message-parser/src/lexer/helpers.ts
packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/src/lexer/Token.ts
packages/message-parser/src/lexer/lexer.ts
packages/message-parser/src/lexer/scanners/links.ts
packages/message-parser/src/lexer/scanners/structural.ts
packages/message-parser/src/lexer/scanners/math.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/src/lexer/scanners/emoji.ts
packages/message-parser/src/lexer/scanners/blocks.ts
packages/message-parser/tests/lexer/links.test.ts
packages/message-parser/src/lexer/constants/regexes.ts
packages/message-parser/src/lexer/index.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/src/lexer/ScanContext.ts
packages/message-parser/src/lexer/scanners/formatting.ts
packages/message-parser/src/lexer/constants/charSets.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/benchmarks/lexer.bench.ts
packages/message-parser/src/lexer/scanners/inline.ts
packages/message-parser/tests/lexer/general.test.ts
packages/message-parser/jest.config.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/src/lexer/constants/emoticons.ts
packages/message-parser/src/index.ts
packages/message-parser/tests/lexer/inline.test.ts
packages/message-parser/src/lexer/constants/charCodes.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Utilize Playwright fixtures (`test`, `page`, `expect`) for consistency in test files

Applied to files:

packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/links.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/jest.config.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Ensure tests run reliably in parallel without shared state conflicts

Applied to files:

packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/jest.config.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.{ts,tsx,js} : Write concise, technical TypeScript/JavaScript with accurate typing in Playwright tests

Applied to files:

packages/message-parser/tests/lexer/structural.test.ts
packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/jest.config.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/tests/lexer/inline.test.ts

📚 Learning: 2026-03-09T21:20:12.687Z

Learnt from: pierre-lehnen-rc
Repo: RocketChat/Rocket.Chat PR: 39386
File: apps/meteor/server/services/push/tokenManagement/findDocumentToUpdate.ts:12-15
Timestamp: 2026-03-09T21:20:12.687Z
Learning: In `apps/meteor/server/services/push/tokenManagement/findDocumentToUpdate.ts`, the early return `if (data.voipToken) return null` (Lines 13-15) is intentionally correct. VoIP token updates always include an `_id`, so they are handled by the `_id` lookup block above (Lines 5-9) and never reach this guard. The guard is only a safety net for edge cases where `_id` is absent or no document was found, preventing an incorrect `token + appName` fallback match for VoIP-only payloads.

Applied to files:

packages/message-parser/src/lexer/lexer.ts

📚 Learning: 2025-11-24T17:08:26.531Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/test-cases.mdc:0-0
Timestamp: 2025-11-24T17:08:26.531Z
Learning: Applies to **/*test-case*.md : Write test cases in Markdown format following the standardized template with ALL necessary components: Title, Description, Preconditions, Type, Steps, and Expected Result

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `expect` matchers for assertions (`toEqual`, `toContain`, `toBeTruthy`, `toHaveLength`, etc.) instead of `assert` statements in Playwright tests

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/jest.config.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to **/*.spec.ts : Use descriptive test names that clearly communicate expected behavior in Playwright tests

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/tests/lexer/general.test.ts

📚 Learning: 2025-11-19T18:20:07.720Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37419
File: packages/i18n/src/locales/en.i18n.json:918-921
Timestamp: 2025-11-19T18:20:07.720Z
Learning: Repo: RocketChat/Rocket.Chat — i18n/formatting
Learning: This repository uses a custom message formatting parser in UI blocks/messages; do not assume standard Markdown rules. For keys like Call_ended_bold, Call_not_answered_bold, Call_failed_bold, and Call_transferred_bold in packages/i18n/src/locales/en.i18n.json, retain the existing single-asterisk emphasis unless maintainers request otherwise.

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/src/lexer/scanners/formatting.ts
packages/message-parser/LEXER.md

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Group related tests in the same file

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/tests/lexer/blocks.test.ts
packages/message-parser/tests/lexer/math.test.ts
packages/message-parser/tests/lexer/inline.test.ts

📚 Learning: 2025-12-10T21:00:54.909Z

Learnt from: KevLehman
Repo: RocketChat/Rocket.Chat PR: 37091
File: ee/packages/abac/jest.config.ts:4-7
Timestamp: 2025-12-10T21:00:54.909Z
Learning: Rocket.Chat monorepo: Jest testMatch pattern '<rootDir>/src/**/*.spec.(ts|js|mjs)' is valid in this repo and used across multiple packages (e.g., packages/tools, ee/packages/omnichannel-services). Do not flag it as invalid in future reviews.

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts
packages/message-parser/src/lexer/constants/regexes.ts
packages/message-parser/jest.config.ts
packages/message-parser/tests/lexer/inline.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Use `test.step()` for complex test scenarios to improve organization in Playwright tests

Applied to files:

packages/message-parser/tests/lexer/formatting.test.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : Maintain test isolation between test cases in Playwright tests

Applied to files:

packages/message-parser/tests/lexer/code.test.ts
packages/message-parser/jest.config.ts

📚 Learning: 2026-01-17T01:51:47.764Z

Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 38219
File: packages/core-typings/src/cloud/Announcement.ts:5-6
Timestamp: 2026-01-17T01:51:47.764Z
Learning: In packages/core-typings/src/cloud/Announcement.ts, the AnnouncementSchema.createdBy field intentionally overrides IBannerSchema.createdBy (object with _id and optional username) with a string enum ['cloud', 'system'] to match existing runtime behavior. This is documented as technical debt with a FIXME comment at apps/meteor/app/cloud/server/functions/syncWorkspace/handleCommsSync.ts:53 and should not be flagged as an error until the runtime behavior is corrected.

Applied to files:

packages/message-parser/src/lexer/scanners/inline.ts

📚 Learning: 2026-03-06T18:09:17.867Z

Learnt from: tassoevan
Repo: RocketChat/Rocket.Chat PR: 39397
File: packages/gazzodown/src/elements/Timestamp/DateTimeFormats.spec.tsx:20-23
Timestamp: 2026-03-06T18:09:17.867Z
Learning: In the RocketChat/Rocket.Chat gazzodown package (`packages/gazzodown`), tests are intended to run under the UTC timezone, but as of PR `#39397` this is NOT yet explicitly enforced in `jest.config.ts` or the `package.json` test scripts (which just run `jest` without `TZ=UTC`). To make timezone-sensitive snapshot tests reliable across all environments, `TZ=UTC` should be added to the test scripts in `package.json` or to `jest.config.ts` via `testEnvironmentOptions.timezone`. Without explicit UTC enforcement, snapshot tests involving date-fns formatted output or `toLocaleString()` will fail for contributors in non-UTC timezones.

Applied to files:

packages/message-parser/jest.config.ts

📚 Learning: 2025-12-16T17:29:45.163Z

Learnt from: gabriellsh
Repo: RocketChat/Rocket.Chat PR: 37834
File: apps/meteor/tests/e2e/page-objects/fragments/admin-flextab-emoji.ts:12-22
Timestamp: 2025-12-16T17:29:45.163Z
Learning: In page object files under `apps/meteor/tests/e2e/page-objects/`, always import `expect` from `../../utils/test` (Playwright's async expect), not from Jest. Jest's `expect` has a synchronous signature and will cause TypeScript errors when used with web-first assertions like `toBeVisible()`.

Applied to files:

packages/message-parser/jest.config.ts

📚 Learning: 2025-11-24T17:08:17.065Z

Learnt from: CR
Repo: RocketChat/Rocket.Chat PR: 0
File: .cursor/rules/playwright.mdc:0-0
Timestamp: 2025-11-24T17:08:17.065Z
Learning: Applies to apps/meteor/tests/e2e/**/*.spec.ts : All test files must be created in `apps/meteor/tests/e2e/` directory

Applied to files:

packages/message-parser/jest.config.ts

🪛 markdownlint-cli2 (0.21.0)

packages/message-parser/LEXER.md

[warning] 45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (28)

packages/message-parser/package.json (1)

38-40: Nice split between parser and lexer benchmarks.

Keeping bench:parser and bench:lexer separately addressable makes local profiling and CI targeting much easier.

packages/message-parser/src/lexer/scanners/math.ts (1)

1-31: LGTM!

The scanner correctly handles $ (inline) and $$ (block) KaTeX delimiters with proper state toggling and token emission. The boundary check at pos + 1 is implicitly safe since charCodeAt returns NaN for out-of-bounds indices, which won't equal CH_DOLLAR.

packages/message-parser/src/lexer/scanners/index.ts (1)

1-86: LGTM!

The scanner dispatch table is well-organized with clear groupings. Character code registrations align with the CHAR_CLASS special character classifications. The sparse array approach is efficient for O(1) dispatch lookups.

packages/message-parser/tests/lexer/inline.test.ts (1)

1-202: LGTM!

Comprehensive test coverage for inline token scanning including emoji shortcodes, mentions, URLs, colors, phone numbers, and emoticons. Edge cases are well-covered (unclosed shortcodes, trailing punctuation stripping, invalid patterns falling through to TEXT).

packages/message-parser/tests/lexer/math.test.ts (1)

1-114: LGTM!

Thorough test coverage for KaTeX delimiter handling across inline ($), block ($$), and backslash (\[, \], $, $) variants. The tests correctly verify the toggle-based state machine behavior and round-trip preservation.

packages/message-parser/tests/lexer/links.test.ts (1)

1-133: LGTM!

Excellent coverage of link-related tokenization including markdown links, images, angle brackets, emoticons (<3, >:)), and timestamps. Edge cases are well-tested: empty parts, nested brackets, blockquote detection at line start vs mid-line, and malformed timestamp fallbacks.
packages/message-parser/src/lexer/lexer.ts (2)
69-71: EOF token may exceed MAX_TOKENS limit.

The EOF token is pushed unconditionally after the loop breaks at MAX_TOKENS. While flushText internally guards against exceeding the cap, the direct push() call for EOF doesn't, potentially resulting in MAX_TOKENS + 1 tokens.
Suggested fix
         flushText(ctx, pos);
-        ctx.tokens.push(makeToken(TokenKind.EOF, '', '', pos));
+        if (ctx.tokens.length < MAX_TOKENS) {
+            ctx.tokens.push(makeToken(TokenKind.EOF, '', '', pos));
+        }
         return ctx.tokens;
1-73: Well-structured lexer implementation.

The three-path dispatch (plain ASCII fast path, non-ASCII/emoji path, special char scanner dispatch) is efficient. The CHAR_CLASS check enables O(1) classification, and scanner table dispatch avoids conditional chains.
packages/message-parser/tests/lexer/formatting.test.ts (1)

1-110: LGTM!

Comprehensive formatting marker tests covering asterisk, underscore, and tilde behaviors. Good coverage of context-sensitive tokenization (line-start bullets vs mid-line emphasis) and emoticon fallbacks (*), *-)).

packages/message-parser/tests/lexer/general.test.ts (1)

1-263: LGTM!

Excellent comprehensive test suite covering boundary conditions, token position invariants, round-trips, MAX_TOKENS limits, Unicode emoji (including ZWJ sequences, variation selectors, skin tone modifiers), and non-ASCII text handling. The position invariant checks (contiguous tokens, end = start + raw.length) are particularly valuable for ensuring lexer correctness.

packages/message-parser/src/lexer/scanners/formatting.ts (1)

1-51: LGTM!

The formatting scanners are well-structured with clear separation of concerns. The asterisk scanner correctly handles the three cases (list bullet, emoticon, emphasis) with proper priority, and the underscore/tilde scanners follow a consistent pattern for consuming runs of their respective characters.

packages/message-parser/tests/lexer/structural.test.ts (1)

1-118: LGTM!

Comprehensive test coverage for structural tokenization. The helper functions (tok, kinds, kv, raw) provide clean abstractions for assertions. The round-trip tests on lines 109-117 are particularly valuable for ensuring the lexer preserves input faithfully.

packages/message-parser/src/lexer/scanners/code.ts (1)

1-83: LGTM!

The code scanner correctly distinguishes between fenced blocks (which can span multiple lines) and inline code spans (which cannot cross line boundaries). The unclosed-fence handling is appropriate - emitting CODE_CONTENT for fenced blocks but TEXT for unclosed inline spans reflects the semantic difference between the two constructs.

packages/message-parser/src/lexer/helpers.ts (1)

1-66: LGTM!

Both helper functions are well-designed with proper boundary checks and backtracking logic. The scanMentionBody correctly handles optional sub-address segments (: or @ separators) with safe backtracking when the segment after a separator is invalid. The callers in inline.ts and blocks.ts properly guard against empty strings and -1 return values.
packages/message-parser/src/lexer/scanners/emoji.ts (1)
14-15: Surrogate range check may be overly broad.

Line 15 checks the full surrogate range (0xd800-0xdfff), which includes both high surrogates (0xd800-0xdbff) and low surrogates (0xdc00-0xdfff). In valid UTF-16, a low surrogate should never appear independently at the start of a character. While this doesn't cause incorrect behavior for well-formed input (the i + 1 < len guard prevents out-of-bounds access), narrowing to high surrogates would be more precise.

The same pattern appears on line 45 in the ZWJ loop.
Proposed refinement
-    i += (c1 >= 0xd800 && c1 <= 0xdfff && i + 1 < len) ? 2 : 1;
+    i += (c1 >= 0xd800 && c1 <= 0xdbff && i + 1 < len) ? 2 : 1;
-        if (next >= 0xd800 && next <= 0xdfff && i + 1 < len && isUnicodeEmojiStart(input, i)) { i += 2; }
+        if (next >= 0xd800 && next <= 0xdbff && i + 1 < len && isUnicodeEmojiStart(input, i)) { i += 2; }
packages/message-parser/src/lexer/scanners/structural.ts (1)

1-53: LGTM!

The structural scanners are well-implemented. The scanEscape function correctly handles the EOF edge case (backslash at end of input) by falling through to plain text - this is verified by the test expect(kinds('\\')).toEqual(['TEXT']). The KaTeX delimiter handling with state flag toggles integrates cleanly with the broader lexer state machine.

packages/message-parser/tests/lexer/blocks.test.ts (2)

1-167: Good test coverage for block-level constructs.

The tests comprehensively cover headings, list bullets (unordered, task, ordered), and pipe/spoiler syntax. The emoticon edge cases (e.g., #), -_-, 8-)) are particularly valuable for ensuring the lexer correctly prioritizes emoticon matching in ambiguous contexts.

87-98: Test expectation is correct; no changes needed.

The ] character in '- [y] text' doesn't produce a separate token because scanCloseBracket() only emits a token when ] is immediately followed by ( (link href syntax). Otherwise, it accumulates ] into pending text without emitting a token. This is correct behavior: [y] is not a valid link pattern, so the ] is treated as regular text and merged with y into a single TEXT token.

packages/message-parser/src/lexer/scanners/inline.ts (3)

64-73: Email addresses may be misclassified as user mentions.

For input like alice@example.com, the text alice accumulates as TEXT, then scanAt is invoked at @ and emits MENTION_USER for @example (or potentially @example.com depending on MENTION_OK character set), leaving the rest as separate tokens instead of producing a single EMAIL token.

Email detection via EMAIL_RE only runs from scanPlus (for +... patterns) and scanC, but neither handles the common case of an email address starting with a letter.

41-61: LGTM - Back-patching logic for URL scheme detection.

The tryUrlScheme function correctly handles the case where a URL scheme prefix (like http) has already been emitted as TEXT. The back-patch via ctx.tokens.pop() is safe since ctx.tokens is a plain array per the ScanContext definition. The TRAIL_PUNCT replacement ensures trailing punctuation (like periods at end of sentences) isn't included in the URL.

102-139: LGTM - Multi-pattern scanner for 'c' character.

The scanC function correctly prioritizes patterns: color:#hex`` first (most specific), then URL, then email. The flushText call before emitting COLOR (line 110) addresses the previously flagged issue.

packages/message-parser/tests/lexer/code.test.ts (1)

1-109: LGTM! Comprehensive test coverage for code tokenization.

The test file is well-structured with clear helper functions (tok, kinds, kv, raw) that reduce boilerplate. Good coverage of edge cases including empty spans, unclosed code at newline/EOF, multiple spans, and round-trip integrity verification.

packages/message-parser/src/lexer/ScanContext.ts (1)

1-81: LGTM! Well-designed context and utility functions.

The ScanContext interface provides a clean abstraction for scanner state. The safety cap (MAX_TOKENS) prevents runaway tokenization. Utility functions are well-scoped and correctly implemented.
packages/message-parser/src/lexer/scanners/blocks.ts (3)
26-29: Type cast String(count) as '1' | '2' | '3' | '4' assumes count is 1-4.

The while loop (line 22) can result in count = 0 if the character at pos isn't #, though since this scanner is dispatched for #, count will always be ≥ 1. However, if count exceeds 4 (not possible due to count < 4 guard), the cast would be incorrect.

The current logic is safe because:

Scanner is only called when current char is #

Loop stops at count < 4

So count is guaranteed to be 1-4. The code is correct.

62-76: Task bullet detection could fail on short input.

Lines 66-70 access input[pos + 3], charCodeAt(pos + 4), and charCodeAt(pos + 5) without explicit length checks. However, charCodeAt returns NaN for out-of-bounds indices, and input[pos + 3] returns undefined. The comparisons will fail safely, so no runtime error occurs.

Functionally correct, though explicit bounds checks could improve clarity.

123-133: URL regex behavior is correct with sticky flag.

The regex uses the y (sticky) flag, which ensures exec() only matches at exactly the lastIndex position. Setting lastIndex = pos and then calling exec() correctly enforces that the match begins at the intended position.
			> Likely an incorrect or invalid review comment.
packages/message-parser/src/lexer/constants/charCodes.ts (1)

1-45: LGTM! Clear and correct ASCII character code constants.

All character codes are accurate. The inline comments serve as useful documentation showing the actual character each code represents, which aids maintainability for this reference file.

packages/message-parser/src/lexer/scanners/links.ts (1)

77-88: Standalone brackets and parentheses always emit link tokens—this is intentional and tested.

The test suite (links.test.ts:36–38) explicitly validates that a lone [ produces LINK_OPEN and a lone ) produces LINK_HREF_CLOSE. This confirms the behavior is by design for the Stage 1 flat token stream, with disambiguation deferred to Stage 2 parsing.

…ments

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/message-parser/src/index.ts">

<violation number="1" location="packages/message-parser/src/index.ts:22">
P1: `parse()` now returns `Token[]` for the handwritten engine and exposes a `Root | Token[]` public return type, breaking the prior AST-return contract.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

…h boundary

…se entry to scan context

…le flags

…based checks

… and unicode mentions

cubic-dev-ai · 2026-04-23T19:04:36Z

You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment @cubic-dev-ai review.

Se3do · 2026-04-25T09:59:22Z

@cubic-dev-ai review.

cubic-dev-ai · 2026-04-25T09:59:55Z

@cubic-dev-ai review.

@Se3do I have started the AI code review. It will take a few minutes to complete.

Se3do added 8 commits March 8, 2026 20:42

chore(message-parser-handwritten): initialize project with tsconfig a…

2e8369a

…nd jest setup

feat(message-parser-handwritten): add AST node types and utilities

c14651a

feat(message-parser-handwritten): add token types and scan context

d983da5

feat(message-parser-handwritten): add modular scanners

761acae

Extract scanning logic into dedicated modules: - blocks, code, emoji, formatting, inline, links, math, structural - central dispatch via scanners/index.ts

feat(message-parser-handwritten): add main lexer entry point

bc1209b

test(message-parser-handwritten): add comprehensive test suite

ad06544

perf(message-parser-handwritten): add benchmark suite for the lexer

b009a1b

docs(message-parser-handwritten): add project README

d5d139b

Se3do marked this pull request as ready for review March 8, 2026 19:29

Copilot AI review requested due to automatic review settings March 8, 2026 19:29

Copilot started reviewing on behalf of Se3do March 8, 2026 19:30 View session

coderabbitai Bot added the type: feature Pull requests that introduces new feature label Mar 8, 2026

dionisio-bot Bot added the stat: conflict label Mar 8, 2026

cubic-dev-ai Bot reviewed Mar 8, 2026

View reviewed changes

Copilot AI reviewed Mar 8, 2026

View reviewed changes

Comment thread packages/message-parser-handwritten/package.json Outdated

Comment thread packages/message-parser/src/lexer/scanners/inline.ts

Comment thread packages/message-parser-handwritten/src/lexer/scanners/code.ts Outdated

chore: restore upstream yarn.lock

d425f3d

dionisio-bot Bot removed the stat: conflict label Mar 8, 2026

chore(message-parser-handwritten): remove unused files

6ea0d54

coderabbitai Bot reviewed Mar 8, 2026

View reviewed changes

Se3do added 7 commits March 8, 2026 21:51

fix(message-parser-handwritten): include U+1FA00–U+1FAFF range in uni…

376183d

…code emoji detection in lexer

fix(message-parser-handwritten): remove accidental trailing space in …

c9ac88e

…emoticon key '>.<'

fix(message-parser-handwritten): fixed silently emitting EOF when MAX…

e462709

…_TOKENS is reached

fix(message-parser-handwritten): flush pending text before emitting C…

e47852c

…OLOR token

fix(message-parser-handwritten): add -S flag to env for multi-word in…

0f6bf3a

…terpreter in shebang

fix(message-parser-handwritten): correctly parse ZWJ emoji sequences …

894eaa8

…with skin-tone modifiers

fix(message-parser-handwritten): avoid consuming trailing ZWJ without…

83875c1

… a valid emoji component

chore: remove message-parser-handwritten package

2ac67d4

coderabbitai Bot removed the type: feature Pull requests that introduces new feature label Mar 13, 2026

dionisio-bot Bot added the stat: conflict label Mar 13, 2026

Merge branch 'develop' into feat/handwritten-parser-prototype

65fd9c5

dionisio-bot Bot removed the stat: conflict label Mar 13, 2026

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread packages/message-parser/benchmarks/lexer.bench.ts Outdated

Comment thread packages/message-parser/src/lexer/constants/regexes.ts

Comment thread packages/message-parser/src/lexer/constants/regexes.ts Outdated

Se3do added 2 commits March 13, 2026 16:18

feat(message-parser): add handwritten engine option to parse()

4b5862b

chore(message-parser): remove stale and redundant lexer benchmark com…

3572ee7

…ments

cubic-dev-ai Bot reviewed Mar 13, 2026

View reviewed changes

Comment thread packages/message-parser/src/index.ts Outdated

Se3do added 17 commits March 13, 2026 16:48

fix(message-parser): trim trailing punctuation from EMAIL lexer tokens

9fa3f09

fix(message-parser): make COLOR_RE match 8/4-digit hex atomically wit…

c03fcb9

…h boundary

feat(message-parser): add handwritten lexer options plumbing from par…

16490fd

…se entry to scan context

feat(message-parser): gate handwritten lexer features with parser-sty…

d330b0d

…le flags

feat(message-parser): align URL and email autolink validity with tld-…

5eab63b

…based checks

feat(message-parser): broaden URL detection to all ascii word starts

e7045fa

feat(message-parser): support unicode mention names in handwritten lexer

0b00a07

test(message-parser): add parity tests for handwritten options, urls,…

0eeb2ad

… and unicode mentions

fix(message-parser): emit LINK_HREF_CLOSE for standalone closing paren

921afb4

fix(message-parser): restrict #channel mentions to word boundaries

30f31e8

feat(message-parser): add Parser skeleton and options

b097c22

feat(message-parser): wire handwritten parser into parse() entry point

8f6e447

refactor(message-parser): switch parser engine branch logic

bff72b5

feat(message-parser): add parser foundation and heading block parsing

3196485

feat(message-parser): add heading and code fence block parsing

8e010b7

feat(message-parser): add blockquote, list, and task parsing

8d99e0e

feat(message-parser): add katex, spoiler block, and big emoji parsing

11d2b60

Merge branch 'RocketChat:develop' into feat/handwritten-parser-prototype

6f0c6f1

Conversation

Se3do commented Mar 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Goal

Why

Performance vs. current parser

Next Steps

Summary by CodeRabbit

Uh oh!

dionisio-bot Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

CLAassistant commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot commented Apr 23, 2026

Uh oh!

Se3do commented Apr 25, 2026

Uh oh!

cubic-dev-ai Bot commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Se3do commented Mar 8, 2026 •

edited by coderabbitai Bot

Loading

dionisio-bot Bot commented Mar 8, 2026 •

edited

Loading

changeset-bot Bot commented Mar 8, 2026 •

edited

Loading

CLAassistant commented Mar 8, 2026 •

edited

Loading

coderabbitai Bot commented Mar 8, 2026 •

edited

Loading