-
Notifications
You must be signed in to change notification settings - Fork 0
Filtering raw Markdown syntax #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"Determiner" now accepts a database instance. I'm not sure why the tests were not updated for this change, so I implemented a MockDatabase here.
|
Warning Rate limit exceeded@pan93412 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 2 minutes and 21 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds database abstraction interfaces and in-memory test mocks, refactors Determiner and web wiring to accept those interfaces, implements exclusion logic for code/links in spelling checks, updates tests to use mocks and async flows, upgrades GitHub workflows to pnpm and newer actions, and bumps several dependencies. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Determiner
participant ExclusionLogic
participant SpellingDB
participant CaseDB
User->>Determiner: checkSpelling(content)
Determiner->>ExclusionLogic: buildExclusionRanges(content)
ExclusionLogic->>ExclusionLogic: find code blocks, inline code, links
ExclusionLogic->>ExclusionLogic: merge overlapping ranges
ExclusionLogic-->>Determiner: exclusionRanges[]
Determiner->>SpellingDB: getRules()
SpellingDB-->>Determiner: spellingRules[]
loop for each rule match
Determiner->>ExclusionLogic: isPositionExcluded(matchIndex)
alt excluded
ExclusionLogic-->>Determiner: true (skip)
else not excluded
ExclusionLogic-->>Determiner: false
Determiner->>Determiner: record spelling mistake
end
end
Determiner->>CaseDB: getRules()
CaseDB-->>Determiner: caseRules[]
loop for each case match
Determiner->>ExclusionLogic: isPositionExcluded(matchIndex)
alt excluded
ExclusionLogic-->>Determiner: true (skip)
else not excluded
ExclusionLogic-->>Determiner: false
Determiner->>Determiner: record case correction
end
end
Determiner-->>User: mistakes[], corrections[]
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
pnpx = npx. `npm install prettier` is redundant.
d66fe3b to
f0011b0
Compare
f0011b0 to
972d0e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR makes several improvements including dependency updates, refactoring database interactions to use interfaces for better testability, and enhancing the spell-checker logic to exclude markdown-specific syntax. Additionally, it adds comprehensive test coverage and establishes a CI/CD pipeline for automated testing.
- Dependency version updates across multiple packages (esbuild, rollup, vitest, discord.js, mongoose, dotenv, fastify, etc.)
- Refactored database code to use interfaces (
ISpellingDatabase,ICaseDatabase) with mock implementations for testing - Enhanced spell-checking logic to ignore content within code blocks, markdown links, and auto-links
- Added comprehensive test coverage for the new exclusion logic
- Set up GitHub Actions workflows for testing and updated existing workflows
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| package.json | Updated dependency versions for production and development packages |
| pnpm-lock.yaml | Lock file reflecting updated dependency versions and their transitive dependencies |
| pnpm-workspace.yaml | Added workspace configuration to only build dependencies for esbuild |
| web.ts | Changed import from concrete classes to interfaces for dependency injection |
| determiner.ts | Changed constructor parameters to use interfaces; added logic to exclude markdown/code syntax from spell-checking |
| determiner.test.ts | Updated tests to use mock implementations and added comprehensive test coverage for exclusion logic |
| database.ts | Updated classes to implement new interfaces |
| database-interfaces.ts | New file defining interfaces for database operations |
| database-test-utils.ts | New file providing mock implementations for testing |
| .github/workflows/test.yaml | New workflow for running automated tests |
| .github/workflows/check.yaml | Updated to use pnpm and newer action versions |
| .github/workflows/release.yaml | Updated actions/checkout version |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
determiner.ts (1)
226-226: Bug: Incorrect mutation of sanitizedContent.Using
sanitizedContent.replace(match, rule.term)replaces only the first occurrence ofmatchin the entire string, not the specific occurrence atmatchIndexthat was just processed. This could:
- Replace the wrong occurrence if the same text appears earlier
- Cause inconsistent behavior across rule checks
- Not achieve the intended normalization effect
If the intent is to prevent duplicate reports, consider tracking reported positions instead of mutating the content.
Consider one of these fixes:
Option 1: Remove the mutation (if duplicate prevention isn't critical):
- sanitizedContent = sanitizedContent.replace(match, rule.term);Option 2: Track reported positions (if you need duplicate prevention):
+ const reportedPositions = new Set<number>(); for (const rule of caseRules) { ... while ((matchResult = regex.exec(content)) !== null) { - // 如果 match 的文字在排除區塊內,跳過大小寫檢查 - if (isExcluded(matchResult.index)) { + const matchIndex = matchResult.index; + // 如果 match 的文字在排除區塊內或已報告,跳過大小寫檢查 + if (isExcluded(matchIndex) || reportedPositions.has(matchIndex)) { continue; } ... if (!/[A-Za-z]/.test(charBefore) && !/[A-Za-z]/.test(charAfter)) { + reportedPositions.add(matchIndex); mistakes.push({...}); } - sanitizedContent = sanitizedContent.replace(match, rule.term); }
♻️ Duplicate comments (1)
determiner.ts (1)
71-71: Empty code blocks not handled.The regex uses
+which requires at least one character, so empty code fences like``````won't be matched and excluded. Valid Markdown can have empty code blocks.Apply this diff to allow empty code blocks:
- const tripleBacktickRegex = /```[\s\S]+?```/g; + const tripleBacktickRegex = /```[\s\S]*?```/g;Based on past review comments.
🧹 Nitpick comments (3)
.github/workflows/test.yaml (1)
14-17: Pin pnpm to the repo’s declared version
package.json’spackageManagerpin ispnpm@10.20.0, but this step letspnpm/action-setupchoose whatever the latest is on run day. When a new pnpm major releases, the workflow can suddenly pick it up and behave differently from local installs. Please pin the action to the same version.- uses: pnpm/action-setup@v4 name: Install pnpm with: + version: 10.20.0 run_install: false.github/workflows/check.yaml (1)
23-26: Keep CI pnpm aligned with package.jsonThe repo declares
packageManager: "pnpm@10.20.0"in package.json, but this setup step installs whatever pnpm version happens to be current. That can introduce CI/local drift and sudden breakage on pnpm releases. Please pin the action to 10.20.0 here as well.- uses: pnpm/action-setup@v4 name: Install pnpm with: + version: 10.20.0 run_install: falsedatabase-test-utils.ts (1)
39-41: Optional: Remove unnecessary optional chaining.The optional chaining
rule._id?.toString()is unnecessary since_idis defined as a required field inMockSpellingRule(line 6). The same applies to line 81 forMockCaseRule.Apply this diff:
- this.#rules = this.#rules.filter(rule => rule._id?.toString() !== ruleId.toString()); + this.#rules = this.#rules.filter(rule => rule._id.toString() !== ruleId.toString());And similarly for line 81:
- this.#rules = this.#rules.filter(rule => rule._id?.toString() !== ruleId.toString()); + this.#rules = this.#rules.filter(rule => rule._id.toString() !== ruleId.toString());
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (11)
.github/workflows/check.yaml(1 hunks).github/workflows/release.yaml(1 hunks).github/workflows/test.yaml(1 hunks)database-interfaces.ts(1 hunks)database-test-utils.ts(1 hunks)database.ts(2 hunks)determiner.test.ts(1 hunks)determiner.ts(5 hunks)package.json(1 hunks)pnpm-workspace.yaml(1 hunks)web.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
database.ts (1)
database-interfaces.ts (2)
ISpellingDatabase(7-58)ICaseDatabase(63-105)
web.ts (1)
database-interfaces.ts (2)
ISpellingDatabase(7-58)ICaseDatabase(63-105)
determiner.test.ts (2)
database-test-utils.ts (2)
MockSpellingDatabase(16-56)MockCaseDatabase(61-91)determiner.ts (1)
Determiner(15-232)
determiner.ts (1)
database-interfaces.ts (2)
ISpellingDatabase(7-58)ICaseDatabase(63-105)
database-interfaces.ts (1)
models.ts (3)
SpellingRule(6-11)SpellingRuleType(20-20)CaseRule(47-49)
database-test-utils.ts (2)
models.ts (3)
SpellingRule(6-11)CaseRule(47-49)SpellingRuleType(20-20)database-interfaces.ts (2)
ISpellingDatabase(7-58)ICaseDatabase(63-105)
🪛 ast-grep (0.39.6)
determiner.ts
[warning] 167-167: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(searchWord, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 167-167: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\b${searchWord}\\b, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (6)
database-interfaces.ts (1)
1-105: Well-designed interface abstraction!The interfaces are comprehensive, well-documented, and provide a clean contract for database operations. The separation between
ISpellingDatabaseandICaseDatabasefollows good design principles.web.ts (1)
7-16: Clean dependency injection refactor.The migration to interface-based dependencies improves testability and follows the dependency inversion principle. The type-only import is appropriate for interfaces.
database-test-utils.ts (1)
16-91: Well-implemented mock databases for testing.The mock implementations correctly implement the database interfaces with in-memory storage, making them suitable for unit tests. The CRUD operations and query methods are appropriately simplified for testing purposes.
determiner.ts (3)
2-22: Interface-based dependency injection implemented correctly.The refactoring to use
ISpellingDatabaseandICaseDatabaseinterfaces improves testability and follows dependency inversion principles.
44-98: Well-designed exclusion logic for Markdown constructs.The exclusion mechanism correctly identifies code blocks (single, double, triple backticks), Markdown links, and autolinks. The overlap detection prevents double-counting ranges that are already excluded by earlier patterns (e.g., double backticks excluding single backticks within them).
100-147: Excellent range merging and adaptive search implementation.The merging logic correctly consolidates overlapping/adjacent ranges, and the adaptive
isExcludedfunction optimally chooses between linear search (≤10 ranges) and binary search (>10 ranges) based on the range count. This addresses the performance optimization suggested in past reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
determiner.ts (1)
54-59: Fix regex to match empty code blocks.The regex
/```[\s\S]+?```/gwon't match empty triple-backtick code blocks (e.g.,``````). While rare, empty code fences are valid in Markdown and should be excluded from spell checking.Apply this diff:
- const tripleBacktickRegex = /```[\s\S]+?```/g; + const tripleBacktickRegex = /```[\s\S]*?```/g;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
determiner.test.ts(1 hunks)determiner.ts(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- determiner.test.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-31T12:45:58.617Z
Learnt from: pan93412
Repo: Edit-Mr/WinstonBot PR: 28
File: determiner.ts:167-178
Timestamp: 2025-10-31T12:45:58.617Z
Learning: In the WinstonBot project's determiner.ts file, the `searchWord` variable in spelling rules intentionally supports regex patterns. The rules are stored in MongoDB as controlled input, so constructing RegExp from these patterns is by design and not a security concern.
Applied to files:
determiner.ts
🧬 Code graph analysis (1)
determiner.ts (1)
database-interfaces.ts (2)
ISpellingDatabase(7-58)ICaseDatabase(63-105)
🪛 ast-grep (0.39.6)
determiner.ts
[warning] 150-150: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(searchWord, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
[warning] 150-150: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(\\b${searchWord}\\b, "gi")
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (5)
determiner.ts (5)
1-22: LGTM! Clean interface adoption.The refactoring to use
ISpellingDatabaseandICaseDatabaseinterfaces improves testability and follows dependency injection best practices. The type-only imports are appropriate.
84-98: LGTM! Correct interval merging.The range merging logic correctly handles both overlapping and adjacent ranges. The sort followed by linear merge is the standard algorithm for interval consolidation.
100-130: LGTM! Excellent adaptive optimization.The dual-strategy approach (linear for ≤10 ranges, binary search for >10 ranges) is well-designed. The binary search correctly finds the candidate range that might contain the position, then validates containment. This addresses the performance concerns from previous reviews.
150-170: LGTM! Correct exclusion integration for spelling checks.The exclusion logic is properly integrated—each match position is checked against excluded ranges before reporting a mistake. The use of the original
content(notsanitizedContent) for position checking is correct.Note: The static analysis warnings about ReDoS on line 151 are false positives. Based on learnings,
searchWordintentionally supports regex patterns from controlled MongoDB input, so this is by design.
183-210: LGTM! Correct exclusion integration for case checks.The exclusion check at line 186 correctly skips matches inside code blocks before performing case validation. The flow ensures that only non-excluded matches are processed for case errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
determiner.test.ts (1)
283-283: Remove unnecessaryasynckeyword from suite definition.Suite callbacks in Vitest are not asynchronous and don't return promises. The
asynckeyword here has no effect and should be removed for clarity.Apply this diff to fix:
- suite("超過兩個反引號內的文字應該被忽略 (inline code block)", async () => { + suite("超過兩個反引號內的文字應該被忽略 (inline code block)", () => {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
determiner.test.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pan93412
Repo: Edit-Mr/WinstonBot PR: 28
File: determiner.ts:167-178
Timestamp: 2025-10-31T12:45:58.617Z
Learning: In the WinstonBot project's determiner.ts file, the `searchWord` variable in spelling rules intentionally supports regex patterns. The rules are stored in MongoDB as controlled input, so constructing RegExp from these patterns is by design and not a security concern.
📚 Learning: 2025-10-31T12:45:58.617Z
Learnt from: pan93412
Repo: Edit-Mr/WinstonBot PR: 28
File: determiner.ts:167-178
Timestamp: 2025-10-31T12:45:58.617Z
Learning: In the WinstonBot project's determiner.ts file, the `searchWord` variable in spelling rules intentionally supports regex patterns. The rules are stored in MongoDB as controlled input, so constructing RegExp from these patterns is by design and not a security concern.
Applied to files:
determiner.test.ts
🧬 Code graph analysis (1)
determiner.test.ts (2)
database-test-utils.ts (2)
MockSpellingDatabase(16-56)MockCaseDatabase(61-91)determiner.ts (1)
Determiner(15-215)
🔇 Additional comments (9)
determiner.test.ts (9)
1-6: LGTM!The imports are well-organized and include all necessary dependencies for the test suite.
8-78: LGTM!The traditional Chinese rule tests are comprehensive and correctly converted to async. The use of mock databases provides good test isolation.
80-150: LGTM!The code block exclusion tests comprehensively cover single-line and multi-line code blocks with various formats. The tests correctly verify that spelling errors inside code blocks are ignored.
152-170: LGTM!The Markdown link test correctly verifies that link text is spell-checked while URLs are excluded. This is the expected behavior for context-aware checking.
172-207: LGTM!The autolink exclusion test and mixed exclusion test provide good coverage of the exclusion logic. They correctly verify that excluded blocks are ignored while non-excluded text is still checked.
209-244: LGTM!The case sensitivity tests provide excellent coverage, verifying that case rules respect exclusion blocks for code but still check Markdown link text and regular text.
246-281: LGTM!Excellent edge case coverage. The multiple exclusion blocks test verifies that different exclusion types work together correctly, and the empty autolink test handles a potential corner case.
299-335: LGTM!Excellent edge case coverage. The empty backticks test verifies that empty code blocks don't incorrectly exclude subsequent text, and the URL test confirms that URLs are properly sanitized before spell-checking.
285-296: Implementation correctly handles 4-6 backtick inline code blocks.The regex pattern
/```[\s\S]+?```/gsuccessfully matches 4+ backtick patterns because extra backticks are consumed by the[\s\S]+?portion. For example, with 4 backticks (javascript), the pattern matches: positions 0-2 as the openingtriplet, positions 3-13 as the middle content (including the 4th backtick), and positions 14-16 as the closingtriplet. This approach correctly extends to 5 and 6 backticks. The test expectations are valid.
This pull request introduces significant improvements to the codebase, focusing on interface abstraction, enhanced test coverage, and improved logic for excluding code and link blocks during spelling and case checks. The changes also update CI workflows for better dependency management and caching.
Interface Abstraction and Mock Implementations
ISpellingDatabaseandICaseDatabaseindatabase-interfaces.tsto define contracts for spelling and case database operations, improving code modularity and testability.SpellingDatabaseandCaseDatabaseclasses indatabase.tsto implement the new interfaces, ensuring consistent usage throughout the codebase. [1] [2]MockSpellingDatabaseandMockCaseDatabaseindatabase-test-utils.tsfor use in tests, enabling isolated and reliable testing without a real database.Improved Determiner Logic
Determinerto use the new interfaces, and added advanced logic to exclude spelling and case checks within code blocks (inline, multi-line, double-backtick), Markdown links, and autolinks, including merging overlapping ranges and optimizing exclusion checks. [1] [2] [3] [4]Expanded and Enhanced Test Coverage
determiner.test.tsto use mock databases and converted all tests to async. Added comprehensive new tests for exclusion logic (code blocks, Markdown links, autolinks, multiple exclusion blocks, empty blocks), ensuring correctness and robustness of the new logic.CI Workflow Modernization
check.yaml,release.yaml,test.yaml) to use newer versions of actions, switch topnpmfor dependency management and caching, and added a new workflow for logic tests with Vitest. This improves CI speed, reliability, and consistency. [1] [2] [3]These changes collectively improve the maintainability, testability, and reliability of the codebase, while ensuring spelling and case checks are context-aware and CI processes are up-to-date.
Fixed #27
Summary by CodeRabbit
Tests
Improvements
Chores
Refactor