Add repeated mistake detection loop#64
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the devflow mistakes feature, allowing the recording, listing, and detection of repeated agent mistakes (such as PowerShell range syntax and unavailable Playwright modules) across the CLI, core, and MCP packages. The feedback highlights opportunities to make the detection logic more robust by allowing optional whitespace in the PowerShell range regex, limiting the wildcard search window in Playwright module detection to prevent false positives, and ensuring type consistency by parsing the CLI exit-code option as an integer.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| /Select-Object\s+-Index\s+\d+\.\.\d+/i.test(commandText) || | ||
| (/Cannot bind parameter 'Index'/i.test(combined) && | ||
| /Cannot convert value "\d+\.\.\d+" to type "System\.Int32"/i.test(combined)) |
There was a problem hiding this comment.
The regular expressions used to detect PowerShell range syntax mistakes do not account for optional whitespace around the range operator (..). In PowerShell, spaces are permitted around .. (e.g., 108 .. 156), and agents might generate such commands. Updating the regexes to allow optional whitespace (\s*) around \.\. will make the detection more robust and prevent false negatives.
| /Select-Object\s+-Index\s+\d+\.\.\d+/i.test(commandText) || | |
| (/Cannot bind parameter 'Index'/i.test(combined) && | |
| /Cannot convert value "\d+\.\.\d+" to type "System\.Int32"/i.test(combined)) | |
| /Select-Object\s+-Index\s+\d+\s*\.\.\s*\d+/i.test(commandText) || | |
| (/Cannot bind parameter 'Index'/i.test(combined) && | |
| /Cannot convert value "\d+\s*\.\.\s*\d+" to type "System\.Int32"/i.test(combined)) |
| function detectPlaywrightModuleUnavailable(combined) { | ||
| return ( | ||
| /Cannot find (module|package) ['"]@?playwright(?:\/test)?['"]/i.test(combined) || | ||
| /ERR_MODULE_NOT_FOUND[\s\S]*@?playwright(?:\/test)?/i.test(combined) |
There was a problem hiding this comment.
The regular expression /ERR_MODULE_NOT_FOUND[\s\S]*@?playwright(?:\/test)?/i uses a greedy wildcard [\s\S]* which can match across the entire combined output. If an unrelated module fails to load (triggering ERR_MODULE_NOT_FOUND) and the word playwright appears anywhere else in the logs (such as in the command text or stdout), this will trigger a false positive. Limiting the search window (e.g., to 300 characters) will make the detection much more reliable.
| /ERR_MODULE_NOT_FOUND[\s\S]*@?playwright(?:\/test)?/i.test(combined) | |
| /ERR_MODULE_NOT_FOUND[\s\S]{0,300}@?playwright(?:\/test)?/i.test(combined) |
| command: options.command, | ||
| stderr: options.stderr, | ||
| stdout: options.stdout, | ||
| exitCode: options["exit-code"], |
There was a problem hiding this comment.
Command line arguments parsed from the CLI are always strings. Passing options["exit-code"] directly to createMistakeDetection results in a string type for exitCode (e.g., "1"), whereas other parts of the system (such as gate evidence) expect exitCode to be a number or null. Parsing it as an integer ensures type consistency across the JSON contracts.
exitCode: options["exit-code"] !== undefined ? parseInt(options["exit-code"], 10) : undefined,
Summary
Verification