Conversation
Files with invalid UTF-8 bytes (and no encoding magic comment) were silently skipped with empty diagnostics. RuboCop reports these as a fatal Lint/Syntax "Invalid byte sequence in utf-8." offense at line 1. Added `emit_invalid_utf8_diagnostic()` in `lint_file()` to emit the diagnostic instead of returning empty. This resolves 21 of 27 FN. The remaining 6 FN are semantic parse errors (e.g., "Invalid retry without rescue") and config/context mismatches that cannot be fixed without causing FP regressions due to Prism reporting these more broadly than RuboCop's Parser gem. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cdcb8cb to
799dfdd
Compare
Contributor
|
✅ Cop-check results (8 shards)
All shards passed — ready to merge. |
6
approved these changes
Mar 30, 2026
Owner
6
left a comment
There was a problem hiding this comment.
Reviewed: correctly detects invalid UTF-8 files without encoding magic comments. Proper config/filter handling. Resolves 21 of 27 FNs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Automated fix fix for
Lint/Syntaxcorpus conformance.Details
Lint/SyntaxResult
Changes
Agent conversation excerpt
I'll start by reading the key source files and documentation.
Let me look at the linter's syntax diagnostic emission logic and understand how parser errors are handled.
Let me understand how the source file parsing handles invalid UTF-8 bytes.
Let me look at how
lint_source_innerhandles files - particularly the parsing and how invalid UTF-8 is handled.Let me read the full
lint_source_oncefunction to see the complete flow.Let me run the existing tests to see the current state, and also check what Prism does with invalid UTF-8 and "retry" outside rescue.
Let me look at how the corpus check compares offenses - is it message-exact or just location-based?
Let me test what Prism reports for files with invalid UTF-8 bytes and for "retry" outside rescue.
Let me check what Prism actually reports for invalid UTF-8 bytes.
Let me write a proper Rust test to check Prism's behavior with invalid UTF-8.
fn main() {
// We'll just use cargo test with a temporary te`
Let me look at how the linter handles non-UTF-8 files. There may be early file reading logic that skips them.
Now let me read the key section around line 426 where invalid UTF-8 files are handled.
Let me check what file scope the workflow enforces for cop fixes.
Let me look at the Cop trait to see what methods are available.
Task prompt (3165 tokens)
Fix Lint/Syntax — 0 FP, 27 FN
Instructions
You are fixing ONE cop in nitrocop, a Rust Ruby linter that uses Prism for parsing.
Current state: 6,600 matches, 0 false positives, 27 false negatives.
Focus on: FN (RuboCop flags code nitrocop misses).
⚠ 6,600 existing matches must not regress. Validate with
check_cop.pybefore committing.Workflow
behavior on BOTH the specific FP case AND the general pattern:
tests/fixtures/cops/lint/syntax/offense.rbwith^annotationtests/fixtures/cops/lint/syntax/no_offense.rbcargo test --lib -- cop::lint::syntaxsrc/cop/lint/syntax.rscargo test --lib -- cop::lint::syntax///doc comment on the cop struct documenting what you found and fixedFixture Format
Mark offenses with
^markers on the line AFTER the offending source line.The
^characters must align with the offending columns. The message format isLint/Syntax: <message text>.See the Current Fixture sections below for real examples from this cop.
Mixed issues: some code bugs, some config issues
Pre-diagnostic shows SOME patterns are correctly detected in isolation (config issues)
and SOME are genuinely missed (code bugs). See the per-example diagnosis below.
If your test passes immediately
If you add a test case and it passes without code changes, the corpus mismatch is
caused by config/context differences, not a detection bug.
Do NOT loop trying to make the test fail. Instead:
src/config/or the cop's config handling, not detection logica
///comment on the cop struct and commitCRITICAL: Avoid regressions in the opposite direction
When fixing FPs, your change MUST NOT suppress legitimate detections. When fixing FNs,
your change MUST NOT flag code that RuboCop accepts. A fix that eliminates a few issues
in one direction but introduces hundreds in the other is a catastrophic regression.
Before exempting a category of patterns, verify with RuboCop that the general case
is still an offense:
If RuboCop flags the general pattern but not your specific case, the difference is in
a narrow context (e.g., enclosing structure, receiver type, argument count) — your fix
must target that specific context, not the broad category.
Rule of thumb: if your fix adds an early
returnorcontinuethat skips a wholenode type, operator class, or naming pattern, it's probably too broad. Prefer adding a
condition that matches the SPECIFIC differentiating context.
Rules
src/cop/lint/syntax.rsandtests/fixtures/cops/lint/syntax/cargo test --lib -- cop::lint::syntaxto verify your fix (do NOT run the full test suite)python3 scripts/check_cop.py Lint/Syntax --rerun --clone --sample 15before committing to catch regressionsgit stashCurrent Fixture: offense.rb
tests/fixtures/cops/lint/syntax/offense.rbCurrent Fixture: no_offense.rb
tests/fixtures/cops/lint/syntax/no_offense.rbKey Source Files
src/cop/lint/syntax.rsvendor/rubocop/lib/rubocop/cop/lint/syntax.rbvendor/rubocop/spec/rubocop/cop/lint/syntax_spec.rbRead these files before making changes.
Start Here
Use the existing corpus data to focus on the most concentrated regressions first.
Helpful local commands:
python3 scripts/investigate_cop.py Lint/Syntax --repos-onlypython3 scripts/investigate_cop.py Lint/Syntax --contextpython3 scripts/verify_cop_locations.py Lint/SyntaxTop FN repos:
rubycocoa__rubycocoa__773ca0c(14 FN) — exampletemplate/Xcode3.x/ProjectBuilder/Application/Cocoa-Ruby Application/rb_main.rb:1castwide__solargraph__676da4c(3 FN) — examplespec/fixtures/invalid_byte.rb:1ruby-formatter__rufo__a90e654(2 FN) — examplespec/lib/rufo/formatter_source_specs/retry.rb.spec:3Representative FN examples:
castwide__solargraph__676da4c: spec/fixtures/invalid_byte.rb:1— Invalid byte sequence in utf-8.castwide__solargraph__676da4c: spec/fixtures/invalid_node_comment.rb:1— Invalid byte sequence in utf-8.castwide__solargraph__676da4c: spec/fixtures/invalid_utf8.rb:1— Invalid byte sequence in utf-8.Pre-diagnostic Results
Diagnosis Summary
Each example was tested by running nitrocop on the extracted source in isolation
with
--force-default-configto determine if the issue is a code bug or config issue.Note: source context is truncated and may not parse perfectly. If a diagnosis
seems wrong (e.g., your test passes immediately for a 'CODE BUG'), treat it as
a config/context issue instead.
FN #1:
castwide__solargraph__676da4c: spec/fixtures/invalid_byte.rb:1NOT DETECTED — CODE BUG
The cop fails to detect this pattern. Fix the detection logic.
Prism AST at offense line:
ProgramNode > StatementsNodeMessage:
Invalid byte sequence in utf-8.Ready-made test snippet (add to offense.rb, adjust
^count):Full file context (30 lines before offense):
FN #2:
castwide__solargraph__676da4c: spec/fixtures/invalid_node_comment.rb:1NOT DETECTED — CODE BUG
The cop fails to detect this pattern. Fix the detection logic.
Prism AST at offense line:
ProgramNode > StatementsNodeMessage:
Invalid byte sequence in utf-8.Ready-made test snippet (add to offense.rb, adjust
^count):Full file context (30 lines before offense):
FN #3:
castwide__solargraph__676da4c: spec/fixtures/invalid_utf8.rb:1DETECTED in isolation — CONFIG/CONTEXT issue
The cop correctly detects this pattern with default config.
The corpus FN is caused by the target repo's configuration
(Include/Exclude patterns, cop disabled, file outside scope,
or
rubocop:disablecomment). Investigate config resolution.Message:
Invalid byte sequence in utf-8.Ready-made test snippet (add to offense.rb, adjust
^count):Full source context:
FN #4:
expertiza__expertiza__594a6fb: db/migrate/112_create_response_maps.rb:133DETECTED in isolation — CONFIG/CONTEXT issue
The cop correctly detects this pattern with default config.
The corpus FN is caused by the target repo's configuration
(Include/Exclude patterns, cop disabled, file outside scope,
or
rubocop:disablecomment). Investigate config resolution.Message:
Invalid return in class/module body (Using Ruby 4.0 parser; configure usingTargetRubyVersionparameter, underAllCops)Ready-made test snippet (add to offense.rb, adjust
^count):Full source context:
FN #5:
inferiorhumanorgans__meishi__f731fd8: app/controllers/carddav/address_book_controller.rb:1DETECTED in isolation — CONFIG/CONTEXT issue
The cop correctly detects this pattern with default config.
The corpus FN is caused by the target repo's configuration
(Include/Exclude patterns, cop disabled, file outside scope,
or
rubocop:disablecomment). Investigate config resolution.Message:
Invalid byte sequence in utf-8.Ready-made test snippet (add to offense.rb, adjust
^count):Full source context:
FN #6:
mbj__unparser__15c57a1: test/corpus/literal/before/34.rb:1DETECTED in isolation — CONFIG/CONTEXT issue
The cop correctly detects this pattern with default config.
The corpus FN is caused by the target repo's configuration
(Include/Exclude patterns, cop disabled, file outside scope,
or
rubocop:disablecomment). Investigate config resolution.Message:
Invalid retry without rescue (Using Ruby 4.0 parser; configure usingTargetRubyVersionparameter, underAllCops)Ready-made test snippet (add to offense.rb, adjust
^count):Full source context:
FN #7:
ruby-formatter__rufo__a90e654: spec/lib/rufo/formatter_source_specs/retry.rb.spec:3NOT DETECTED — CODE BUG
The cop fails to detect this pattern. Fix the detection logic.
Message:
Invalid retry without rescue (Using Ruby 4.0 parser; configure usingTargetRubyVersionparameter, underAllCops)Ready-made test snippet (add to offense.rb, adjust
^count):Full file context (30 lines before offense):
FN #8:
ruby-formatter__rufo__a90e654: spec/lib/rufo/formatter_source_specs/retry.rb.spec:6NOT DETECTED — CODE BUG
The cop fails to detect this pattern. Fix the detection logic.
Message:
Invalid retry without rescue (Using Ruby 4.0 parser; configure usingTargetRubyVersionparameter, underAllCops)Ready-made test snippet (add to offense.rb, adjust
^count):Full file context (30 lines before offense):
Omitted 7 additional diagnosed FN example(s) for brevity.