Skip to content

Conversation

@vries
Copy link
Contributor

@vries vries commented Nov 20, 2025

This series consists of four patches.

Patches 1, 2 and 4 each fix a problem with --ignore-multiline-regex (described in the commit messages), and add a corresponding unit test.

Patch 3 is a refactoring patch, making patch 4 smaller.

Patch 2 fixes this issue .

Closes #3642

Consider the following text:
```
$ cat -n test.txt
     1  codespell:ignore-begin
     2  Thsi 1
     3  codespell:ignore-end
     4  thsi 2
```

When checking the file, as expected line 2 is ignored:
```
$ codespell \
    --ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
    test.txt
test.txt:4: thsi ==> this
$
```

However, if we do the same using stdin, line 2 is not ignored:
```
$ cat test.txt \
  | codespell \
      --ignore-multiline-regex \
      'codespell:ignore-begin.*codespell:ignore-end' \
      -
2: Thsi 1
        Thsi ==> This
4: thsi 2
        thsi ==> this
```

Fix this in the filename == "-" handling in parse_file, by using
FileOpener.get_lines instead of io.IOBase.readlines.
Consider the following file:
```
$ cat test.txt
Thsi line contains a typo
While this line is correct
$
```

As expected, using --write-changes fixes the typo in the file:
```
$ codespell --write-changes test.txt
FIXED: test.txt
$ cat test.txt
This line contains a typo
While this line is correct
$
```

However, if we use --ignore-multiline-regex as well, we get instead:
```
$ codespell \
    --ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
    -w \
    test.txt
FIXED: test.txt
$ cat test.txt
This line contains a typoWhile this line is correct$
```

Fix this in the self.ignore_multiline_regex != None case in
FileOpener.get_lines, by using str.splitlines instead of str.split.

Fixes:
- codespell-project#3642
No functional changes.

This makes parse_file a bit more readable, and makes the following
patch smaller.
Consider the following file:
```
$ cat test.txt
codespell:ignore-begin
Thsi line contains a typo
codespell:ignore-end
Thsi line also contains a typo
While this line is correct
```

If we use codespell to fix the second typo:
```
$ codespell \
    --ignore-multiline-regex 'codespell:ignore-begin.*codespell:ignore-end' \
    --write-changes \
    test.txt
FIXED: test.txt
```
indeed that typo is fixed, but the text matching the multiline regexp is gone:
```
$ cat test.txt



This line also contains a typo
While this line is correct
$
...

The problem is that FileOpener.get_lines (returning a list of strings)
implements --ignore-multiline-regex by blanking out the text matching the
regexp.

Fix this by changing FileOpener.get_lines to return a list of fragments, each
modeled as a tuple (ignored: bool, line_number: int, lines: list[str]), and
handling this new format elsewhere.
@DimitriPapadopoulos
Copy link
Collaborator

Looks good to me, I think the coverage issue predates these patches. @larsoner Could you have a look too?

@larsoner larsoner merged commit 200c31b into codespell-project:main Nov 24, 2025
16 of 17 checks passed
@larsoner
Copy link
Member

Thanks @vries !

@vries vries deleted the ignore-multiline-regex branch November 25, 2025 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: With ignore-multiline-regex set, --write doesn't work properly

3 participants