Skip to content

Add support for comments in --ignore-words file#2068

Open
DimitriPapadopoulos wants to merge 1 commit intocodespell-project:mainfrom
DimitriPapadopoulos:comments_in_ignore_words
Open

Add support for comments in --ignore-words file#2068
DimitriPapadopoulos wants to merge 1 commit intocodespell-project:mainfrom
DimitriPapadopoulos:comments_in_ignore_words

Conversation

@DimitriPapadopoulos
Copy link
Copy Markdown
Collaborator

Closes #2063.

Copy link
Copy Markdown
Collaborator

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if my typo has a hash in it? Ensuring it's the first character or there's a space before it or something might work, but we really need to update the dictionary tests to ensure they catch this too (so we can't have dictionary entries we can't ignore).

@DimitriPapadopoulos
Copy link
Copy Markdown
Collaborator Author

Do we intend to catch typos with a hash? Whatever the answer, requiring a space before # if it's not the first character looks like a good idea, it's more readable. I'll fix that.

@DimitriPapadopoulos
Copy link
Copy Markdown
Collaborator Author

I've modified the code to match lines starting with # (and thus avoid matching typos starting with #) and the part of lines starting with #. So these would be comments:

# comment
errror #comment

and these would not be comments:

#notcomment
errror# notcomment

My concern is that this is getting quite confusing for end-users, especially those used to Python comments. Instead I would recommend not supporting typos containing hashes. If end-users do ask for this functionality in the future, we could introduce escaping with '\' instead, like this:

\#rror->error
err\#r->error

which could be extended to support commas too, as in:

yesido->yes\, I do

By the way, is codespell supposed to support typos with spaces? For example, I know codespell supports warmup->warm up, warm-up,, but does it support warm up->warmup, warm-up,? I cannot find such an example:

$ cd codespell_lib/data
$ 
$ awk -F '->' '{ print $1; }' *.txt | grep ' '
$ 

@peternewman
Copy link
Copy Markdown
Collaborator

Do we intend to catch typos with a hash?

I don't specifically know, but if we stop them from being ignored, we ought to stop them getting in the dictionary for now (so we can rethink that issue if any come up).

and these would not be comments:

#notcomment
errror# notcomment

I think the first of these should be too, i.e. the whole line commented out. I think I'd agree with you on the latter one.

My concern is that this is getting quite confusing for end-users,

Agreed. It should maybe throw a warning if it matches a non-comment.

especially those used to Python comments.

Yeah they would be worth considering.

Instead I would recommend not supporting typos containing hashes.

That was mostly the direction I was suggesting you go in, but that does mean the dictionary needs a test so none sneak into there.

which could be extended to support commas too, as in:

yesido->yes\, I do

Makes sense.

By the way, is codespell supposed to support typos with spaces? For example, I know codespell supports warmup->warm up, warm-up,, but does it support warm up->warmup, warm-up,?

IIRC it doesn't currently (probably because of the correction thing), but a few people have asked for it.

@DimitriPapadopoulos
Copy link
Copy Markdown
Collaborator Author

I've kept the simplest comment definition: comments start with a hash (#) like Python or shell comments, and we don't require whitespaces before or after the hash. So all of the following are comments:

#comment
# comment
foobar#comment
foobar #comment
foobar# comment

While the code itself does not enforce that, I would recommend whitespaces before and after the hash like PEP8:

foobar  # comment

But that's not enforced.

In addition to whitespaces and commas, we cannot support hashes in typos any more. Update the dictionary tests accordingly.

@DimitriPapadopoulos DimitriPapadopoulos force-pushed the comments_in_ignore_words branch 2 times, most recently from 43bd356 to 5bf7863 Compare September 27, 2021 19:50
@glebsts
Copy link
Copy Markdown

glebsts commented Mar 24, 2022

Hello, what's up? Recently discovered codespell, like it, but I need to add comment to ignore file to give a note about why we accept one or other misspelling (i.e. # because we have a developer with surname looking like misspelling).
I can take over this PR if smbd explains me what is needed.
@peternewman

Like Python comments, comments start with a hash ('#').

This means we cannot support hashes in typos, so check the dictionaries
for hashes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for comments in --ignore-words file

3 participants