Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we skip files which end in ".pem/.crt" #2135

Open
clickthisnick opened this issue Nov 3, 2021 · 9 comments
Open

Can we skip files which end in ".pem/.crt" #2135

clickthisnick opened this issue Nov 3, 2021 · 9 comments

Comments

@clickthisnick
Copy link
Contributor

What are people's thought on skipping files that end with ".pem" and ".crt" so that certificates and things like that don't get false flagged on accident?

@vikivivi
Copy link
Contributor

vikivivi commented Nov 4, 2021

You might want to see "skip" in https://github.com/codespell-project/codespell/blob/master/README.rst

@clickthisnick
Copy link
Contributor Author

ya that's what we are doing - didn't know if the community thought it would be okay to default skip without that explicitly set tho

@peternewman
Copy link
Collaborator

If I look at some random .pem and .crt files, some do have some plain English in them too, although mostly just the example ones. Is there some reason they shouldn't be scanned automatically?

Also what's it tripping up on them, two letter character combinations? Can we resolve it by just moving them to the code dictionary?

@clickthisnick
Copy link
Contributor Author

ya its a bunch of 2/3 letters things like FLE -> FILE, we started enabling codespell automatically on a bunch of repos and people have been fixing typos in their testing/dummy certs and then wonder why they are then broken/invalid

I don't think moving to code dictionary would work as likely fle is a typo.

looking at my specific example the cert has a line FLE+blah and FLE is being flagged. It seems like + is a delimiter like space so FLE is considered a word, but I wonder if it should be?

@peternewman
Copy link
Collaborator

ya its a bunch of 2/3 letters things like FLE -> FILE, we started enabling codespell automatically on a bunch of repos and people have been fixing typos in their testing/dummy certs and then wonder why they are then broken/invalid

Oh dear. I was going to suggest something clever for hex, then realised it's base 64 so that's a non-starter.

I don't think moving to code dictionary would work as likely fle is a typo.

Yeah agreed, again if it was just hex we could do clever stuff, but it's every typo.

looking at my specific example the cert has a line FLE+blah and FLE is being flagged. It seems like + is a delimiter like space so FLE is considered a word, but I wonder if it should be?

I think you want it to be, so you catch typos in your variables when you're doing foo+bar=baz.

I'm sort of ambivalent either way to this personally, perhaps we should have a straight vote; 👍 or 👎 on @clickthisnick first post in this topic as to whether we should change the default skip (when nothing is set) to include these types of files.

If we do so, we should probably make sure it logs the files its skipping by default, so we're not silently hiding some typos.

@matkoniecz
Copy link
Contributor

If skipping would be automatically done: would there be any way to actually scan .pem/.crt files?

I see no overriding of skip in parameters (which could be useful BTW, thugh workaround of multiple codespell is also viable)

And codespell */**/*.crt would not scan crt file two folders deep.

@clickthisnick
Copy link
Contributor Author

After reading the Jupyter notebook filter issue, having to maintain and include a bunch of custom file extensions in the core product would be annoying and time consuming.

For my usecase we had a script add the codespell config to repos (via pre-commit), we can def just ignore the specific extensions we have found to be problematic in our specific environment, rather than make this tool much more complicated

@clickthisnick
Copy link
Contributor Author

clickthisnick commented Nov 12, 2021

I'm okay with closing this issue, and saying its up to the user to use the tool in the best way that they best see fit, rather than edit the tool to take a non intuitive action for each specific case

@peternewman
Copy link
Collaborator

If skipping would be automatically done: would there be any way to actually scan .pem/.crt files?

Possibly not with how it's written currently, but we could set things up so the default skip argument was to skip those two extensions (and maybe .git)? If you then supplied any skip argument, it would be cancelled, but you could skip them manually there, as well as what you wanted to skip.

After reading the Jupyter notebook filter issue, having to maintain and include a bunch of custom file extensions in the core product would be annoying and time consuming.

Personally I wouldn't be so against it for something like this, which has a far broader usage, at least in the sense nearly everyone uses certs, but perhaps not many people scan them with Codespell. I guess we need to work out if they are extensions to codespell (i.e. special processing via a module/function when it matches a particular type of file), or using codespell in external tools.

For my usecase we had a script add the codespell config to repos (via pre-commit), we can def just ignore the specific extensions we have found to be problematic in our specific environment, rather than make this tool much more complicated

That's great. You could also possibly look at an ignore regex to match the header, base64, footer pattern, which would still find typos elsewhere in those files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants