Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set exit status if a file does not exist #2157

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

DimitriPapadopoulos
Copy link
Collaborator

Fixes #2129.

@DimitriPapadopoulos
Copy link
Collaborator Author

DimitriPapadopoulos commented Nov 16, 2021

Should we print("ERROR: ...") too?

@peternewman
Copy link
Collaborator

Should we print("ERROR: ...") too?

I'd have thought so, otherwise we'll just get a load of new bug reports for people not sure why they're getting non-zero exit codes.

I guess the only question is whether there's a legitimate reason for this behaviour at all? And hence whether we need an option to be able to skip it. ls doesn't seem to have and you can't work around it with an unmatching glob.

@DimitriPapadopoulos
Copy link
Collaborator Author

Note that parse_file()is called both for files explictly passed as arguments, and for files within directories passed as arguments. We want to catch the former case of files explictly passed as arguments that don't exist. We don't really want to catch the latter case of files returned by os.walk(), but for which os.path.exists( ) is false. It could happen if a directory is being erased while running codespell on it. While this really is a corner case, I've moved the os.path.exists( ) outside parse_file().

@DimitriPapadopoulos
Copy link
Collaborator Author

Also, I don't think there are legitimate reasons to explictly call codespell on a non-existant file and not expect an error. Yet I have kept if not os.path.exists(filename) under if not glob_match.match(filename), so that it remains possible to pass a non-existant file as an argument and skip it with a non matching glob - as you wrote.

@peternewman
Copy link
Collaborator

Note that parse_file()is called both for files explictly passed as arguments, and for files within directories passed as arguments. We want to catch the former case of files explictly passed as arguments that don't exist. We don't really want to catch the latter case of files returned by os.walk(), but for which os.path.exists( ) is false. It could happen if a directory is being erased while running codespell on it. While this really is a corner case, I've moved the os.path.exists( ) outside parse_file().

Makes sense, could you chuck a comment in before parse_file in the directory case so someone doesn't got and add it in future.

I assume it will complete successfully at least if that corner case was hit?

Also, I don't think there are legitimate reasons to explictly call codespell on a non-existant file and not expect an error. Yet I have kept if not os.path.exists(filename) under if not glob_match.match(filename),

Glad you agree and nice workaround if required. I was imaginging some complicated flag to maintain.

so that it remains possible to pass a non-existant file as an argument and skip it with a non matching glob - as you wrote.

Cool, although I meant with ls at least, you can't match a non-existent file with a glob as there's nothing to expand on.

Copy link
Collaborator

@peternewman peternewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the comment on the dir walk to add.

Actually also are things like skip, exclude and ignore files covered too?

@DimitriPapadopoulos
Copy link
Collaborator Author

I assume it will complete successfully at least if that corner case was hit?

Indeed, after my latest changes, I have left absolutely no change in the relevant code path (directory passed as argument).

@DimitriPapadopoulos
Copy link
Collaborator Author

Makes sense, could you chuck a comment in before parse_file in the directory case so someone doesn't got and add it in future.

It is irrelevant in the directory case, because in the directory case the directory exists (otherwise we don't know it is a directory):

if os.path.isdir(filename):

@peternewman
Copy link
Collaborator

I assume it will complete successfully at least if that corner case was hit?

Indeed, after my latest changes, I have left absolutely no change in the relevant code path (directory passed as argument).

That's not quite the same thing as whether the current code doesn't break or not...

It is irrelevant in the directory case, because in the directory case the directory exists (otherwise we don't know it is a directory):

I'm not disputing that, just suggesting we put a comment before the directory branch of parse_files saying something like "File iterated from directory which exists so don't need to check the file exists".

@DimitriPapadopoulos
Copy link
Collaborator Author

I assume it will complete successfully at least if that corner case was hit?

Indeed, after my latest changes, I have left absolutely no change in the relevant code path (directory passed as argument).

That's not quite the same thing as whether the current code doesn't break or not...

Ah, I had misunderstood: you're asking about the current code, not my changes. Function parse_file() currently mostly ignores files that would be deleted between the moment os.walk() returns the file name and the moment parse_file() starts processing the file:

# ignore irregular files
if not os.path.isfile(filename):
return bad_count

It might fail with an exception later on, in this specific piece of code, including if the file itself it unreadable (but the enclosing directory is readable):
text = is_text_file(filename)

Then, it would again ignore the missing file:
try:
lines, encoding = file_opener.open(filename)
except Exception:
return bad_count

I could protect text = is_text_file(filename) with a try, then add a test for the case of unreadable (or deleted) files, something like this:

$ mkdir -p foo/bar
$ touch foo/bar/file
$ chmod a= foo/bar/file
$ 
$ cd foo
$ 
$ codespell
Traceback (most recent call last):
  File "/home/user/.local/bin/codespell", line 11, in <module>
    load_entry_point('codespell', 'console_scripts', 'codespell')()
  File "/my/path/codespell/codespell_lib/_codespell.py", line 746, in _script_main
    return main(*sys.argv[1:])
  File "/my/path/codespell/codespell_lib/_codespell.py", line 892, in main
    uri_ignore_words, context, options)
  File "/my/path/codespell/codespell_lib/_codespell.py", line 630, in parse_file
    text = is_text_file(filename)
  File "/my/path/codespell/codespell_lib/_codespell.py", line 488, in is_text_file
    with open(filename, mode='rb') as f:
PermissionError: [Errno 13] Permission denied: './bar/file'
$ 

I'd rather do that in a different issue / merge request though.

It is irrelevant in the directory case, because in the directory case the directory exists (otherwise we don't know it is a directory):

I'm not disputing that, just suggesting we put a comment before the directory branch of parse_files saying something like "File iterated from directory which exists so don't need to check the file exists".

There is no directory branch in parse_files(), this function operates only on files. All the os.walk() logic is in main() and the directory branch is precisely protected by:

if os.path.isdir(filename):

@DimitriPapadopoulos
Copy link
Collaborator Author

Rebased, @peternewman and @larsoner, could you have a look, time permitting?

context,
options,
)
if not os.path.exists(filename):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a new conditional here, why not modify line 842 to have:

            print("ERROR: file does not exist: %s" % filename, file=sys.stderr)
            bad_count += 1

it seems like a better place to add the check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@larsoner I had to rebase, so I am not certain which line 842 you are referring to. Perhaps this one?

bad_count += 1

In any case, the idea is that we report an error if the files passed as arguments do not exist. We are merely checking command line arguments here.

@DimitriPapadopoulos DimitriPapadopoulos force-pushed the nonexistent_file branch 4 times, most recently from 946b9b8 to 38f8626 Compare March 18, 2023 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Codespell exit 0 silently when nonexistent file is passed in as its argument
3 participants