Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disallow reading directories as files #36605

Open
jakobnissen opened this issue Jul 10, 2020 · 17 comments
Open

Disallow reading directories as files #36605

jakobnissen opened this issue Jul 10, 2020 · 17 comments
Assignees
Labels
domain:filesystem Underlying file system and functions that use it

Comments

@jakobnissen
Copy link
Contributor

Right now, you can do:

julia> open("/Users/jakobnissen/Documents/") do file
           read(file)
       end
0-element Array{UInt8,1}

That doesn't make much sense to me - presumably, attempting to read a directory is always a mistake, so we may as well error there. In fact, I discovered this because I accidentally read a directory and couldn't figure out why my program didn't give any output!

As a sanity check, Python refuses to do this:

>>> f = open("/Users/jakobnissen/Documents")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IsADirectoryError: [Errno 21] Is a directory: '/Users/jakobnissen/Documents'
@aviks
Copy link
Member

aviks commented Jul 10, 2020

Surprisingly I hit the same issue yesterday. In my case, the open was hidden behind a file parser, which also did not error on empty input, but returned an empty object instead. So that was even harder to debug.

@StefanKarpinski
Copy link
Sponsor Member

This should definitely error.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jul 13, 2020

We used to have an fstat test here, but it was slow (#35907)

@jakobnissen
Copy link
Contributor Author

That is unfortunate. Perhaps we could check if it's a directory iff the file is size zero? I think a zero-size file can be determined when filling the initial buffer, so it should be very fast. It will, of course, slow down opening zero-size files that are actually files.

@PallHaraldsson
Copy link
Contributor

You can do e.g.:

julia> open("/dev/null")   # would the check conflict with something, e.g. in /dev/input/

@KristofferC
Copy link
Sponsor Member

Consequence:

julia> using TOML

julia> TOML.parsefile(homedir())
Dict{String, Any}()

@staticfloat
Copy link
Sponsor Member

I'm pretty sure this is the branch that needs a special case: https://github.com/JuliaLang/julia/pull/35925/files#r509678679

@fredrikekre
Copy link
Member

Just a note that some filesystems (this is CephFS) allows reading directories:

$ cat bin/
entries:                      8
 files:                       7
 subdirs:                     1
rentries:                  1538
 rfiles:                   1425
 rsubdirs:                  113
rbytes:                44240688
rctime:    1618950776.329878379

$ julia -q
julia> open("bin") do dir
           println(read(dir, String))
       end
entries:                      8
 files:                       7
 subdirs:                     1
rentries:                  1538
 rfiles:                   1425
 rsubdirs:                  113
rbytes:                44240688
rctime:    1618950776.329878379

@StefanKarpinski
Copy link
Sponsor Member

This whole situation still seems pretty unfortunate to me. @vtjnash, how big a performance hit would it be to reinstate the explicit check for whether the thing we're opening is a directory and erroring when it is?

@giordano
Copy link
Contributor

Perhaps we could check if it's a directory iff the file is size zero?

But directories don't have zero size according to fstat

@cmcaine
Copy link
Contributor

cmcaine commented Jul 26, 2021

I think it's different notions of file size, Mose: e.g. filesize(open("/")) == 122 for me, but eof(open("/")) == true. Assuming eof or a lower level version of it are fast enough, that could be a trigger to do an extra conditional check to see if the thing is a directory.

And on file systems where directories are readable, Julia will read them, which is probably(?) the correct behaviour?

It's what cat does anyway: error when reading a directory on normal filesystems, print the directory on filesystems where directories are readable.

@Mathnerd314
Copy link

Mathnerd314 commented Jul 26, 2021

read(2)

       EISDIR fd refers to a directory.

C and Go use very thin wrappers around read(2) and you get this EISDIR "is a directory" error.

Julia's handling of all error codes is to set EOF:

julia/src/support/ios.c

Lines 312 to 315 in bb5b98e

if (_os_read(s->fd, s->buf, (size_t)s->maxsize, &got)) {
s->_eof = 1;
return tot;
}

I suggest handling the error codes properly, then you won't need an explicit check before opening the file, and it will work properly with directories that are also readable as files.

edit: I looked at this a bit more and there is an errcode value in the ios struct, unused AFAICT, but maybe you could store the error code there in the C code and then read it from Julia with a new ios_geterrcode function if the read returns zero bytes.

@StefanKarpinski
Copy link
Sponsor Member

Assigning @vtjnash since he made the change that broke this 😁

@ViralBShah ViralBShah added the domain:filesystem Underlying file system and functions that use it label Mar 13, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Nov 3, 2022

Thinking on this more, I wonder if we could call faccessat on the new fd, and if that doesn't instantly fail (unlike fstat which slowly succeeded), then we know we accidentally opened a directory

@cmcaine
Copy link
Contributor

cmcaine commented Nov 16, 2022

That would mean that julia couldn't read directories on filesystems where directories are readable (CephFS example above), which is probably better than the status quo, but (arguably) worse than perfect.

@ParadaCarleton
Copy link
Contributor

Ran into this bug yesterday.

@PallHaraldsson
Copy link
Contributor

Isn't it best to disallow reading as files (as done before?) and have a keyword-arg allowing it, the current default behavior? Since it might sometimes be useful:

#36605 (comment)

#35907 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:filesystem Underlying file system and functions that use it
Projects
None yet
Development

No branches or pull requests