New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename and rework File::Stat as File::Info #5584

Merged
merged 3 commits into from Apr 14, 2018

Conversation

@RX14
Member

RX14 commented Jan 13, 2018

I've aimed to make File::Info easier to use, more portable, and nicer than File::Stat.

flags |= ::File::Flags::SetGroup if (stat.st_mode & LibC::S_ISGID) != 0
flags |= ::File::Flags::Sticky if (stat.st_mode & LibC::S_ISVTX) != 0
modification_time = ::Time.new(stat.st_mtim, ::Time::Kind::Utc)

This comment has been minimized.

@straight-shoota

straight-shoota Jan 13, 2018

Contributor

you should better use ::Time.utc

This comment has been minimized.

@RX14

RX14 Jan 13, 2018

Member

st_mtim is a LibC::Timespec. You can't use Time.utc.

This comment has been minimized.

@straight-shoota

straight-shoota Jan 13, 2018

Contributor

ah, nvm

@RX14 RX14 force-pushed the RX14:feature/file-info branch 2 times, most recently from 030495b to 148b1f6 Jan 13, 2018

src/file.cr Outdated
@@ -16,7 +16,7 @@ class File < IO::FileDescriptor
{% end %}
# :nodoc:
DEFAULT_CREATE_MODE = LibC::S_IRUSR | LibC::S_IWUSR | LibC::S_IRGRP | LibC::S_IROTH
DEFAULT_CREATE_PERMISSIONS = 0o644

This comment has been minimized.

@lbguilherme

lbguilherme Jan 13, 2018

Contributor

Maybe it is better to use File::Permissions.flags(OwnerRead, OwnerWrite, GroupRead, OtherRead) here?

This comment has been minimized.

@RX14

RX14 Jan 14, 2018

Member

I don't think so, the shorter octal syntax is far easier to read. "everyone" knows unix octal permission syntax.

This comment has been minimized.

@ysbaddaden

ysbaddaden Apr 9, 2018

Member

Then why bother with a File::Permissions enum?

# The binary representation of this enum is defined to be same representation
# as the permission bits of a unix `st_mode` field. `File::Permissions`
# can also be compared to it's underlying bitset, for example
# `File::Permissions::All == 0o777` will always be `true`.

This comment has been minimized.

@asterite

asterite Jan 13, 2018

Contributor

What about the opposite? Does 0o777 == File::Permissions::All work?

This comment has been minimized.

@asterite

asterite Jan 13, 2018

Contributor

Also, maybe always isn't necessary here.

This comment has been minimized.

@RX14

RX14 Jan 14, 2018

Member

Ah, yeah. Perhaps this equality was a bad idea. Or should I just reopen struct Int? It's a bit ugly how these equalities are defined in two places.

@asterite

Looks awesome! And much cleaner than before. Plus the inspect of permissions is really nice. Thank you!

@RX14

This comment has been minimized.

Member

RX14 commented Jan 14, 2018

@asterite yeah, while debugging I found that I really wanted to be able to see the permissions better so I just implemented it!

@ysbaddaden

Instead of always evaluating and copying to populate a generic struct with lots of data —which misses information btw, such as the creation time for the most obvious one— we could keep the simple wrapper over C structs.

If all I need is accessing the mtime for thousands of files, I'd like to avoid copying data for each and every file, or just checking over and over if it's a file, symlink or something else (pointless).

# :nodoc:
def initialize(@size, @permissions, @type, @flags, @modification_time, @owner, @group)
end

This comment has been minimized.

@ysbaddaden

ysbaddaden Jan 14, 2018

Member

It lost query methods, such as symlink?, file?, ... which is an unwelcomed change.

This comment has been minimized.

@RX14

RX14 Jan 14, 2018

Member

It's just type.file? now. The functionality is still there.

This comment has been minimized.

@straight-shoota

straight-shoota Jan 14, 2018

Contributor

We could think about forwarding these methods to type similar to the weekday names in Time. Allthough this would effectively be aliases. But the API would be simpler that way. You shouldn't have to write type in between to ask a File::Info if it describes a symlink.

This comment has been minimized.

@RX14

RX14 Jan 14, 2018

Member

Why not? It's really just some extra typing and it means you have a clear separation between permissions, types, and flags on files. Which I think is really nice.

This comment has been minimized.

@straight-shoota

straight-shoota Jan 17, 2018

Contributor

I see your argument. But I think that's too bulky.

This comment has been minimized.

@Sija

Sija Apr 13, 2018

Contributor

Why not use delegate here?

@RX14

This comment has been minimized.

Member

RX14 commented Jan 14, 2018

@ysbaddaden Information has been intentionally left out to make it portable. atime is disabled on most modern distros, either with relatime or noatime mount options (and nobody uses it) and ctime is usually not what people want or think it is (they want btime which basically no filesystems support). mtime is really the only widely supported and actually useful time. Go only provides modification time too.

I'll benchmark old stat vs new stat but i'm not sure that a bit of bit twiddling is going to make much difference when put next to the price of a syscall, and making it wrap LibC::Stat would move platform-specifics into the definition of File::Info instead of just the construction. Keep in mind that on windows I'm not going to be using LibC::Stat at all, but recostructing the same information from GetFileInformationByHandle.

@RX14

This comment has been minimized.

Member

RX14 commented Jan 14, 2018

Benchmarks say it doesn't matter much. I'd definitely say it's not worth optimizing for a while.

old:

 File.stat   1.77M (565.92ns) (± 4.62%)  0 B/op   1.77× slower
File.lstat   1.76M (568.21ns) (± 4.58%)  0 B/op   1.78× slower
 File#stat   3.13M (319.46ns) (± 3.19%)  0 B/op        fastest

new:

 File.stat   1.74M (575.88ns) (± 4.60%)  0 B/op   1.79× slower
File.lstat   1.69M (590.26ns) (±10.11%)  0 B/op   1.84× slower
 File#stat   3.11M (321.26ns) (± 3.22%)  0 B/op        fastest
@RX14

This comment has been minimized.

Member

RX14 commented Jan 14, 2018

Also keep in mind that benchmark is a best-case for stat: one file only in a loop. When you add in a typical usage where the stat may not be in cache all the time you're going to see far less than the 2-4% overhead seen here.

it "can take File::Permissions" do
path = "#{__DIR__}/data/chmod.txt"
begin
File.write(path, "")

This comment has been minimized.

@bew

bew Jan 14, 2018

Contributor

You can use File.touch(path) here I think

This comment has been minimized.

@RX14

RX14 Jan 14, 2018

Member

It's copied from the spec two above it with minor tweaks.

This comment has been minimized.

@bew

bew Jan 14, 2018

Contributor

oh ok, probably not worth to change it here then

@RX14

This comment has been minimized.

Member

RX14 commented Jan 14, 2018

I do wonder whether we should rename File.stat? as File.info? and have File.info?(follow_symlink: false).

Also I'm probably going to just rename File::Type::SymbolicLink to File::Type::Symlink, almost everyone uses the shortened name these days.

@asterite

This comment has been minimized.

Contributor

asterite commented Jan 14, 2018

I actually learned in this PR that lstat doesn't (or does) follow symlinks, so a named argument to select that behavior is better in my opinion. lstat is too cryptic. So maybe it's better to just use info.

I believe Ruby started with "works only in unix" and eventually had to keep all those cryptic, unix-specific names. Go for example doesn't do this, it provides a clear interface to the computer without using ancient strange names. I think we should do the same, even if it means a bigger barrier to those coming from Ruby.

@RX14

This comment has been minimized.

Member

RX14 commented Jan 15, 2018

@asterite its called stat and lstat on go too...

@asterite

This comment has been minimized.

Contributor

asterite commented Jan 15, 2018

Maybe because go isn't that flexible with method names.

@RX14

This comment has been minimized.

Member

RX14 commented Jan 15, 2018

@asterite I do agree it should be File.info. (eventually Path#info and File#info for stat/lstat and fstat respectively)

@RX14

This comment has been minimized.

Member

RX14 commented Jan 21, 2018

Updated this PR to take into account the above suggestions.

@RX14

This comment has been minimized.

Member

RX14 commented Jan 21, 2018

Looks like I accidentally made this PR depend on #5619... I'll fix it tomorrow.

@RX14 RX14 force-pushed the RX14:feature/file-info branch from 7c874b8 to f8e9bd5 Jan 21, 2018

@RX14

This comment has been minimized.

Member

RX14 commented Jan 21, 2018

Pushed again to fix this, this can be re-reviewed now.

@RX14 RX14 force-pushed the RX14:feature/file-info branch from f8e9bd5 to d9f73d4 Jan 21, 2018

@ysbaddaden

I'm sorry I'll feel grumpy, but I still have goosebumps, and no explanation for many (unwelcomed) changes:

Why change stat naming to info? it's still an abbreviation and doesn't tell any better.

Why decode & copy everything (check each file type, decode/copy permissions, generate Time instances) instead of using a wrapper for on-demand generation? Sure File::Stat would be a mere delegation struct for the underlying POSIX vs Windows implementations, mostly to hold documentation, but the optimizer will inline calls, or we can just flag them with @[AlwaysInline].

Having to type .info.type.directory? over .stat.directory? can feel minor, but that's still an unwelcomed hick; forward calls from the struct to the enum, please.

Also:

  • ctime & atime are still missing (!)
end
struct Int
def ==(other : File::Permissions)

This comment has been minimized.

@ysbaddaden

ysbaddaden Jan 22, 2018

Member

I'm not sure reopening Int is a good idea.

This comment has been minimized.

@RX14

RX14 Jan 22, 2018

Member

And the alternative is?

This comment has been minimized.

@ysbaddaden

ysbaddaden Jan 29, 2018

Member

Not support Int==(File::Permissions) nor File::Info#permissions==(Int). You must wrap the octal value. I believe it's better than reopening Int for a corner case —checking exact permissions equality doesn't seem like a usual case.

@RX14

This comment has been minimized.

Member

RX14 commented Apr 7, 2018

I'll rebase it when it gets reviewed.

@ysbaddaden

ysbaddaden approved these changes Apr 9, 2018 edited

Looks good! Thanks for adapting 🙇

I just wish there was type delegators on File::Info itself. Introducing a .type really feels artificial. It can be useful, but most of the time —and actually every single use we have for— it is file.info.type.xxx? when it could be a mere file.info.x? (or even file.x?).

src/file.cr Outdated
@@ -16,7 +16,7 @@ class File < IO::FileDescriptor
{% end %}
# :nodoc:
DEFAULT_CREATE_MODE = LibC::S_IRUSR | LibC::S_IWUSR | LibC::S_IRGRP | LibC::S_IROTH
DEFAULT_CREATE_PERMISSIONS = 0o644

This comment has been minimized.

@ysbaddaden

ysbaddaden Apr 9, 2018

Member

Then why bother with a File::Permissions enum?

RX14 added some commits Jan 13, 2018

@RX14 RX14 force-pushed the RX14:feature/file-info branch from 3372d31 to 7541889 Apr 9, 2018

@RX14

This comment has been minimized.

Member

RX14 commented Apr 13, 2018

I've added some shortcuts for file? directory? and symlink?, as they are the common options. I've also fixed DEFAULT_CREATE_PERMISSIONS. Looking forward to finally getting this merged!

@straight-shoota

This comment has been minimized.

Contributor

straight-shoota commented Apr 13, 2018

I don't like delegating only a few type predicates. It should be either all (my preference) or none to provide the least surprise.

@sdogruyol

This comment has been minimized.

Member

sdogruyol commented Apr 13, 2018

I prefer having the most common ones than having all / nothing. IMHO this is good to go as it is 👍

@sdogruyol sdogruyol merged commit 96bcc34 into crystal-lang:master Apr 14, 2018

4 checks passed

ci/circleci: test_darwin Your tests passed on CircleCI!
Details
ci/circleci: test_linux Your tests passed on CircleCI!
Details
ci/circleci: test_linux32 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@sdogruyol sdogruyol added this to the Next milestone Apr 14, 2018

@RX14

This comment has been minimized.

Member

RX14 commented May 29, 2018

@ysbaddaden After looking at crystal-lang/shards#203 I'm wondering if making File::Info#== perform samefile was correct. Perhaps it should have been File.info(foo).same_file? File.info(bar)? File.same_file?(foo, bar)? I'd like other people's input on this API.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented May 29, 2018

Maybe File.same?(foo, bar) or just keep exposing the inode on POSIX systems.

@RX14

This comment has been minimized.

Member

RX14 commented May 29, 2018

I don't think we should expose inode and device numbers (nicely). I'd much rather just expose samefile.

@bcardiff

This comment has been minimized.

Member

bcardiff commented May 30, 2018

File.same?(foo, bar) LGTM, but I would propose something else

Given the following files:

$ touch a.txt
$ ln -s a.txt b.txt
$ touch c.txt

With the stdlib at master we currently have:

a = File.new("./a.txt") 
a_copy = File.new("./a.txt") 

How to check if a1 and a1_copy refer to the same file?

a == a_copy # => false
a.info == a_copy.info # => true

Ok, but how that works with b?

b = File.new("./b.txt")
a.info == b.info # => true

How can I know that a and b are different files?

File.info?(a.path, follow_symlinks: false) == File.info?(a_copy.path, follow_symlinks: false) # => true
File.info?(a.path, follow_symlinks: false) == File.info?(b.path, follow_symlinks: false) # => false

Ok, but what about if I use only the path of the File?
File instances initialized with different path indeed have different path

a_diff = File.new("a.txt")
a.path      # => "./a.txt"
a_diff.path # => "a.txt"

You can use File.expand_path, but there are a lot of rules to expand the
path and we had already a File instance

File.expand_path(a.path)      # => /full/path/to/a.txt
File.expand_path(a_diff.path) # => /full/path/to/a.txt

So, what about?

  1. Changing File#info to File#info(follow_symlinks = true)
  2. Add File.same?(file1, file2, follow_symlinks = true)
  3. Remove override of File::Info==

That would make simpler all of the above scenarios with File.same? having follow_symlinks set to one or another value.

@RX14

This comment has been minimized.

Member

RX14 commented May 30, 2018

Because file.info(follow_symlinks: false) doesn't make sense. file.info != File.info(file.path). file.info is fstat and File.info is stat/lstat.

@RX14

This comment has been minimized.

Member

RX14 commented May 30, 2018

How can I know that a and b are different files?

They are the same file. The whole premise for that question is wrong on posix. The confusing part here is actually storing File#path in the first place. File#path is never actually guaranteed to mean anything. Not all files even have a (valid) path.

You can unlink a path from the filesystem while it is open in a program just fine. And any open programs will keep the inode around in the filesystem, and the file will continue to use up disk space until the program is closed. Then File#path will continue to report the path of the deleted file where it may well have been replaced by a completely different file. That's part of the reason why fstat exists: because open is atomic and then you can call fstat on the file descriptor to check permissions, then perform operations with the FD while being sure that an attacker hasn't raced you to swap out the file for another malicious one under your nose.

Path equality and file equality are entirely different things, and are pretty much entirely unrelated and unreliable until the moment in time you call open.

@bcardiff

This comment has been minimized.

Member

bcardiff commented May 31, 2018

Ok, so @RX14 you say to only

  1. Add File.same?(file1, file2) that will always "follow symlinks" in the comparison
  2. Remove override of File::Info==
  3. Treat path comparison in another way. Later.

I'm fine with that also.

Although the premise for the question might seems wrong, but I think it depends on the use case. If someone wants to deal with lower level of file system some api are not the best. But is not the more common use case and it is still doable.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented May 31, 2018

Mixing symlinks with hard links to detect file identify is IMHO introducing confusing behavior. Furthermore, detecting hard links is quite specific, and not widely used. We don't need a abstraction for this, hence why I'm proposing to keep exposing inodes on POSIX systems and be done with it.

So:

  • detecting hard links: File.info(a).inode == File.info(b).inode;
  • detecting symlinks: File.real_path(a) == File.real_path(b);
  • detecting hard links through potential symlinks:
    File.info(a, follow_symlinks: true).inode == File.info(b, follow_symlinks: true).inode.

Simple, does the job, no confusing behavior, and allows finer grained detection. All it needs is to expose a value that we already have.

@RX14

This comment has been minimized.

Member

RX14 commented May 31, 2018

@bcardiff having a follow_symlinks in File.same is fine because symlinks are a property of a path, which File.same is working with. It's not fine to have File#info with a follow_symlinks option because you've already dereferenced the symlink and got to a specific device and inode when you call open(2).

@ysbaddaden the reason why I want to only expose equality only is because your examples are wrong! You must compare device and inode to be sure they're the same file. If even you get this wrong, then surely thats a fantastic reason to expose this nicely instead of just expose the platform-specific inode (which won't even exist on windows).

Let's add File.same with a follow_symlinks option, then move File::Info#== to File#== so it's just file == file and skip the .info step (it reads badly and is a bit weird).

@bcardiff

This comment has been minimized.

Member

bcardiff commented Jun 4, 2018

PR sent for this. I don't think we need to expose inode for file comparison in order to stay posix independent for this api.

Users are free to go to info and extract more information, but for comparison File.same? is good API.

@ysbaddaden

This comment has been minimized.

Member

ysbaddaden commented Jun 4, 2018

My wrong, you're both right. Let's go with File.same?.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment