Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meaning of "hidden files" in Dir.glob #13196

Open
HertzDevil opened this issue Mar 16, 2023 · 19 comments
Open

Meaning of "hidden files" in Dir.glob #13196

HertzDevil opened this issue Mar 16, 2023 · 19 comments
Labels
platform:darwin platform:windows status:discussion topic:stdlib:files tough-cookie Multi-faceted and challenging topic, making it difficult to arrive at a straightforward decision.

Comments

@HertzDevil
Copy link
Contributor

HertzDevil commented Mar 16, 2023

Dir.glob has a match_hidden parameter, which to my knowledge is the only standard library API that deals with the notion of hidden files. It says:

If match_hidden is true the pattern will match hidden files and folders.

A file or directory is hidden if and only if its name begins with a .:

crystal/src/dir/glob.cr

Lines 197 to 200 in c931553

each_child(path) do |entry|
next if !match_hidden && entry.name.starts_with?('.')
yield join(path, entry.name) if cmd.matches?(entry.name)
end

But this is not how hidden files in Windows work. A file is hidden there if dwFileAttributes & FILE_ATTRIBUTE_HIDDEN != 0 in any of the relevant structs returned by Win32 functions; that file has a transparent icon in Windows Explorer.

Should we change how Dir.glob works even if it means the same directory structure will behave differently depending on the host system? (Of course, the directory structure isn't technically the same on most occasions, unless the standard library attempts to support NTFS natively on Unix-like systems)

@straight-shoota
Copy link
Member

straight-shoota commented Mar 16, 2023

It would be very bad choice to make the definition of hidden dependent on the OS.

File names starting with a dot need to be considered as hidden everywhere or nowhere. We've chosen everywhere so that must apply on Windows, too.

Files marked as hidden in the file system on windows should be considered hidden as well. We can do the same on Unix if available, but that's probably not very relevant.

For reference, the API was generally inspired by Ruby. But Ruby has no match_hidden parameter. There's only a generic flags parameter and the value File::FNM_DOTMATCH would have a similar effect (except that it also includes . and ..).
In Ruby there seems to be no way to exclude files that are marked as hidden in the file system on Windows.

# dotfile
Dir.glob("*")                     # => []
Dir.glob("*", File::FNM_DOTMATCH) # => [".dotfile"]

# FS hidden attribute
Dir.glob("C:/System Volume Information") # => ["C:/System Volume Information"]
Dir.glob("C:/System Volume Information", File::FNM_DOTMATCH) #=> ["C:/System Volume Information"]

@mominshaikhdevs
Copy link

Should we change how Dir.glob works even if it means the same directory structure will behave differently depending on the host system?

IMO the Kernel should be the main focus here. Yes Basically.

If it runs under NT Kernel|NT-Clones/ReactOS? then Yes.
If it runs under Wine|Unix-Clones? then No.

File Names shouldn't have any effect on File Attributes. Thats basically forcing Unix-Conventions on NT.

@Sija
Copy link
Contributor

Sija commented Mar 16, 2023

It would be clear if the option was called match_dotfiles.

It could be renamed with deprecation.

@straight-shoota
Copy link
Member

IMO the Kernel should be the main focus here.

That would blatantly break the premise of portability that guarantees Crystal programs using the standard library behave the same on all platforms (at least as much as technically possible).

With a folder containing the same files, the result of Dir.glob("*", match_hidden: false) must be identical on Windows and Unix.

It would be clear if the option was called match_dotfiles.

That would perhaps more accurately describe the current implementation on Unix systems.

But what would it mean on Windows? Would a hidden file system attribute count as "dotfile"? If not, what would be a different way to include/exclude such files?

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Mar 17, 2023

Similarly there are several other notions of "hidden files" exclusive to macOS / HFS+ as well: https://stackoverflow.com/a/15236292

IMO if a filesystem supports a hidden attribute and a file uses it, then it is no longer the "same" as an otherwise identical file on a filesystem not supporting such an attribute; therefore making Dir.glob omit those files does not break the portability guarantee.

@HertzDevil
Copy link
Contributor Author

Here are two examples of Microsoft casually beginning filenames not meant to be hidden with a period:

image

image

Meanwhile desktop.ini and thumbs.db are hidden files, despite using "normal" filenames. The temporary and hidden lock files for Microsoft Office start with a tilde (~).

@Sija
Copy link
Contributor

Sija commented Mar 17, 2023

That would perhaps more accurately describe the current implementation on Unix systems.

But what would it mean on Windows? Would a hidden file system attribute count as "dotfile"? If not, what would be a different way to include/exclude such files?

@straight-shoota I'd differentiate between the hidden and dot files by means of giving two separate options.

@straight-shoota
Copy link
Member

@HertzDevil Good point. Sigh.

I guess we might have to use two separate parameters then in order to indicate different semantics as appropriate.

@straight-shoota
Copy link
Member

I did some research looking for prior art how other languages or libraries represent hidden files. It seems most either do not have a generic concept for that, or it's platform specific, i.e. the semantics on Windows and Unix are different. I did not find a single instance of a cross-platform abstraction that would allow you to use Unix semantics on Windows.

There are certainly use cases for that, but I suppose it's also easy to implement the Unix semantics when iterating the file names. So that could be a reason why people choose not to bother for a generic API. However, I still think it would be convenient to have a portable abstraction ready to use.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Mar 27, 2023

We could define the overloads like *, match_hidden = false, match_dotfiles = match_hidden. Afterwards match_hidden will only affect files hidden using the filesystem attributes. I think this is the only way to preserve the existing behavior whether the match_hidden argument is specified in a call or not.

Also perhaps we should define File::Info#hidden? as well?

@straight-shoota
Copy link
Member

Yes, I suppose that could be a viable solution.

However, the word "hidden" is problematic due to its ambiguity.
Maybe we can use a more specific term? Like filesystem_hidden? That might be a bit bulky, but accurate.
Or just system_hidden? To follow whatever the (operating) system considers hidden (usually nothing on most Unix file systems).

I wished match_hidden = true would be the default. It would make portability a easier if you had to explicitly opt-in to the platform-specific convention to consider dotfiles as hidden. 😢

@straight-shoota
Copy link
Member

straight-shoota commented Mar 27, 2023

Another idea, but it might be too complex / over-engineered: How about expressing the match_hidden strategy as en enum? Having named values would clearly express the intention, which could also be to follow whatever the current operating system considers as hidden.

So there would be two original members, file_system and dotfiles, plus a native member that matches whatever is typical for the specific platform, as well as none and all.

For semantic compatibility, the mapping for boolean false would be file_system | dotfiles and for true only file_system.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Mar 27, 2023

It might be more appropriate to name it operating_system rather than dotfiles, in order to support all conventions that are not backed by the file system (e.g. the macOS finder infos). Or we could have operating_system as a third flag.

@HertzDevil
Copy link
Contributor Author

On BSD-like systems, including macOS, LibC::Stat#st_flags includes extra file attributes. If that field includes UF_HIDDEN = 0x8000, then the file is hidden; if a symlink itself is hidden then it is considered hidden in Finder, regardless of whether the target is. Apparently FreeBSD states that Windows' FILE_ATTRIBUTE_HIDDEN maps to UF_HIDDEN. So far all the hidden files I have seen on my M2 have this attribute set.

@straight-shoota
Copy link
Member

It might be more appropriate to name it operating_system rather than dotfiles, in order to support all conventions that are not backed by the file system (e.g. the macOS finder infos). Or we could have operating_system as a third flag.

Yes, those would need to be different options. Dotfiles are not really a OS policy. It's just a convention used by programs independent of the OS.
It totally makes sense to choose dotfiles on Windows, for example.

@mjblack
Copy link

mjblack commented Apr 18, 2023

Or just system_hidden? To follow whatever the (operating) system considers hidden (usually nothing on most Unix file systems).

Windows has two attributes to hide files, first one being standard hidden attribute and the second being the system attribute.

@HertzDevil
Copy link
Contributor Author

HertzDevil commented Apr 26, 2023

That means the file_system and operating_system flags would be consistent with the "Show hidden files, folders, and drives" and (the negation of) the "Hide protected operating system files" options in Explorer. If a folder has these files:

  • __.txt (normal)
  • _h.txt (hidden)
  • s_.txt (system)
  • sh.txt (system + hidden)

then:

  • __.txt is always present;
  • _h.txt is present only if file_system is used;
  • s_.txt is also always present, because the system attribute alone does not hide a file in Explorer;
  • sh.txt is present only if file_system | operating_system is used.

If additionally those filenames start with a period then dotfiles must also be used; the file must satisfy all specified flags for it to be returned, not just any of them.

We could also respect the current Explorer settings, although this is probably a step too far:

# shjobj_core.cr
lib LibC
  struct SHELLSTATE
    flags1 : BOOL
    dwWin95Unused : DWORD
    uWin95Unused : UInt
    lParamSort : LONG
    iSortDirection : Int
    version : UInt
    uNotUsed : UInt
    flags2 : BOOL
  end

  # bitfield masks
  SHELLSTATE_flags1_fShowAllObjects  = 0x00000001
  SHELLSTATE_flags1_fShowSuperHidden = 0x00008000

  SSF_SHOWALLOBJECTS  = 0x00000001
  SSF_SHOWSUPERHIDDEN = 0x00040000

  fun SHGetSetSettings(psfs : SHELLSTATE*, dwMask : DWORD, bSet : BOOL)
end

LibC.SHGetSetSettings(out shell_settings, LibC::SSF_SHOWALLOBJECTS | LibC::SSF_SHOWSUPERHIDDEN, 0)
shell_settings.flags1.bits_set?(LibC::SHELLSTATE_flags1_fShowAllObjects)  # Show hidden files, folders, and drives
shell_settings.flags1.bits_set?(LibC::SHELLSTATE_flags1_fShowSuperHidden) # (don't) Hide protected operating system files

For semantic compatibility, the mapping for boolean false would be file_system | dotfiles and for true only file_system.

It's the opposite because the flags would be used to control which files are matched, not excluded. So false becomes file_system | operating_system and true becomes all.

@straight-shoota straight-shoota added the tough-cookie Multi-faceted and challenging topic, making it difficult to arrive at a straightforward decision. label Oct 5, 2023
@elebow
Copy link
Contributor

elebow commented Oct 11, 2023

I would suggest the possibility that any file exclusion behavior in Dir.glob is a misfeature, and the behavior should be left entirely to the application.

(I don't necessarily advocate strongly for this position; just suggesting it.)

Given:

  1. The definition of "hidden" is not consistent across operating systems.
  2. The definition of "hidden" is not necessarily consistent within an operating system (eg, an NTFS filesystem mounted under Unix may have an extended attribute for visibility).
  3. The definition of "hidden" is not even necessarily consistent within a single glob pattern: Consider, for example, an NTFS filesystem and an ext4 filesystem mounted under the same directory on a Linux machine.

I don't believe there is any reasonable way to capture this complexity in a small number of arguments to Dir.glob.

Requiring the application to manually do a separate #reject call (and/or build it into the glob pattern, like "[^.]*") is more explicit and readable.

@Sija
Copy link
Contributor

Sija commented Oct 11, 2023

I would suggest the possibility that any file exclusion behavior in Dir.glob is a misfeature, and the behavior should be left entirely to the application.

That would put IMO unnecessary burden of boilerplate code for the stdlib consumers. It's not consistent across different OSes indeed, in a same time it's a widely-used convention - dotfiles are regarded as hidden/special by many different pieces of the OS/web stack (for protection against http access for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:darwin platform:windows status:discussion topic:stdlib:files tough-cookie Multi-faceted and challenging topic, making it difficult to arrive at a straightforward decision.
Projects
Status: Done
Development

No branches or pull requests

6 participants