Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A robust way to check for a case sensitive file system #1586

Merged
merged 12 commits into from Oct 7, 2015

Conversation

mried
Copy link
Collaborator

@mried mried commented Sep 1, 2015

This is a little improvement to find out if the file system is case sensitive or not. Until now, beets uses the operating system to determine this: On Windows, beets detected a case insensitive file system, on all other systems it is detected as case sensitive. At least this might be wrong: MacOS' HFS+ might be case sensitive or not. And what if the library is located on a FAT formated file system under Linux?

(In fact, the wrong detection was the reason I opened #1496.)

@sampsyo
Copy link
Member

sampsyo commented Sep 1, 2015

Nice! This is a clever way to do the trick. There is one drawback, though: if, on a case-sensitive filesystem, both capitalizations happen to exist as separate files, it will look like a case-insensitive filesystem. Maybe we need an extra check to see whether they're the same inode?

Also, since the check is somewhat generic, it would be great to move the logic to beets.util (into a function that takes a path as an argument). Then beets.library can just invoke it with the library directory as an argument.

Finally, it looks like Travis is complaining about something.

@mried
Copy link
Collaborator Author

mried commented Sep 2, 2015

Well, I guess I missed this case... 😮 - fixed it but it is not easy for windows. Beets has to create a test file there. I also moved the method to beets.util as you suggested.

The failing testcase checks if case sensitivity is detected correctly for windows and MacOS. This has to fail now... But I've got no clue how to write a test case for the new function. Any ideas? Or should I remove the tests?

@@ -61,9 +60,9 @@ def __init__(self, field, pattern, fast=True, case_sensitive=None):
"""
super(PathQuery, self).__init__(field, pattern, fast)

# By default, the case sensitivity depends on the platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is probably still relevant (although perhaps "platform" -> "filesystem"?).

@sampsyo
Copy link
Member

sampsyo commented Sep 2, 2015

Aha, thanks for continuing to dig. This does indeed look painful on Windows. I noticed that Python 3 added a samefile implementation for ntpath; I'm looking into whether we can steal that.

https://hg.python.org/cpython/file/tip/Lib/genericpath.py#l87

@sampsyo
Copy link
Member

sampsyo commented Sep 2, 2015

And about the tests: it looks to me like most of the failures, like this one, are because samefile doesn't like being called with a nonexistent path on a case-sensitive system. Am I misreading that?

@mried
Copy link
Collaborator Author

mried commented Sep 3, 2015

I also read the error like you, but that's odd: samefile is only called if both files (paths) exist, see 7 lines above. How can this happen?

@sampsyo
Copy link
Member

sampsyo commented Sep 4, 2015

Yeah, that's pretty strange. Do you get this error if you run the tests locally on your machine?

@mried
Copy link
Collaborator Author

mried commented Sep 4, 2015

Aha, I guess I found something 😄: Yes, the tests fail when executed locally, too. Stepping through the code (and especially into os.path.exists) lead to a module named mock which returns True for every path. So the queried path does not exist on disk, but os.path.exists tells beets it does.
I've added a Patch for os.path.samefile as well so the tests should be fine now.

@sampsyo
Copy link
Member

sampsyo commented Sep 5, 2015

Ah! Good catch; it was an over-broad mock in the test harness. Thanks!

I'll explore the behavior on Windows and check back in.

@mried
Copy link
Collaborator Author

mried commented Sep 7, 2015

I've added some more tests to test the case sensitivity detection for more cases. I also added some comments so we might be able to understand the testing code the next time we read it 😉

@mried
Copy link
Collaborator Author

mried commented Sep 8, 2015

I had a look on the Python 3 implementation of os.path.samefile: It compares the result of os.stat for both files (or, to be more precise: the fields st_ino and st_dev). That's also possible using Python 2.7. But: the values for these two fields are always 0 (zero) here, so a comparison is not useful. Since I guess the results of os.stat come from the underlying, system specific Python code, we can't steal it here...

@sampsyo
Copy link
Member

sampsyo commented Sep 8, 2015

😢

For reference, here's where st_ino support was added on Windows: https://hg.python.org/cpython/rev/9cd1036455e7/
And st_dev was added as part of this bug: http://bugs.python.org/issue11939

The Watchdog project has reimplemented stat on Windows to get around this problem. There's discussion about it in gorakhargosh/watchdog#195. And here's a simpler implementation just for checking for the same file on Windows that also uses GetFileInformationByHandle.

@sampsyo
Copy link
Member

sampsyo commented Sep 8, 2015

It looks like we can get almost there on Python 2 using fstat instead of stat (which works on open file descriptors). Here's a test on Windows 10:

>>> os.fstat(os.open('a.txt', os.O_RDONLY)).st_ino
4222124650707555L

st_dev is still not defined, but this seems good enough for our purposes---if it's possible to use fstat with directories.

Alternatively, we could use GetLongPathName via ctypes to get the canonical path for a file and check whether the case matches.
http://stackoverflow.com/questions/1587816/is-it-possible-to-access-the-getlongpathname-win32-api-in-python

What a mess! I would love to avoid creating a temporary file and then deleting it, which seems like asking for trouble in the case that permissions aren't exactly right, etc.

@mried
Copy link
Collaborator Author

mried commented Sep 9, 2015

I'm afraid these solutions will not work (as far as I can tell): os.fstat returns the stats of a file descriptor, but it is not possible to get the file descriptor of a directory. We could try to find a file inside the directory, but this will lead to other problems, I guess.
Using GetLongPathName also won't help: I've tested this with various file names:

C:\Python34 => C:\Python34
c:\python34 => c:\python34
C:\PYTHON34 => C:\PYTHON34

So we don't get the real case.

I'll see if we can use the GetFileInformationByHandle you mentioned in you previous post tomorrow.

@sampsyo
Copy link
Member

sampsyo commented Sep 9, 2015

Thanks for looking into it. As always: what a mess!

This post suggest a two-step process, long -> short -> long, can recover the correct case: http://stackoverflow.com/questions/2113822/python-getting-filename-case-as-stored-in-windows

…itivity on windows instead of creating a file.
@mried
Copy link
Collaborator Author

mried commented Sep 11, 2015

I have to admit that I misinterpreted the results of GetLongPathName: It works like we expected. We get the real case of the file and path except the drive letter. I've changed my implementation so it uses the new way.

@mried
Copy link
Collaborator Author

mried commented Sep 16, 2015

What about this one? Do you think it can be merged?

@sampsyo sampsyo merged commit b5f1f99 into beetbox:master Oct 7, 2015
sampsyo added a commit that referenced this pull request Oct 7, 2015
A robust way to check for a case sensitive file system
sampsyo added a commit that referenced this pull request Oct 7, 2015
sampsyo added a commit that referenced this pull request Oct 7, 2015
This could match intuition a little more closely and also avoids a coupling
with the configuration.
sampsyo added a commit that referenced this pull request Oct 7, 2015
sampsyo added a commit that referenced this pull request Oct 7, 2015
sampsyo added a commit that referenced this pull request Oct 7, 2015
This should avoid problems when querying for Unicode paths.
@sampsyo
Copy link
Member

sampsyo commented Oct 7, 2015

Sorry for being so slow on this! I've finally merged, it with a few tweaks to the code and functionality. Most prominently, I changed the check to work on the path used in the query, rather than the library path, which I think should be slightly more predictable.

@djl
Copy link
Member

djl commented Oct 8, 2015

Oops. I didn't take the laziness into account when planning that feature.

The latest change seems to have broken something, though:

Traceback (most recent call last):
  File "/Users/djl/Projects/beets/beet", line 20, in <module>
    beets.ui.main()
  File "/Users/djl/Projects/beets/beets/ui/__init__.py", line 1163, in main
    _raw_main(args)
  File "/Users/djl/Projects/beets/beets/ui/__init__.py", line 1149, in _raw_main
    subcommands, plugins, lib = _setup(options, lib)
  File "/Users/djl/Projects/beets/beets/ui/__init__.py", line 1024, in _setup
    mb.configure()
  File "/Users/djl/Projects/beets/beets/autotag/mb.py", line 72, in configure
    musicbrainzngs.set_hostname(config['musicbrainz']['host'].get(unicode))
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 374, in get
    return as_template(template).value(self, template)
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 969, in value
    if view.exists():
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 190, in exists
    self.first()
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 182, in first
    return iter_first(pairs)
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 66, in iter_first
    return it.next()
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 483, in resolve
    for collection, source in self.parent.resolve():
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 483, in resolve
    for collection, source in self.parent.resolve():
  File "/Users/djl/Projects/beets/beets/util/confit.py", line 905, in resolve
    self.read()
  File "/Users/djl/Projects/beets/beets/__init__.py", line 37, in read
    filename = view.as_filename()
AttributeError: 'unicode' object has no attribute 'as_filename'

(Hmm. I replied by email to a different issue and my response ended up here. GitHub bug?)

@sampsyo
Copy link
Member

sampsyo commented Oct 8, 2015

Oops! Thanks. The above commit should do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants