Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long filename support #114

Closed
philjdf opened this issue Mar 6, 2020 · 11 comments
Closed

Long filename support #114

philjdf opened this issue Mar 6, 2020 · 11 comments
Labels
enhancement New feature or request upstream The issue is in one of the dependencies

Comments

@philjdf
Copy link
Contributor

philjdf commented Mar 6, 2020

ParquetFileReader when used with a long string path fails:

ParquetSharp.ParquetException : class parquet::ParquetStatusException (message: 'IOError: Failed to open local file '<very long filename>'. Detail: [Windows error 3] The system cannot find the path specified.
')
   at ParquetSharp.ExceptionInfo.Check(IntPtr exceptionInfo)
   at ParquetSharp.ParquetFileReader..ctor(String path, ReaderProperties readerProperties)
   ...

Looks like ParquetSharp eventually calls https://github.com/apache/arrow/blob/88e3267ad09ca62643b7fe5ffd98eb03b29728fc/cpp/src/arrow/util/io_util.cc#L838 which is calling _wsopen_s on Windows, so https://docs.microsoft.com/en-us/cpp/c-runtime-library/path-field-limits?view=vs-2019 appears relevant, so it is just a case of needing to prepend the filename with \\?\?

@GPSnoopy
Copy link
Contributor

GPSnoopy commented Mar 6, 2020

@philjdf I believe we've seen this one before. ParquetSharp is irrelevant. The application itself needs to support long paths.

IIRC .NET Core 3 will work out of the box on an up-to-date Windows 10. .NET Framework 4.6 or so needs playing around with the application manifest.

Google and small trial applications are you friends.

@GPSnoopy GPSnoopy closed this as completed Mar 6, 2020
@GPSnoopy
Copy link
Contributor

GPSnoopy commented Mar 9, 2020

I may have spoken too soon. It indeed does not seem to work under .NET Core 3.1 (at least not straight out of the box).

@GPSnoopy GPSnoopy reopened this Mar 9, 2020
@GPSnoopy
Copy link
Contributor

GPSnoopy commented Mar 9, 2020

It looks like the proper Win32 API needs to be used to expose the long path support:
https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

In particular:

Tip Starting with Windows 10, version 1607, for the unicode version of this function (CreateFileW), you can opt-in to remove the MAX_PATH limitation without prepending "\?". See the "Maximum Path Length Limitation" section of Naming Files, Paths, and Namespaces for details.

But it does not seem that the FD API (such as _wsopen_s) have been updated to cover this. Shame.

@GPSnoopy GPSnoopy added enhancement New feature or request upstream The issue is in one of the dependencies labels Mar 9, 2020
@GPSnoopy
Copy link
Contributor

GPSnoopy commented Mar 9, 2020

See more details at:
https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file?redirectedfrom=MSDN#enable-long-paths-in-windows-10-version-1607-and-later

Worth a try in Arrow: use Win32 API to open the file, but use _open_osfhandle to return a file descriptor.

@tpboudreau
Copy link

Sorry for taking so long to get back to this.

@GPSnoopy , your Win32/_open_osfhandle suggestions works. I have a console application writing/reading a parquet file with path length of 1115 (>> than prior 260 limit).

But this is going to be a bit challenging to write (non-flaky) tests for, because (per your link) executing that test required both that the assembly manifest file embedded in the final .exe file be augmented to include a longPathAware element and that the registry on the executing machine had a corresponding property enabled. I've been using the mt.exe tool for the former (maybe there's a better way in VS? -- I'm a VS novice), but the latter seems beyond the library's test's control.

One thought is to make the feature opt-in at compile time; the default could be either to use the current i/o logic or else just not run any long filename tests. Then at least the opting-in user has chosen to be responsible for preparing the target machines' registries to allow tests to succeed.

Any preferences or suggestions before I open the issue and see what the upstream maintainers think?

@GPSnoopy
Copy link
Contributor

GPSnoopy commented Apr 13, 2020

Hi @tpboudreau,

IMHO I don't think that having two code paths is the correct solution. This is going to lead to different users using different implementations, with all the headaches this creates. Thus I favour your second suggestion of only having the new long-path aware path. Afterall, it should work fine even if the application or OS does not support long paths, and will benefit from the improved code coverage by virtue of everyone using the same implementation. So why keep the current I/O logic?

You make a good point about the unit tests. If we only have one code path, we can assume that it is fairly well unit tested by everyone as part of Arrow, even if the long path specific tests can only be run by a subset of people.

By default, long path is enabled in the registry on any modern installation of Windows 10. So I don't think we should worry too much about it. The manifest part should be under our control when running unit tests, so that's hopefully not an issue either. Apparently it's very easy to add a manifest with CMake nowadays (https://stackoverflow.com/questions/6335352/how-can-i-embed-a-specific-manifest-file-in-a-windows-dll-with-a-cmake-build).

My two questions would be:

  • Does Apache Arrow current CI pipeline run on a modern enough Windows?
  • How much do we worry about 3rd party running the unit tests under Windows and getting a long path unit test failure? Afterall, if the error message is informative enough it might be valuable insight for the user.

If the answer to the first point is "yes" (which seems to be the case according to https://github.com/apache/arrow/blob/master/.github/workflows/cpp.yml), then I would think we can enable those long path unit tests by default. For the second point, if we care about it, I could add the option to turn off those specific tests (e.g. CMAKE option?) and refer to it in the unit test error message.

What do you think?

PS. When using VS directly instead of CMake, this seems to be the way to manually add a manifest (I've not tested):
https://stackoverflow.com/questions/53918205/how-to-enable-long-path-aware-behavior-via-manifest-in-a-c-executable

@tpboudreau
Copy link

tpboudreau commented Apr 15, 2020

You're right @GPSnoopy -- there's no point in leaving the old logic in for any users. Users with short paths should be indifferent to the underlying function(s).

Thanks for the heads up on the manifest injection via cmake -- that turned out to be just minutes of work.

I still think we're going to need some kind of opt-in for running the long-path-under-Windows test. For example, the required long path registry setting wasn't enabled by default on my (admittedly crummy, but fairly new) Windows 10 laptop, and I think I've already seen at least one of the CI build farm machines fail the new test in a manner that suggested the target VM/machine registry flag wasn't set (that's not definitive though). But I like your idea of emitting a message that the test was skipped. I wonder if it would be overkill to dynamically detect the registry setting (https://docs.microsoft.com/en-us/archive/msdn-magazine/2017/may/c-use-modern-c-to-access-the-windows-registry) and accordingly either run the test or skip it (via GTEST_SKIP() or similar) and emit a message.

Either way, it's a bit messy because the manifest (AFAICT) must be injected into the final executable to be effective. (I'll double check this -- 'cause I'm far from a Windows expert -- but in my initial testing, inserting an enabling manifest into the arrow.dll or parquet.dll was permitted, but it didn't allow the final executable(s) to open long filenames.) I'm doing that injection for the unit test .exe, but it seems the end user has to do it for each application. [EDIT: Upon reflection, this last point is mostly just a documentation issue and may not even be that depending on how accustomed Windows devs are to doing this type of thing.]

@tpboudreau
Copy link

I opened this issue upstream: https://issues.apache.org/jira/browse/ARROW-8477 .

Patch is done except that the test is failing in the MinGW build environment. Suspect that manifest is not getting injected properly there. I'll raise the PR after I sort that out.

[WIP here if interested: https://github.com/tpboudreau/arrow/tree/ARROW-8477 ]

@tpboudreau
Copy link

PR ARROW-8477 just merged.

@GPSnoopy
Copy link
Contributor

@tpboudreau You absolute hero!

@GPSnoopy
Copy link
Contributor

GPSnoopy commented Sep 1, 2020

Closed as part of ParquetSharp 2.3.0-beta1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream The issue is in one of the dependencies
Projects
None yet
Development

No branches or pull requests

3 participants