-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long filename support #114
Comments
@philjdf I believe we've seen this one before. ParquetSharp is irrelevant. The application itself needs to support long paths. IIRC .NET Core 3 will work out of the box on an up-to-date Windows 10. .NET Framework 4.6 or so needs playing around with the application manifest. Google and small trial applications are you friends. |
I may have spoken too soon. It indeed does not seem to work under .NET Core 3.1 (at least not straight out of the box). |
It looks like the proper Win32 API needs to be used to expose the long path support: In particular:
But it does not seem that the FD API (such as |
See more details at: Worth a try in Arrow: use Win32 API to open the file, but use |
Sorry for taking so long to get back to this. @GPSnoopy , your Win32/_open_osfhandle suggestions works. I have a console application writing/reading a parquet file with path length of 1115 (>> than prior 260 limit). But this is going to be a bit challenging to write (non-flaky) tests for, because (per your link) executing that test required both that the assembly manifest file embedded in the final .exe file be augmented to include a longPathAware element and that the registry on the executing machine had a corresponding property enabled. I've been using the mt.exe tool for the former (maybe there's a better way in VS? -- I'm a VS novice), but the latter seems beyond the library's test's control. One thought is to make the feature opt-in at compile time; the default could be either to use the current i/o logic or else just not run any long filename tests. Then at least the opting-in user has chosen to be responsible for preparing the target machines' registries to allow tests to succeed. Any preferences or suggestions before I open the issue and see what the upstream maintainers think? |
Hi @tpboudreau, IMHO I don't think that having two code paths is the correct solution. This is going to lead to different users using different implementations, with all the headaches this creates. Thus I favour your second suggestion of only having the new long-path aware path. Afterall, it should work fine even if the application or OS does not support long paths, and will benefit from the improved code coverage by virtue of everyone using the same implementation. So why keep the current I/O logic? You make a good point about the unit tests. If we only have one code path, we can assume that it is fairly well unit tested by everyone as part of Arrow, even if the long path specific tests can only be run by a subset of people. By default, long path is enabled in the registry on any modern installation of Windows 10. So I don't think we should worry too much about it. The manifest part should be under our control when running unit tests, so that's hopefully not an issue either. Apparently it's very easy to add a manifest with CMake nowadays (https://stackoverflow.com/questions/6335352/how-can-i-embed-a-specific-manifest-file-in-a-windows-dll-with-a-cmake-build). My two questions would be:
If the answer to the first point is "yes" (which seems to be the case according to https://github.com/apache/arrow/blob/master/.github/workflows/cpp.yml), then I would think we can enable those long path unit tests by default. For the second point, if we care about it, I could add the option to turn off those specific tests (e.g. CMAKE option?) and refer to it in the unit test error message. What do you think? PS. When using VS directly instead of CMake, this seems to be the way to manually add a manifest (I've not tested): |
You're right @GPSnoopy -- there's no point in leaving the old logic in for any users. Users with short paths should be indifferent to the underlying function(s). Thanks for the heads up on the manifest injection via cmake -- that turned out to be just minutes of work. I still think we're going to need some kind of opt-in for running the long-path-under-Windows test. For example, the required long path registry setting wasn't enabled by default on my (admittedly crummy, but fairly new) Windows 10 laptop, and I think I've already seen at least one of the CI build farm machines fail the new test in a manner that suggested the target VM/machine registry flag wasn't set (that's not definitive though). But I like your idea of emitting a message that the test was skipped. I wonder if it would be overkill to dynamically detect the registry setting (https://docs.microsoft.com/en-us/archive/msdn-magazine/2017/may/c-use-modern-c-to-access-the-windows-registry) and accordingly either run the test or skip it (via GTEST_SKIP() or similar) and emit a message. Either way, it's a bit messy because the manifest (AFAICT) must be injected into the final executable to be effective. (I'll double check this -- 'cause I'm far from a Windows expert -- but in my initial testing, inserting an enabling manifest into the arrow.dll or parquet.dll was permitted, but it didn't allow the final executable(s) to open long filenames.) I'm doing that injection for the unit test .exe, but it seems the end user has to do it for each application. [EDIT: Upon reflection, this last point is mostly just a documentation issue and may not even be that depending on how accustomed Windows devs are to doing this type of thing.] |
I opened this issue upstream: https://issues.apache.org/jira/browse/ARROW-8477 . Patch is done except that the test is failing in the MinGW build environment. Suspect that manifest is not getting injected properly there. I'll raise the PR after I sort that out. [WIP here if interested: https://github.com/tpboudreau/arrow/tree/ARROW-8477 ] |
PR ARROW-8477 just merged. |
@tpboudreau You absolute hero! |
Closed as part of ParquetSharp 2.3.0-beta1 release. |
ParquetFileReader when used with a long string path fails:
Looks like ParquetSharp eventually calls https://github.com/apache/arrow/blob/88e3267ad09ca62643b7fe5ffd98eb03b29728fc/cpp/src/arrow/util/io_util.cc#L838 which is calling
_wsopen_s
on Windows, so https://docs.microsoft.com/en-us/cpp/c-runtime-library/path-field-limits?view=vs-2019 appears relevant, so it is just a case of needing to prepend the filename with\\?\
?The text was updated successfully, but these errors were encountered: