Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-6622: [R] Normalize paths for filesystem API on Windows #5445

Closed
wants to merge 4 commits into from

Conversation

nealrichardson
Copy link
Contributor

I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324).

@wesm
Copy link
Member

wesm commented Sep 19, 2019

I think I'm with @pitrou here that introducing automatic path normalization into the C++ filesystem API might be a bit too much scope creep -- we could provide a built-in path normalization function for consumers that don't have it (e.g. Python has pathlib, R has its own function here)

@nealrichardson
Copy link
Contributor Author

I can accept that, though we should document the C++ API's requirements accordingly. I created https://issues.apache.org/jira/browse/ARROW-6629 because I didn't see any filesystem docs at all, so nowhere to just add a note in this patch.

@nealrichardson
Copy link
Contributor Author

This is failing, though differently than before: https://ci.appveyor.com/project/nealrichardson/arrow/builds/27536178/job/gu10af0vck7gum81#L4342

The path normalization is making absolute paths with respect to the current working directory in the R process, and that can clash with whatever the FileSystem object has as its path. I'll need to rethink the approach.

@codecov-io
Copy link

codecov-io commented Sep 21, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@2c7fb24). Click here to learn what that means.
The diff coverage is 80%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #5445   +/-   ##
=========================================
  Coverage          ?   77.18%           
=========================================
  Files             ?       58           
  Lines             ?     4011           
  Branches          ?        0           
=========================================
  Hits              ?     3096           
  Misses            ?      915           
  Partials          ?        0
Impacted Files Coverage Δ
r/R/io.R 77.14% <50%> (ø)
r/R/filesystem.R 70.45% <82.6%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c7fb24...515d710. Read the comment docs.

@nealrichardson
Copy link
Contributor Author

This is passing and ready for review @romainfrancois

Copy link
Contributor

@romainfrancois romainfrancois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use fs:: here ?

r/R/filesystem.R Outdated
shared_ptr(SubTreeFileSystem, xp)
}

clean_path <- function(path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like things potentially better fixed with fs:: cc @jimhester. I know fs has been downgraded to Suggests here, but I'm just worried we'd be missing cases with the implementation below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the fs function that does the winslash normalization we need here, but more than happy to use it if it exists.


fs_relative_path <- function(path) {
# Make sure all path separators are "/", not "\" as on Windows
path_sep <- ifelse(tolower(Sys.info()[["sysname"]]) == "windows", "\\\\", "/")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use .Platform$file.sep

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure about the name of this function, it is not really doing anything to compute a relative path, you could probably just use normalizePath(mustWork = FALSE), assuming path doesn't actually have to exist on the filesystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: .Platform$file.sep, I intended to but then on my Windows VM, .Platform$file.sep reported "/" while tempfile() still created paths with "\" 🤷‍♂

I originally did normalizePath but unfortunately for this use case, normalizePath does path expansion, even with mustWork=FALSE. We need to keep the path relative (to whatever the FileSystem object holds as its cwd) and just need Windows \ to be converted to /. Regular expressions were my last resort and I'm happy to use something better or more robust if it exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks a bit weird to do a no-op replacement on Unix, though I suppose it doesn't work.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct to me. The simple solution should be to start with /-separated slashes and only use that.

r/R/filesystem.R Outdated
shared_ptr(Selector, fs___Selector__create(base_dir, allow_non_existent, recursive))
shared_ptr(
Selector,
fs___Selector__create(fs_relative_path(base_dir), allow_non_existent, recursive)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Will it work correctly if base_dir is e.g. a path in an S3 filesystem, or will it produce bogus results?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for all such cases below.

@@ -83,7 +83,6 @@ test_that("SubTreeFilesystem", {
expect_is(st_fs, "FileSystem")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you should simply /-normalize td above. Should be easy :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At worse you simply string-replace \ with / (but only when on Windows :-)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, then you shouldn't call it fs_relative_path...

@pitrou
Copy link
Member

pitrou commented Sep 23, 2019

Also note that SubTreeFileSystem is not really needed by end-users. It's mostly useful for internal C++ testing, because there is a sizable generic filesystem test suite that expects to write at the root of an empty filesystem.

@nealrichardson
Copy link
Contributor Author

I renamed the functions to clean_path_abs() and clean_path_rel(). Hopefully that makes their intent and output more clear.

I wouldn't be surprised if we had to revisit this in the future, but for now this seems like a step forward, so I'll merge unless there are objections.

@nealrichardson nealrichardson deleted the win-fs-fix branch September 25, 2019 14:05
alippai pushed a commit to alippai/arrow that referenced this pull request Sep 26, 2019
I fixed this in R, though I wonder if that's not enough. At a minimum, the C++ docs should note this requirement (@pitrou maybe you know the best place in the docs to add this?), and I still think it would be nice if all of this path normalization were handled in C++ (cf. https://issues.apache.org/jira/browse/ARROW-6324).

Closes apache#5445 from nealrichardson/win-fs-fix and squashes the following commits:

515d710 <Neal Richardson> Rename functions
1775a6d <Neal Richardson> Fix two other absolute paths I missed
0dce0d7 <Neal Richardson> Munge paths directly on windows
f03815e <Neal Richardson> Normalize paths for filesystem API on Windows

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants