Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort case-insensitive to make RcppExports more deterministic #878

Merged
merged 1 commit into from Jul 12, 2018

Conversation

Projects
None yet
6 participants
@jackwasey
Copy link
Contributor

jackwasey commented Jul 12, 2018

compileAttributes uses list.files which gives a locale-dependent ordering of the matched files. This differs between C and en-us.UTF-8, with C ordering putting upper-case file names first. This makes the code output to RcppExports.* different depending on the locale.

It was not obvious to me how to make this locale independent using base R, but at least ignoring case when calling list.files eliminates some of the discrepancies produced with these two widely used collations. People calling compileAttributes() from locales which are not ordered A-Z will see differences in the output order (and thus order of functions in the generated code).

@kevinushey

This comment has been minimized.

Copy link
Contributor

kevinushey commented Jul 12, 2018

A number of packages that have required locale-independent sorting have implemented a sort_c function, e.g. see what roxygen2 does here:

https://github.com/klutometis/roxygen/blob/bbf259ddee033a873426ded1e83cdd344b986595/R/util-locale.R#L1-L14

We could potentially do something similar in Rcpp -- explicitly set LC_COLLATE, sort the list of files, and then restore the old LC_COLLATE.

@kevinushey

This comment has been minimized.

Copy link
Contributor

kevinushey commented Jul 12, 2018

But if setting ignore.case is enough to handle the most common cases then I think that's sufficient.

@codecov-io

This comment has been minimized.

Copy link

codecov-io commented Jul 12, 2018

Codecov Report

Merging #878 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #878   +/-   ##
=======================================
  Coverage   90.09%   90.09%           
=======================================
  Files          71       71           
  Lines        3271     3271           
=======================================
  Hits         2947     2947           
  Misses        324      324
Impacted Files Coverage Δ
R/Attributes.R 90.66% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5591c5...76819ad. Read the comment docs.

@jackwasey

This comment has been minimized.

Copy link
Contributor Author

jackwasey commented Jul 12, 2018

I'll leave it to your judgement: I agree this is a quick win which, at least for me, covers my fairly vanilla Ubuntu and Mac setups, which were producing different RcppExports code. I'm sure those case-insensitive windows users would also have bad interactions when also developing also on case-sensitive operating systems.

@eddelbuettel

This comment has been minimized.

Copy link
Member

eddelbuettel commented Jul 12, 2018

This is an easy win. We could always look into setting locale to C, sorting, then resetting as @kevinushey suggested. But until then this works for me.

@eddelbuettel eddelbuettel merged commit 1986ded into RcppCore:master Jul 12, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

jackwasey pushed a commit to jackwasey/icd that referenced this pull request Jul 31, 2018

@hadley

This comment has been minimized.

Copy link
Contributor

hadley commented Sep 25, 2018

Could we please change the locale here? That will also ensure consistency from system to system.

@eddelbuettel

This comment has been minimized.

Copy link
Member

eddelbuettel commented Sep 25, 2018

Quoting @jjallaire in the same thread :

A recent change in Rcpp (v0.12.18 from a couple of months ago) changed the sorting behavior to be more stable across locales/operating systems: #878

Net is that this should be a one-time issue and that the order will be better preserved across environments in the future.

@hadley

This comment has been minimized.

Copy link
Contributor

hadley commented Sep 26, 2018

That link is to this issue?

The problem with only ignoring the case is that you will still get a different default ordering in locales that don't order letters the same way as English. This means that collaborators with different locales will still generate different RcppExports.

@jjallaire

This comment has been minimized.

Copy link
Member

jjallaire commented Sep 26, 2018

@jackwasey or @kevinushey Do you see a way to change this so that we can completely stable ordering across all systems and locales?

@kevinushey

This comment has been minimized.

Copy link
Contributor

kevinushey commented Sep 26, 2018

We could set the locale to C temporarily and then sort the discovered files, e.g.

cppFiles <- list.files(srcDir, pattern = "\\.((c(c|pp)?)|(h(pp)?))$", ignore.case = TRUE)

locale <- Sys.getlocale(category = "LC_COLLATE")
Sys.setlocale(category = "LC_COLLATE", locale = "C")
cppFiles <- sort(cppFiles)
Sys.setlocale(category = "LC_COLLATE", locale = locale)
@jackwasey

This comment has been minimized.

Copy link
Contributor Author

jackwasey commented Sep 26, 2018

@kevinushey this looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.