Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sort case-insensitive to make RcppExports more deterministic #878

Merged
merged 1 commit into from
Jul 12, 2018

Conversation

jackwasey
Copy link
Contributor

compileAttributes uses list.files which gives a locale-dependent ordering of the matched files. This differs between C and en-us.UTF-8, with C ordering putting upper-case file names first. This makes the code output to RcppExports.* different depending on the locale.

It was not obvious to me how to make this locale independent using base R, but at least ignoring case when calling list.files eliminates some of the discrepancies produced with these two widely used collations. People calling compileAttributes() from locales which are not ordered A-Z will see differences in the output order (and thus order of functions in the generated code).

@kevinushey
Copy link
Contributor

A number of packages that have required locale-independent sorting have implemented a sort_c function, e.g. see what roxygen2 does here:

https://github.com/klutometis/roxygen/blob/bbf259ddee033a873426ded1e83cdd344b986595/R/util-locale.R#L1-L14

We could potentially do something similar in Rcpp -- explicitly set LC_COLLATE, sort the list of files, and then restore the old LC_COLLATE.

@kevinushey
Copy link
Contributor

But if setting ignore.case is enough to handle the most common cases then I think that's sufficient.

@codecov-io
Copy link

Codecov Report

Merging #878 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #878   +/-   ##
=======================================
  Coverage   90.09%   90.09%           
=======================================
  Files          71       71           
  Lines        3271     3271           
=======================================
  Hits         2947     2947           
  Misses        324      324
Impacted Files Coverage Δ
R/Attributes.R 90.66% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e5591c5...76819ad. Read the comment docs.

@jackwasey
Copy link
Contributor Author

I'll leave it to your judgement: I agree this is a quick win which, at least for me, covers my fairly vanilla Ubuntu and Mac setups, which were producing different RcppExports code. I'm sure those case-insensitive windows users would also have bad interactions when also developing also on case-sensitive operating systems.

@eddelbuettel
Copy link
Member

This is an easy win. We could always look into setting locale to C, sorting, then resetting as @kevinushey suggested. But until then this works for me.

@eddelbuettel eddelbuettel merged commit 1986ded into RcppCore:master Jul 12, 2018
jackwasey pushed a commit to jackwasey/icd that referenced this pull request Jul 31, 2018
@hadley
Copy link
Contributor

hadley commented Sep 25, 2018

Could we please change the locale here? That will also ensure consistency from system to system.

@eddelbuettel
Copy link
Member

Quoting @jjallaire in the same thread :

A recent change in Rcpp (v0.12.18 from a couple of months ago) changed the sorting behavior to be more stable across locales/operating systems: #878

Net is that this should be a one-time issue and that the order will be better preserved across environments in the future.

@hadley
Copy link
Contributor

hadley commented Sep 26, 2018

That link is to this issue?

The problem with only ignoring the case is that you will still get a different default ordering in locales that don't order letters the same way as English. This means that collaborators with different locales will still generate different RcppExports.

@jjallaire
Copy link
Member

@jackwasey or @kevinushey Do you see a way to change this so that we can completely stable ordering across all systems and locales?

@kevinushey
Copy link
Contributor

kevinushey commented Sep 26, 2018

We could set the locale to C temporarily and then sort the discovered files, e.g.

cppFiles <- list.files(srcDir, pattern = "\\.((c(c|pp)?)|(h(pp)?))$", ignore.case = TRUE)

locale <- Sys.getlocale(category = "LC_COLLATE")
Sys.setlocale(category = "LC_COLLATE", locale = "C")
cppFiles <- sort(cppFiles)
Sys.setlocale(category = "LC_COLLATE", locale = locale)

@jackwasey
Copy link
Contributor Author

@kevinushey this looks good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants