check.names is not working #102

yxomo · 2015-07-20T12:32:23Z

When loading a table from an Excel worksheet using read.xlsx, spaces in column names are still converted to dots (.), even when check.names = FALSE is specified. The reason is most likely due to the fact that column names are modified using the function clean_names at loading. For example, I saw the following line in readWorkbook.R:

m <- .Call("openxlsx_readWorkbook", v, r, string_refs, isDate, nRows, colNames, skipEmptyRows, origin, clean_names, PACKAGE = "openxlsx")

awalker89 · 2015-07-21T08:20:37Z

Yeah. This has been the behaviour of read.xlsx for a few versions now and changing this could upset a lot of people. For now this will stay the way it is.
I'll leave this issue open to see if other people see this as an issue.

yxomo · 2015-07-24T07:18:42Z

Could you at least add a flag to openxlsx that disables the transformation
of spaces? I am generating an Excel output file based on one or more Excel
input files, and keeping column names consistent (without giving sets of
allowed/disallowed characters) would be extremely helpful.

Yeah. This has been the behaviour of read.xlsx for a few versions now and

changing this could upset a lot of people. For now this will stay the way
it is.
I'll leave this issue open to see if other people see this as an issue.

—
Reply to this email directly or view it on GitHub
#102 (comment).

r2evans · 2015-08-16T06:58:59Z

My vote: allow the check.names to do what it implies, perhaps allowing spaces in names. This option and behavior is present in base functions (e.g., read.csv) so should be both consistent and not completely alien to users.

I was surprised when reading this issue to learn that check.names does not default to TRUE. My suggestion is to change the default to TRUE but provide a transition period for the behavior to fully transition.

In package .onLoad (or .onAttach?), mention the default change. This message should stay throughout transition period.

2a. In readWorkbook, if check.names not specified and "bad column name" found, convert as before but provide a warning. (If check.names is being specified, then no warning, just do as it asks.)

2b. On a subsequent transitional version, if check.names is again not specified, then provide a warning but do not convert the column name.

If you change the default behavior from FALSE to TRUE, then you shouldn't be impacting current users: if they don't specify check.names and are not trying to keep spacey-names, then they won't notice a difference. If they do specify it and haven't raised an issue yet, then it perhaps doesn't impact them (at all or enough to be the squeaky wheel).

richierocks · 2015-12-31T06:30:57Z

I just fell over this too. It isn't intuitive that column names are converted when check.names = FALSE. @r2evans' plan sounds reasonable.

warnes · 2016-02-17T21:10:48Z

+1 to have check.names=TRUE be the default, and to not convert spaces in names to dots when check.names==FALSE.

DarwinAwardWinner · 2016-02-22T19:53:38Z

+1 for this. At least document that the option doesn't do what it says, and provide some way to get at the un-modified names.

PeterVermont · 2017-03-13T15:07:23Z

I also ran into this and would appreciate it ifspaces could be preserved in column names.

rorynolan · 2017-08-21T09:55:06Z

I would also like spaces to be preserved. Thanks for the package!

YiyingW · 2017-10-26T23:06:13Z

I would also appreciate it if spaces could be preserved in column names. Currently, I use colnames(df) <- gsub("\\.", " ", colnames(df)) to avoid the problem.

cchng · 2018-02-05T17:42:59Z

+1

r2evans · 2018-02-05T18:42:36Z

@cchng, might I suggest you use the thumbs-up reaction or such on the initial post or most influential comment? Github issues can stretch rather long with individual "+1" comments. In large projects, the maintainers gauge which issues to prioritize based on (among other things) popular support, and it's much easier to look at the "reaction counter" on a single post than manually count all of the "individual positive commenters". Not a rant, just a suggestion. Thanks for piping in!

WilDoane · 2018-07-14T17:58:51Z

The problem is worse than has been described thus far: clean_names() as defined removes leading and trailing runs of spaces as well as converting internal runs of spaces to a single period.

So, when reading a file with a column header of clean_names(" a bc ") -> "a.bc"

In other words, this behavior changes the user's data, irreversibly (or, at least, without clumsy user-bolted-on solutions). And you do so despite the explicit presence of options such as check.names being FALSE.

Opening a workbook and saving it out again should be (IMHO) an identity operation: same data out as in, barring other user caused changes.

r2evans · 2018-07-14T18:27:02Z

@pooranis, I disagree with your suggestion for clean.names, though predominantly because it is inconsistent with the check.names= argument in base functions (which is at the root of my suggested transition).

@awalker89, it's been (almost) three years since this issue was initially opened, and there has been some discussion about the behavior. Any more thoughts?

WilDoane · 2018-07-14T19:19:55Z

Barring a resolution,

read.xlsx(file, sheet, check.names = FALSE) %>%
  magrittr::set_colnames(read.xlsx(file, sheet,
                                   rows = 1,
                                   check.names = FALSE,
                                   colNames = FALSE) %>%
                         as.character()
  )

preserves column datatypes. The messiness of that alone should inspire writing improved code. :) Anyone have a betterish solution?

My only other solution, thus far, converts all columns to character (due to using colnames = FALSE to be able to grab the proper, unaltered names from the first row).

conge · 2018-07-18T19:32:11Z

@WilDoane
your solution works. It runs slower when reading from an xlsx with many sheets in it. But it's much better than nothing. I thank you for that.

taking the solutions from you and @andre Elrico from stacks overflow, now I have a solution to read a multi-sheets xlsx into a list without name checking...

Many thanks to you!

read.xlsx_WilDoane <- function(xlsxFile, sheet) {
  # taken from WilDoane @https://github.com/awalker89/openxlsx/issues/102
  openxlsx::read.xlsx(xlsxFile=xlsxFile, sheet=sheet, check.names = FALSE) %>%
    magrittr::set_colnames(openxlsx::read.xlsx(xlsxFile, sheet,
                                               rows = 1,
                                               check.names = FALSE,
                                               colNames = FALSE) %>%
                             as.character()
                           )
}


read_all_sheets <- function (filename) {
  # solution from https://stackoverflow.com/questions/12945687/read-all-worksheets-in-an-excel-workbook-into-an-r-list-with-data-frames
  # by Andre Elrico
  sheets <- openxlsx::getSheetNames(filename)
  #SheetList <- lapply(sheets,openxlsx::read.xlsx,xlsxFile=filename)
  SheetList <- lapply(sheets,read.xlsx_WilDoane,xlsxFile=filename)
  names(SheetList) <- sheets  
  SheetList
}

sorhawell · 2019-07-12T14:40:59Z

I believe it is time to fix this...
I lost a few hours, as I expected check.names=FALSE meant exactly that and not =TRUE.

always thx for your open source contributions.

Dheinny · 2019-07-28T19:26:54Z

I completely agree with @sorhawell and @r2evans.
It's been 4 years since the issue was opened and nothing was done about that. And implement r2evans' suggested solutions it's not that hard and wouldn't be impacting current users. Please, fix it for us.

amanda-hi · 2019-10-15T18:38:22Z

Just adding this comment as another reminder - the current documentation for the check.names argument is, as described above, deceptive and confusing. I did a quick google search when I found that setting it to "false" wasn't doing what I expected, and it brought me here. I think that shows that this is still a relevant issue for users of the package.

r2evans · 2019-10-15T18:53:08Z

To be frank, @aleonti, while relevant to users of the package, there has been no development on this project at all since commit ead0038 (Aug 2018), so it's not just this bug that is stagnant. There have been other discussion (#484) on the orphaned status of this package. If you have sufficient C++ skills, perhaps you would consider forking and taking over maintenance of the package; unfortunately, it's a tall order to do that, and most R users are unlikely to have the C++ skills to do it solidly. *shrug*

AB-Kent · 2021-03-08T20:38:10Z

Maintenance of this package has moved; this issue is now at ycphs/openxlsx#33

r2evans mentioned this issue Dec 27, 2019

check.names behavior ycphs/openxlsx#33

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check.names is not working #102

check.names is not working #102

yxomo commented Jul 20, 2015

awalker89 commented Jul 21, 2015

yxomo commented Jul 24, 2015

r2evans commented Aug 16, 2015

richierocks commented Dec 31, 2015

warnes commented Feb 17, 2016

DarwinAwardWinner commented Feb 22, 2016

PeterVermont commented Mar 13, 2017

rorynolan commented Aug 21, 2017

YiyingW commented Oct 26, 2017 •

edited

Loading

cchng commented Feb 5, 2018

r2evans commented Feb 5, 2018

WilDoane commented Jul 14, 2018 •

edited

Loading

r2evans commented Jul 14, 2018

WilDoane commented Jul 14, 2018 •

edited

Loading

conge commented Jul 18, 2018 •

edited

Loading

sorhawell commented Jul 12, 2019 •

edited

Loading

Dheinny commented Jul 28, 2019

amanda-hi commented Oct 15, 2019

r2evans commented Oct 15, 2019

AB-Kent commented Mar 8, 2021

check.names is not working #102

check.names is not working #102

Comments

yxomo commented Jul 20, 2015

awalker89 commented Jul 21, 2015

yxomo commented Jul 24, 2015

r2evans commented Aug 16, 2015

richierocks commented Dec 31, 2015

warnes commented Feb 17, 2016

DarwinAwardWinner commented Feb 22, 2016

PeterVermont commented Mar 13, 2017

rorynolan commented Aug 21, 2017

YiyingW commented Oct 26, 2017 • edited Loading

cchng commented Feb 5, 2018

r2evans commented Feb 5, 2018

WilDoane commented Jul 14, 2018 • edited Loading

r2evans commented Jul 14, 2018

WilDoane commented Jul 14, 2018 • edited Loading

conge commented Jul 18, 2018 • edited Loading

sorhawell commented Jul 12, 2019 • edited Loading

Dheinny commented Jul 28, 2019

amanda-hi commented Oct 15, 2019

r2evans commented Oct 15, 2019

AB-Kent commented Mar 8, 2021

YiyingW commented Oct 26, 2017 •

edited

Loading

WilDoane commented Jul 14, 2018 •

edited

Loading

WilDoane commented Jul 14, 2018 •

edited

Loading

conge commented Jul 18, 2018 •

edited

Loading

sorhawell commented Jul 12, 2019 •

edited

Loading