Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly handle cols argument of measure() #5064

Closed
wants to merge 7 commits into from
Closed

Conversation

tlapak
Copy link
Contributor

@tlapak tlapak commented Jul 4, 2021

Closes #5063

There are two separate issues reported here. One results from col not being properly evaluated at any point. The other from the fact that measure and melt assume that measure was passed the complete list of column names. (measure returns an index into cols and melt applies that to names(data))

The fix feels slightly hacky but seems to be the simplest way of letting measure() know about the full list of column names without exposing this interaction to the user. I also think we should check that cols is only passed a subset of column names.

Todos:

  • Currently fails test 2183.00020 because that call contains two errors and I'm now checking is.character(cols) further up
  • Also fails 2183.78 because I use a slightly different error message for the same test which is now combined with checking that cols is a subset of names(data)
  • New unit tests
  • News item

@tdhock
Copy link
Member

tdhock commented Jul 5, 2021

hi @UweBlock @tlapak thanks for the PR/issue.
The big idea in my design was that the pattern/sep arguments to measure() can and should be used to specify the subset of columns to melt, and the user should not have to specify the cols argument.
So rather than changing the code, maybe we should change the documentation? to say something like, "cols will be automatically set by melt.data.table, so the user should not provide this argument."

@tdhock
Copy link
Member

tdhock commented Jul 5, 2021

or better: "cols: a character vector that is automatically provided, with all of the column names of the data table input to melt. The user should NOT provide this argument; to specify the columns to be measured, use the pattern argument."

@MichaelChirico
Copy link
Member

Maybe I'm missing something obvious, but if cols is not supposed to be supplied by the user, why keep it in the public API? Shouldn't we just remove it from the signature?

@tdhock
Copy link
Member

tdhock commented Jul 6, 2021

well, I would be all for removing the cols argument, because indeed I don't think the user should ever have to use it.
but then how does measure() know what are the column names of the table passed to melt?
I based my design on the patterns() function which also has the cols argument:

patterns = function(..., cols=character(0L)) {

for completeness the cols argument to patterns is documented as follows:

  \item{cols}{A character vector of names to which each pattern is matched.}

@avimallu
Copy link
Contributor

avimallu commented Jul 6, 2021

patterns = function(..., cols=character(0L)) {

Perhaps it could be mentioned along with the default argument to specify that it is not required? To clarify its use further, the help text

Perl-compatible regex with capture groups to match to cols. Columns that match the regex are considered measure variables.

or that for the cols argument can be specified to not require cols to be user specified?

Or, a good way is to mention it identical to the way the group.desc arugment is used. Specify that it is internal?

@tdhock
Copy link
Member

tdhock commented Aug 12, 2021

hi @tlapak would you agree that to move forward we should either close or substantially modify this PR? (remove changes to code, add new docs for cols argument)

@tlapak
Copy link
Contributor Author

tlapak commented Aug 26, 2021

@tdhock, sorry for the long radio silence. I've been a bit swamped. I see what you mean when you say the argument shouldn't have to be used but the original issue also made sense to me. Even if I probably wouldn't have used the argument its presence and documentation is somewhat suggestive. But I didn't mean to step on anyone's toes here.

I don't feel super strongly about this but would agree with @MichaelChirico that having an argument exposed that shouldn't be used is not exactly ideal. Maybe it would work to have a public measure() without the cols argument as a thin wrapper around the current one. Change that to an internal .measure() that gets its arguments filled by the wrapper? Haven't thought about that too much but let me close this PR for now since its not the direction we'll be going either way.

@tlapak tlapak closed this Aug 26, 2021
@tdhock
Copy link
Member

tdhock commented Aug 27, 2021

ok thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Odd behaviour of melt() when using the cols argument in measure()
4 participants