-
Notifications
You must be signed in to change notification settings - Fork 979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: expansion of melt functionality for handling names of output #1547
Comments
Good point. It'd be nice to have @jangorecki do you think you can take a look at this? We'll need a way to extract the names and levels for value/variable column.. |
@MichaelChirico, It seems like we're thinking along the same lines for these things. For what it's worth, I always find that "stubs" at the end of a variable name are more cumbersome to deal with. This is the case even with base R's Here's something that I had put together. Something in there slows it down to ~ half the speed of just using Even with that function, I would propose changing the name format to "jan_mean1" and so on. Here's an example:
My rudimentary version of this was
|
@Arun, I think we were discussing about using a character vector in |
@arunsrinivasan sure |
@jangorecki this is not a priority just so you know :-). |
@MichaelChirico I don't see why |
library(data.table)
set.seed(10239)
DT <- setnames(as.data.table(replicate(8, runif(10))),
paste0("meas", rep(1:3, c(3, 3, 2)),
rep(c("_jan", "_feb", "_mar"))))
melt(DT[ , lapply(.SD, mean)],
measure.vars = list(paste0("meas", 1:3, "_jan"),
paste0("meas", 1:3, "_feb"),
paste0("meas", 1:2, "_mar")),
variable.name = "var.label",
value.name = c("jan","feb","mar"))
# var.label jan feb mar
#1: meas1 0.4542663 0.3788571 0.6180074
#2: meas2 0.4204972 0.4789930 0.4648342
#3: meas3 0.5354431 0.4650098 NA
melt(DT[ , lapply(.SD, mean)],
measure.vars = patterns("_jan", "_feb", "_mar"),
variable.name = "var.label",
value.name = c("jan","feb","mar"))
# var.label jan feb mar
#1: meas1 0.4542663 0.3788571 0.6180074
#2: meas2 0.4204972 0.4789930 0.4648342
#3: meas3 0.5354431 0.4650098 NA it trims |
@jangorecki hmm, I thought I had tried that for Certainly this capability is not clear from the man page:
Not clear that it accepts name_s_ -- reads as if While we're on the man page, this phrasing is misleading:
If it were a function, this workaround/approach would work
|
For programmatic use you need to have that call in place. You can do it with substitute(
melt(DT[ , lapply(.SD, mean)],
measure.vars = .measure.vars,
value.name = mos),
list(.measure.vars = as.call(c(as.name("patterns"),paste0("_", mos))))
) Anyway I'm not quite convinced to |
@jangorecki certainly the dox can be improved,
Almost impossible to guess that |
This may have already been covered/ obvious (I see Jan mentioned "this could be extended to extract from names of
(Borrowed from an SO question.) |
Oh I like the named patterns function. |
(I think this still falls under the same general issue:)
Suppose the goal is to have (a1, b1) rbinded over (a2, b2):
Taken from SO: http://stackoverflow.com/q/42375113/ |
@franknarf1 I was just coming back here to suggest the same thing about naming the "arguments" to |
@MichaelChirico , @mattdowle , I like the addition of named patterns. However, I think there were two embedded questions in this issue, but only one of them is closed. Isn't part of the question also how to make the "variable" column retain more descriptive details when |
I'm trying to summarize some data which I have stored (in wide form) as repeated cross-sections; the way to go would appear to be to summarize and melt, but there's some crucial information lost in the process -- variable names are tossed & it's not clear we can predict the resulting order.
An example:
We want to
melt
grouping each month's data, sopatterns
is quite helpful:Looks beautiful, but the output less so:
We've lost a lot of info. Does
value1
correspond tojan
,feb
, ormar
? Doesvariable == 1
meanmeas1
,meas2
, ormeas3
?I hope that
value1
meansjan
,value2
meansfeb
, andvalue3
meansmar
, but this is unclear; ditto thatvaluek
corresponds tovariable == k
. This is especially true if the column order of the input is potentially unknown.The
value.name
andvariable.name
fields are pretty useless to help us here.It seems the robust way to deal with this as of now is to specify the
measure.vars
in alist
, like so:With this, confidence is restored in the order of output, so we can at least rename the output without worry of mis-naming something. But this seems verbose and ugly, and to make
patterns
seem a lot less helpful as a function.The text was updated successfully, but these errors were encountered: