Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sep argument ignored in dcast #2122

Closed
RoyalTS opened this issue Apr 19, 2017 · 4 comments
Closed

sep argument ignored in dcast #2122

RoyalTS opened this issue Apr 19, 2017 · 4 comments
Labels

Comments

@RoyalTS
Copy link
Contributor

RoyalTS commented Apr 19, 2017

dcast seems to be ignoring the sep argument:

dt <- data.table(user=rep(1:10, each=2),
                 measurement=rep(c('a', 'b'), times=10),
                 value=rnorm(20))

dcast(dt, user~measurement, sep='_')

This yields

    user          a           b
 1:    1 -1.1240899 -1.00620808
 2:    2  0.3236956 -1.86065046
 3:    3 -0.9664502  0.21878106
 4:    4  0.9231289 -0.86210577
 5:    5  1.8197930 -0.51468111
 6:    6  0.5600921 -0.64247822
 7:    7  1.0440452 -0.03289074
 8:    8 -0.1192868 -0.79392246
 9:    9  1.0655679 -1.83930778
10:   10  0.2463848  0.43551250

Given _ as the sep – or indeed if I had left the sep argument blank – the column names should be measurement_a and measurement_b, no?

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.4

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.10.4

loaded via a namespace (and not attached):
[1] tools_3.3.2
@MichaelChirico
Copy link
Member

This is expected behavior, but perhaps ?dcast could be more of a help here. sep is used when you have multiple value.var:

dt <- data.table(user=rep(1:10, each=2),
                 measurement=rep(c('a', 'b'), times=10),
                 value=rnorm(20),
                 value2 = rnorm(20))
dcast(dt, user~measurement, sep='_', value.var = c('value', 'value2'))
#     user     value_a    value_b    value2_a     value2_b
#  1:    1  2.47824187 -0.4655328 -1.81186160 -0.295153290
#  2:    2  0.27597669  0.0499813  1.09270848  1.465269972
#  3:    3 -0.51353846 -1.4100579  1.65814414  1.841348213
#  4:    4 -1.14161843  0.3337608 -0.05070586  0.287335488
#  5:    5 -1.11248648 -0.2373563  0.72059394 -1.176779744
#  6:    6 -0.74958142 -0.4015603  2.47068888 -0.759265311
#  7:    7  0.22767623 -1.6260782  1.36925862  0.476299662
#  8:    8  0.61496455  1.6415110 -0.72233829 -0.003333061
#  9:    9  0.51846450  1.1448276  0.16129185  0.444361567
# 10:   10  0.05730503  1.1189461  1.34467875 -0.477063731

@RoyalTS RoyalTS changed the title sep argument ignore in dcast sep argument ignored in dcast Apr 21, 2017
@RoyalTS
Copy link
Contributor Author

RoyalTS commented Apr 28, 2017

The case in which variables are named value_a and value_b while in wide form with the separator being _ or any other separator seems to me to be super common. That there's no easy way to produce that wide format from a long-format table when dcast has a sep argument is super confusing.

So yeah, some clarification in the docs at the very least would be great.

@franknarf1
Copy link
Contributor

sep is used when you have multiple value.var

@MichaelChirico It is also used to separate multiple functions. Probably related:

@arunsrinivasan
Copy link
Member

The behaviour here is consistent (as explained above). If you'd like to issue a PR reflecting the necessary changes to help file, it'd be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants