Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with aggregate command (from stats) in notebooks only #116

Closed
etabeta78 opened this issue Mar 20, 2019 · 5 comments
Closed

Issue with aggregate command (from stats) in notebooks only #116

etabeta78 opened this issue Mar 20, 2019 · 5 comments

Comments

@etabeta78
Copy link

Jupyter notebooks with R kernel seem to have issues in some cases when aggregate (from stats package in base R) returns multiple values (e.g. a vector).

This can be reproduced by running the following command

aggregate(. ~ Species, iris, range)

which results in an error:

ERROR while rich displaying an object: Error in vapply(part, format, character(nrow(part))): values must be length 3, but FUN(X[[2]]) result is length 6

The issue is mitigated by the fact that if I run

aggregate(. ~ Species, iris, range, simplify=F)

the code works, but this way I get a data.frame with list columns and not the same object that I can obtain from R console with simplify=T (default value).

The issue seems strictly related either to the way IRkernel tries to display the data.frame, since the command

print(aggregate(. ~ Species, iris, range))

produce the correct output. But it surprised me that the code works flawlessly in R and Rmarkdown and it fails on IRkernel notebooks

I was unable to find a similar issue on the github tracker (searching for vapply and aggregate or print), so I hope this is not a duplicated report.

My system has:

  • R 3.5.1
  • IRkernel_0.8.15.9000
  • IRdisplay_0.7.0
  • repr_0.19.2

but the issue is reproducible also on my colleague laptop.

@flying-sheep
Copy link
Member

That’s a super weird structure.

> str(aggregate(. ~ Species, iris, range))
 'data.frame':	3 obs. of  5 variables:
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
 $ Sepal.Length: num [1:3, 1:2] 4.3 4.9 4.9 5.8 7 7.9
 $ Sepal.Width : num [1:3, 1:2] 2.3 2 2.2 4.4 3.4 3.8
 $ Petal.Length: num [1:3, 1:2] 1 3 4.5 1.9 5.1 6.9
 $ Petal.Width : num [1:3, 1:2] 0.1 1 1.4 0.6 1.8 2.5

A data frame with matrices as columns?

Hmm, no idea how to do that best.

@flying-sheep flying-sheep transferred this issue from IRkernel/IRkernel Mar 20, 2019
@etabeta78
Copy link
Author

Thanks a lot for taking a look (and for moving it in the proper component repo)!

Base R coerces the matrix columns to separate data.frame columns, appending a column-in-matrix index to the column main name (i.e. you get Sepal.Length.1 & Sepal.Length.2 and so on)...

This happens with any kind of multivalued function passed through aggregate, for what is worth:
range is simply the easiest reproducible case I found

@flying-sheep
Copy link
Member

Base R coerces the matrix columns to separate data.frame columns

Yes, for printing. Doesn’t seem like a good idea: People will think df$Sepal.Width.1 is an actual column and will get confused when it isn’t.

I’m thinking that we could handle it by collapsing the rows of the nested matrices with commas:

Species Sepal.Length Sepal.Width
setosa 4.3, 5.8 2.3, 4.4
versicolor 4.9, 7.0 2.0, 3.4

@etabeta78
Copy link
Author

etabeta78 commented Mar 20, 2019

That would be like forcing simplify=F for situations where the simplification triggers vapply errors: I think it is absolutely reasonable :-)

@etabeta78
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants