New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rowData rownames issue explored at BioC2018 #13

Open
mikelove opened this Issue Jul 26, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@mikelove

mikelove commented Jul 26, 2018

Here is a reproducing example for a potential issue in the devel branch version of SummarizedExperiment. It seems an inconsistency in rownames of rowData and rownames of the SE, which would prevent creation of an SE, is nevertheless allowed during rowData re-assignment.

https://gist.github.com/mikelove/8fb94c610b588d4a6a64848d3fcbdf76

We tried with SummarizedExperiment v1.11.5 (so use.names=TRUE is the default).

Thanks

@hpages

This comment has been minimized.

Show comment
Hide comment
@hpages

hpages Jul 26, 2018

Contributor

Thank Mike. I'll look into this.

Note that it has never been a requirement for z to have the same row names as se when doing rowData(se) <- z so the issue is also present in release. What has changed between release and devel is that the rowData() getter propagates the row names by default in devel whereas in release it didn't so you would need to do z <- rowData(se, use.names=TRUE)[5:1,,drop=FALSE] to propagate them to z.

Also note that a SummarizedExperiment object se is a vector-like object along its 1st dimension. This is formalized by the fact that se is a Vector derivative and that length(se) is the same as nrow(se). This also means that, like any other Vector derivative, se can carry metadata columns that can be accessed with mcols(). In the particular case of a SummarizedExperiment object, the metadata columns can also be accessed with rowData(), which is just an alias for mcols(). So it's important to keep mcols() and rowData() consistent and also to keep the behavior of mcols() consistent across all Vector derivatives. In order to do that, we should consider adding the row names check to the mcols() setter. Then the rowData() setter will automatically follow. This is an important change that can possibly affect many packages around. I need to evaluate this.

Contributor

hpages commented Jul 26, 2018

Thank Mike. I'll look into this.

Note that it has never been a requirement for z to have the same row names as se when doing rowData(se) <- z so the issue is also present in release. What has changed between release and devel is that the rowData() getter propagates the row names by default in devel whereas in release it didn't so you would need to do z <- rowData(se, use.names=TRUE)[5:1,,drop=FALSE] to propagate them to z.

Also note that a SummarizedExperiment object se is a vector-like object along its 1st dimension. This is formalized by the fact that se is a Vector derivative and that length(se) is the same as nrow(se). This also means that, like any other Vector derivative, se can carry metadata columns that can be accessed with mcols(). In the particular case of a SummarizedExperiment object, the metadata columns can also be accessed with rowData(), which is just an alias for mcols(). So it's important to keep mcols() and rowData() consistent and also to keep the behavior of mcols() consistent across all Vector derivatives. In order to do that, we should consider adding the row names check to the mcols() setter. Then the rowData() setter will automatically follow. This is an important change that can possibly affect many packages around. I need to evaluate this.

@mikelove

This comment has been minimized.

Show comment
Hide comment
@mikelove

mikelove Jul 26, 2018

I think the broader consequences should certainly be taken into account, and maybe “fixing” this behavior reported above is not worth the downstream effort of many package maintainers.

mikelove commented Jul 26, 2018

I think the broader consequences should certainly be taken into account, and maybe “fixing” this behavior reported above is not worth the downstream effort of many package maintainers.

@hpages

This comment has been minimized.

Show comment
Hide comment
@hpages

hpages Jul 26, 2018

Contributor

A compromise could be to only issue a warning that the row names differ and are being ignored. That won't break any code and people who have code that triggers the annoying warning will then decide if they want to remove or fix the row names of z before doing mcols(x) <- z.

I'll evaluate the impact of the 1st solution (i.e. raising an error) and if it turns out to be too disruptive we'll go for the 2nd solution (the warning).

Contributor

hpages commented Jul 26, 2018

A compromise could be to only issue a warning that the row names differ and are being ignored. That won't break any code and people who have code that triggers the annoying warning will then decide if they want to remove or fix the row names of z before doing mcols(x) <- z.

I'll evaluate the impact of the 1st solution (i.e. raising an error) and if it turns out to be too disruptive we'll go for the 2nd solution (the warning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment