Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in child() #8

Merged
merged 2 commits into from
Dec 19, 2018
Merged

Conversation

heavywatal
Copy link
Contributor

@heavywatal heavywatal commented Dec 19, 2018

The old version fails when .data$label contains numeric characters like '6'
because isTRUE('6' == 6) in R, and it causes superfluous match.
Now .node argument must be always numeric and compared with .data$parent.

library(tidytree)
library(ape)
set.seed(2017)
tree = rtree(4)
x = as_tibble(tree)
y = x %>% dplyr::mutate(label = as.character(sample.int(length(node))))
child(y, 6)

Output from the current version:

Warning in .data$parent == .node :
  longer object length is not a multiple of shorter object length
Calls: child -> child.tbl_tree -> which
Empty data.table (0 rows) of 4 cols: parent,node,branch.length,label

Output from the fixed version:

   parent  node branch.length  label
    <int> <int>         <num> <char>
1:      6     3     0.6743315      6
2:      6     7     0.4349056      7

The old version fails when .data$label contains numeric characters like '4'
because isTRUE('4' == 4) in R, and it causes superfluous match.
Now .node argument must be always numeric and compared with .data$parent.
@GuangchuangYu GuangchuangYu merged commit 330ef6c into YuLab-SMU:master Dec 19, 2018
@heavywatal heavywatal deleted the fix-child branch December 19, 2018 13:53
@GuangchuangYu
Copy link
Member

the original version supports passing .node using node label. I modify it to support back with this feature, see 0df361c.

@heavywatal
Copy link
Contributor Author

Good catch, and sorry, I should have mentioned it more explicitly. But 0df361c seems still unsafe to me because it allows comparison between numerics and characters (character .node vs numeric .data$node, or numeric .node vs character .data$label). Maybe the following code will be safer and more efficient:

child.tbl_tree <- function(.data, .node, ...) {
    valid.tbl_tree(.data)
    if (is.character(.node)) {
        .node <- .data$node[.data$label == .node]
    }
    .data[.data$parent == .node & .data$parent != .data$node,]
}

@GuangchuangYu
Copy link
Member

thanks and incorporated.

@heavywatal
Copy link
Contributor Author

Why do you leave these lines?
https://github.com/GuangchuangYu/tidytree/blob/2fbf75e716e2266ee97cd205a8e566bf434485de/R/offspring.R#L13-L14
The first line still allows dangerous comparison between numeric .data$node and character .node (when a user passed a label). The second line just consumes CPU for nothing.

And the effect of this part is also doubtful:
https://github.com/GuangchuangYu/tidytree/blob/2fbf75e716e2266ee97cd205a8e566bf434485de/R/offspring.R#L17-L19
because returning empty data.frame for unmatched .node seems just enough. Even if you want to validate .node in this function, the current version does nothing for unmatched character/label .node.

So my suggestion is... in my previous comment #8 (comment).

@GuangchuangYu
Copy link
Member

oops, my fault. just incorporated.

GuangchuangYu added a commit that referenced this pull request Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants