Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upFixing crash when attempting to join on character(0) #4272
Conversation
| @@ -21,8 +21,8 @@ merge.data.table = function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FAL | |||
| if (!missing(by) && !missing(by.x)) | |||
| warning("Supplied both `by` and `by.x/by.y`. `by` argument will be ignored.") | |||
| if (!is.null(by.x)) { | |||
| if ( !is.character(by.x) || !is.character(by.y)) | |||
| stop("A non-empty vector of column names are required for `by.x` and `by.y`.") | |||
| if (length(by.x) == 0L || !is.character(by.x) || !is.character(by.y)) | |||
MichaelChirico
Mar 2, 2020
Member
aha! I was just looking at this code yesterday and something looked funny but I didn't bother stress testing it. nice catch!
aha! I was just looking at this code yesterday and something looked funny but I didn't bother stress testing it. nice catch!
| @@ -3031,7 +3031,7 @@ isReallyReal = function(x) { | |||
| onsub = as.call(c(quote(c), onsub)) | |||
| } | |||
| on = eval(onsub, parent.frame(2L), parent.frame(2L)) | |||
| if (!is.character(on)) | |||
| if (length(on) == 0L || !is.character(on)) | |||
MichaelChirico
Mar 2, 2020
Member
yes, perfect. we also shouldn't have gotten to checking by.x&by.y separately in the first place because here by.x=by.y so simply by should be used
yes, perfect. we also shouldn't have gotten to checking by.x&by.y separately in the first place because here by.x=by.y so simply by should be used
tlapak
Mar 2, 2020
Author
Contributor
I'm not quite sure what you mean. At this point we're not checking separately if we come through merge. merge sets by=by.x and then later calls y[x, on=by]. If we don't check in merge we catch it here but this is the point where it gets caught when using x[y] syntax.
(I would've been really mad if you had pushed a fix yesterday.)
I'm not quite sure what you mean. At this point we're not checking separately if we come through merge. merge sets by=by.x and then later calls y[x, on=by]. If we don't check in merge we catch it here but this is the point where it gets caught when using x[y] syntax.
(I would've been really mad if you had pushed a fix yesterday.)
Codecov Report
@@ Coverage Diff @@
## master #4272 +/- ##
=======================================
Coverage 99.60% 99.60%
=======================================
Files 73 73
Lines 14027 14029 +2
=======================================
+ Hits 13972 13974 +2
Misses 55 55
Continue to review full report at Codecov.
|
|
Could you please fix that in bmerge.c? I just run into that problem using internal functions. Segfaults are pretty severe issues that should be eliminated, not only from exported API, but in general. |
|
I'll have a look at it but it may take a bit for me to get the chance to actually write and test the fix. But it should just be the same length check only in the C function and then raise an internal error. I assume you're calling bmerge directly? |
|
Yes, somewhere around
in SEXP bmerge, to check those are non-zero length, should do. If you remove your current fixes, then you can easily reach there with your unit tests. Which will be probably good, to handle that in single place. |
|
Now also closes #4499. I opted to not raise an error there in order to pass 2126.1 and 2126.2/be consistent with the behavior expected there. I do think there is an argument to be made for all those cases to be an error or for joins with empty data.tables to return an empty data.table. The current behavior is close-ish though. I also think it's better to leave the argument checks for joins with |
|
Thanks for incorporating my feedback. It should be safe to put it into coming release. |
|
Thanks @tlapak! I've invited you to be project member, please accept using the button that should appear on your GitHub projects or profile page. That way in future you can create branches in the main project directly. I'll add you to contributors list as well in a follow up commit (easier for me than pushing to your fork). |
Attempting to join or merge on
character(0)currently crashes R in two out of three possible cases. At least on Windows:Turns out that merge checks the length of
bybut does not check the length ofby.xorby.y(either is sufficient as the equality is checked). Likewise,[.data.table, or rather.parse_on, doesn't check the length ofon. I have added the checks as well as tests for all three cases.(Actually, only checking in
.parse_onwould be sufficient to prevent the crash, but this way produces a more useful error message when usingmerge.)I have also taken the liberty of making a grammar fix to the relevant error message of
merge, hope that is acceptable.Now also closes #4499