-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
%notin% is not safe in the way it handles NA by default #5481
Comments
I don't see the point in your example. What would be your supposed output for |
My point is that I expected the following
The output below is what I expected for
I don't really know how other users typically use %in% in negation but my tipical use is |
@Kamgang-B I don't think this is an issue at all. |
I think it is a good question. |
Maybe I'm missing the meaning... dt[x %notin% 1:3 implies that x is not in 1,2,3 and that would include NA because NA is not 1, 2, or 3 It would be great if R could handle a mixed type vector or list though I'm not sure that it can, I think NA being skipped in this case makes perfect sense. data.table is especially useful whereby you can assign a value in I would discourage chaining it or building "NA handling" into the awesome %notin% because you can repeat the above any number of times in very legible manner, without needing to learn the internal mechanics of the method and its NA handling or type safety. |
I agree with @datocrats-org that the behavior makes sense because just as |
@hdn012 @datocrats-org @mczek
I think that I was not clear enough when talking about the functional form. I know that
That's why I think that it would be nice to complement Kindly consider exporting a new more flexible function |
I like the idea |
If there is nobody willing to submit such extended function |
IMHO, the typical use of the function
%notin%
is likely expected to beDT[lhs %notin% rhs), ...]
where 1- rhs contains no missing value and 2- the user wants to return/modify rows where lhs contains only values in rhs.Also, I don't expect users to do something like
!lhs %notin%
(since%in%
is already convenient for this operation).For these reasons, I think that it is better to be on the safe side by allowing
DT[lhs %notin% rhs,...]
, to return/modify only rows whose values are in rhs. In doing so, the user will have to explicitly add NA to the rhs if he also wants to include rows with missing values.Consider the following example:
In doing this operation, I don't really think users expect the rows where x is NA to be modified.
So, even if %notin% is meant to provide a more memory-efficient version of
!lhs %in% rhs%
(IIRW), I also think that it would better to handle missing values more safely.P.S.: I wonder if it's also possible to export a functional alternative of %notin%. something like
notin(x, table, nomatch=-1L)
.The text was updated successfully, but these errors were encountered: