Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve helpfulness of warning message during on-assignment type coer… #2989

Merged
merged 5 commits into from Aug 10, 2018

Conversation

MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 1, 2018

…cion.

Original tweet & some discussion here:

https://twitter.com/sarahbeeysian/status/1021359529789775872

@franknarf1
Copy link
Contributor

@franknarf1 franknarf1 commented Aug 4, 2018

Maybe consider changing tone instead of removing the sentence? Like "Setting the column type as intended up front is the best way to get around this if your RHS has the correct type." or something.

Fwiw, for me...

set the column type correctly up front

... was the most useful part of the message as a new user and probably something I wouldn't think of from reading the new message.

inspect your initial LHS types and be sure they're as expected

... gets at the point too indirectly and no longer contains a recommended fix.

I don't find the twitter complaint ("it's rude because I know better already") convincing as an argument to remove it when other users might know less than they do.


Also, the first pitfall you highlight

A common source of this error is mismatch of return types when assigning by group, especially in edge cases.

...has its own (error) message already:

library(data.table)
DT = data.table(a = 1:2)
DT[, v := if (.GRP==1L) "a" else 1L, by=a]
> Error in `[.data.table`(DT, , `:=`(v, if (.GRP == 1L) "a" else 1L), by = a) : 
>   Type of RHS ('integer') must match LHS ('character'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

Also, you refer to it as "this error", though it is a warning, which may be confusing.

…r truncation and informs user the first truncated item.
@mattdowle
Copy link
Member

@mattdowle mattdowle commented Aug 10, 2018

Please @franknarf1 and @MichaelChirico review latest refinement before merge. (The drop down under reviewers won't let me pick either of you. Michael because you submitted the PR I guess and Frank because you're not a project member it seems?! But you can both add your reviews I believe.)

Copy link
Contributor

@franknarf1 franknarf1 left a comment

Looks good, but the last sentence of the new message refers to class instead of type (comment further down).

(I've never used the github review interface, so please let me know if I've made a mistake. Thanks)

src/assign.c Outdated
s2 = (char *)type2char(TYPEOF(thisvalue));
if (isReal(thisvalue)) s3="; may have truncated precision"; else s3="";
warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc.) to make your intent clear and for speed. A common source of this error is mismatch of return types when assigning by group, especially in edge cases. Another common pitfall comes from wrong assumptions about your table's column types; check print(x, class = TRUE) or sapply(x, class) to inspect your initial LHS types and be sure they're as expected.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1);
warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class).",
Copy link
Contributor

@franknarf1 franknarf1 Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Column types can be observed with print(x,class=TRUE) and sapply(x,class).

Some users won't know types for non-atomic classes (integer = IDate, factor; double = Date). Could change sapply(x,class) to sapply(x,typeof), but I'm not sure how to change the reference to print. Maybe some rewording or an extra sentence could work. Alternately, maybe print could have another option, like verbose = TRUE that prints maximal info (somewhat substituting for str and similar to tables()):

library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
setkey(DT, id)
setindex(DT, d, id)

print(DT, verbose = TRUE) # fake code
#       id          d
#    <int>     <IDat>
#               <int>
# 1:     1 2018-08-10
# 2:     2 2018-08-11
# 
# key: id
# indices: 
# - d, id

(... Also displaying new statistics from #2879)

Anyway, I guess the type/class distinction will only matter in rare cases, like...

library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)

DT[1, d := "1999-01-01"] 
# gets coerced to NA_integer_

DT[1, d := 1999-01-01] 
# worse, if forgetting quotes and class/type, it gets silently handled as an int

In contrast, sub-assigning a character to a factor works as expected.

@mattdowle mattdowle merged commit 375f3c7 into master Aug 10, 2018
2 checks passed
@mattdowle mattdowle deleted the type_warning branch Aug 10, 2018
@MichaelChirico
Copy link
Member Author

@MichaelChirico MichaelChirico commented Aug 15, 2018

Awesome! Much more thorough overhaul 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants