Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upImprove helpfulness of warning message during on-assignment type coer… #2989
Conversation
|
Maybe consider changing tone instead of removing the sentence? Like "Setting the column type as intended up front is the best way to get around this if your RHS has the correct type." or something. Fwiw, for me...
... was the most useful part of the message as a new user and probably something I wouldn't think of from reading the new message.
... gets at the point too indirectly and no longer contains a recommended fix. I don't find the twitter complaint ("it's rude because I know better already") convincing as an argument to remove it when other users might know less than they do. Also, the first pitfall you highlight
...has its own (error) message already:
Also, you refer to it as "this error", though it is a warning, which may be confusing. |
…r truncation and informs user the first truncated item.
|
Please @franknarf1 and @MichaelChirico review latest refinement before merge. (The drop down under reviewers won't let me pick either of you. Michael because you submitted the PR I guess and Frank because you're not a project member it seems?! But you can both add your reviews I believe.) |
|
Looks good, but the last sentence of the new message refers to class instead of type (comment further down). (I've never used the github review interface, so please let me know if I've made a mistake. Thanks) |
| s2 = (char *)type2char(TYPEOF(thisvalue)); | ||
| if (isReal(thisvalue)) s3="; may have truncated precision"; else s3=""; | ||
| warning("Coerced '%s' RHS to '%s' to match the column's type%s. Either change the target column ['%s'] to '%s' first (by creating a new '%s' vector length %d (nrows of entire table) and assign that; i.e. 'replace' column), or coerce RHS to '%s' (e.g. 1L, NA_[real|integer]_, as.*, etc.) to make your intent clear and for speed. A common source of this error is mismatch of return types when assigning by group, especially in edge cases. Another common pitfall comes from wrong assumptions about your table's column types; check print(x, class = TRUE) or sapply(x, class) to inspect your initial LHS types and be sure they're as expected.", s2, s1, s3, CHAR(STRING_ELT(names, coln)), s2, s2, LENGTH(VECTOR_ELT(dt,0)), s1); | ||
| warning("Coerced %s RHS to %s to match the type of the target column (column %d named '%s'). If the target column's type %s is correct, it's best for efficiency to avoid the coercion and create the RHS as type %s. To achieve that consider R's type postfix: typeof(0L) vs typeof(0), and typeof(NA) vs typeof(NA_integer_) vs typeof(NA_real_). You can wrap the RHS with as.%s() to avoid this warning, but that will still perform the coercion. If the target column's type is not correct, it's best to revisit where the DT was created and fix the column type there; e.g., by using colClasses= in fread(). Otherwise, you can change the column type now by plonking a new column (of the desired type) over the top of it; e.g. DT[, `%s`:=as.%s(`%s`)]. If the RHS of := has nrow(DT) elements, then the assignment is called a column plonk and is the way to change a column's type. Column types can be observed with print(x,class=TRUE) and sapply(x,class).", |
franknarf1
Aug 10, 2018
Contributor
Column types can be observed with print(x,class=TRUE) and sapply(x,class).
Some users won't know types for non-atomic classes (integer = IDate, factor; double = Date). Could change sapply(x,class) to sapply(x,typeof), but I'm not sure how to change the reference to print. Maybe some rewording or an extra sentence could work. Alternately, maybe print could have another option, like verbose = TRUE that prints maximal info (somewhat substituting for str and similar to tables()):
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
setkey(DT, id)
setindex(DT, d, id)
print(DT, verbose = TRUE) # fake code
# id d
# <int> <IDat>
# <int>
# 1: 1 2018-08-10
# 2: 2 2018-08-11
#
# key: id
# indices:
# - d, id
(... Also displaying new statistics from #2879)
Anyway, I guess the type/class distinction will only matter in rare cases, like...
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
DT[1, d := "1999-01-01"]
# gets coerced to NA_integer_
DT[1, d := 1999-01-01]
# worse, if forgetting quotes and class/type, it gets silently handled as an int
In contrast, sub-assigning a character to a factor works as expected.
Column types can be observed with print(x,class=TRUE) and sapply(x,class).
Some users won't know types for non-atomic classes (integer = IDate, factor; double = Date). Could change sapply(x,class) to sapply(x,typeof), but I'm not sure how to change the reference to print. Maybe some rewording or an extra sentence could work. Alternately, maybe print could have another option, like verbose = TRUE that prints maximal info (somewhat substituting for str and similar to tables()):
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
setkey(DT, id)
setindex(DT, d, id)
print(DT, verbose = TRUE) # fake code
# id d
# <int> <IDat>
# <int>
# 1: 1 2018-08-10
# 2: 2 2018-08-11
#
# key: id
# indices:
# - d, id
(... Also displaying new statistics from #2879)
Anyway, I guess the type/class distinction will only matter in rare cases, like...
library(data.table)
DT = data.table(id = 1:2, d = as.IDate(Sys.Date()) + 0:1)
DT[1, d := "1999-01-01"]
# gets coerced to NA_integer_
DT[1, d := 1999-01-01]
# worse, if forgetting quotes and class/type, it gets silently handled as an int
In contrast, sub-assigning a character to a factor works as expected.
|
Awesome! Much more thorough overhaul |
…cion.
Original tweet & some discussion here:
https://twitter.com/sarahbeeysian/status/1021359529789775872