You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for this parameter says the following:
strip.white: default is ‘TRUE’. Strips leading and trailing whitespaces
of unquoted fields. If ‘FALSE’, only header trailing spaces
are removed.
So, it appears that, according to the documentation, when the flag is false the white space should be left intact EXCEPT for the trailing whitespace in the headers (and it is unclear whether the documentation talks about all fields in the header, or just the last one). I'm not sure why such exception is warranted, but I tried to see how it works:
(1) It appears that the trailing whitespace in the header isn't removed after all (and neither does the leading whitespace):
> data.table::fread("A ,B \n1,2\n3,4\n", strip.white=F) -> f
> colnames(f)
[1] "A " "B "
> data.table::fread(" A, B\n1,2\n3,4\n", strip.white=F) -> f
> colnames(f)
[1] " A" " B"
I think this is good, just the documentation needs to be corrected.
(2) Now what about the data? It appears the flag is not respected when the data is numeric:
> data.table::fread("A,B,C,D\n 1.0 , 2 , x , true \n 3.7 , 4 , y\t, false \n", strip.white=F) -> f
> str(f)
Classes ‘data.table’ and 'data.frame': 2 obs. of 4 variables:
$ A: num 1 3.7
$ B: int 2 4
$ C: chr " x " " y\t"
$ D: logi TRUE FALSE
- attr(*, ".internal.selfref")=<externalptr>
Thus, it appears the flag only applies to character fields and is ignored for all others. I'm not sure whether this is the intentional behavior or not, but the documentation doesn't mention it at all...
(Update) Cross-checking with the documentation of read.csv, they mention the following:
strip.white: logical. Used only when ‘sep’ has been specified, and
allows the stripping of leading and trailing white space from
unquoted ‘character’ fields (‘numeric’ fields are always
stripped).
So it appears that the behavior of fread is (almost) consistent with that of read.csv, and then it's just the documentation issue. The only discrepancy in the behavior is that read.csv strips both spaces and tabs, while fread only spaces.
The text was updated successfully, but these errors were encountered:
The documentation for this parameter says the following:
So, it appears that, according to the documentation, when the flag is false the white space should be left intact EXCEPT for the trailing whitespace in the headers (and it is unclear whether the documentation talks about all fields in the header, or just the last one). I'm not sure why such exception is warranted, but I tried to see how it works:
(1) It appears that the trailing whitespace in the header isn't removed after all (and neither does the leading whitespace):
I think this is good, just the documentation needs to be corrected.
(2) Now what about the data? It appears the flag is not respected when the data is numeric:
Thus, it appears the flag only applies to character fields and is ignored for all others. I'm not sure whether this is the intentional behavior or not, but the documentation doesn't mention it at all...
(Update) Cross-checking with the documentation of
read.csv
, they mention the following:So it appears that the behavior of
fread
is (almost) consistent with that ofread.csv
, and then it's just the documentation issue. The only discrepancy in the behavior is thatread.csv
strips both spaces and tabs, whilefread
only spaces.The text was updated successfully, but these errors were encountered: