Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
ARROW-3962: [Go] Accept null values while reading CSV #3129
Fix the bug of CSV reader couldn't accept null values.
How to reproduce
Create a following CSV file:
After that run example function in csv_test.go, got following results.
The reason why stopping is csv.Reader got error while parsing empty string as a float64 (the error message is
@@ Coverage Diff @@ ## master #3129 +/- ## ========================================== + Coverage 87.02% 87.02% +<.01% ========================================== Files 495 495 Lines 69679 69686 +7 ========================================== + Hits 60640 60647 +7 Misses 8942 8942 Partials 97 97
2 times, most recently
Dec 10, 2018
left a comment
I've always disliked the automatic, by default, handling of missing values of
so I am personally a bit reluctant to have the same behaviour in Go-Arrow, although, here, it's easier to figure out this was a missing value and not a valid zero value.
this behaviour could be activated with a
yes. I think I am asking for the default behaviour of the CSV reader to be to fail early and loudly when encountering missing values.
I'd argue that dataset cleaning shouldn't be coupled to nor baked in the CSV reader.
For the record, the C++ CSV reader automatically recognizes null values in most data types. That is empty values, but also a bunch of conventional "null" notations listed here:
It may be better to have similar characteristics from one Arrow implementation to another.
It would not be unreasonable to recognize a conservative list of null markers by default, like "null" and "NULL". There is also the question of empty cells in columns that contain numeric data (with string columns, you probably want to distinguish empty string vs. null)