New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-14644 [R] open_dataset doesn't ignore BOM in csv file #11871
ARROW-14644 [R] open_dataset doesn't ignore BOM in csv file #11871
Conversation
|
I've found the particular line where things are going wrong, but I'm not sure how to fix. Column "a" isn't being added to convert options, as it is somehow triggering the condition on line 120. arrow/cpp/src/arrow/dataset/file_csv.cc Lines 117 to 120 in 9cf4275
I'm pretty baffled as it seems like the field name is clearly in the column set. Here's my debugger output from a breakpoint at line 120:
(I altered the test so it just contains column "a" to make it easier to grab the value in the set; I get similar results with the original test that has both columns.) I'm fairly sure this is the root problem because when I run |
Found it: Looks like
|
@dragosmg Would you be willing to give me permission to push to your repo? I have a fix in C++ I can push to this branch. |
This draft PR only adds a failing unit test for the failure to skip BOMs when reading a CSV file with
open_dataset()
.