Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv behavior with blank lines differs between CSV deliimters #2883

Closed
ultrabug opened this issue Oct 31, 2018 · 5 comments
Closed

read_csv behavior with blank lines differs between CSV deliimters #2883

ultrabug opened this issue Oct 31, 2018 · 5 comments

Comments

@ultrabug
Copy link

Hi,

I was playing with pyarrow.csv read_csv and found a rather strange behavior that I'm not sure is normal.

Parsing will fail if the delimiter of the CSV file is a comma and there's a blank line after the header (see basic_with_blank.csv example)

Example output:

Traceback (most recent call last):
  File "sorrow.py", line 14, in <module>
    table = pa_csv.read_csv(csv)
  File "pyarrow/_csv.pyx", line 198, in pyarrow._csv.read_csv
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 2 columns, got 1

If I change the CSV delimiter to semicolon, the error disappears and everything is fine!

I'm providing python code and CSV samples which compares with pandas (which does not suffer from this).

Hope this helps, thanks

csv_parse_error.zip

@wesm
Copy link
Member

wesm commented Oct 31, 2018

@ultrabug this part of the project is very Alpha so expect to find such issues. Thanks for reporting, can you open a JIRA issue?

@ultrabug
Copy link
Author

ultrabug commented Nov 4, 2018

Hi Wes, thanks for your answer. I'll open a JIRA issue then!

I will reference the JIRA issue here and close the issue right after

@ultrabug
Copy link
Author

ultrabug commented Nov 4, 2018

@ultrabug ultrabug closed this as completed Nov 4, 2018
@qmilangowin
Copy link

Hi, has this issue really been fixed? I'm getting the same error:

  File "foo.py", line 7, in <module>
    table = pa_csv.read_csv('hacker_stories.csv')
  File "pyarrow/_csv.pyx", line 198, in pyarrow._csv.read_csv
  File "pyarrow/error.pxi", line 81, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: CSV parse error: Expected 12 columns, got 16

I'm using version 0.11.1

@wesm
Copy link
Member

wesm commented Jan 22, 2019

Please update to 0.12.0 and let us know if it still doesn't work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants