astropy.io.ascii does not read pandas csv file correctly #6694

astrofrog · 2017-10-09T11:04:44Z

Pandas by default will write CSV files where the header for the first column is missing:

In [4]: from pandas import DataFrame

In [5]: df = DataFrame()

In [6]: df['a'] = [1,2,3]

In [7]: df.to_csv('test.csv')

In [8]: %more test.csv
,a
0,1
1,2
2,3

Whether or not this is sensible is debatable, but this means there are a lot of CSV files in the wild missing the first header column name. Astropy doesn't read these in correctly though:

In [10]: from astropy.io.ascii import read

In [11]: read('test.csv')
Out[11]: 
<Table masked=True length=4>
 col1 col2
int64 str1
----- ----
   --    a
    0    1
    1    2
    2    3

I think we might want to special case this, or deal better with cases like this given how common these kinds of files are going to be.

MSeifert04 · 2017-10-09T11:10:20Z

Interesting question. The first column is the index, so it doesn't make sense to read it in as "normal column". On the other hand that raises the question whether pandas user should write the index column to the csv if they want to use the csv in other programs...

drdavella · 2017-10-09T14:38:56Z

Doesn't the leading comma in the header row indicate that the first "column" is really the index? This means there should be a reliable way to detect this case. If instead you wrote the same file with df.to_csv('test.csv', index=False), then you should just see

In [8]: %more test.csv
a
1
2
3

pllim · 2017-10-09T14:41:32Z

At the same time, should pandas fix this on their side too?

drdavella · 2017-10-09T14:46:24Z

@pllim, I don't think it's a bug. I think it's the way that pandas indicates that the first column in a csv is an index, not a real data column. It's possible to give the index column a name, but I believe it's None by default. See the docs here:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

astrofrog · 2017-10-09T14:48:27Z

I've already seen examples in the wild where people are trying to read in these files with Astropy and failing. I know it's frustrating but I do think we should support this 'format'

drdavella · 2017-10-09T14:51:19Z

I could volunteer to look into this since I think I'd probably learn something new/useful. But if someone else already has a handle on a fix, that's okay too.

astrofrog added the io.ascii label Oct 9, 2017

drdavella mentioned this issue Oct 12, 2017

Recognize pandas CSV files with empty index labels #6723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

astropy.io.ascii does not read pandas csv file correctly #6694

astropy.io.ascii does not read pandas csv file correctly #6694

astrofrog commented Oct 9, 2017 •

edited

MSeifert04 commented Oct 9, 2017

drdavella commented Oct 9, 2017 •

edited

pllim commented Oct 9, 2017

drdavella commented Oct 9, 2017 •

edited

astrofrog commented Oct 9, 2017

drdavella commented Oct 9, 2017

astropy.io.ascii does not read pandas csv file correctly #6694

astropy.io.ascii does not read pandas csv file correctly #6694

Comments

astrofrog commented Oct 9, 2017 • edited

MSeifert04 commented Oct 9, 2017

drdavella commented Oct 9, 2017 • edited

pllim commented Oct 9, 2017

drdavella commented Oct 9, 2017 • edited

astrofrog commented Oct 9, 2017

drdavella commented Oct 9, 2017

astrofrog commented Oct 9, 2017 •

edited

drdavella commented Oct 9, 2017 •

edited

drdavella commented Oct 9, 2017 •

edited