Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

astropy.io.ascii does not read pandas csv file correctly #6694

Open
astrofrog opened this issue Oct 9, 2017 · 6 comments
Open

astropy.io.ascii does not read pandas csv file correctly #6694

astrofrog opened this issue Oct 9, 2017 · 6 comments
Labels

Comments

@astrofrog
Copy link
Member

astrofrog commented Oct 9, 2017

Pandas by default will write CSV files where the header for the first column is missing:

In [4]: from pandas import DataFrame

In [5]: df = DataFrame()

In [6]: df['a'] = [1,2,3]

In [7]: df.to_csv('test.csv')

In [8]: %more test.csv
,a
0,1
1,2
2,3

Whether or not this is sensible is debatable, but this means there are a lot of CSV files in the wild missing the first header column name. Astropy doesn't read these in correctly though:

In [10]: from astropy.io.ascii import read

In [11]: read('test.csv')
Out[11]: 
<Table masked=True length=4>
 col1 col2
int64 str1
----- ----
   --    a
    0    1
    1    2
    2    3

I think we might want to special case this, or deal better with cases like this given how common these kinds of files are going to be.

@MSeifert04
Copy link
Contributor

Interesting question. The first column is the index, so it doesn't make sense to read it in as "normal column". On the other hand that raises the question whether pandas user should write the index column to the csv if they want to use the csv in other programs...

@drdavella
Copy link
Contributor

drdavella commented Oct 9, 2017

Doesn't the leading comma in the header row indicate that the first "column" is really the index? This means there should be a reliable way to detect this case. If instead you wrote the same file with df.to_csv('test.csv', index=False), then you should just see

In [8]: %more test.csv
a
1
2
3

@pllim
Copy link
Member

pllim commented Oct 9, 2017

At the same time, should pandas fix this on their side too?

@drdavella
Copy link
Contributor

drdavella commented Oct 9, 2017

@pllim, I don't think it's a bug. I think it's the way that pandas indicates that the first column in a csv is an index, not a real data column. It's possible to give the index column a name, but I believe it's None by default. See the docs here:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

@astrofrog
Copy link
Member Author

I've already seen examples in the wild where people are trying to read in these files with Astropy and failing. I know it's frustrating but I do think we should support this 'format'

@drdavella
Copy link
Contributor

I could volunteer to look into this since I think I'd probably learn something new/useful. But if someone else already has a handle on a fix, that's okay too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants