-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Reading and writing Excel spreadsheets in the unified table interface #3744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Pandas has a |
Pandas actually relies on a package called |
On the other hand, they've done a lot of the hard work. Check this out:
So should be pretty easy to make a custom reader with that? On the other hand, writing might be harder. |
I would have said that writing is easier because in writing you get to pick the conventions (empty line below the column names? Units?), in reading you have to assume that things are formatted as you think they are. openpyxl and xlrd are closely related: http://www.python-excel.org/ |
I have not looked at the pandas code, but I just needed to read an Excel table and could do so in ~10 lines of code, so this should not be hard. I would not think that we save much by going via pandas, but I'll leave that for the author of a PR to decide. |
I've used |
On the other hand if we can easily avoid the indirection that's fine too. |
If this has not been solved, I would like to work on it. |
@sunilk747 the issue is still open, please go ahead and let us know if you need help at any point! |
@hamogu I have read the above comment's from which what I understand is the task here is to add a method to read excel sheet and create table from that. Correct me if I am wrong. |
@sunilk747 - that's correct, and as we discussed above, you can do this very easy by using pandas which already does the hard work of calling |
@astrofrog the |
@sunilk747 - |
@astrofrog From what I understand I come with this code segment, please have a look and tell me if I am going wrong.
Here the path variable holds the path to the excel sheet. |
@sunilk747 The best way to find out if it works, is to try it out. Let us know if you get stuck somewhere. We try to be helpful, but we are all contributing to astropy in our free time, so please try to go as far as you can yourself and then we'll review, comment and suggest improvements and try to help you out to solve any problems. |
With the improved interface to |
Alternatively one idea is to add a pandas 'reader', e.g. |
This is done in #8381, so adding support for excel is likely quite easy. |
tl;dr: look at #8381 and add support for the
pandas.read_excel
function and tests.The use of spreadsheet programs is not uncommon in astronomy. We, as a programming community sometimes tend to think of spreadsheet programs as an inferior way to view and manipulate data, but in fact
astropy.table.Table
by now (or after open PRs are merged) supports many features that are either directly borrowed from spreadsheet editors or at least can be found in spreadsheet editors as well: Output values can be formatted for displaying in a particular way, columns can be hidden from display, columns can have a certain data dtype (e.g. a time in mixin columns), columns can be calculated automatically based on values from other columns...Given how many people use spreadsheets and how often I get xlsx files from collaborartors by email it does not seem unreasonable to provide an xlsx reader and writer for the astropy unified table reader and writer interface like so
tab = astropy.table.Table.read('mysheet.xlsx')
Note that #1562 modified the csv reader so that
io.ascii
can read cvs files written by Microsoft Excel and similar spreadsheet programs, but more could be done if we could read those files directly. In particular, dates and times (difficult when exporting to a csv first) and the column formatting could be preserved (bonus points for hidden colums).Obviously, reading and writing a spreadsheet as a table requires some assumptions which need to be documented, e.g. column headers are in the first row, units (if any) are in the second row, the formatting of numbers is the same for all numbers in a column, the data is read from the active sheet if no sheet name is given, etc. and will never be feature complete (e.g. xlsx files can contain plots and graphics).
Yet, many tables conform to "reasonable" formatting.
Using openpyxl (https://openpyxl.readthedocs.org/en/latest/) as a dependency, this should not be too hard to implement for a set of basic features.
In my opinion, this has a low priority.
The text was updated successfully, but these errors were encountered: