New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting a cube to a Pandas dataframe #4526
Comments
Thanks for the insight @kaedonkers! There isn't much Pandas knowledge within team Iris, so if this is something you think would be valuable: do you think you or someone else knowledgeable could put up a PR? |
Thanks @trexfeathers - Yes it is something I think I can put up a PR for, but am a bit short on time at the moment. Once I've finished the project I'm currently on, I'll take a stab at it. |
If you know of any pandas users in the iris community who would want in on this then please let them know! I am no pandas power user but others may be |
@kaedonkers I've started looking at this... #4669 is a start on being more agnostic about the number of cube dimensions... |
Good further discussion with @trexfeathers today. A rough summary of our discussion:
|
In line with general Iris testing philosophy, the majority of the testing should be 'atomic' unit tests that have maximum focus and minimal run time*. The round-trip tests you describe make perfect integration tests, since doing a full object equality has a wider focus and longer run time vs unit tests. We could have one going (* within reason - no need to get wrapped up with mocking) The current testing module was written before we started the unit/integration distinction:
|
Some new thoughts on this matter: #4669 (review) |
Dask DataFrame doesn't yet support multiindexing (dask/dask#1493), so that's probably a non-starter for both of us? Seems there is plenty of Pandas functionality that doesn't work with a Dask |
@trexfeathers @hsteptoe Thank you both for following this up! Progress looks good, do you need any input from me? An example perhaps? Something I forgot to include in the original feature request is that Xarray and Pandas have this functionality in the form of |
Thanks @kaedonkers 😊 @hsteptoe has included a slew of useful examples in #4669, so you can rest easy. I'm just trying to find time to review it as soon as I can! |
Closed by #5074 |
✨ Feature Request
Return a "long" DataFrame which retains all the metadata of the cube (either by default or using kwarg
table="long"
).Motivation
Currently
iris.pandas.as_data_frame
turns a 2D cube (a specification not present in the documentation) into a "pivot table"/"wide table" DataFrame with one of the former dim coord values as the column names and the other dim coord values as the indices. The values of the DataArray at the centre of the cube become the table values.I would argue that this result is unexpected for Pandas users, not particularly useful and loses lots of metadata in the process.
Feature list
To keep track of ideas as I find them:
cube.coord(dimensions=[0/1])
as this throws unnecessary errors related to the presence of AuxCoordsProposed change
A better default behaviour would be to generate a "long" table in which all the coord values from the cube are in separate columns of the DataFrame, with the coord name as the name of the column. The DataArray values would be in another column named after the cube. Attributes could also be included as their own columns for maximum metadata retention, although this might want a toggle kwarg as it could clutter the resulting DataFrame.
This would also allow the conversion to handle more than 2D data (which should really be added to the current documentation as a requirement).
For example:
The text was updated successfully, but these errors were encountered: