Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataframe converter #9

Merged
merged 9 commits into from
Feb 13, 2024
Merged

Dataframe converter #9

merged 9 commits into from
Feb 13, 2024

Conversation

alxmrs
Copy link
Owner

@alxmrs alxmrs commented Feb 11, 2024

This change provides a few humble functions to try to adapt the Xarray model to Dask's dataframe model. The conversion is more or less an itertools.product and index operation. The translation to dataframes honor's Xarray's chunks.

There area a few next steps. From here, we can add dask-sql to this module and see how it works on real SQL queries.

Warning: this is an untested sketch!

This change provides a few humble functions to try to adapt the Xarray model to Dask's dataframe model. The conversion is more or less an itertools.product and index operation. The translation to dataframes honor's Xarray's chunks. I've copied weather-tools' `ichunked` function just in case we need that layer of chunking of iterables (it's not used now).

There area a few next steps. From here, we can write unit tests to prove out the conversion to Dask Dataframe. Further, we can then add dask-sql to this module and see how it works on real SQL queries. I'm pretty sure before applying `unravel` to `form_map`, we'll need to convert the output to a Pandas dataframe.
@alxmrs
Copy link
Owner Author

alxmrs commented Feb 11, 2024

CC: @mahrsee1997 -- Feel free to contribute to this branch if this is interesting to you!

It lacks support for dataframe features. Nor is it performant. But at a base level, we can convert Xarray datasets to dataframes.
1. Compute slices late
2. Adding `divisions` metadata (via an integer index).
@alxmrs alxmrs changed the title [Draft] An untested POC for #8. [Draft] An POC for #8. Feb 13, 2024
@alxmrs alxmrs changed the title [Draft] An POC for #8. [Draft] A POC for #8. Feb 13, 2024
@alxmrs alxmrs changed the title [Draft] A POC for #8. Dataframe converter Feb 13, 2024
@alxmrs alxmrs merged commit c75c10b into main Feb 13, 2024
@alxmrs alxmrs deleted the product branch February 13, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant