Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential contribution: Add xarray backends for fgout and fgmax files? #598

Open
kbarnhart opened this issue Apr 9, 2024 · 5 comments
Open

Comments

@kbarnhart
Copy link

I have a few postprocessing workflows that I built in old D-Claw that assumed you could read in fgout or fgmax-like files with xarray (I had built some conversion tools). Rather than rebuilt the conversion tools, I've written an xarray BackendEntrypoint so that I can open fgout or fgmax files with xarray.

for example, the following yields an xarray dataset (with coordinate system information correctly assigned) from a fgmax file.

import xarray as xr
ds = xr.open_dataset("fgmax0001.txt", engine=FGMaxBackend, backend_kwargs={"epsg": epsg})

Perhaps there are others that would be interested in this interface? (or not 🤷‍♀️ ?)

@rjleveque @mandli do you want to consider these backends as a contribution to the geoclaw.fgmax_tools and geoclaw.fgout_tools? I can make a PR if you are interested. However, I don't want to clutter up the library with code that may not be broadly used.

If you think these would be broadly usable, I suspect clawpack.geoclaw would be a more natural place for these backends to live rather than my in-development pre- and post-processing package for D-Claw.

Links:
https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html#rst-backend-entrypoint
https://docs.xarray.dev/en/stable/generated/xarray.backends.BackendEntrypoint.html#xarray.backends.BackendEntrypoint

@mandli
Copy link
Member

mandli commented Apr 9, 2024

@kbarnhart So the backend supplements what is in the current Python classes by directly importing it into an xarray data structure? Does it use code that is currently available? I am not super familiar with the tools that are in fgmax_tools and fgout_tools but I start to worry if we duplicate code functionality in terms of maintainability. Just to be clear, I am not suggesting we not use the code you have but maybe we should switch to it or build it in more directly.

@kbarnhart
Copy link
Author

@mandli - I think the best description is that the xarray backend uses the existing code to open the fixed grid or fgmax grids and return them in the format that allows xarray to use them. Each of the two classes looks something like:

from xarray.backends import BackendEntrypoint

class FGOutBackend(BackendEntrypoint):
    def open_dataset(
        self,
        filename,
        drop_variables=None,
    ):
        # Infer things about format from filename (e.g., binary or ascii)
        # Use code from fgout_tools.py to open filename fixed grid object.
        # Organize it into an xarray dataset with dimensions (time, y, x) 
        # (FGmax lacks the time dimension)
        # Optionally assign a coordinate system. 
        return dataset 

I don't think this duplicates anything because I didn't re-write anything related to the core file i/o... but because I don't know the wider usability, it is not clear to me whether this is something that makes sense to maintain within the geoclaw repository (e.g., it should live with my external tools).

To help explain my use case... I've been working on a tool that analyzes the energetics of landslide material and water. At core, it expects gridded data with a known dx and dy, that can be opened by xarray and has variable names of h, hu, hv, eta, etc.

I wanted this tool to be usable (by me) for the output of other models, so I didn't want to write it around the fixed grid format. Instead, I wrote it around expecting a netcdf with a specific set of variables.

@mandli
Copy link
Member

mandli commented Apr 10, 2024

Sounds perfect! We do this sometimes with other data structures and pandas data frames, which can be useful. The other big class that in theory would use pandas is GaugeSolution but upon further inspection it looks like it was never fully implemented. I know that we also use pandas for some of the storm surge stuff but that's a bit different.

I am wondering then what the right interface for this would be, if xarray in this instance should maintain the original data for the class or the class would hand back an xarray when requested.

@rjleveque
Copy link
Member

Sounds very useful as something to add to geoclaw. The one other place I know of where xarray is currently supported is in topotools.read_netcdf where a topo DEM can be returned as an xarray. It seems like in the long run we should support and use xarray much more broadly.

@kbarnhart
Copy link
Author

thanks for this feedback @mandli and @rjleveque . I'll aim to make a PR in the next week or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants