Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite cell/swath xarray readers as MultiFileHandlers #64

Draft
wants to merge 58 commits into
base: master
Choose a base branch
from

Conversation

claytharrison
Copy link
Collaborator

@claytharrison claytharrison commented Apr 24, 2024

This pull request aims to reimplement the reading/merging logic for swath and cell files in the structure established by MultiFileHandler/ChronFiles/etc in the file_handling module.

On this commit, readers for cell files are implemented (RaggedArray and OrthoMulti). The most basic method of operation goes something like:

from ascat.read_native.cell_collection import RaggedArrayFiles, OrthoMultiArrayFiles
contiguous_ra_source = "/path/to/contiguous/sig0_12.5/metop_a"
indexed_ra_source = "/path/to/indexed/sig0_12.5/metop_a"
multisat_ra_source = "/path/to/indexed/sig0_12.5/"
orthomulti_source = "/path/to/era5_land_2023/"
orthomulti_grid = "/path/to/era5_land_2023/grid.nc"

# amazon chunk
# you can also query by list of location_id, cell number, or lon/lat coords
bbox = (-7, -4, -69, -65)

contiguous_ra_files = RaggedArrayFiles(contiguous_ra_source, product_id="sig0_12.5") 
indexed_ra_files = RaggedArrayFiles(indexed_ra_source, product_id="sig0_12.5")

# right now we just use the "all_sats" parameter to indicate if the files are nested within metop_a/metop_b/metop_c directories underneath
# the root dir. This is of course not general or ideal.
multisat_ra_files = RaggedArrayFiles(multisat_ra_source, product_id="sig0_12.5", all_sats=True)

# for orthomulti right now you just pass the grid file path as an argument and it will generate a pygeogrids object from that.
# the product_id doesn't do anything in this case.
orthomulti_files = OrthoMultiArrayFiles(orthomulti_source, product_id="this_doesnt_matter_in_this_case", grid=orthomulti_grid)

# extract the data

contiguous_ra_ds = contiguous_ra_files.extract(bbox=bbox)
indexed_ra_ds = indexed_ra_files.extract(bbox=bbox)
# ^ these two should be the same, since contiguous RAs are converted to indexed before merging

multisat_ra_ds = multisat_ra_files.extract(bbox=bbox)

orthomulti_ds = orthomulti_files.extract(bbox=bbox)

To do:

  • Add swath file reader Finish swath file reader
  • Find a robust method of handling product-specific information like grids, etc., including a way for users to provide that themselves. For the cell reader we only really need to pass the grid, but for the swath reader this will get more complicated
  • Add ability to write out according to different cell scheme (any cell scheme)
  • Try integration with regrid applications, make sure that still works nicely.
  • Rename things better
  • whatever else is missing compared to the old version

@claytharrison
Copy link
Collaborator Author

claytharrison commented Apr 25, 2024

I added a basic Swath reader but nothing for handling specific products yet. For now you can steal the information for a given product from xarray_io.py.

It tries to implement a spatial filter for the results of the time-based file search, to relatively quickly exclude unnecessary swath files from reading and merging. The concept was graciously stolen from a script of Pavan's. It seems like it works but I haven't done proper testing yet.

Using it should go something like -

from ascat.read_native.swath_collection import SwathFile
from ascat.read_native.swath_collection import SwathGridFiles
from fibgrid.realization import FibGrid

swath_path = "tests/ascat_test_data/hsaf/h129/swaths"
grid = FibGrid(6.25)
sf = SwathGridFiles(
    swath_path,
    cls=SwathFile,
    fn_templ="W_IT-HSAF-ROME,SAT,SSM-ASCAT-METOP{sat}-6.25-H129_C_LIIB_{date}_{placeholder}_{placeholder1}____.nc",
    sf_templ={"year_folder": "{year}"},
    grid=grid,
    fn_read_fmt= lambda timestamp: {
        "date": timestamp.strftime("%Y%m%d*"),
        "sat": "[ABC]",
        "placeholder": "*",
        "placeholder1": "*"
    },
    sf_read_fmt = lambda timestamp:{
        "year_folder": {
            "year": f"{timestamp.year}"
        },
    },
)
files = sf.search_period(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    date_field_fmt="%Y%m%d%H%M%S"
)
bbox=(-90, -4, -70, 20)

merged_ds = sf.extract(
    datetime(2021, 1, 15),
    datetime(2021, 1, 30),
    bbox = bbox,
    date_field_fmt="%Y%m%d%H%M%S"
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants