Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Swath Data in CF #269

Open
ajelenak opened this issue May 14, 2020 · 19 comments
Open

Support Swath Data in CF #269

ajelenak opened this issue May 14, 2020 · 19 comments
Assignees
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@ajelenak
Copy link

ajelenak commented May 14, 2020

Title

Include Swath Data Encodings in the CF Document

Moderator

@erget

Moderator Status Review [last updated: 2020/05/15]

Awaiting a PR implementing the text changes to the Conventions. As stated below othis proposal has been reviewed

  • at least twice by the CF Community at community meetings
  • by CGMS WGI
    possibly by other groups as well. Thus the general approach seems good but of course the final text will need thorough review.

Requirement Summary

This proposal was presented at several past CF workshops during the course of its development. It has also been vetted by a number of subject matter experts. Given that it does not require any change of the data model or the conventions current text, it probably would fit best as a new appendix, similar to the current Appendix H for Discrete Sampling Geometries.

Technical Proposal Summary

Earth Science swath data originates as electromagnetic radiation collected from a specific direction into a solid angle and then measured at a number of electromagnetic spectrum intervals. The combination of the direction, the solid angle, and the instrument data acquisition settings defines one observation. At any given instant an instrument sweeps over an area of the Earth while its platform (an object carrying such instrument) moves. Successive observations are usually combined to cover a larger portion of the Earth. When these successive observations are plotted on maps they appear to cover a swath on the Earth’s surface, hence the name for this type of data. The proposed encodings are independent from the observation method and are applicable to swath data acquired by instruments on either satellites, airplanes, or unmanned aerial vehicles (UAV).

Benefits

All providers and users of remotely sensed geoscience data from satellites, airplanes, or UAVs.

Status Quo

Swath data remains unsupported by the CF conventions.

Detailed Proposal

The proposal is at https://github.com/Unidata/EC-netCDF-CF/blob/master/swath/swath.adoc. The new text for the conventions document will be based on the content in Sections 2.3 through 2.6.

@ajelenak ajelenak added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label May 14, 2020
@erget
Copy link
Member

erget commented May 15, 2020

@ajelenak I think this is a good idea. Since nobody's volunteered yet I'm happy to moderate the discussion but it should be noted that I am clearly in favour of it so it would be more interesting to have somebody with more concerns. At a minimum I'd like to involve some people who can throw a cautious eye on it once you've got a PR ready.

@erget erget self-assigned this May 15, 2020
@JonathanGregory
Copy link
Contributor

I'd like to review this but I haven't had time yet. Thanks for compiling this carefully written proposal. Jonathan

@erget
Copy link
Member

erget commented Jun 5, 2020

@ajelenak it looks like nobody's found glaring errors yet ;) would you mind putting together a PR proposing the changes to the text so that we can begin the discussion / approval process?

@hilawe
Copy link

hilawe commented Mar 4, 2021

@ajelenak this is an excellent and thorough effort. At NCEI we may have more examples for you to consider that could help with further definitions. I will let my colleagues know and see what they think, if it's not too late.

Thank you for your service to the community.

@gaochen-larc
Copy link

gaochen-larc commented Apr 10, 2023

This is very helpful! Thank you @ajelenak !

For a follow-up question, I am dealing with data like the TRMM case. Should I use different groups to separate data with different dimensions? For example,

Group: lores
lat(time, samp_lo)
lon(time, samp_lo)

Group: hires
lat(time, samp_hi)
lon(time, samp_hi)

Any suggestions?

Thanks!

@lupemba
Copy link

lupemba commented Dec 19, 2023

@ajelenak
I just had a brief look at the document. The text seems focused on passive imagers. Is the Swath data format also applicable for radar data e.g. SAR or Scatterometers?

@davidhassell
Copy link
Contributor

Hi - we have a new project starting in 2024 that will involves, as part of it, an extraction of data from climate and NWP models for comparison with L2 swaths, on the grid of the latter. Therefore my interest is piqued and I look forward to reviewing the proposal.

Thanks,
David

@ajelenak
Copy link
Author

@lupemba Yes, SAR and scatterometer instruments were taken into account during development. Do you have a specific example we could test with?

@hilawe
Copy link

hilawe commented Dec 19, 2023

For a follow-up question, I am dealing with data like the TRMM case. Should I use different groups to separate data with different dimensions? For example,

Group: lores lat(time, samp_lo) lon(time, samp_lo)

Group: hires lat(time, samp_hi) lon(time, samp_hi)

@ajelenak I wanted to piggyback on this TRMM comment and provide a sample Passive Microwave Climate Data Record file that should cover what @gaochen-larc brought up. This should cover the maximum complexity of satellite swath files.

Thank you!

@lupemba
Copy link

lupemba commented Dec 20, 2023

@ajelenak,
The upcoming EPS-SG program is aiming to follow the CF format (where it is practicable). The test data can be found on this site and include test data for SCA (the new scatterometer) https://www.eumetsat.int/eps-sg-user-test-data
The backscatter in the test data is mostly noise and have a lot of missing values but the geometry of the swath should be close to what is expected for the real data.

I would recommend looking at SCA-1B-SZF and SCA-1B-SFR. The SZF is the full resolution where each beam has its own geometry. In the SZR the backscatter is resampled to produce 5 collocated measurements on one swath (two if you want to split the left and right side.).
Note that the swath is quite different for rotating fan beam scatterometers. I don't have any example of this.

image

@ajelenak
Copy link
Author

ajelenak commented Jan 3, 2024

@lupemba I got some sample SCA L1B files from the link you provided. The content in the SZF file /data group is in one of the possible swath formats. However, the content in the SZR file /data group is not. Below are several variables from that group in one SCA-1B-SZR file to illustrate how data are organized:

  group: data {
    dimensions:
      number_beams = 5;
      number_points = 3392;
    variables:
      double time(number_points=3392);
        :long_name = "time associated with each point";
        :units = "UTC seconds since 2020-01-01 00:00:00.000";

      int backscatter(number_points=3392, number_beams=5);
        :long_name = "backscatter coefficient (also known as NRCS or sigma0) obtained by spatial averaging the full resolution data around the grid point for the fore VV, mid VV, aft VV, mid HH and mid cross-pol channels";
        :units = "dB";
        :missing_value = -2147483648; // int
        :scale_factor = 1.0E-7; // double
        :add_offset = 0.0; // double

      int latitude(number_points=3392);
        :long_name = "geodetic latitude";
        :units = "degrees_north";
        :missing_value = -2147483648; // int
        :valid_min = -90000000; // int
        :valid_max = 89999999; // int
        :scale_factor = 1.0E-6; // double
        :add_offset = 0.0; // double

      int longitude(number_points=3392);
        :long_name = "longitude";
        :units = "degrees_east";
        :missing_value = -2147483648; // int
        :valid_min = -180000000; // int
        :valid_max = 179999999; // int
        :scale_factor = 1.0E-6; // double
        :add_offset = 0.0; // double
      uint line_index(number_points=3392);
        :long_name = "absolute grid index in along track";

      // ...

      uint line_index(number_points=3392);
        :long_name = "absolute grid index in along track";

      short node_index(number_points=3392);
        :long_name = "grid index in across track (far left swath to far right swath)";
        :valid_max = 53S; // short
        :valid_min = -53S; // short

      // ...
  }

This is not swath format because the backscatter, longitude, and latitude data are spatially stored as 1D. There is nothing wrong with this data organization but in CF this is very similar to a trajectory.

@ajelenak
Copy link
Author

ajelenak commented Jan 3, 2024

@hilawe Yes, your sample file is very "busy" but it is compliant with the swath proposal.

@lupemba
Copy link

lupemba commented Jan 3, 2024

This is not swath format because the backscatter, longitude, and latitude data are spatially stored as 1D. There is nothing wrong with this data organization but in CF this is very similar to a trajectory.

@ajelenak, I am happy to hear that the SZF format is compliant.
The SZR data is actually also on a grid of 106 nodes x 32 lines (53 nodes for each side). The data have just been flatten to a 1D array for the netCDF format.
I can try to ask around to hear why 1D longitude, and latitude where chosen over a 2D grid.
What is the benefits of having the data stored as swaths?

@semmerson
Copy link

semmerson commented Jan 3, 2024 via email

@ajelenak
Copy link
Author

ajelenak commented Jan 3, 2024

Also, I suggest to add an appropriate coordinate for the number_beams dimension with alphanumeric identifiers of the five beams.

@davidhassell
Copy link
Contributor

Hi @ajelenak and all,

I've read through the document a couple of times, now. I think that it's very clear and compreshensive, and the numerous examples are great - thank you!

@davidhassell
Copy link
Contributor

davidhassell commented Jan 4, 2024

(carrying on having pressed "send" too early ...)

I'd like to make a couple of general comments:

Dimension order

There are multiple occasions where there are dimensions that do not have corresponding 1-d (auxiliary) coordinate variables. E.g. atrack, xtrack, ncols, nrows, FOR, obs, scan, etc. Sometimes there order is specified ("with the slowest varying dimension representing forward (along-track) movement of the platform"), other times not, and sometimes (2.4.2. Multiband Image) it seems to imply that the order can be swapped. The dimension order is crucial for correct interpretation, so shouldn't the convention be strict and explicit about not only the dimension order, but also how to identify these dimensions?

For instance, in Example 9 we have float lat(time, FOR, obs) ; we know that time is time because it also has coordinate variable, but I don't know what FOR and obs represent.

EDIT: I do know what FOR and obs represent physically from the text, but I meant that there is nothing to distiguish them in the example file.

This is also an unsolved problem for the storage or tripolar ocean grids, for which there is no indication which dimension is "x" and which is "y", yet that information is needed to correctly manipulate the data.

Is this a CF convention?

The proposal describes how the CF conventions can be used to describe swath products using existing functionality. As such it seems more of a profile of CF use rather than an extension to the conventions themselves. Would it be better to maintain them separately and reference them from the Conventions attribute, something like Conventions = "CF-1.11 CF-swath-1.0"?

Thanks,
David

@lupemba
Copy link

lupemba commented Jan 4, 2024

Thanks for all the inputs. I hope that I do not hijack this discussion with SCA data. Maybe another forum/thread would be more suited for this kind of discussion.

@semmerson

If the "UTC" is a suffix rather than a prefix, then parsing will work

This has already been raised at EUMETSAT and has been updated for the next release of the test data.

Unreferenced, logarithmic units aren't supported. Referenced units are,

Normalized radar cross-section (NRCS) is a unitless parameter. The units of radar cross-section is m^2 and when it is normalized by the area of the target it becomes unitless.

@ajelenak

Also, I suggest to add an appropriate coordinate for the number_beams dimension with alphanumeric identifiers of the five beams.
I will bring this suggestion forward. It would be something like

variables:
    string beam(number_beams) ;
         beam:standard_name = "sensor_beam_identifier";

With the names being ["forVV", "midVV", "aftVV", "midHH", "midXX"].
I will also suggest renaming beam to band to better fit the convention but this is a bigger changes to the format.

@taylor13
Copy link

taylor13 commented Jan 4, 2024

If indeed no new attributes are needed, I would agree with #269 (comment) that this might better be documented as a profile (similar to the externally documented specifications for "cmorizing" CMIP data). Perhaps a simple example of how to handle swath data could be included with reference to the more detailed information elsewhere. [I should admit that I have not carefully studied the proposal, so hope I haven't missed some critical new extension to CF that is being proposed.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

9 participants