-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Geometries Contribution (and github experimentation) #115
Changes from 8 commits
94875c5
c5be80f
54fc86c
177ed57
5f8c085
030ceb4
c0d8cc2
d351c95
958de45
e4ef4b2
9dab561
96005a1
f0cd0e4
cdd8fa0
894df8f
fd74bae
14c7ffb
9756c55
7768e33
9929a61
8c34e5f
c1a224a
a5882f4
e3ff90a
2ebb1df
b6316f4
34eda36
5666769
ef7c9cc
e3276e8
7975d39
91a74e4
edab7e8
d8ea261
90ac4a9
f15f449
64bde66
033cc25
16c9b01
6903131
7ce6b4a
591cb88
f9cef4a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,5 +54,4 @@ respect to the specified dimension. | |
| `variance` | __u^2^__ | Variance | ||
|=============== | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -544,3 +544,168 @@ data: // time coordinates translated to date/time format | |
"2000-8-1 6:00:00", "2000-9-1 6:00:00" ; | ||
---- | ||
==== | ||
|
||
[[geometries, Section 7.5, "Geometries"]] | ||
=== Geometries | ||
|
||
For many geospatial applications, data values are associated with a geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. | ||
Polygonal cells with an arbitrary number of vertices can be described using <<cell-boundaries>>, but in that case every cell must have the same number of vertices and must be single polygon rings. | ||
In contrast, each geometry may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be __multipart__ i.e include several disjoint parts. | ||
While line and point geometries don't describe an interval along a dimension as the traditional cell bounds described above do, they do describe the extent of a geometry or real-world feature so are included in this section. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JonathanGregory and @davidhassell - How's this? Not sure what words to use to be consistent with the rest of CF, but I think this is what we were getting at? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dblodgett-usgs I think this is looking good! Thanks. Some minor comments from my latest read
All the best, David There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @davidhassell Example 7.15 was meant in part to demonstrate that geometry could be encoded without associated data variables. For example, it shows that you would need a grid_mapping attribute on the geometry container variable if there isn't a data variable providing that attribute. Do you think there's merit in showing an example of geometry without a data variable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @twhiteaker I see, thanks. However, storing coordinates (a "domain") without a data variable is not currently allowed by CF, and I think that to introduce it would need a ticket and discussion in its own right - for example, how would you encode the "usual" regular lat-lon grid without a data variable to bind it all together? This is similar in principle to insisting that there are also representative coordinates for each cell --- bounds without coordinates are also not currently possible. All the best, David There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @davidhassell Seems like it's better to get this proposal accepted and take broader steps like standalone geometries later then. I'll get a revised example 7.15 into the pull request ASAP. |
||
The approach described here specifies how to encode such geometries following the pattern in **9.3.3 Contiguous ragged array representation** and attach them to variables in a way that is consistent with the cell bounds approach. | ||
|
||
All geometries are made up of one or more nodes. | ||
The geometry type specifies the set of topological assumptions to be applied to relate the nodes. | ||
For example, __multipoint__ and __line__ geometries are nearly the same except nodes are interpreted as being connected for lines. | ||
Lines and polygons are also nearly the same except that it should be assumed that the first and last node are connected. | ||
Note that CF does not require the first and last node to be identical but allows them to be coincident if desired. | ||
Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as __exterior__ or __interior__ polygons. | ||
Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry. | ||
|
||
Any data variable can be given a `geometry` attribute that indicates the geometry for the quantity held in the variable. | ||
One of the dimensions of the data variable must be the number of geometries to which the data applies. | ||
As shown in Example 7.14, if the data variable has a discrete sampling geometry, the number of geometries is the length of the instance dimension (Section 9.2). | ||
|
||
[["timeseries-with-geometry"]] | ||
[caption="Example 7.14. "] | ||
.Timeseries with geometry. | ||
==== | ||
---- | ||
dimensions: | ||
instance = 2 ; | ||
node = 5 ; | ||
time = 4 ; | ||
variables: | ||
int time(time) ; | ||
time:units = "days since 2000-01-01" ; | ||
double lat(instance) ; | ||
lat:units = "degrees_north" ; | ||
lat:standard_name = "latitude" ; | ||
lat:geometry = "geometry_container" ; | ||
double lon(instance) ; | ||
lon:units = "degrees_east" ; | ||
lon:standard_name = "longitude" ; | ||
lon:geometry = "geometry_container" ; | ||
int datum ; | ||
datum:grid_mapping_name = "latitude_longitude" ; | ||
datum:longitude_of_prime_meridian = 0.0 ; | ||
datum:semi_major_axis = 6378137.0 ; | ||
datum:inverse_flattening = 298.257223563 ; | ||
int geometry_container ; | ||
geometry_container:geometry_type = "line" ; | ||
geometry_container:node_count = "node_count" ; | ||
geometry_container:node_coordinates = "x y" ; | ||
int node_count(instance) ; | ||
double x(node) ; | ||
x:units = "degrees_east" ; | ||
x:standard_name = "longitude" ; | ||
x:cf_role = "geometry_x_node" ; | ||
double y(node) ; | ||
y:units = "degrees_north" ; | ||
y:standard_name = "latitude" ; | ||
y:cf_role = "geometry_y_node" ; | ||
double someData(instance, time) ; | ||
someData:coordinates = "time lat lon" ; | ||
someData:grid_mapping = "datum" ; | ||
someData:geometry = "geometry_container" ; | ||
// global attributes: | ||
:Conventions = "CF-1.8" ; | ||
:featureType = "timeSeries" ; | ||
data: | ||
time = 1, 2, 3, 4 ; | ||
lat = 30, 50 ; | ||
lon = 10, 60 ; | ||
someData = | ||
1, 2, 3, 4, | ||
1, 2, 3, 4 ; | ||
node_count = 3, 2 ; | ||
x = 30, 10, 40, 50, 50 ; | ||
y = 10, 30, 40, 60, 50 ; | ||
---- | ||
The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used. | ||
==== | ||
|
||
|
||
A __geometry container__ variable acts as a container for attributes that describe a set of geometries. | ||
The **`geometry`** attribute of the data variable contains the name of a geometry container variable. | ||
The geometry container variable must hold **`geometry_type`** and **`node_coordinates`** attributes. | ||
A **`grid_mapping`** attribute, which would usually by carried by a data variable, can be carried by the geometry container variable. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that we should leave this out for now, as there will always be a data variable on which to attach the I'm sure that this will come to pass eventually, but until then I think that it will be easier to accept this proposal without this feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be an unfortunate omission from the data model. I understand that CF typically assumes a file has data variables in addition to coordinate domains, but I really think inclusion of the What if we change the line to: (bold added to highlight change) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi David, I aggre that allowing "domains without data" would be a good enhancement to CF. The data model represents what CF is rather than what it should be, so when this is allowed in CF, we'll have to update the data model. (Perhaps the existing data model will help to inform us of a good way to proceed!) As a "A Perhaps this will act as a starting point for a future, separate discussion on storing domains without data arrays There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fine by me. Will implement the change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that that line has actually changed since the line you commented on. Going to change from: |
||
|
||
The **`geometry_type`** attribute indicates the type of geometry present. | ||
Its allowable values are: __point__, __multipoint__, __line__, __multiline__, __polygon__, __multipolygon__. | ||
The **`node_coordinates`** attribute contains the blank delimited names of the variables that contain geometry node coordinates (one variable for each spatial dimension). | ||
|
||
|
||
The geometry node coordinate variables must each have a **`cf_role`** attribute whose allowable values are __geometry_x_node__, __geometry_y_node__, and __geometry_z_node__. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re. cf_role : I originally didn't like this because a variable was already identified as being a node coordinate variable by virtue of it being named by a node_coordinates attribute of a geometry container variable. However, I subsequently realised that we need to know which of the named variables contains the X (or Y or Z) node coordinates. How about a metadata approach which says that the node coordinate variables must be described by axis attributes, and standard names are recommended:
As node coordinate variables are not [auxiliary] coordinate variables, there would be no confusion with the existing standarised use of the axis attribute. Coordinate variables which correspond to the node coordinates must be present should also be forced to have axis attributes with the usual meaning. There are two reasons for this - 1) for software that can not interpret the geometries, a representative coordinate location is still useful for analysis/plotting and 2) Allowing cells to be defined by bounds alone is new to CF and so, whilst that is something that we may want to explore in the future, I think that going down this route would only delay this ticket unnecessarily. What value the representative coordinates should have is entirely up to the creator. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since these are not "axes", but rather just a collection of nodes that form polygons, I don't think the axis pattern applies. We did discuss it a while back and decided use of cf_role is more appropriate. This is in line with the use of I did mean to have a representative lat and lon point for each geometry which I think is what you are getting at in your comments about coordinate variables? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that cf_role has only been used by DSGs for Re. representative lat and lon variables - that's what I had in mind, thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. I see now. Sorry for the confusion. A while back, I think we had used standard_name for this and were told that cf_role was more appropriate. I see what you mean about needing particular metadata to clarify and how |
||
The geometry node coordinate variables must all have the same single dimension, which is the total number of nodes in all the geometries. | ||
The nodes must be stored consecutively for each geometry and in the order of the geometries, and within each multipart geometry the nodes must be stored consecutively for each part and in the order of the parts. | ||
Polygon exterior rings must be put in anticlockwise order (viewed from above) and polygon interior rings in clockwise order. | ||
They are put in opposite orders to facilitate calculation of area and consistency with the typical implementation pattern. | ||
|
||
|
||
For all geometry types except __point__, in which each geometry contains a single node, when more than one geometry is present, the geometry container variable must have a **`node_count`** attribute that contains the name of a variable indicating the count of nodes per geometry. | ||
The node count is the total nodes in all the parts. | ||
|
||
|
||
For __multiline__, __multipolygon__, and __polygons__ with holes, the geometry container variable must have a **`part_node_count`** attribute indicates a variable of the count of nodes per geometry part. | ||
Note that because multipoint geometries always have a single node per part, the **`part_node_count`** is not required. | ||
The single dimension of the part node count variable should equal the total number of parts in all the geometries. | ||
|
||
For __polygon__ and __multipolygon__ geometries with holes, the geometry container variable must have an **`interior_ring`** attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With regards the difference between polygon and multipolygon, a use-case to consider is if the instance dimension (in the example given) is an unlimited dimension to allow data to come in gradually. In this case, you might not know in advance if each incoming-instance is a polygon or a multipolygon, so you would be forced to use multipolygon, even if some of the geometries consisted of just one polygon (!). So on those grounds, I don't think that the distinction between polygon and multipolygon is useful, and just polygon should suffice, I think. The same argument applies on multipoint and multiline. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disagree here. Saying that a geometry is a polygon (or multi) type when you declare it allows an implementation to make some pretty important assumptions about the data it is going to be working with, making the distinction very useful. The potential of an unlimited instance dimension, while possible, is an edge case as far as I'm concerned. Even if it were to be required, within a dataset, there would be a policy regarding support for multipolygons. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm afraid that I don't see how specifying multipolygon helps. It is possible that at least one cell of a multipolygon container contains a "non-multi" polygon, so the geometry for each cell needs to be ascertained by inspecting the part_node_count and interior_ring attributes, regardless. Is it right that a geometry is a multipolygon if it has two or more parts that are exterior rings, as determined by the interior_ring attribute or assumed to be exterior in this attribute's absence? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I'm not being very eloquent... There are a few things at play here. The So yes, it is right that:
but this is not quite the right interpretation.
That said, having the distinction between single and multi types is useful for two reasons (and probably more that I'm not thinking of).
Because of this, if we were to say, "just assume everything in CF is a multi geometry because the multi type is a super class of the single type", everyone would have to support the complexity of multi types. We would also run into situations where software that only works with single geometry types would be in a bind when opening and attempting to read or write data in CF. -- So at the end of the day, this is about being consistent and interoperable with practices in other similar encodings and specifications and playing nice with people who need to implement support for a NetCDF-CF encoding. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we combine polygon+multipolygon:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Re: 1. -- I think it's easy enough to assume multipart and if you don't get multipart, cool! So software can figure it out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good to me |
||
This interior ring variable should contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon. | ||
The single dimension of the interior ring variable should be the same dimension as that of the part node count variable. | ||
|
||
[[complete-multipolygon-example]] | ||
[caption="Example 7.15. "] | ||
.A multipolygon with holes | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about this for a smaller CDL multipolygon example that includes two features and a hole? First feature has two parts with first part including a hole. Second feature has one part.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Example with correct node order for exterior vs interior.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 I'll get it in here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... This example needs a data variable ... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dblodgett-usgs incorporated a data variable into the example. Scroll to the end of the latest version (as I am typing this) here: https://github.com/dblodgett-usgs/cf-conventions/blob/7768e33e7edff459482e8ef8057ea6b8e015c9eb/ch07.adoc |
||
==== | ||
This example demonstrates the use of all potential attributes and variables for encoding geometries. | ||
---- | ||
dimensions: | ||
node = 12 ; | ||
feature = 2 ; | ||
part = 4 ; | ||
variables: | ||
double x(node) ; | ||
x:units = "degrees_east" ; | ||
x:standard_name = "longitude" ; | ||
x:cf_role = "geometry_x_node" ; | ||
double y(node) ; | ||
y:units = "degrees_north" ; | ||
y:standard_name = "latitude" ; | ||
y:cf_role = "geometry_y_node" ; | ||
double lat(feature) ; | ||
lat:units = "degrees_north" ; | ||
lat:standard_name = "latitude" ; | ||
lat:geometry = "geometry_container" ; | ||
double lon(feature) ; | ||
lon:units = "degrees_east" ; | ||
lon:standard_name = "longitude" ; | ||
lon:geometry = "geometry_container" ; | ||
float geometry_container ; | ||
geometry_container:geometry_type = "multipolygon" ; | ||
geometry_container:node_count = "node_count" ; | ||
geometry_container:node_coordinates = "x y" ; | ||
geometry_container:grid_mapping = "crs" ; | ||
geometry_container:coordinates = "lat lon" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since no data variable, I've added coordinates to the geometry_container. Will add a little note about this where we say the grid_mapping can be put on the geometry container. |
||
geometry_container:part_node_count = "part_node_count" ; | ||
geometry_container:interior_ring = "interior_ring" ; | ||
int node_count(feature) ; | ||
int part_node_count(part) ; | ||
int interior_ring(part) ; | ||
float crs ; | ||
crs:grid_mapping_name = "latitude_longitude" ; | ||
crs:semi_major_axis = 6378137. ; | ||
crs:inverse_flattening = 298.257223563 ; | ||
crs:longitude_of_prime_meridian = 0. ; | ||
// global attributes: | ||
:Conventions = "CF-1.8" ; | ||
data: | ||
x = 20, 10, 0, 5, 10, 15, 20, 10, 0, 50, 40, 30 ; | ||
y = 0, 15, 0, 5, 10, 5, 20, 35, 20, 0, 15, 0 ; | ||
node_count = 9, 3 ; | ||
part_node_count = 3, 3, 3, 3 ; | ||
interior_ring = 0, 1, 0, 0 ; | ||
crs = 0. ; | ||
---- | ||
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonathanGregory and @davidhassell - we can now comment on each line individually as needed.