# Geometries Contribution (and github experimentation) #115

Merged
merged 43 commits into from Nov 14, 2018
Commits
Filter file types
Failed to load files and symbols.
+165 −1

#### Just for now

 @@ -54,5 +54,4 @@ respect to the specified dimension. | `variance` | __u^2^__ | Variance |===============
@@ -544,3 +544,164 @@ data: // time coordinates translated to date/time format
"2000-8-1 6:00:00", "2000-9-1 6:00:00" ;
----
====

[[spatial-geometries, Section 7.5, "Spatial Geometries"]]

#### JonathanGregory Jun 14, 2017

Contributor

Could we call it just "Geometries" (without "Spatial"), as you have done elsewhere? Or "Simple geometries", like you originally called it?

#### dblodgett-usgs Jun 14, 2017

Contributor

Yeah. I missed this. Will call it geometries.

=== Geometries

For many geospatial applications, data values are associated with a geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed.

#### dblodgett-usgs Jun 8, 2017

Contributor

@JonathanGregory and @davidhassell - we can now comment on each line individually as needed.

Polygonal cells with an arbitrary number of vertices can be described using <<cell-boundaries>>, but in that case every cell must have the same number of vertices.
In contrast, each geometry associated with a given data variable may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be __multipart__ i.e include several disjoint parts.

#### davidhassell Jun 15, 2017

Contributor

It would be good to point out that points and lines and are not really "cell bounds" in the chapter 7.1 sense, but they describe "cell" extent in a different way. Perhaps adding something about "line and point geometries describe cells that don't have an area in the traditional, polygonal sense" ?

#### dblodgett-usgs Jun 15, 2017

Contributor

Makes sense. Will work something in.

The approach described here specifies how to encode such geometries following the pattern in **9.3.3 Contiguous ragged array representation** and attach them to variables in a way that is consistent with the cell bounds approach.

All geometries are made up of one or more nodes.
The geometry type specifies the set of topological assumptions to be applied to relate the nodes.
For example, __multipoint__ and __line__ geometries are nearly the same except nodes are interpreted as being connected for lines.
Lines and polygons are also nearly the same except that it should be assumed that the first and last node are connected.
Note that CF does not require the first and last node to be identical but allows them to be coincident if desired.
Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as __exterior__ or __interior__ polygons.
Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry.

Any data variable can be given a `geometry` attribute that indicates the geometry for the quantity held in the variable.
One of the dimensions of the data variable must be the number of geometries to which the data applies.
As shown in Example 7.14, if the data variable has a discrete sampling geometry, the number of geometries is the length of the instance dimension (Section 9.2).

[["timeseries-with-geometry"]]
[caption="Example 7.14. "]
.Timeseries with geometry.
====
----
dimensions:
instance = 2 ;
node = 5 ;
time = 4 ;
variables:
int time(time) ;
time:units = "days since 2000-01-01" ;
double lat(instance) ;
lat:units = "degrees_north" ;
lat:standard_name = "latitude" ;
lat:geometry = "geometry_container" ;
double lon(instance) ;
lon:units = "degrees_east" ;
lon:standard_name = "longitude" ;
lon:geometry = "geometry_container" ;
int datum ;
datum:grid_mapping_name = "latitude_longitude" ;
datum:longitude_of_prime_meridian = 0.0 ;
datum:semi_major_axis = 6378137.0 ;
datum:inverse_flattening = 298.257223563 ;
int geometry_container ;
geometry_container:geometry_type = "line" ;
geometry_container:node_count = "node_count" ;
geometry_container:node_coordinates = "x y" ;
int node_count(instance) ;
double x(node) ;
x:units = "degrees_east" ;
x:standard_name = "longitude" ;
x:cf_role = "geometry_x_node" ;
double y(node) ;
y:units = "degrees_north" ;
y:standard_name = "latitude" ;
y:cf_role = "geometry_y_node" ;
double someData(instance, time) ;
someData:coordinates = "time lat lon" ;
someData:grid_mapping = "datum" ;
someData:geometry = "geometry_container" ;
// global attributes:
:Conventions = "CF-1.8" ;
:featureType = "timeSeries" ;
data:
time = 1, 2, 3, 4 ;
lat = 30, 50 ;
lon = 10, 60 ;
someData =
1, 2, 3, 4,
1, 2, 3, 4 ;
node_count = 3, 2 ;
x = 30, 10, 40, 50, 50 ;
y = 10, 30, 40, 60, 50 ;
----
The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used.
====

A __geometry container__ variable acts as a container for attributes that describe a set of geometries.
The **`geometry`** attribute of the data variable contains the name of a geometry container variable.
The geometry container variable must hold **`geometry_type`** and **`node_coordinates`** attributes.
A **`grid_mapping`** attribute, which would usually by carried by a data variable, can be carried by the geometry container variable.

#### davidhassell Jul 12, 2017

Contributor

I think that we should leave this out for now, as there will always be a data variable on which to attach the `grid_mapping` attribute. To implement this I think that there would have to be a larger, more general discussion on storing "domains" (in the CF data model use of the word) without data.

I'm sure that this will come to pass eventually, but until then I think that it will be easier to accept this proposal without this feature.

#### dblodgett-usgs Jul 13, 2017

Contributor

This would be an unfortunate omission from the data model. I understand that CF typically assumes a file has data variables in addition to coordinate domains, but I really think inclusion of the `grid_mapping` attribute in the geometry container variable would be helpful.

What if we change the line to: (bold added to highlight change)
A `grid_mapping` attribute, which must also by carried by a data variable, can be carried by the geometry container variable.

#### davidhassell Jul 13, 2017

Contributor

Hi David,

I aggre that allowing "domains without data" would be a good enhancement to CF. The data model represents what CF is rather than what it should be, so when this is allowed in CF, we'll have to update the data model. (Perhaps the existing data model will help to inform us of a good way to proceed!)

As a `grid_mapping` is optional, how about:

"A `grid_mapping` attribute can be carried by the geometry container variable provided it is also carried by the data variables associated with the container."
[conformance will have to check that these are the same]

Perhaps this will act as a starting point for a future, separate discussion on storing domains without data arrays

#### dblodgett-usgs Jul 13, 2017

Contributor

Fine by me. Will implement the change.

#### dblodgett-usgs Jul 13, 2017

Contributor

Note that that line has actually changed since the line you commented on. Going to change from:
"A `grid_mapping` and `coordinates` attribute, which would usually by carried by a data variable, can be carried by the geometry container variable."
to
"The `grid_mapping` and `coordinates` attributes can be carried by the geometry container variable provided they are also carried by the data variables associated with the container."

The **`geometry_type`** attribute indicates the type of geometry present.
Its allowable values are: __point__, __multipoint__, __line__, __multiline__, __polygon__, __multipolygon__.
The **`node_coordinates`** attribute contains the blank delimited names of the variables that contain geometry node coordinates (one variable for each spatial dimension).

The geometry node coordinate variables must each have a **`cf_role`** attribute whose allowable values are __geometry_x_node__, __geometry_y_node__, and __geometry_z_node__.

#### davidhassell Jun 15, 2017

Contributor

Re. cf_role : I originally didn't like this because a variable was already identified as being a node coordinate variable by virtue of it being named by a node_coordinates attribute of a geometry container variable. However, I subsequently realised that we need to know which of the named variables contains the X (or Y or Z) node coordinates.

How about a metadata approach which says that the node coordinate variables must be described by axis attributes, and standard names are recommended:

``````int geometry_container ;
geometry_container:geometry_type = "line" ;
geometry_container:node_count = "node_count" ;
geometry_container:node_coordinates = "x y" ;
double x(node) ;
x:units = "degrees_east" ;
x:standard_name = "longitude" ;
x:axis = "X" ;
double y(node) ;
y:units = "degrees_north" ;
y:standard_name = "latitude" ;
y:axis = "Y" ;
``````

As node coordinate variables are not [auxiliary] coordinate variables, there would be no confusion with the existing standarised use of the axis attribute.

Coordinate variables which correspond to the node coordinates must be present should also be forced to have axis attributes with the usual meaning. There are two reasons for this - 1) for software that can not interpret the geometries, a representative coordinate location is still useful for analysis/plotting and 2) Allowing cells to be defined by bounds alone is new to CF and so, whilst that is something that we may want to explore in the future, I think that going down this route would only delay this ticket unnecessarily.

What value the representative coordinates should have is entirely up to the creator.

#### dblodgett-usgs Jun 15, 2017

Contributor

Since these are not "axes", but rather just a collection of nodes that form polygons, I don't think the axis pattern applies. We did discuss it a while back and decided use of cf_role is more appropriate. This is in line with the use of `cf_role: projection_x_coordinate` etc. when using projected coordinate variables.

I did mean to have a representative lat and lon point for each geometry which I think is what you are getting at in your comments about coordinate variables?

#### davidhassell Jun 16, 2017

Contributor

I think that cf_role has only been used by DSGs for `timeseries_id`, `profile_id`, and `trajectory_id` variables which are not otherwise linked to their data variable. A projected coordinate variable has a `standard_name` of `projected_[xy]_coordinate`. My main point is that the node coordinate variable are linked to their data variable so their role is not in question, but they still require particular metadata to understand them (just like a latitude coordinate variable must have certain metadata).

Re. representative lat and lon variables - that's what I had in mind, thanks

#### dblodgett-usgs Jun 16, 2017

Contributor

OK. I see now. Sorry for the confusion. A while back, I think we had used standard_name for this and were told that cf_role was more appropriate. I see what you mean about needing particular metadata to clarify and how `axis` makes sense. I guess I can accept the "axis" is in reference to the coordinate-frame axis, which makes a bit more sense. Will update the text. See what you think after it's checked in.

The geometry node coordinate variables must all have the same single dimension, which is the total number of nodes in all the geometries.
The nodes must be stored consecutively for each geometry and in the order of the geometries, and within each multipart geometry the nodes must be stored consecutively for each part and in the order of the parts.
Polygon exterior rings must be put in anticlockwise order (viewed from above) and polygon interior rings in clockwise order.
They are put in opposite orders to facilitate calculation of area and consistency with the typical implementation pattern.

For all geometry types except __point__, in which each geometry contains a single node, when more than one geometry is present, the geometry container variable must have a **`node_count`** attribute that contains the name of a variable indicating the count of nodes per geometry.
The node count is the total nodes in all the parts.

For __multiline__, __multipolygon__, and __polygons__ with holes, the geometry container variable must have a **`part_node_count`** attribute indicates a variable of the count of nodes per geometry part.
Note that because multipoint geometries always have a single node per part, the **`part_node_count`** is not required.
The single dimension of the part node count variable should equal the total number of parts in all the geometries.

For __polygon__ and __multipolygon__ geometries with holes, the geometry container variable must have an **`interior_ring`** attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not.

#### davidhassell Jun 15, 2017

Contributor

With regards the difference between polygon and multipolygon, a use-case to consider is if the instance dimension (in the example given) is an unlimited dimension to allow data to come in gradually. In this case, you might not know in advance if each incoming-instance is a polygon or a multipolygon, so you would be forced to use multipolygon, even if some of the geometries consisted of just one polygon (!). So on those grounds, I don't think that the distinction between polygon and multipolygon is useful, and just polygon should suffice, I think. The same argument applies on multipoint and multiline.

#### dblodgett-usgs Jun 15, 2017 • edited

Contributor

I disagree here. Saying that a geometry is a polygon (or multi) type when you declare it allows an implementation to make some pretty important assumptions about the data it is going to be working with, making the distinction very useful.

The potential of an unlimited instance dimension, while possible, is an edge case as far as I'm concerned. Even if it were to be required, within a dataset, there would be a policy regarding support for multipolygons.

#### davidhassell Jun 16, 2017

Contributor

I'm afraid that I don't see how specifying multipolygon helps. It is possible that at least one cell of a multipolygon container contains a "non-multi" polygon, so the geometry for each cell needs to be ascertained by inspecting the part_node_count and interior_ring attributes, regardless.

Is it right that a geometry is a multipolygon if it has two or more parts that are exterior rings, as determined by the interior_ring attribute or assumed to be exterior in this attribute's absence?

#### dblodgett-usgs Jun 16, 2017

Contributor

Sorry I'm not being very eloquent... There are a few things at play here.

The `interior_ring` attribute indicates whether a given ring is interior or exterior. While this could be inferred through inspection of the node winding direction, in most cases it is much more convenient to have this in an attribute.

So yes, it is right that:

a geometry is a multipolygon if it has two or more parts that are exterior rings

but this is not quite the right interpretation.

as determined by the `interior_ring` attribute.

`interior_ring` is an indication of whether a ring is a hole or not. It is very convenient to have this flag when reading data into memory where you don't want to test the node winding direction to tell if a ring is a hole or not, this is true for both single and multi polygons. `interior_ring` is not meant to help tell between single and multi geometry types but it could be used to infer something about the distinction on a geometry by geometry basis.

That said, having the distinction between single and multi types is useful for two reasons (and probably more that I'm not thinking of).

1. Some software only needs to support single polygon types. They need to be able to tell that the data they are about to read is unambiguously a type they are designed to handle.
2. Typical geometry objects (in code / in memory) need to be instantiated as single or multi types. It would be very problematic to have to scan a file to see if it's truly a multi type or not.

Because of this, if we were to say, "just assume everything in CF is a multi geometry because the multi type is a super class of the single type", everyone would have to support the complexity of multi types. We would also run into situations where software that only works with single geometry types would be in a bind when opening and attempting to read or write data in CF. -- So at the end of the day, this is about being consistent and interoperable with practices in other similar encodings and specifications and playing nice with people who need to implement support for a NetCDF-CF encoding.

#### twhiteaker Sep 13, 2017

Contributor

If we combine polygon+multipolygon:

1. Do we need a variable on the instance dimension indicating whether the geometry is multipart? Or shall software figure it out by looking at, e.g., node count and part node count?
2. Shall we combine line and multiline into "line"? I think so.
3. Shall we combine point and multipoint into "point"? I don't think so. To me, points and multipoints are more distinct than line/multiline and polygon/multipolygon, as in I have a hard time coming up with use cases for mixing them. I note that Esri offers point, multipoint, line, and polygon classes, for what I assume are good reasons (I've asked and will report back if I get a response).

#### dblodgett-usgs Sep 15, 2017

Contributor

Re: 1. -- I think it's easy enough to assume multipart and if you don't get multipart, cool! So software can figure it out.
Re: 2. -- yes. At the end of the day this distinction shouldn't matter. Just treat everything as if it's multi.
Re: 3. -- I might lean toward everything being multipoint (which can have a single point per geometry) in this part of the spec and if you want to do single point, it is basically the DSG "point" type. Make sense?

#### twhiteaker Sep 15, 2017

Contributor

Sounds good to me

This interior ring variable should contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon.
The single dimension of the interior ring variable should be the same dimension as that of the part node count variable.

[[complete-multipolygon-example]]
[caption="Example 7.15. "]
.A multipolygon with holes

#### twhiteaker Jun 12, 2017

Contributor

How about this for a smaller CDL multipolygon example that includes two features and a hole? First feature has two parts with first part including a hole. Second feature has one part.

``````dimensions:
node = 12 ;
feature = 2 ;
part = 4 ;
variables:
double x(node) ;
x:units = "degrees_east" ;
x:standard_name = "longitude" ;
x:cf_role = "geometry_x_node" ;
double y(node) ;
y:units = "degrees_north" ;
y:standard_name = "latitude" ;
y:cf_role = "geometry_y_node" ;
float geometry_container ;
geometry_container:geometry_type = "multipolygon" ;
geometry_container:node_count = "node_count" ;
geometry_container:node_coordinates = "x y" ;
geometry_container:grid_mapping = "crs" ;
geometry_container:part_node_count = "part_node_count" ;
geometry_container:interior_ring = "interior_ring" ;
int node_count(feature) ;
int part_node_count(part) ;
int interior_ring(part) ;
float crs ;
crs:grid_mapping_name = "latitude_longitude" ;
crs:semi_major_axis = 6378137. ;
crs:inverse_flattening = 298.257223563 ;
crs:longitude_of_prime_meridian = 0. ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
x = 0, 10, 20, 5, 10, 15, 0, 10, 20, 30, 40, 50 ;
y = 0, 15, 0, 5, 10, 5, 20, 35, 20, 0, 15, 0 ;
geometry_container = 0. ;
node_count = 9, 3 ;
part_node_count = 3, 3, 3, 3 ;
interior_ring = 0, 1, 0, 0 ;
crs = 0. ;
``````

#### twhiteaker Jun 12, 2017

Contributor

Example with correct node order for exterior vs interior.

``````dimensions:
node = 12 ;
feature = 2 ;
part = 4 ;
variables:
double x(node) ;
x:units = "degrees_east" ;
x:standard_name = "longitude" ;
x:cf_role = "geometry_x_node" ;
double y(node) ;
y:units = "degrees_north" ;
y:standard_name = "latitude" ;
y:cf_role = "geometry_y_node" ;
float geometry_container ;
geometry_container:geometry_type = "multipolygon" ;
geometry_container:node_count = "node_count" ;
geometry_container:node_coordinates = "x y" ;
geometry_container:grid_mapping = "crs" ;
geometry_container:part_node_count = "part_node_count" ;
geometry_container:interior_ring = "interior_ring" ;
int node_count(feature) ;
int part_node_count(part) ;
int interior_ring(part) ;
float crs ;
crs:grid_mapping_name = "latitude_longitude" ;
crs:semi_major_axis = 6378137. ;
crs:inverse_flattening = 298.257223563 ;
crs:longitude_of_prime_meridian = 0. ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
x = 20, 10, 0, 5, 10, 15, 20, 10, 0, 50, 40, 30 ;
y = 0, 15, 0, 5, 10, 5, 20, 35, 20, 0, 15, 0 ;
geometry_container = 0. ;
node_count = 9, 3 ;
part_node_count = 3, 3, 3, 3 ;
interior_ring = 0, 1, 0, 0 ;
crs = 0. ;
``````

#### dblodgett-usgs Jun 14, 2017

Contributor

👍 I'll get it in here.

#### davidhassell Oct 14, 2017

Contributor

... This example needs a data variable ...

#### twhiteaker Oct 17, 2017

Contributor

@dblodgett-usgs incorporated a data variable into the example. Scroll to the end of the latest version (as I am typing this) here: https://github.com/dblodgett-usgs/cf-conventions/blob/7768e33e7edff459482e8ef8057ea6b8e015c9eb/ch07.adoc

====
This example demonstrates the use of all potential attributes and variables for encoding geometries.
----
dimensions:
node = 25 ;
feature = 1 ;
part = 6 ;
variables:
double x(node) ;
x:units = "degrees_east" ;
x:standard_name = "longitude" ;
x:cf_role = "geometry_x_node" ;
double y(node) ;
y:units = "degrees_north" ;
y:standard_name = "latitude" ;
y:cf_role = "geometry_y_node" ;
float geometry_container ;
geometry_container:geometry_type = "multipolygon" ;
geometry_container:node_count = "node_count" ;
geometry_container:node_coordinates = "x y" ;
geometry_container:grid_mapping = "crs" ;
geometry_container:part_node_count = "part_node_count" ;
geometry_container:interior_ring = "interior_ring" ;
int node_count(feature) ;
node_count:long_name = "count of coordinates in each feature geometry" ;
int part_node_count(part) ;
part_node_count:long_name = "count of nodes in each geometry part" ;
int interior_ring(part) ;
interior_ring:long_name = "type of each geometry part" ;
float crs ;
crs:grid_mapping_name = "latitude_longitude" ;
crs:semi_major_axis = 6378137. ;
crs:inverse_flattening = 298.257223563 ;
crs:longitude_of_prime_meridian = 0. ;
// global attributes:
:Conventions = "CF-1.8" ;
data:
x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5,
11, 15, 13, 11 ;
y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29,
25, 25, 25, 29, 25 ;
geometry_container = 0. ;
node_count = 25 ;
part_node_count = 5, 4, 4, 4, 4, 4 ;
interior_ring = 0, 1, 1, 1, 0, 0 ;
crs = 0. ;
----
====
 @@ -157,5 +157,9 @@ Replaced first para in Section 9.6. "Missing Data". Added verbiage to Section 2. .06 April 2017 . Ticket #70, Connecting coordinates to Grid Mapping variables: revisions in Section 5.6 and Examples 5.10 and 5.12 .07 April 2017 . New Simple Geometry section 7.5 and content in appendix E. .25 April 2017 . Ticket #100, Clarifications to the preamble of sections 4 and 5.
ProTip! Use n and p to navigate between commits in a pull request.