Simple Geometry Contribution #109

dblodgett-usgs · 2017-04-22T13:31:34Z

Dear CF,

This addition has been discussed on the email list heavily. It has not been submitted through trac as it seems that this ticket represents consensus to stop using trac in favor of github. I think it may be useful to consider a contribution not vetted in trac to pilot a workflow in github. If it is preferably to the community, I will circle back and use trac as in other contributions. Just trying to nudge the group and give a bit of food for thought.

Note that I have additional commit history that demonstrates peer review of the actual text on this branch. If the group would rather have a rich commit history, we can close this and open a PR from that branch.

For completeness of the history in case people come to this PR without having been part of the conversation on the email list, this work has been developed and vetted in a github repository here. The issue history in that repository contains nearly all the communication between the team that developed the proposal. The wiki in that repository also contains a more verbose description of the new section 7.5 being contributed here. That repository also contains a reference python implementation of the format. This repository contains an R implementation.

Best,

Dave

See https://cf-trac.llnl.gov/trac/ticket/160

Added link from trac ticket to this PR?

Applied via https://github.com/cf-convention/cf-conventions/pull/XXX

Updated "Revision History"? (Use the date you applied the changes.)

JonathanGregory · 2017-05-02T17:37:32Z

Dear Dave

Thanks for doing this. I think it's in pretty good shape but I have quite a few detailed comments and suggestions to make, nearly all on the proposed text rather than the convention itself. I'm not sure how to do that in GitHub, so it's a useful exercise to see how this would work. If your proposal was in trac, I would reply to your posting on the ticket, edit the wiki-markup text to show my changes and suggestions for the parts affected, and repost it to the trac ticket. If it was on a wiki, I would make a copy of it on the wiki page, and edit it similarly.

The way GitHub is set up, I suppose the natural way to do it is to make a new branch and edit that, but (a) I don't know how to do that, (b) it's not obvious to me that the changes I suggested would be clear to you. In fact I find the proposal in this form not as easy to follow as it would be in trac. I can view it as deltas of the files, but these have little context and are hard to read as text because the markup isn't translated, or I can read the properly rendered modified files, but these don't show what's been changed, and of course they show much more that isn't affected, especially as it's in several different complete files.

So I'm inclined to think that it would be easier to use GitHub issues in the same way as we use trac. That is, you would post your entire proposed text to the "issue". Then I presume I could copy your posting and edit it, as in trac. Unfortunately, the markup isn't the same, is it - the issues use markdown, I believe, whereas the convention text uses AsciiDoc. This is a technical obstacle. Is there an automatic translator? If not, once the text is agreed, it would have to be manually transposed into the conventions document, as we have been doing from agreed trac tickets.

You and others have much more experience with this, so perhaps you could suggest how to proceed, both in general with using GitHub issues (but that might be better on ticket 160) and specifically to give you detailed suggestions for your text.

Best wishes and thanks

Jonathan

dblodgett-usgs · 2017-05-03T15:31:41Z

Dear Jonathan,

I agree that this is a useful test case for how we might do this in github. I somewhat intentionally submitted this in a way that would provoke such discussion.

I've been pondering this a bit and I think that the path of least resistance may be to submit an issue rather than a pull request highlighting additions or changes. I'll close this pull request and open an Issue instead. That way we can use markdown to document text review.

My suggestion would be to use ~~strikethrough text~~ (~~text wrapped in double tilde~~) to show that text is to be changed, and bold text (**text wrapped in double asterisk**) to indicate new additions.

Once the text is finalized in an issue, someone would submit a pull request like this one that includes the finalized content. That way the PR review will be about the details of the asciidoc and not the details of the proposed change to the CF spec information content.

Best,

Dave

ChrisBarker-NOAA · 2017-05-03T16:46:47Z

hmm -- I'm going to have to disagree here -- if there is more than a small amount of text, then a
PR is really a better way to deal with it.

@JonathanGregory posted in a TRAC ticket, which then ended up on the mailing list, where I read it, but I can't comment on TRAC tickets, so I'll address the issues here:

I have
quite a few detailed comments and suggestions to make, nearly all on the
proposed text rather than the convention itself.

With a PR, it is easy to make comments on individual lines of text -- this is a really nice way to intersperse comments and particularly when it is small word-smithing issue or whatever. And you can also post larger comments that are not tied to individual parts of the text. Not sure what the confusion on this is -- this particular use case is really well supported by gitHub PRs and not nearly as well supported by dumping a large block of text in an issue.

Go to the "Files Changed" tab on the PR, and you can click on any line of text and add a comment there.

The way !GitHub is set up, I suppose the natural way to do it is to make
a new branch and edit that, but (a) I don't know how to do that, (b) it's
not obvious to me that the changes I suggested would be clear to you.

A workflow like this would be the way to go if you wanted to suggest large changes to the text. IN this case, you are moving to using git (rather than gitHub per-se) features -- then you would want to create a new branch or fork, and make your changes, and then submit them as a PR. This required a fair bit of git-fu -- but yes, the changes would be very clear. Perhaps we should write up a tutorial on how to do that.

BTW -- there are (kinda) two workflows for this kind of thing. In general, gitHub supports two kinds of contributions:

(A) Contributions from the "core team" -- in that case, these folks have permissions to push to the main gitHub repository. Then they can either: directly make a change, or (better option for non-tirivial changes) create a branch, make the changes there, and then do a PR to merge the branch back into master -- it can then be reviewed and discussed by the team before merging.

(B) Contributions from non-members of the core team. In this case, the workflow is to fork the repository, make the changes in that fork, and do a PR to the main repo to merge. The changes are then discussed and reviewed, and can be merged in when ready.

These seem very similar, but (B) can be a quite a bit more awkward if others want to do more than comment, and actually go in and make changes to the file in question.

If you are using workflow (A) then any core member can pull the branch, and look at it, and make changes an push them up -- so a smaller group can collaborate on a set of changes easily.

If you are using workflow (B) (as is the case here), the then "new" stuff is in a fork of the repo in someone else's gitHub account (@dblodgett-usgs in this case). So no one else has permission to edit that -- we can comment on the PR, but only the proposer can actually change anything. So if you do want to edit the doc itself and suggest those changes, you need to fork the forks' repo, and do it there, and then submit a PR to that repo -- this gets pretty complicated, though It's not actually that hard to do.

I can view it as deltas of the files, but these have little
context and are hard to read as text because the markup isn't translated,

I guess ASCIIDOC is not as fabulously readable as we'd like :-(

As for the context, that is a gitHub limitation -- you can either see only the changes, without much context (but able to comment on it line by line) -- or see the whole file, but then teh cnages aren't highlighted and you can't comment line by line. However, I find it only slightly painful to switch back and forth -- look at the changes, and if I need context, switch to the whole file to see what's going on and then back again to comment.

or I can read the properly rendered modified files, but these don't show
what's been changed, and of course they show much more that isn't
affected, and it's several different complete files.

I can't really see how this is made better by jamming all the text into the issue. -- now to see the changes, the proposer is going to have to mark it up by hand somehow....

Unfortunately, the markup isn't the same, is it - the issues use
markdown, I believe, whereas the convention text uses !AsciiDoc. This is a
technical obstacle. Is there an automatic translator?

It looks like gitHub does know how to render asciidoc:

https://github.com/dblodgett-usgs/cf-conventions/blob/94875c55a97596c9055174244714baaa1d9ffee4/appe.adoc

BTW, you can open different "views" of the same PR in multiple tabs in your browser. This can be very helpful if you want to look at the changes and make comments, and also switch quickly and easily to the full page to see context, for example.

In short -- let's give gitHub a chance! -- This kind of PR review is one of the features we WANTED from gitHub.

ChrisBarker-NOAA · 2017-05-03T16:51:20Z

@dblodgett-usgs wrote:

My suggestion would be to use strikethrough text (~~text wrapped in double tilde~~) to show that text is to be changed, and bold text (text wrapped in double asterisk) to indicate new additions.

That's exactly the kind of hand-editing that I think is a big waste of time -- let the computer tell you what's changed! This is a lot of work for the proposer

Once the text is finalized in an issue, someone would submit a pull request like this one that includes the finalized content. That way the PR review will be about the details of the asciidoc and not the details of the proposed change to the CF spec information content.

I guess I'm confused about where this proposal is at -- if it's still about the CF spec issues, then yes, a Issue may be a better way to discuss -- but then there shouldn't be any asciidoc required at all -- or ADDING markup for changes, etc.

When Jonathan wrote:

quite a few detailed comments and suggestions to make, nearly all on the
proposed text rather than the convention itself"

That sounds to me like we ARE talking about details of text and format, and a PR is the best way to manage that stage of the process.

dblodgett-usgs · 2017-05-03T17:02:32Z

This is all really good analysis. Thanks! I may have (probably did) swing too far with hand-markup of text changes.

Two things:

I totally forgot about line-by-line comments in the PR diff. I think that may be the solution @JonathanGregory is looking for.

This proposal is at the point of wordsmithing and minor details. It does not HAVE to be at the point of a formatted contribution to the repository that get's pushed out to cfconventions.org. I'm not sure that distinction is worth getting hung up on, but maybe it is?

So, I'll reopen this and we can give commenting on the diff a chance.

davidhassell · 2017-05-03T17:11:00Z

Hi David and Chris, I'm a bit lost - how can we make/view diffs of proposed text, such as that given at #112 Thanks! David

…

On 3 May 2017 at 18:02, David Blodgett ***@***.***> wrote: This is all really good analysis. Thanks! I may have (probably did) swing too far with hand-markup of text changes. Two things: I totally forgot about line-by-line comments in the PR diff. I think that may be the solution @JonathanGregory <https://github.com/JonathanGregory> is looking for. This proposal is at the point of wordsmithing and minor details. It does not HAVE to be at the point of a formatted contribution to the repository that get's pushed out to cfconventions.org. I'm not sure that distinction is worth getting hung up on, but maybe it is? So, I'll reopen this and we can give commenting on the diff a chance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHwAcJro-80e-nLq3Ad7SAlhfmTeSnphks5r2LMqgaJpZM4NFFoo> .

-- David Hassell National Centre for Atmospheric Science Department of Meteorology, University of Reading, Earley Gate, PO Box 243, Reading RG6 6BB Tel: +44 118 378 5613 http://www.met.reading.ac.uk/

dblodgett-usgs · 2017-05-03T17:15:17Z

Line by line comments are possible in the diff view of a pull request. Click on the line number on this page: https://github.com/cf-convention/cf-conventions/pull/109/files?diff=unified .

You should see something like this:

ChrisBarker-NOAA · 2017-05-03T17:16:31Z

On Wed, May 3, 2017 at 10:11 AM, David Hassell <notifications@github.com> wrote:

Hi David and Chris, I'm a bit lost - how can we make/view diffs of proposed text, such as that given at #112

That is an "Issue", rather than a PR (Pull Request) -- and there is not way to make/view/comment on diffs in an issue -- and Issue is really just a conversation. Which is why, for this kind of thing, a PR is a good option -- in that case, someone is proposing actuall cahnges to the doc, and you can see teh changes, comment on them line by line, etc: if you go here: #109 then click on the "Files Changed" tab -- you will see the changes, and can comment on them. There have got to be a lot of tutorials on the web about the details...

…

-CHB

Thanks! David On 3 May 2017 at 18:02, David Blodgett ***@***.***> wrote: > This is all really good analysis. Thanks! I may have (probably did) swing > too far with hand-markup of text changes. > > Two things: > > I totally forgot about line-by-line comments in the PR diff. I think that > may be the solution @JonathanGregory <https://github.com/JonathanGregory > > is looking for. > > This proposal is at the point of wordsmithing and minor details. It does > not HAVE to be at the point of a formatted contribution to the repository > that get's pushed out to cfconventions.org. I'm not sure that distinction > is worth getting hung up on, but maybe it is? > > So, I'll reopen this and we can give commenting on the diff a chance. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#109 (comment)- 298972768>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AHwAcJro-80e- nLq3Ad7SAlhfmTeSnphks5r2LMqgaJpZM4NFFoo> > . > -- David Hassell National Centre for Atmospheric Science Department of Meteorology, University of Reading, Earley Gate, PO Box 243, Reading RG6 6BB Tel: +44 118 378 5613 <+44%20118%20378%205613> http://www.met.reading.ac.uk/ — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA38YFt4Xl95LcHnesAgbTegxa192Bdqks5r2LUmgaJpZM4NFFoo> .

-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

JonathanGregory · 2017-05-04T14:18:55Z

Thanks for this discussion. In trac, I would have to do quite a lot of work as well to propose changes and make comments in a clear way, and I'd of course prefer it to be automated as far as possible. What I want to do in this case is about word-smithing, and just a few points about the detail of the proposal. Writing it all as comments is inefficient e.g. "I suggest changing A to B". It is much easier for me to change A to B in the text, and if it's not obvious why I would have suggested that, I would add a comment as well. So I want to make changes and comments, like one would in a Word document with tracked changes. Of course I am not suggesting that dreadful and hideous alternative, but is there a way to do this in GitHub? So far it sounds like that can't be done without forking a branch, but I fear that would be too much hard work for everyone who might want to contribute to debates about CF text as they do in trac tickets.

dblodgett-usgs · 2017-05-04T14:25:39Z

ch07.adoc

+[[spatial-geometries, Section 7.5, "Spatial Geometries"]]
+=== Spatial Geometries
+
+For many geospatial applications, data values are associated with a spatial geometry (e.g., the average monthly rainfall in the UK).  Although cells with an arbitrary number of multiple vertices can be described using <<cell-boundaries>>, spatial geometries contain an arbitrary number of nodes for each geometry and include line and __multipart__ geometries (e.g., the different islands of the UK).  The approach described here specifies how to encode such geometries following the pattern in **9.3.3 Contiguous ragged array representation** and attach them to variables in a way that is consistent with the cell bounds approach.


For @JonathanGregory If you have a comment on a small substring of a paragraph it could be done like:

Although cells with an arbitrary number of multiple vertices can be described using

to

Although cells with an arbitrary number of vertices can be described using

Or, just copy the whole paragraph and rewrite as needed.

Or, this could just be a general comment indicating how you think this should be rewritten.

How can I (someone who doen't have write access to the repo) make comment in this manner? I tried via https://github.com/cf-convention/cf-conventions/pull/109/files?diff=unified but couldn't work it out.

Thanks, David

When you roll over the line numbers, you will see a + appear. Click the + and it opens a comment box. That should be available to anyone if the repository is public.

dblodgett-usgs · 2017-05-04T14:26:44Z

I think the easiest might be to use this view select the line you want to modify, and go to town. I've added a comment over there. Here's a link to the comment.

davidhassell · 2017-05-05T14:23:42Z

appe.adoc

+  y = 10, 30, 40, 60, 50 ;
+----
+The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used.
+====


Test comment - please ignore

@dblodgett-usgs That's great - thanks

rsignell-usgs · 2017-05-03T21:01:58Z

ch07.adoc

+
+All geometries are made up of one or more nodes. The geometry type specifies the set of topological assumptions to be applied to relate the nodes. For example, __multipoint__ and __line__ geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except the first and last nodes must be identical for polygons. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as __exterior__ or __interior__ polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry.
+
+While this geometry encoding is applicable to any variable that shares a dimension with a set of geometry, the application it was originally designed for requires that geometry be joined to the instance dimension of a Discrete Sampling Geometry `timeSeries` featureType. In this case, any data variable can be given a `geometry` attribute that is to be interpreted as the representative geometry for the quantity held in the variable. An example of this is areal average precipitation over a watershed. An example of line geometry with time series data is given in <<appendix-cell-methods>>.


Could the instance dimension be renamed the features dimension? I was very confused about what this dimension was until I read the caption below the example, where I learned this was the number of simpleFeatures in the dataset.

The instance term is taken from the DSG spec. I would be in favor of renaming it the feature dimension, but that would require changes to the DSG spec in addition to this proposal.

Ah @dblodgett-usgs , thanks. I see that for DSG trajectories (CF-1.5 Appendix H.5) it says:

The instance dimension in the case of trajectories specifies the number of trajectories in the collection and is also referred to as the trajectory dimension.

So maybe we could say:

"The instance dimension in the case of features specifies the number of features in the collection and is also referred to as the feature dimension."

I suppose. Yeah. I'll work through the text and make a change to that affect pendingfeedback from @JonathanGregory

Updated spec incoming to this PR. I've added the feature dimension text and changed "instance" to "feature" in the cdl.

cameronsmith1 · 2017-05-05T20:10:51Z

I had assumed that edits to CF are frequent enough and complicated enough, that it would make sense to make a branch for each proposed change, on which all of the discussions and changes will occur, which then get merged to the master version once the final version is agreed to.

(note: I am familiar with github, but am not an expert)

ChrisBarker-NOAA · 2017-05-05T20:24:26Z

I think "complicated enough" is key -- if's a typo fix, then don't bother, but larger edits should get a branch.

However, you can only really do a branch if yu have commit rights to the main repo. So anyone that doesn't can to a PR instead, which is pretty much the same thing.

(and a PR can be merged into a branch as well -- it doesn't have to go straight to master -- if there is a big change that needs further review by the core contributors.

cameronsmith1 · 2017-05-05T21:18:38Z

Hi @ChrisBarker-NOAA , I normally think of a PR as the mechanism to merge one branch with another (usually on to the master branch). How does a PR work without a branch?

Is this discussion actually happening on a branch, because it is a little different than I am used to?

ChrisBarker-NOAA · 2017-05-05T21:23:46Z

A PR can be for a merge of one branch to another within the same project -- but that can only be done by people that have write permissions to the central repo.

A PR from a fork, like this one, is essentially the same, but the different "branch" is in someone else's repo (in this case dblodgett-usgs's account) as it is a different repo, it doesn't matter what it's called "master" in dblodgett-usges' repo is separate from "master" in this repo.

clear as mud?

cameronsmith1 · 2017-05-05T21:25:14Z

@ChrisBarker-NOAA , so the real distinction is between a branch and a fork?

ChrisBarker-NOAA · 2017-05-05T21:30:11Z

@ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real distinction is between a branch and a fork?

yup.

cameronsmith1 · 2017-05-06T00:13:05Z

Thanks. BTW, I am submitting this reply by responding via email, to show what it looks like.

----------------------------------------------------------------------------------- Philip Cameron-Smith, pjc@llnl.gov, CliMA Group Leader, Lawrence Livermore Nat. Lab. ----------------------------------------------------------------------------------- From: Chris Barker [mailto:notifications@github.com] Sent: Friday, May 05, 2017 2:30 PM To: cf-convention/cf-conventions <cf-conventions@noreply.github.com> Cc: Cameron-smith, Philip <cameronsmith1@llnl.gov>; Comment <comment@noreply.github.com>

Subject: Re: [cf-convention/cf-conventions] Simple Geometry Contribution (#109) @ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real distinction is between a branch and a fork?

yup. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#109 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIkyUEKhm2hNX0wCNFE8IFQ9hBt7laZcks5r25TkgaJpZM4NFFoo>.

dblodgett-usgs · 2017-05-06T14:16:49Z

The group needs to decide if a branched workflow is going to be helpful. I prefer the fork / pull request onto master workflow because it grants full control of the fork to the person making changes, but I’d be open to other alternatives as discussed above. Note that some email client reply headers work better with GitHub than others. My Mac mail client is usually pretty tidy, but it looks like whatever is used at LLNL isn’t parsed out nicely by GitHub.

…

On May 5, 2017, at 7:13 PM, cameronsmith1 ***@***.***> wrote: Thanks. BTW, I am submitting this reply by responding via email, to show what it looks like. ----------------------------------------------------------------------------------- Philip Cameron-Smith, ***@***.***, CliMA Group Leader, Lawrence Livermore Nat. Lab. ----------------------------------------------------------------------------------- From: Chris Barker ***@***.*** Sent: Friday, May 05, 2017 2:30 PM To: cf-convention/cf-conventions ***@***.***> Cc: Cameron-smith, Philip ***@***.***>; Comment ***@***.***> Subject: Re: [cf-convention/cf-conventions] Simple Geometry Contribution (#109) > @ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real > distinction is between a branch and a fork? > yup. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#109 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIkyUEKhm2hNX0wCNFE8IFQ9hBt7laZcks5r25TkgaJpZM4NFFoo>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABbHQ8GtO2IKDSnABaZh2FtBML0BrZ8Kks5r27sSgaJpZM4NFFoo>.

ChrisBarker-NOAA · 2017-05-07T17:05:45Z

The group needs to decide if a branched workflow is going to be helpful. I prefer the fork / pull request onto master workflow because it grants full control of the fork to the person making changes,

Wry good point -- I think which is best is situation dependent -- a fork-based workflow provides better control to the proposer. A branched-based workflow provides easier collaboration on a change. Got is very flexible though, so if there is a small subgroup that wants to collaborate on a change, then multiple people can be granted access to the forked version. I tend to prefer the branch-beaded workflow, but it does require giving permission to more people to push to the many repo, which may be too much admin overhead.

…

-CHB

cameronsmith1 · 2017-05-08T20:13:16Z

When I was looking at the current Pull Request in Github, there were various options that it wouldn't let me access because I didn't have permission (presumably because it is a fork). What would it take to give people permission to edit the current Fork/PR, so people can see the editing features that will be available? (FYI, I am responding by email again, but this time I am stripping off everything else in my email.)

ChrisBarker-NOAA · 2017-05-08T21:00:22Z

The form belongs to @dblodgett-usgs -- so he would have to give yu permission to edit his repo.

Or you can fork his fork -- make changes and do a PR against his fork -- this gets complicated fast!

-CHB

cameronsmith1 · 2017-05-08T21:13:24Z

As I understand it, then, there are 4 options for handling changes:

People will provide feedback on a proposed change only by making comments.
Give a lot of people write-access to the main CF git repository, and use branches.
Someone proposing a change will fork the git repository, and give write-access to people (probably a select few).
Someone proposing a change will fork the git repository, and others wanting to alter the proposed changes will fork the fork, then issue a new pull request.

Personally, I think (1) will be limiting, and (4) would get very messy and confusing for a large and controversial change.

ChrisBarker-NOAA · 2017-05-08T21:39:24Z

agreed, though:

is not that limiting, and in this case @dblodgett-usgs suggested that he wanted to make any changes himself.
required write access to the core folks working on a change -- I don't know how practical that is.
is a fine option -- a major change may be getting proposed by a small group of folks -- make sense for them (and only them) to share a repo to do it.

I agree - (4) is pretty unweildy.

For small changes, (1) is fine.

For larger changes, (3) should work.

dblodgett-usgs · 2017-05-09T12:55:04Z

I'll make the observation that, for most large submissions, only one person is going to be the editor of the text. So, like in this case, putting in a PR from a fork (owned by the submitting editor) works well.

In this case, I forked the repository, added my draft contribution, had two collaborators review it (which you can see in these two closed PRs), then submitted the PR here.

My intention is to take all the comments I receive and reconcile them into the open PR with a new commit. When I do that, you will see that comments on old (now stale) content will be hidden and the review process can continue fresh.

So the way I've done this, which has been completely natural and easy, has been a mix of 1 (for community review) and 4 (for peer review prior to the community). Note that for this proposal we had significant community review on the email list and in another github repository / google docs as well prior to getting to this point. We are NOT starting from scratch with a github pull request.

On a somewhat unrelated not, I don't think pushing to the shared group repository should ever be allowed. Curious how those who have been maintaining this repo feel about it. IMHO, all changes should come in through pull requests. This keeps the requirement for review on all community members. Pulling commits from/to each other's forks is not all that hard once you have multiple remote repositories in your local copy.

Finally, I'm looking forward to @JonathanGregory's line by line critique of this work. After all this conversation, it seems that the pull request review feature is really what you are looking for. When you go to make a comment on a line there is an option to "start a review". It would be great to have one comprehensive review from you. Maybe we would have a comment that at-least two community members have completed a peer review of the material prior to merging?

…sions

dblodgett-usgs · 2017-05-09T13:13:50Z

Note that there is now an "outdated" comment thread from @rsignell-usgs above. The commit just above this comment addresses his comment about the use of "instance" dimension for "features". Open that commit to see what was changed. That commit is also now part of the total PR diff visible at the "files changed" link at the top of this page.

rsignell-usgs · 2017-05-09T13:25:01Z

After all this conversation, it seems that the pull request review feature is really what you are looking for. When you go to make a comment on a line there is an option to "start a review".

👍

JonathanGregory · 2017-05-09T15:03:41Z

OK, thanks for all this discussion. I'll try to find time to do this in the next week or so. I'm a bit anxious that this might seem more complicated and consequently off-putting than trac would be for the average CF user who wants to propose a change to the convention text. But I need to try it to see how it works.

JonathanGregory · 2017-06-04T12:29:22Z

Dear Dave

At last I have managed to make time to work on your proposal. I'm sorry it took me so long. It turned out that I had a lot of suggestions to make, and I have edited it extensively. However, these changes are nearly all about presentation, clarity, logical order (to my mind - you may disagree) and removing repetition. It didn't seem possible to do this with GitHub comments so instead I have edited your AsciiDoc text. AsciiDoc is not the same as MarkDown, so GitHub can't render it correctly, but it's not very different.

In my next posting to the issue, I paste in my edited AsciiDoc, with the sole extra change of using four tildes to delimit verbatim code, as required by MarkDown. The AsciiDoc markup for headings doesn't work. Having gone through this exercise, I wonder whether in future it would be best to develop conventions changes in GitHub in MarkDown format, and convert it to AsciiDoc when it's agreed. There is at least one program (pandoc) which is said to be able to do this. Or could we store the conventions document in MarkDown rather than AsciiDoc? I assume that MarkDown doesn't have all the required facilities, since AsciiDoc appears to be more powerful.

In my third posting to the issue, I've given my revised text without the deletions and comments. I derived it this with sed '/^>/d; s/~~[^~][^~]*~~//g'.

Working on this did raise some substantive issues about your proposal, which I've commented on in the text. They are (a) I don't like the duplication of first and last points in a polygon. (b) Is it really necessary to use opposite orders for exterior and interior polygons? (c) It seems better to me to regard a polygon with holes as a multipolygon. (d) If there is only a single geometry, the node count variable could be omitted. (e) I have two concerns with the part about CRSs.

Best wishes

Jonathan

JonathanGregory · 2017-06-04T12:30:34Z

Could we call the new features just "geometries"? What does "spatial" add, since geometries are usually in space? Perhaps some other adjective would be more informative.

[[spatial-geometries, Section 7.5, "Spatial Geometries"]]
=== Spatial Geometries

The difference of this new convention from bounds is that there can be a different number of nodes for each of the elements of the axis, hence my suggestions the next para. The ref to 9.3.3 should be a named reference, I suppose. I rearranged the first sentence, and moved and modified the first two sentences from your second para into this one, as definitions. I moved the watershed example here from a later para.

For many geospatial applications, data values (e.g., the average monthly rainfall in the UK) are associated with a [spatial] geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. Polygonal cells with an arbitrary number of multiple vertices are geometries, and can be described using <>, but in that case every cell must have the same number of vertices. In contrast, each [spatial] geometry associated with a given data variable may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be ~~spatial geometries contain an arbitrary number of nodes for each geometry and include line and~~ multipart ~~geometries~~ i.e. including several disjoint parts (e.g., the different islands of the UK). Other geometry types exist and may be introduced in a later version of the ~~specification~~ CF convention. The approach described here specifies how to encode such geometries following the pattern in 9.3.3 Contiguous ragged array representation and attach them to variables in a way that is consistent with the cell bounds approach.

I've put all the next para in other places, except that I don't think we need to list the types here, because that's done when the geometry attribute is defined.

A geometry is usually thought to be a spatial representation of a real-world feature. It can be disjoint, having multiple parts. Geometry types are limited to point, multipoint, line, multiline, polygon and multipolygon types. Similar to other geospatial data formats, geometries are encoded as ordered sets of geospatial nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. Parametric geometries or otherwise curved features may be supported in the future.

All geometries are made up of one or more nodes. As in other geospatial data formats, geometries are encoded as ordered sets of nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. (Parametric geometries or otherwise curved features may be supported in the future.)

It is redundant to supply the last point of a polygon. If you replicate the last point, there's the possibility for inconsistency, and it needs to be checked; if the replicated point is not supplied, no mistake can be made! Hence my change here. It also saves a small amount of space.

The geometry type specifies the set of topological assumptions to be applied to relate the nodes. Multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except ~~the first and last nodes must be identical for polygons~~ that a polygon is completed by connecting the last node back to the first. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as exterior or interior polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry.

I think we should omit the first sentence of this para because we don't need to explain the history of the convention. I have moved the idea of the second sentence to the end of the paragraph. I don't think your timeseries example belongs in Appendix E: it should be in this subsection, with the other example. It could also be referred to somewhere in Sect 9.

While this geometry encoding is applicable to any variable that shares a dimension with a set of geometry, the application it was originally designed for requires that geometry be joined to the instance dimension of a Discrete Sampling Geometry timeSeries featureType. The instance dimension, in the more general case of features, specifies the number of features in the collection and is also referred to as the feature dimension. In this case, any Any data variable can be given a geometry attribute that ~~is to be interpreted as~~ indicates the ~~representative~~ geometry for the quantity held in the variable. ~~An example of this is areal average precipitation over a watershed. An example of line geometry with time series data is given in <>.~~ One of the dimensions of the data variable must be the number of geometries to which the data applies. If the data variable has a discrete sampling geometry, the number of geometries is the instance dimension (Section 9.2).

I think the subheading is probably not needed.

==== Geometry Variables and Attributes

I suggest omitting the last two sentences because it seems easy enough to read if we just introduce the attributes in turn.

~~A set of geometries can be added to a file by inserting all required data variables and a~~ A geometry container variable ~~that~~ acts as a container for attributes that describe a set of geometries. The geometry attribute of the data variable contains ~~containing~~ the name of a geometry container variable~~can be added to any variable that shares the feature dimension with the geometries~~. The geometry container variable must hold geometry_type and node_coordinates attributes. Depending on the geometry_type, the geometry container may also need to contain a node_count, part_node_count, and interior_ring attribute. These attributes are described in detail below.

In the next two paras, we don't need to repeat the requirements that these attributes must be supplied. I avoided naming x,y,z because that is obvious from the next sentence, which was a later para. I moved it here and shortened it by removing repetition. I put "blank-delimited" because that's the phrase we normally use.

The geometry_type attribute ~~must be carried by a geometry container variable and~~ indicates the type of geometry present. Its allowable values are: point, multipoint, line, multiline, polygon, multipolygon.

The node_coordinates attribute ~~must be carried by a geometry container variable and~~ contains the ~~space~~ blank delimited names of the ~~x and y (and z)~~ variables that contain geometry node coordinates (one variable for each spatial dimension).

Is it really necessary to use opposite orders for exterior and interior? It trikes me that the convention would be simpler and hence more reliable if they were all anticlockwise. You know which are interior ones from the interior_ring attribute and so can reverse them if necessary when doing calculations.

The ~~variables that contain~~ geometry node coordinate variables ~~data, indicated by the node_coordinates attribute on the geometry container variable, are also identifiable through the use of a required~~ must each have a cf_role attribute, whose allowable values are geometry_x_node, geometry_y_node, and geometry_z_node. The geometry node coordinate variables must all have the same single dimension, which is the total number of nodes in all the geometries. The nodes must be stored consecutively for each geometry and and in the order of the geometries, and within each multipart geometry the nodes must be stored consecutively for each part and in the order of the parts. Polygon exterior rings must be put in anticlockwise order (viewed from above) and polygon interior rings in clockwise order. They are put in opposite orders to facilitate calculation of area.

If there is only a single geometry, the node count variable could be omitted, couldn't it, because all the nodes must belong to that geometry. That would be a simplification.

For all geometry types except point, in which each geometry contains a single node, the geometry container variable must have a node_count attribute that contains the name of a variable indicating the count of nodes per geometry. ~~Note that~~ For a multipart geometry, the node count ~~may span~~ is the total for all the ~~multiple geometry~~ parts. The single dimension of the node count variable should be the number of geometries.

I gather from the next para that you regard a single polygon with holes as a polygon still. Isn't it simpler to treat it as a multipolygon? That means the requirements for attributes are simpler, and in terms of processing the data it seems natural to me. I have changed it to match that assumption, but you may well have counter-arguments.

For multiline and multipolygon geometries (~~and~~ including polygons with holes), the geometry container variable must have a part_node_count attribute that contains the name of a variable indicating the count of nodes per geometry part. Note that because multipoint geometries always have a single node per part, the part_node_count is not required. The single dimension of the part node count variable should equal the total number of parts in all the geometries.

See last para concerning polygons with holes.

For polygon and multipolygon geometries with holes, the geometry container variable must have an interior_ring attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not. ~~The variable indicated by the interior_ring attribute~~ This interior ring variable should contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon. ~~Note that single part polygons can have interior rings; multipart polygons are distinct in that they have more than one exterior ring.~~ The single dimension of the interior ring variable should be the same dimension as that of the part node count variable.

I moved this final para to an earlier point

The variables that contain geometry node coordinate data, indicated by the node_coordinates attribute on the geometry container variable, are also identifiable through the use of a required cf_role attribute. Allowable values are geometry_x_node, geometry_y_node, and geometry_z_node.

I don't think the following section is needed. See below.

==== Encoding Geometries

The first sentence of this is "history", and I think the point is better made earlier (see above). I have put the rest of the requirements at earlier points where these variables are introduced.

Geometry encoding follows a similar pattern to the contiguous ragged array approach in 9.3.3 Contiguous ragged array representation with some modification to suit the spatial geometry use case rather than observational time series. All spatial data are encoded in the variables indicated by the node_coordinates and appropriate cf_role attribute. These node variables should be one dimensional and total number of nodes long. There are three one dimensional variables that are used to break up and interpret the node variabes: node_count, part_node_count, and interior_ring.

Some of the next two paragraphs is repetition, and I have put the other requirements at earlier points.

For geometry types requiring a node_count attribute, the node count variable should be the number of geometries long and indicate the number of nodes per geometry. For geometry types requireing a part_node_count attribute, the part node count variable should be the number of geometry parts long and indicate the number of nodes per geometry part. For geometry types requireing an interior_ring attribute, the interior ring variable should be the number of geometry parts long and contain 0s and 1s to indicate exterior or interior.

The ecosystem of polygon specifications and software implementations of those specifications varies in how polygons are encoded. Nodes within a polygon exterior or interior ring are typically encoded in opposite clockwise or anticlockwise direction around the polygon. This is important for operations such as caluclating area. CF requires that outer rings be encoded in anticlockwise order and interior rings be encoded in clockwise order. CF also requires that the first and last node in a polygon be identical to ensure polygon rings are complete.

I have two concerns with the next paragraph. First, why should a grid mapping be required in the case of geometries when it's not otherwise? I'm sure that in many real-world applications it is desirable and should be included, but it's possible this convention could be used with model or other data in which a grid mapping isn't needed or useful. (NB a grid mapping doesn't exactly correspond to a CRS. Parts of a CRS reside in other parts of CF metadata.) I would prefer a recommendation to include a grid_mapping for real-world data, rather than a requirement. Second, if a grid mapping is needed for a geometry when there is no data variable, why not put a grid_mapping attribute on the geometry container variable? As far as I can see, the proposed crs attribute is the same thing. But why do we need this anyway, since there is always a data variable in the file?

A coordinate reference system (CRS) (referred to as a grid mapping elsewhere in the CF convention) is strictly required for geometries. The normal CF practice, of attaching a grid_mapping attribute--containing the name of a CRS container variable--to a data variable, can be used and the grid_mapping CRS should be assumed to apply to the geometry. However, the normal grid_mapping, which typically applies to auxiliary coordinate variables and remains optional for use with geometries, can be overridden by attaching a crs attribute that contains the name of a CRS container variable to the geometry container variable. If a grid_mapping is not present on a data variable linked to geometry, a crs attribute is required.

[[complete-multipolygon-example]]
[caption="Example 7.14. "]
.A multipolygon with holes

This example demonstrates the use of all potential attributes and variables for encoding geometries.

dimensions:
  node = 25 ;
  feature = 1 ;
  part = 6 ;
variables:
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ;
    x:cf_role = "geometry_x_node" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = "latitude" ;
    y:cf_role = "geometry_y_node" ;
  float geometry_container ;
    geometry_container:geometry_type = "multipolygon" ;
    geometry_container:node_count = "node_count" ;
    geometry_container:node_coordinates = "x y" ;
    geometry_container:crs = "crs" ; // I'm doubtful about this - see above
    geometry_container:part_node_count = "part_node_count" ;
    geometry_container:interior_ring = "interior_ring" ;
  int node_count(feature) ; // could be omitted since feature=1 - see above
    node_count:long_name = "count of coordinates in each feature geometry" ;
  int part_node_count(part) ;
    part_node_count:long_name = "count of nodes in each geometry part" ;
  int interior_ring(part) ;
    interior_ring:long_name = "type of each geometry part" ;
  float crs ;
    crs:grid_mapping_name = "latitude_longitude" ;
    crs:semi_major_axis = 6378137. ;
    crs:inverse_flattening = 298.257223563 ;
    crs:longitude_of_prime_meridian = 0. ;
// global attributes:
  :Conventions = "CF-1.8" ;
data:
 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 
    11, 15, 13, 11 ;
 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 
    25, 25, 25, 29, 25 ;
 geometry_container = 0. ;
 node_count = 25 ;
 part_node_count = 5, 4, 4, 4, 4, 4 ;
 interior_ring = 0, 1, 1, 1, 0, 0 ;
 crs = 0. ;

I think your timeseries example should appear here (not in Appendix E).
====

JonathanGregory · 2017-06-04T12:31:20Z

[[spatial-geometries, Section 7.5, "Spatial Geometries"]]
=== Spatial Geometries

For many geospatial applications, data values (e.g., the average monthly rainfall in the UK) are associated with a [spatial] geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. Polygonal cells with an arbitrary number of multiple vertices are geometries, and can be described using <>, but in that case every cell must have the same number of vertices. In contrast, each [spatial] geometry associated with a given data variable may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be multipart i.e. including several disjoint parts (e.g., the different islands of the UK). Other geometry types exist and may be introduced in a later version of the CF convention. The approach described here specifies how to encode such geometries following the pattern in 9.3.3 Contiguous ragged array representation and attach them to variables in a way that is consistent with the cell bounds approach.

All geometries are made up of one or more nodes. As in other geospatial data formats, geometries are encoded as ordered sets of nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. (Parametric geometries or otherwise curved features may be supported in the future.)

The geometry type specifies the set of topological assumptions to be applied to relate the nodes. Multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except that a polygon is completed by connecting the last node back to the first. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as exterior or interior polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry.

Any data variable can be given a geometry attribute that indicates the geometry for the quantity held in the variable. One of the dimensions of the data variable must be the number of geometries to which the data applies. If the data variable has a discrete sampling geometry, the number of geometries is the instance dimension (Section 9.2).

==== Geometry Variables and Attributes

A geometry container variable acts as a container for attributes that describe a set of geometries. The geometry attribute of the data variable contains the name of a geometry container variable. The geometry container variable must hold geometry_type and node_coordinates attributes.

The geometry_type attribute indicates the type of geometry present. Its allowable values are: point, multipoint, line, multiline, polygon, multipolygon.

The node_coordinates attribute contains the blank delimited names of the variables that contain geometry node coordinates (one variable for each spatial dimension).

The geometry node coordinate variables must each have a cf_role attribute, whose allowable values are geometry_x_node, geometry_y_node, and geometry_z_node. The geometry node coordinate variables must all have the same single dimension, which is the total number of nodes in all the geometries. The nodes must be stored consecutively for each geometry and and in the order of the geometries, and within each multipart geometry the nodes must be stored consecutively for each part and in the order of the parts. Polygon exterior rings must be put in anticlockwise order (viewed from above) and polygon interior rings in clockwise order. They are put in opposite orders to facilitate calculation of area.

For all geometry types except point, in which each geometry contains a single node, the geometry container variable must have a node_count attribute that contains the name of a variable indicating the count of nodes per geometry. For a multipart geometry, the node count is the total for all the parts. The single dimension of the node count variable should be the number of geometries.

For multiline and multipolygon geometries ( including polygons with holes), the geometry container variable must have a part_node_count attribute that contains the name of a variable indicating the count of nodes per geometry part. Note that because multipoint geometries always have a single node per part, the part_node_count is not required. The single dimension of the part node count variable should equal the total number of parts in all the geometries.

For multipolygon geometries with holes, the geometry container variable must have an interior_ring attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not. This interior ring variable should contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon. The single dimension of the interior ring variable should be the same dimension as that of the part node count variable.

==== Encoding Geometries

[[complete-multipolygon-example]]
[caption="Example 7.14. "]
.A multipolygon with holes

This example demonstrates the use of all potential attributes and variables for encoding geometries.

dimensions:
  node = 25 ;
  feature = 1 ;
  part = 6 ;
variables:
  double x(node) ;
    x:units = "degrees_east" ;
    x:standard_name = "longitude" ;
    x:cf_role = "geometry_x_node" ;
  double y(node) ;
    y:units = "degrees_north" ;
    y:standard_name = "latitude" ;
    y:cf_role = "geometry_y_node" ;
  float geometry_container ;
    geometry_container:geometry_type = "multipolygon" ;
    geometry_container:node_count = "node_count" ;
    geometry_container:node_coordinates = "x y" ;
    geometry_container:crs = "crs" ; // I'm doubtful about this - see above
    geometry_container:part_node_count = "part_node_count" ;
    geometry_container:interior_ring = "interior_ring" ;
  int node_count(feature) ; // could be omitted since feature=1 - see above
    node_count:long_name = "count of coordinates in each feature geometry" ;
  int part_node_count(part) ;
    part_node_count:long_name = "count of nodes in each geometry part" ;
  int interior_ring(part) ;
    interior_ring:long_name = "type of each geometry part" ;
  float crs ;
    crs:grid_mapping_name = "latitude_longitude" ;
    crs:semi_major_axis = 6378137. ;
    crs:inverse_flattening = 298.257223563 ;
    crs:longitude_of_prime_meridian = 0. ;
// global attributes:
  :Conventions = "CF-1.8" ;
data:
 x = 0, 20, 20, 0, 0, 1, 10, 19, 1, 5, 7, 9, 5, 11, 13, 15, 11, 5, 9, 7, 5, 
    11, 15, 13, 11 ;
 y = 0, 0, 20, 20, 0, 1, 5, 1, 1, 15, 19, 15, 15, 15, 19, 15, 15, 25, 25, 29, 
    25, 25, 25, 29, 25 ;
 geometry_container = 0. ;
 node_count = 25 ;
 part_node_count = 5, 4, 4, 4, 4, 4 ;
 interior_ring = 0, 1, 1, 1, 0, 0 ;
 crs = 0. ;

====

ChrisBarker-NOAA · 2017-06-05T15:52:24Z

On a phone, so only a few comments for now. I have edited your AsciiDoc text. AsciiDoc is not the same as MarkDown, so GitHub can't render it correctly, but it's not *very* different. I'm pretty sure gitHub can render asciidoc. But in any case, if you want to know exactly how it's going to render, you need to use the same renderer and format. So there is really no point in putting markdown in here. If you want to work on the content and structure, just do it in plain text -- that's the whole point of these plain text-based markup formats. In my next posting to the issue, I paste in my edited AsciiDoc, If you want to make major changes to the entire doc, rather than commenting on bits, you're really better off forking the repo and making the changes and doing a pull request. That's exactly what the gitHub workflow is for,

…

-CHB

ChrisBarker-NOAA · 2017-06-05T15:59:14Z

Comment on the real content.,. Working on this did raise some substantive issues about your proposal, which I've commented on in the text. They are (a) I don't like the duplication of first and last points in a polygon. This is legacy from older geo formats: "You’ll also notice that the *first coordinate is repeated at the end* of each ring. There’s no particular reason why this is necessary besides GeoJSON’s heritage in older formats." In some, repetition of the point distinguishes between a polygon and a polyline. Not required here, but compatibility with other formats is s good thing. (b) Is it really necessary to use opposite orders for exterior and interior polygons? (c) It seems better to me to regard a polygon with holes as a multipolygon. A polygon with holes is a Different Thing than multiple polygons. For instance, what is the area of a polygon with a hole? Vs two polygons that happen to overlap? And using winding order makes processing easier. And again, compatibility with other formats. (d) If there is only a single geometry, the node count variable could be omitted. It could be, but it makes it easier to write processing code if it's always there.

…

-CHB

JonathanGregory · 2017-06-05T17:24:23Z

Dear Chris

I'm pretty sure gitHub can render asciidoc.

Maybe it can. How do you tell the issue that you're giving it AsciiDoc rather than MarkDown?

If you want to make major changes to the entire doc, rather than commenting on bits, you're really better off forking the repo and making the changes and doing a pull request. That's exactly what the gitHub workflow is for.

Yes, we discussed it above, and the conclusion was that I should comment on the pull request. However, I really do not think that's very human-friendly for editing text like this (not friendly to this human, anyway). To make things easy to appreciate, you want to see the inserted text, deleted text, and comments all in front of you. That is what I have tried to achieve above, using a bit of MarkDown. (I forgot to point out that I used italic for inserted text and ~~strikethrough~~ for deleted - hopefully obvious.) I think the workflow involving comments on pull requests would be appropriate just for correcting final errors or making minor insertions.

Duplication of first and last points in a polygon ... is legacy from older geo formats. ... compatibility with other formats is a good thing.

Yes, if it's free, but this has a cost (redundancy and possible inconsistency) so I'm not convinced. This is a different file format so if you're translating from another one you will have to do some work anyway. Deleting or inserting a duplicated point as part of the translation wouldn't be complicated.

A polygon with holes is a Different Thing than multiple polygons. For instance, what is the area of a polygon with a hole? Vs two polygons that happen to overlap?

I agree those are different things, but they're clearly distinguished by the interior and exterior ring marking. A polygon with a hole is a multipolygon where one is exterior and one interior. Two overlapping polygons are both exterior.

If there is only a single geometry, the node count variable could be omitted. It could be, but it makes it easier to write processing code if it's always there.

Maybe, but by that argument the part_node_count variable and the interior_ring variable should always be present too.

Best wishes

Jonathan

dblodgett-usgs · 2017-06-05T22:28:49Z

Dear Jonathan,

Thanks for getting back with us on this.

I think I've covered most of the comments with my personal disposition on them. This issue is getting a bit ungainly. I think that I will probably close it and move your comments to a google doc where we can actually do word by word diffing. (see my comment on this below) This level of review of the text is not supported by github, where we should probably just be vetting small changes and the actual asciidoc syntax. I'll wait to do this till tomorrow or the next day in case there's a real push to keep this review here...

GitHub Usage Response

I'm pretty sure gitHub can render asciidoc.

Comments only support markdown and the diff view of a pull request will never be "rendered". .adoc files do render in github. The best solution would probably to comment paragraph by paragraph in plain text in the diff view. I'll migrate your comments to that style as an example.

If you want to make major changes to the entire doc, rather than commenting on bits, you're really better off forking the repo and making the changes and doing a pull request. That's exactly what the gitHub workflow is for.

As I expected, for editing paragraphs of text, word-by-word diffing is desirable and it seems that the only way to get this is with some manual indication of what's been deleted and what's been inserted. (I suggested this above, but commenting line by line with this kind of thing as needed may be a good happy medium?)
Additionally, I think we may consider having contributions vetted in a more text-friendly system, such as google docs, or another more ad-hoc space then use this mechanism for review of the actual ascii doc syntax of agreed upon text. I've suggested it before, but this is a good example of how we are mixing purposes (content review and syntax review). I used google docs for a previous submission, would it be preferable to move this review there so we can use normal document track changes till the text is finalized?

Proposal Response .

Duplication of first and last points in a polygon ... is legacy from older geo formats. ... compatibility with other formats is a good thing.

I have the same stance as @ChrisBarker-NOAA on this. It's just a common way of being defensive in the data encoding. It is an extra bit of certainty that the linear ring is complete and wasn't corrupted on the way in or out of a file. I'm not attached, but the idea of multiple layers of certainty for protection against data corruption is a positive that you may not be thinking about? As another point of reference, the OGC Simple Features Access spec follows this convention as well.

A polygon with holes is a Different Thing than multiple polygons. For instance, what is the area of a polygon with a hole? Vs two polygons that happen to overlap?

I feel pretty strongly that we should NOT say that a polygon with a hole is a multipolygon.

A polygon is a thing with some area. You calculate the area of the outer ring and subtract the area of the inner rings. The shoelace formula relies on opposite ordering for this.
Being consistent with other geospatial formats is a big positive for adoptability and familiarity of a format.
This is also consistent with the standing OGC Simple Feature Access standard, which is the canonical reference for this kind of data at this point IMHO.

If there is only a single geometry, the node count variable could be omitted. It could be, but it makes it easier to write processing code if it's always there.

I have no problem allowing the node_count to be omitted.

I think we should omit the first sentence of this para because we don't need to explain the history of the convention.

I've seen the history of the convention described elsewhere (specifically in the DSG spec), and was following that. No problem removing the content there.

First, why should a grid mapping be required in the case of geometries when it's not otherwise?

As long as it can be required for data associated with real-world coordinates. I am uncomfortable allowing anyone to not-state their datum/projection assumptions when recording lat/lon or other real-world coordinates.

As far as I can see, the proposed crs attribute is the same thing. But why do we need this anyway, since there is always a data variable in the file?

While "crs" is the same thing, the grid_mapping variable name is problematic because we are not dealing with grids here. If it's objectionable, I'm happy to just use grid_mapping, but it could be confusing. Regarding putting a crs attribute on data variables, that has always confused me. Why not attach the spatial metadata to the spatial data directly, as we've suggested here - attaching a crs to the geometry container variable? It's indirect and just odd that the metadata goes on a data variable that references the spatial data. I have no problem following the existing convention, but it's a really odd artifact of CF that I've never really understood the logic for.

Best,

Dave

davidhassell · 2017-06-06T09:00:21Z

Dear Dave, Chris and Jonathan, A few comments on the proposal . I haven't though about github usage yet , but I will try a few text edit suggestions based on my comments below. I don't think that we need to have the cf_role attribute on the node variables. Their role is well defined by them being named geometry_container:node_coordinates attribute, in just the same way that auxiliary coordinates are identified from the coordinate attribute of a data variable. The use of cf_role in DSGs is different, I think - in that case there is nothing else to say how it relates to the data variable. It would be very useful to include a data variable with a geometry attribute in example 7.14 (A multipolygon, one part of which has holes). One thing that is not clear to me is how you would specify multiple geometries along a dimension. For example, if I have a data variable with shape 12x3 that contains three equal length timeseries of countrywide averages for the UK, Netherlands and France, how would I attach the three geometries defining the three countries?

Duplication of first and last points in a polygon ... is legacy from older geo formats. ... compatibility with other formats is a good thing. I have the same stance as @ChrisBarker-NOAA <https://github.com/chrisbarker-noaa> on this. It's just a common way of being defensive in the data encoding. It is an extra bit of certainty that the linear ring is complete and wasn't corrupted on the way in or out of a file. I'm not attached, but the idea of multiple layers of certainty for protection against data corruption is a positive that you may not be thinking about? As another point of reference, the OGC Simple Features Access <https://en.wikipedia.org/wiki/Shoelace_formula> spec follows this convention as well.

I'm not keen on the duplicating the start/end node. We don't duplicate points for other cyclic entities (e.g. global longitude coordinates) and if the repeated nodes were there, I'd feel duty bound to check them for equality, which is never nice with floats. A polygon with holes is a Different Thing than multiple polygons. For

instance, what is the area of a polygon with a hole? Vs two polygons that happen to overlap? I feel pretty strongly that we should NOT say that a polygon with a hole is a multipolygon. 1. A polygon is a thing with some area. You calculate the area of the outer ring and subtract the area of the inner rings. The shoelace formula <https://en.wikipedia.org/wiki/Shoelace_formula> relies on opposite ordering for this. 2. Being consistent with other geospatial formats is a big positive for adoptability and familiarity of a format. 3. This is also consistent with the standing OGC Simple Feature Access <https://en.wikipedia.org/wiki/Shoelace_formula> standard, which is the canonical reference for this kind of data at this point IMHO.

I'm happy with this definition - we should have some text that more precisely defines a polygon (happy to initialise).

First, why should a grid mapping be required in the case of geometries when it's not otherwise? As long as it can be required for data associated with real-world coordinates. I am uncomfortable allowing anyone to not-state their datum/projection assumptions when recording lat/lon or other real-world coordinates. As far as I can see, the proposed crs attribute is the same thing. But why do we need this anyway, since there is always a data variable in the file? While "crs" is the same thing, the grid_mapping variable name is problematic because we are not dealing with grids here. If it's objectionable, I'm happy to just use grid_mapping, but it could be confusing. Regarding putting a crs attribute on data variables, that has always confused me. Why not attach the spatial metadata to the spatial data directly, as we've suggested here - attaching a crs to the geometry container variable? It's indirect and just odd that the metadata goes on a data variable that references the spatial data. I have no problem following the existing convention, but it's a really odd artifact of CF that I've never really understood the logic for.

I don't favour a special rule for one type of bounds over another, so I would not like to see the crs as mandatory, nor attachable to the geometry container. Viewing the geometries as bounds, I think that we do have grid ... which raises the issue that a geometry is bounds without an associated coordinate value. This is new in CF. Is it desirable? I can see use for a representative location that is useful (e.g. the average of the nodes' lats and lons). All the best, David

…

-- David Hassell National Centre for Atmospheric Science Department of Meteorology, University of Reading, Earley Gate, PO Box 243, Reading RG6 6BB Tel: +44 118 378 5613 http://www.met.reading.ac.uk/

JonathanGregory · 2017-06-06T13:36:21Z

Dear Dave and David

Thanks for your comments.

I agree with David that a multipolygon example would be useful.

Regarding duplicating points, another argument is for consistency within, as David says. CF can already describe polygonal bounds for a cell. In that case, the points are ordered anticlockwise (like a polygon in the new convention) and the first and last point are joined, with no repeated point. I think it's more important for CF to be consistent with CF than with other conventions. CF generally avoids redundancy.

I could accept that a polygon with holes is different from a multipolygon, but I'd appreciate some further clarification. A single polygon with a single hole is a two-part geometry, needing a part_node_count, isn't it? So are two polygons. Thus, structurally, a polygon with holes is like a multipolygon.

Dave says "I am uncomfortable allowing anyone to not-state their datum/projection assumptions when recording lat/lon or other real-world coordinates." However, that is the situation in CF. Even for the real world, it is optional to state these assumptions, because they are not always relevant to the dataset in question. It wouldn't be consistent to take a different view in this particular part of the convention.

Consistency is also the reason for using the attribute for grid_mapping, not introduce a new attribute with exactly the same function. Also CRS isn't an accurate description of the grid mapping variable, which contains only part of the CRS. David has thought about this in most detail.

Best wishes

Jonathan

dblodgett-usgs · 2017-06-06T18:56:40Z

Dear Jonathan and David,

I can accept the consistency within CF argument, and don't think it's a big enough problem to allow polygons to not self-close.

Will attempt to petter define geometry such that a single-part geometry can have holes.

The CRS issue is an old horse I've been riding for a long time. My issue is that:

Even for the real world, it is optional to state these assumptions, because they are not always relevant to the dataset in question.

is simply not true. There are no real world coordinates that do not require a statement of the reference system they use. My stance on the issue is that, if a grid_mapping is to be optional, CF should have a default, such as WGS84. In addition, if people are using a coordinate system that has no connection to the earth, then that should be explicit by defining a grid_mapping with grid_mapping_name "non_earth_coordinates" or something. However, this is not the issue at hand and I'm fine following the existing convention on this front for now.

Still curious about my suggestion re: modifying text in google docs or another editor designed for collaborative writing (not coding)... thoughts?

Best,

Dave

ChrisBarker-NOAA · 2017-06-06T20:57:20Z

I can accept the consistency within CF argument, and don't think it's a big enough problem to allow polygons to not self-close.

I agree here -- I think we should think of this as a way to represent OGC-standard data in netCDF in a CF-compatible way, rather than "A way to specify geometries in CF"

I have enough trouble getting GIS folks to deal with netcdf as it is....

Will attempt to petter define geometry such that a single-part geometry can have holes.

Good idea, and good to provide a Multipolygon (and polygons with holes) example.

And if we do keep the "OGC compatible" emphasis, then we can point folks to OGC standards for reference.

is simply not true. There are no real world coordinates that do not require a statement of the reference system they use. My stance on the issue is that, if a grid_mapping is to be optional, CF should have a default, such as WGS84.

I agree -- It's probably too late to require that it be specified, but using WGS84 as an official default is a good way to capture (most) use-cases.

There is no such thinking as "no reference system", there is only sloppiness. :-)

In addition, if people are using a coordinate system that has no connection to the earth, then that should be explicit by defining a grid_mapping with grid_mapping_name "non_earth_coordinates" or something.

and an "unknown" DRS may be needed for when people are pulling data from another source that was already sloppy...

Still curious about my suggestion re: modifying text in google docs or another editor designed for collaborative writing (not coding)... thoughts?

Google docs is a great way to collaborate on a document -- much better than passing MSWord files around, etc.

But one of the great things about asciidoc (and it's like) is that they're plain text, just like code, and thus tools designed for code work will with it -- git, github, diff, etc....

So the gitHub workflow is actually a pretty good one for this:

If you want to comment on specific bits of a doc, put line by line comments in the Pull Request.
If you want to make large changes to a page, then fork (or branch) the repo and make your changes. Then do a PR -- In the PR, the changes can be seen in diff view, which is a good tool for review.

Also, it looks like asciidoc supported comments:

http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/#comments

so you can add comments and/or comment out chunks of text, rather than deleting them.

All that being said, if there are 2-3 people that want to collaborate on it in Google docs, then bring it back into gitHub, go for it.

What will be lost is the process -- but that probably doesn't matter.

dblodgett-usgs · 2017-06-07T05:50:15Z

The issue with comments on documents (as opposed to code) is that documents have lines of text that are much more than the typical 60-80 characters that's typical of code. GitHub is optimized for syntax and short-line commenting. It works well if comments are conceptual or like, "can you reword this?", but not so good for detailed editing such as Jonathan's latest. For now, I'll close this PR, reconcile Jonathan's suggestions into a new version of the doc and open a new PR that references this one. This comment string is enough to rinse and repeat with a reference back here for old conversation.

davidhassell · 2017-06-08T09:01:41Z

Hi Dave,

Does this closing this mean that discussion is currently not possible, and will opening a new pull request mean that the discussion will ultimately be split across different threads?

Thanks,

David

ChrisBarker-NOAA · 2017-06-08T15:47:25Z

@dblodgett-usgs wrote:

The issue with comments on documents (as opposed to code) is that documents have lines of text that are much more than the typical 60-80 characters that's typical of code. GitHub is optimized for syntax and short-line commenting.

True, but:

if you have a document that has entire paragraphs as single line,s it makes it easy to comment on a paragraph, so not so bad, really.

Ans ascidoc and other markups don't assign significance to a single line break, so you can, (and should) use moderate sized lines anyway.

But this by no means needs to be resolved on this issue, and at the end of the day any method that works for the folks taking part is the one to go with.

ChrisBarker-NOAA · 2017-06-08T15:50:36Z

@davidhassell:

closing the issue means it shows up as closed when you search for it. But you can still add to it, and what you add will be captured for posterity (like this one...)

Opening a new PR will create a new discussion thread for that PR -- so yes, the discussion will be split, but I think that's probably a good thing -- folks can look here for the history, but the new PR will have discussion on what will hopefully be a close to final proposal, having included all this discussion.

dblodgett-usgs · 2017-06-08T16:05:58Z

And ascidoc and other markups don't assign significance to a single line break, so you can, (and should) use moderate sized lines anyway.

Good point! I hadn't thought about that. I'll format my next PR like that and we can try it out.

…ventions into reshel-109

Added sentence to introductory part of Section 4.3 beginning with "Optionally" Added "Recommendations" sentence after the paragraph starting with "Optionally" of Section 4.3

Added line to Section 4.3, Recommendations

simple geometry chapter 7 and appendix e.

94875c5

dblodgett-usgs closed this May 3, 2017

dblodgett-usgs mentioned this pull request May 3, 2017

Simple Geometry Contribution and github test case #112

Closed

dblodgett-usgs reopened this May 3, 2017

latest changes merged in

c5be80f

dblodgett-usgs commented May 4, 2017

View reviewed changes

davidhassell reviewed May 5, 2017

View reviewed changes

rsignell-usgs reviewed May 5, 2017

View reviewed changes

addressing comment from @rsignell-usgs re: instance and feature dimen…

54fc86c

…sions

dblodgett-usgs closed this Jun 7, 2017

dblodgett-usgs mentioned this pull request Jun 8, 2017

Geometries Contribution (and github experimentation) #115

Merged

reshel3 pushed a commit to reshel3/cf-conventions that referenced this pull request Jun 29, 2017

Merge branch 'cf-convention#109' of https://github.com/reshel3/cf-con…

5c9cd6a

…ventions into reshel-109

marqh pushed a commit to marqh/cf-conventions that referenced this pull request Jul 19, 2017

cf-convention#109

fd4ac99

Added sentence to introductory part of Section 4.3 beginning with "Optionally" Added "Recommendations" sentence after the paragraph starting with "Optionally" of Section 4.3

erget pushed a commit that referenced this pull request Dec 27, 2019

Ticket #109

667e0b2

Added line to Section 4.3, Recommendations

Simple Geometry Contribution #109

Simple Geometry Contribution #109

Conversation

dblodgett-usgs commented Apr 22, 2017

JonathanGregory commented May 2, 2017

dblodgett-usgs commented May 3, 2017

ChrisBarker-NOAA commented May 3, 2017 • edited by rsignell-usgs Loading

ChrisBarker-NOAA commented May 3, 2017

dblodgett-usgs commented May 3, 2017

davidhassell commented May 3, 2017 via email

dblodgett-usgs commented May 3, 2017 • edited Loading

ChrisBarker-NOAA commented May 3, 2017 via email

JonathanGregory commented May 4, 2017

dblodgett-usgs May 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dblodgett-usgs commented May 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameronsmith1 commented May 5, 2017

ChrisBarker-NOAA commented May 5, 2017

cameronsmith1 commented May 5, 2017

ChrisBarker-NOAA commented May 5, 2017

cameronsmith1 commented May 5, 2017

ChrisBarker-NOAA commented May 5, 2017 via email

cameronsmith1 commented May 6, 2017 via email

dblodgett-usgs commented May 6, 2017 via email

ChrisBarker-NOAA commented May 7, 2017 via email

cameronsmith1 commented May 8, 2017 via email

ChrisBarker-NOAA commented May 8, 2017

cameronsmith1 commented May 8, 2017

ChrisBarker-NOAA commented May 8, 2017

dblodgett-usgs commented May 9, 2017

dblodgett-usgs commented May 9, 2017

rsignell-usgs commented May 9, 2017 • edited Loading

JonathanGregory commented May 9, 2017

JonathanGregory commented Jun 4, 2017

JonathanGregory commented Jun 4, 2017

[[complete-multipolygon-example]] [caption="Example 7.14. "] .A multipolygon with holes

JonathanGregory commented Jun 4, 2017

[[complete-multipolygon-example]] [caption="Example 7.14. "] .A multipolygon with holes

ChrisBarker-NOAA commented Jun 5, 2017 via email

ChrisBarker-NOAA commented Jun 5, 2017 via email

JonathanGregory commented Jun 5, 2017

dblodgett-usgs commented Jun 5, 2017

GitHub Usage Response

Proposal Response .

davidhassell commented Jun 6, 2017 via email

JonathanGregory commented Jun 6, 2017

dblodgett-usgs commented Jun 6, 2017

ChrisBarker-NOAA commented Jun 6, 2017

dblodgett-usgs commented Jun 7, 2017

davidhassell commented Jun 8, 2017

ChrisBarker-NOAA commented Jun 8, 2017

ChrisBarker-NOAA commented Jun 8, 2017

dblodgett-usgs commented Jun 8, 2017

ChrisBarker-NOAA commented May 3, 2017 •

edited by rsignell-usgs

Loading

dblodgett-usgs commented May 3, 2017 •

edited

Loading

dblodgett-usgs May 4, 2017 •

edited

Loading

rsignell-usgs commented May 9, 2017 •

edited

Loading

[[complete-multipolygon-example]]
[caption="Example 7.14. "]
.A multipolygon with holes

[[complete-multipolygon-example]]
[caption="Example 7.14. "]
.A multipolygon with holes