-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple Geometry Contribution #109
Conversation
Dear Dave Thanks for doing this. I think it's in pretty good shape but I have quite a few detailed comments and suggestions to make, nearly all on the proposed text rather than the convention itself. I'm not sure how to do that in GitHub, so it's a useful exercise to see how this would work. If your proposal was in trac, I would reply to your posting on the ticket, edit the wiki-markup text to show my changes and suggestions for the parts affected, and repost it to the trac ticket. If it was on a wiki, I would make a copy of it on the wiki page, and edit it similarly. The way GitHub is set up, I suppose the natural way to do it is to make a new branch and edit that, but (a) I don't know how to do that, (b) it's not obvious to me that the changes I suggested would be clear to you. In fact I find the proposal in this form not as easy to follow as it would be in trac. I can view it as deltas of the files, but these have little context and are hard to read as text because the markup isn't translated, or I can read the properly rendered modified files, but these don't show what's been changed, and of course they show much more that isn't affected, especially as it's in several different complete files. So I'm inclined to think that it would be easier to use GitHub issues in the same way as we use trac. That is, you would post your entire proposed text to the "issue". Then I presume I could copy your posting and edit it, as in trac. Unfortunately, the markup isn't the same, is it - the issues use markdown, I believe, whereas the convention text uses AsciiDoc. This is a technical obstacle. Is there an automatic translator? If not, once the text is agreed, it would have to be manually transposed into the conventions document, as we have been doing from agreed trac tickets. You and others have much more experience with this, so perhaps you could suggest how to proceed, both in general with using GitHub issues (but that might be better on ticket 160) and specifically to give you detailed suggestions for your text. Best wishes and thanks Jonathan |
Dear Jonathan, I agree that this is a useful test case for how we might do this in github. I somewhat intentionally submitted this in a way that would provoke such discussion. I've been pondering this a bit and I think that the path of least resistance may be to submit an issue rather than a pull request highlighting additions or changes. I'll close this pull request and open an Issue instead. That way we can use markdown to document text review. My suggestion would be to use Once the text is finalized in an issue, someone would submit a pull request like this one that includes the finalized content. That way the PR review will be about the details of the asciidoc and not the details of the proposed change to the CF spec information content. Best,
|
hmm -- I'm going to have to disagree here -- if there is more than a small amount of text, then a @JonathanGregory posted in a TRAC ticket, which then ended up on the mailing list, where I read it, but I can't comment on TRAC tickets, so I'll address the issues here:
With a PR, it is easy to make comments on individual lines of text -- this is a really nice way to intersperse comments and particularly when it is small word-smithing issue or whatever. And you can also post larger comments that are not tied to individual parts of the text. Not sure what the confusion on this is -- this particular use case is really well supported by gitHub PRs and not nearly as well supported by dumping a large block of text in an issue. Go to the "Files Changed" tab on the PR, and you can click on any line of text and add a comment there.
A workflow like this would be the way to go if you wanted to suggest large changes to the text. IN this case, you are moving to using git (rather than gitHub per-se) features -- then you would want to create a new branch or fork, and make your changes, and then submit them as a PR. This required a fair bit of git-fu -- but yes, the changes would be very clear. Perhaps we should write up a tutorial on how to do that. BTW -- there are (kinda) two workflows for this kind of thing. In general, gitHub supports two kinds of contributions: (A) Contributions from the "core team" -- in that case, these folks have permissions to push to the main gitHub repository. Then they can either: directly make a change, or (better option for non-tirivial changes) create a branch, make the changes there, and then do a PR to merge the branch back into master -- it can then be reviewed and discussed by the team before merging. (B) Contributions from non-members of the core team. In this case, the workflow is to fork the repository, make the changes in that fork, and do a PR to the main repo to merge. The changes are then discussed and reviewed, and can be merged in when ready. These seem very similar, but (B) can be a quite a bit more awkward if others want to do more than comment, and actually go in and make changes to the file in question. If you are using workflow (A) then any core member can pull the branch, and look at it, and make changes an push them up -- so a smaller group can collaborate on a set of changes easily. If you are using workflow (B) (as is the case here), the then "new" stuff is in a fork of the repo in someone else's gitHub account (@dblodgett-usgs in this case). So no one else has permission to edit that -- we can comment on the PR, but only the proposer can actually change anything. So if you do want to edit the doc itself and suggest those changes, you need to fork the forks' repo, and do it there, and then submit a PR to that repo -- this gets pretty complicated, though It's not actually that hard to do.
I guess ASCIIDOC is not as fabulously readable as we'd like :-( As for the context, that is a gitHub limitation -- you can either see only the changes, without much context (but able to comment on it line by line) -- or see the whole file, but then teh cnages aren't highlighted and you can't comment line by line. However, I find it only slightly painful to switch back and forth -- look at the changes, and if I need context, switch to the whole file to see what's going on and then back again to comment.
I can't really see how this is made better by jamming all the text into the issue. -- now to see the changes, the proposer is going to have to mark it up by hand somehow....
It looks like gitHub does know how to render asciidoc: BTW, you can open different "views" of the same PR in multiple tabs in your browser. This can be very helpful if you want to look at the changes and make comments, and also switch quickly and easily to the full page to see context, for example. In short -- let's give gitHub a chance! -- This kind of PR review is one of the features we WANTED from gitHub. |
@dblodgett-usgs wrote:
That's exactly the kind of hand-editing that I think is a big waste of time -- let the computer tell you what's changed! This is a lot of work for the proposer
I guess I'm confused about where this proposal is at -- if it's still about the CF spec issues, then yes, a Issue may be a better way to discuss -- but then there shouldn't be any asciidoc required at all -- or ADDING markup for changes, etc. When Jonathan wrote:
That sounds to me like we ARE talking about details of text and format, and a PR is the best way to manage that stage of the process. |
This is all really good analysis. Thanks! I may have (probably did) swing too far with hand-markup of text changes. Two things: I totally forgot about line-by-line comments in the PR diff. I think that may be the solution @JonathanGregory is looking for. This proposal is at the point of wordsmithing and minor details. It does not HAVE to be at the point of a formatted contribution to the repository that get's pushed out to cfconventions.org. I'm not sure that distinction is worth getting hung up on, but maybe it is? So, I'll reopen this and we can give commenting on the diff a chance. |
Hi David and Chris,
I'm a bit lost - how can we make/view diffs of proposed text, such as that
given at #112
Thanks!
David
…On 3 May 2017 at 18:02, David Blodgett ***@***.***> wrote:
This is all really good analysis. Thanks! I may have (probably did) swing
too far with hand-markup of text changes.
Two things:
I totally forgot about line-by-line comments in the PR diff. I think that
may be the solution @JonathanGregory <https://github.com/JonathanGregory>
is looking for.
This proposal is at the point of wordsmithing and minor details. It does
not HAVE to be at the point of a formatted contribution to the repository
that get's pushed out to cfconventions.org. I'm not sure that distinction
is worth getting hung up on, but maybe it is?
So, I'll reopen this and we can give commenting on the diff a chance.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHwAcJro-80e-nLq3Ad7SAlhfmTeSnphks5r2LMqgaJpZM4NFFoo>
.
--
David Hassell
National Centre for Atmospheric Science
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243, Reading RG6 6BB
Tel: +44 118 378 5613
http://www.met.reading.ac.uk/
|
Line by line comments are possible in the diff view of a pull request. Click on the line number on this page: https://github.com/cf-convention/cf-conventions/pull/109/files?diff=unified . |
On Wed, May 3, 2017 at 10:11 AM, David Hassell <notifications@github.com>
wrote:
Hi David and Chris,
I'm a bit lost - how can we make/view diffs of proposed text, such as that
given at #112
That is an "Issue", rather than a PR (Pull Request) -- and there is not way
to make/view/comment on diffs in an issue -- and Issue is really just a
conversation.
Which is why, for this kind of thing, a PR is a good option -- in that
case, someone is proposing actuall cahnges to the doc, and you can see teh
changes, comment on them line by line, etc:
if you go here:
#109
then click on the "Files Changed" tab -- you will see the changes, and can
comment on them.
There have got to be a lot of tutorials on the web about the details...
…-CHB
Thanks!
David
On 3 May 2017 at 18:02, David Blodgett ***@***.***> wrote:
> This is all really good analysis. Thanks! I may have (probably did) swing
> too far with hand-markup of text changes.
>
> Two things:
>
> I totally forgot about line-by-line comments in the PR diff. I think that
> may be the solution @JonathanGregory <https://github.com/JonathanGregory
>
> is looking for.
>
> This proposal is at the point of wordsmithing and minor details. It does
> not HAVE to be at the point of a formatted contribution to the repository
> that get's pushed out to cfconventions.org. I'm not sure that
distinction
> is worth getting hung up on, but maybe it is?
>
> So, I'll reopen this and we can give commenting on the diff a chance.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#109 (comment)-
298972768>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AHwAcJro-80e-
nLq3Ad7SAlhfmTeSnphks5r2LMqgaJpZM4NFFoo>
> .
>
--
David Hassell
National Centre for Atmospheric Science
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243, Reading RG6 6BB
Tel: +44 118 378 5613 <+44%20118%20378%205613>
http://www.met.reading.ac.uk/
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#109 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA38YFt4Xl95LcHnesAgbTegxa192Bdqks5r2LUmgaJpZM4NFFoo>
.
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
|
Thanks for this discussion. In trac, I would have to do quite a lot of work as well to propose changes and make comments in a clear way, and I'd of course prefer it to be automated as far as possible. What I want to do in this case is about word-smithing, and just a few points about the detail of the proposal. Writing it all as comments is inefficient e.g. "I suggest changing A to B". It is much easier for me to change A to B in the text, and if it's not obvious why I would have suggested that, I would add a comment as well. So I want to make changes and comments, like one would in a Word document with tracked changes. Of course I am not suggesting that dreadful and hideous alternative, but is there a way to do this in GitHub? So far it sounds like that can't be done without forking a branch, but I fear that would be too much hard work for everyone who might want to contribute to debates about CF text as they do in trac tickets. |
[[spatial-geometries, Section 7.5, "Spatial Geometries"]] | ||
=== Spatial Geometries | ||
|
||
For many geospatial applications, data values are associated with a spatial geometry (e.g., the average monthly rainfall in the UK). Although cells with an arbitrary number of multiple vertices can be described using <<cell-boundaries>>, spatial geometries contain an arbitrary number of nodes for each geometry and include line and __multipart__ geometries (e.g., the different islands of the UK). The approach described here specifies how to encode such geometries following the pattern in **9.3.3 Contiguous ragged array representation** and attach them to variables in a way that is consistent with the cell bounds approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For @JonathanGregory If you have a comment on a small substring of a paragraph it could be done like:
Although cells with an arbitrary number of multiple vertices can be described using
to
Although cells with an arbitrary number of vertices can be described using
Or, just copy the whole paragraph and rewrite as needed.
Or, this could just be a general comment indicating how you think this should be rewritten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I (someone who doen't have write access to the repo) make comment in this manner? I tried via https://github.com/cf-convention/cf-conventions/pull/109/files?diff=unified but couldn't work it out.
Thanks, David
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you roll over the line numbers, you will see a +
appear. Click the +
and it opens a comment box. That should be available to anyone if the repository is public.
I think the easiest might be to use this view select the line you want to modify, and go to town. I've added a comment over there. Here's a link to the comment. |
y = 10, 30, 40, 60, 50 ; | ||
---- | ||
The time series variable, someData, is associated with line geometries via the geometry attribute. The first line geometry is comprised of three nodes, while the second has two nodes. Client applications unaware of CF geometries can fall back to the lat and lon variables to locate feature instances in space. In this example, lat and lon coordinates are identical to the first node in each line geometry, though any representative point could be used. | ||
==== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test comment - please ignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dblodgett-usgs That's great - thanks
ch07.adoc
Outdated
|
||
All geometries are made up of one or more nodes. The geometry type specifies the set of topological assumptions to be applied to relate the nodes. For example, __multipoint__ and __line__ geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except the first and last nodes must be identical for polygons. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as __exterior__ or __interior__ polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry. | ||
|
||
While this geometry encoding is applicable to any variable that shares a dimension with a set of geometry, the application it was originally designed for requires that geometry be joined to the instance dimension of a Discrete Sampling Geometry `timeSeries` featureType. In this case, any data variable can be given a `geometry` attribute that is to be interpreted as the representative geometry for the quantity held in the variable. An example of this is areal average precipitation over a watershed. An example of line geometry with time series data is given in <<appendix-cell-methods>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the instance
dimension be renamed the features
dimension? I was very confused about what this dimension was until I read the caption below the example, where I learned this was the number of simpleFeatures in the dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance
term is taken from the DSG spec. I would be in favor of renaming it the feature
dimension, but that would require changes to the DSG spec in addition to this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah @dblodgett-usgs , thanks. I see that for DSG trajectories (CF-1.5 Appendix H.5) it says:
The instance dimension in the case of trajectories specifies the number of trajectories in the collection and is also referred to as the trajectory dimension.
So maybe we could say:
"The instance dimension in the case of features specifies the number of features in the collection and is also referred to as the feature dimension."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose. Yeah. I'll work through the text and make a change to that affect pendingfeedback from @JonathanGregory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated spec incoming to this PR. I've added the feature dimension text and changed "instance" to "feature" in the cdl.
I had assumed that edits to CF are frequent enough and complicated enough, that it would make sense to make a branch for each proposed change, on which all of the discussions and changes will occur, which then get merged to the master version once the final version is agreed to. (note: I am familiar with github, but am not an expert) |
I think "complicated enough" is key -- if's a typo fix, then don't bother, but larger edits should get a branch. However, you can only really do a branch if yu have commit rights to the main repo. So anyone that doesn't can to a PR instead, which is pretty much the same thing. (and a PR can be merged into a branch as well -- it doesn't have to go straight to master -- if there is a big change that needs further review by the core contributors. |
Hi @ChrisBarker-NOAA , I normally think of a PR as the mechanism to merge one branch with another (usually on to the master branch). How does a PR work without a branch? Is this discussion actually happening on a branch, because it is a little different than I am used to? |
A PR can be for a merge of one branch to another within the same project -- but that can only be done by people that have write permissions to the central repo. A PR from a fork, like this one, is essentially the same, but the different "branch" is in someone else's repo (in this case dblodgett-usgs's account) as it is a different repo, it doesn't matter what it's called "master" in dblodgett-usges' repo is separate from "master" in this repo. clear as mud? |
@ChrisBarker-NOAA , so the real distinction is between a branch and a fork? |
@ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real
distinction is between a branch and a fork?
yup.
|
Thanks. BTW, I am submitting this reply by responding via email, to show what it looks like.
-----------------------------------------------------------------------------------
Philip Cameron-Smith, pjc@llnl.gov, CliMA Group Leader, Lawrence Livermore Nat. Lab.
-----------------------------------------------------------------------------------
From: Chris Barker [mailto:notifications@github.com]
Sent: Friday, May 05, 2017 2:30 PM
To: cf-convention/cf-conventions <cf-conventions@noreply.github.com>
Cc: Cameron-smith, Philip <cameronsmith1@llnl.gov>; Comment <comment@noreply.github.com>
Subject: Re: [cf-convention/cf-conventions] Simple Geometry Contribution (#109)
@ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real
distinction is between a branch and a fork?
yup.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#109 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIkyUEKhm2hNX0wCNFE8IFQ9hBt7laZcks5r25TkgaJpZM4NFFoo>.
|
The group needs to decide if a branched workflow is going to be helpful. I prefer the fork / pull request onto master workflow because it grants full control of the fork to the person making changes, but I’d be open to other alternatives as discussed above.
Note that some email client reply headers work better with GitHub than others. My Mac mail client is usually pretty tidy, but it looks like whatever is used at LLNL isn’t parsed out nicely by GitHub.
… On May 5, 2017, at 7:13 PM, cameronsmith1 ***@***.***> wrote:
Thanks. BTW, I am submitting this reply by responding via email, to show what it looks like.
-----------------------------------------------------------------------------------
Philip Cameron-Smith, ***@***.***, CliMA Group Leader, Lawrence Livermore Nat. Lab.
-----------------------------------------------------------------------------------
From: Chris Barker ***@***.***
Sent: Friday, May 05, 2017 2:30 PM
To: cf-convention/cf-conventions ***@***.***>
Cc: Cameron-smith, Philip ***@***.***>; Comment ***@***.***>
Subject: Re: [cf-convention/cf-conventions] Simple Geometry Contribution (#109)
> @ChrisBarker-NOAA <https://github.com/ChrisBarker-NOAA> , so the real
> distinction is between a branch and a fork?
>
yup.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#109 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIkyUEKhm2hNX0wCNFE8IFQ9hBt7laZcks5r25TkgaJpZM4NFFoo>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABbHQ8GtO2IKDSnABaZh2FtBML0BrZ8Kks5r27sSgaJpZM4NFFoo>.
|
The group needs to decide if a branched workflow is going to be helpful. I prefer the fork / pull request onto master workflow because it grants full control of the fork to the person making changes,
Wry good point -- I think which is best is situation dependent -- a
fork-based workflow provides better control to the proposer. A
branched-based workflow provides easier collaboration on a change.
Got is very flexible though, so if there is a small subgroup that
wants to collaborate on a change, then multiple people can be granted
access to the forked version.
I tend to prefer the branch-beaded workflow, but it does require
giving permission to more people to push to the many repo, which may
be too much admin overhead.
…-CHB
|
When I was looking at the current Pull Request in Github, there were various options that it wouldn't let me access because I didn't have permission (presumably because it is a fork). What would it take to give people permission to edit the current Fork/PR, so people can see the editing features that will be available?
(FYI, I am responding by email again, but this time I am stripping off everything else in my email.)
|
The form belongs to @dblodgett-usgs -- so he would have to give yu permission to edit his repo. Or you can fork his fork -- make changes and do a PR against his fork -- this gets complicated fast! -CHB |
As I understand it, then, there are 4 options for handling changes:
Personally, I think (1) will be limiting, and (4) would get very messy and confusing for a large and controversial change. |
agreed, though:
I agree - (4) is pretty unweildy. For small changes, (1) is fine. For larger changes, (3) should work. |
I'll make the observation that, for most large submissions, only one person is going to be the editor of the text. So, like in this case, putting in a PR from a fork (owned by the submitting editor) works well. In this case, I forked the repository, added my draft contribution, had two collaborators review it (which you can see in these two closed PRs), then submitted the PR here. My intention is to take all the comments I receive and reconcile them into the open PR with a new commit. When I do that, you will see that comments on old (now stale) content will be hidden and the review process can continue fresh. So the way I've done this, which has been completely natural and easy, has been a mix of 1 (for community review) and 4 (for peer review prior to the community). Note that for this proposal we had significant community review on the email list and in another github repository / google docs as well prior to getting to this point. We are NOT starting from scratch with a github pull request. On a somewhat unrelated not, I don't think pushing to the shared group repository should ever be allowed. Curious how those who have been maintaining this repo feel about it. IMHO, all changes should come in through pull requests. This keeps the requirement for review on all community members. Pulling commits from/to each other's forks is not all that hard once you have multiple remote repositories in your local copy. Finally, I'm looking forward to @JonathanGregory's line by line critique of this work. After all this conversation, it seems that the pull request review feature is really what you are looking for. When you go to make a comment on a line there is an option to "start a review". It would be great to have one comprehensive review from you. Maybe we would have a comment that at-least two community members have completed a peer review of the material prior to merging? |
Note that there is now an "outdated" comment thread from @rsignell-usgs above. The commit just above this comment addresses his comment about the use of "instance" dimension for "features". Open that commit to see what was changed. That commit is also now part of the total PR diff visible at the "files changed" link at the top of this page. |
👍 |
OK, thanks for all this discussion. I'll try to find time to do this in the next week or so. I'm a bit anxious that this might seem more complicated and consequently off-putting than trac would be for the average CF user who wants to propose a change to the convention text. But I need to try it to see how it works. |
Dear Dave At last I have managed to make time to work on your proposal. I'm sorry it took me so long. It turned out that I had a lot of suggestions to make, and I have edited it extensively. However, these changes are nearly all about presentation, clarity, logical order (to my mind - you may disagree) and removing repetition. It didn't seem possible to do this with GitHub comments so instead I have edited your AsciiDoc text. AsciiDoc is not the same as MarkDown, so GitHub can't render it correctly, but it's not very different. In my next posting to the issue, I paste in my edited AsciiDoc, with the sole extra change of using four tildes to delimit verbatim code, as required by MarkDown. The AsciiDoc markup for headings doesn't work. Having gone through this exercise, I wonder whether in future it would be best to develop conventions changes in GitHub in MarkDown format, and convert it to AsciiDoc when it's agreed. There is at least one program ( In my third posting to the issue, I've given my revised text without the deletions and comments. I derived it this with Working on this did raise some substantive issues about your proposal, which I've commented on in the text. They are (a) I don't like the duplication of first and last points in a polygon. (b) Is it really necessary to use opposite orders for exterior and interior polygons? (c) It seems better to me to regard a polygon with holes as a multipolygon. (d) If there is only a single geometry, the node count variable could be omitted. (e) I have two concerns with the part about CRSs. Best wishes Jonathan |
[[spatial-geometries, Section 7.5, "Spatial Geometries"]]
For many geospatial applications, data values (e.g., the average monthly rainfall in the UK) are associated with a [spatial] geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. Polygonal cells with an arbitrary number of multiple vertices are geometries, and can be described using <>, but in that case every cell must have the same number of vertices. In contrast, each [spatial] geometry associated with a given data variable may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be
All geometries are made up of one or more nodes. As in other geospatial data formats, geometries are encoded as ordered sets of nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. (Parametric geometries or otherwise curved features may be supported in the future.)
The geometry type specifies the set of topological assumptions to be applied to relate the nodes. Multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except
==== Geometry Variables and Attributes
The The
The
For all geometry types except point, in which each geometry contains a single node, the geometry container variable must have a
For multiline and multipolygon geometries (
For
==== Encoding Geometries
[[complete-multipolygon-example]]
|
[[spatial-geometries, Section 7.5, "Spatial Geometries"]] For many geospatial applications, data values (e.g., the average monthly rainfall in the UK) are associated with a [spatial] geometry, which is a spatial representation of a real-world feature, for instance a time-series of areal average precipitation over a watershed. Polygonal cells with an arbitrary number of multiple vertices are geometries, and can be described using <>, but in that case every cell must have the same number of vertices. In contrast, each [spatial] geometry associated with a given data variable may have a different number of nodes, the geometries may be lines (as alternatives to points and polygons), and they may be multipart i.e. including several disjoint parts (e.g., the different islands of the UK). Other geometry types exist and may be introduced in a later version of the CF convention. The approach described here specifies how to encode such geometries following the pattern in 9.3.3 Contiguous ragged array representation and attach them to variables in a way that is consistent with the cell bounds approach. All geometries are made up of one or more nodes. As in other geospatial data formats, geometries are encoded as ordered sets of nodes. The connection between nodes is assumed to be linear in the coordinate reference system the nodes are defined in. (Parametric geometries or otherwise curved features may be supported in the future.) The geometry type specifies the set of topological assumptions to be applied to relate the nodes. Multipoint and line geometries are nearly the same except nodes are interpreted as being connected for lines. Lines and polygons are also nearly the same except that a polygon is completed by connecting the last node back to the first. Polygons that have holes, such as waterbodies in a land unit, are encoded as a collection of polygon ring parts, each identified as exterior or interior polygons. Multipart geometries, such as multiple lines representing the same river or multiple islands representing the same jurisdiction, are encoded as collections of un-connected points, lines, or polygons that are logically grouped into a single geometry. Any data variable can be given a ==== Geometry Variables and Attributes A geometry container variable acts as a container for attributes that describe a set of geometries. The The The The geometry node coordinate variables must each have a For all geometry types except point, in which each geometry contains a single node, the geometry container variable must have a For multiline and multipolygon geometries ( including polygons with holes), the geometry container variable must have a For multipolygon geometries with holes, the geometry container variable must have an ==== Encoding Geometries [[complete-multipolygon-example]]
|
On a phone, so only a few comments for now.
I have edited your AsciiDoc text. AsciiDoc is not the same as MarkDown, so
GitHub can't render it correctly, but it's not *very* different.
I'm pretty sure gitHub can render asciidoc.
But in any case, if you want to know exactly how it's going to render, you
need to use the same renderer and format. So there is really no point in
putting markdown in here.
If you want to work on the content and structure, just do it in plain text
-- that's the whole point of these plain text-based markup formats.
In my next posting to the issue, I paste in my edited AsciiDoc,
If you want to make major changes to the entire doc, rather than commenting
on bits, you're really better off forking the repo and making the changes
and doing a pull request. That's exactly what the gitHub workflow is for,
…-CHB
|
Comment on the real content.,.
Working on this did raise some substantive issues about your proposal,
which I've commented on in the text. They are (a) I don't like the
duplication of first and last points in a polygon.
This is legacy from older geo formats:
"You’ll also notice that the *first coordinate is repeated at the end* of
each ring. There’s no particular reason why this is necessary besides
GeoJSON’s heritage in older formats."
In some, repetition of the point distinguishes between a polygon and a
polyline. Not required here, but compatibility with other formats is s good
thing.
(b) Is it really necessary to use opposite orders for exterior and interior
polygons? (c) It seems better to me to regard a polygon with holes as a
multipolygon.
A polygon with holes is a Different Thing than multiple polygons.
For instance, what is the area of a polygon with a hole? Vs two polygons
that happen to overlap? And using winding order makes processing easier.
And again, compatibility with other formats.
(d) If there is only a single geometry, the node count variable could be
omitted.
It could be, but it makes it easier to write processing code if it's always
there.
…-CHB
|
Dear Chris
Maybe it can. How do you tell the issue that you're giving it AsciiDoc rather than MarkDown?
Yes, we discussed it above, and the conclusion was that I should comment on the pull request. However, I really do not think that's very human-friendly for editing text like this (not friendly to this human, anyway). To make things easy to appreciate, you want to see the inserted text, deleted text, and comments all in front of you. That is what I have tried to achieve above, using a bit of MarkDown. (I forgot to point out that I used italic for inserted text and
Yes, if it's free, but this has a cost (redundancy and possible inconsistency) so I'm not convinced. This is a different file format so if you're translating from another one you will have to do some work anyway. Deleting or inserting a duplicated point as part of the translation wouldn't be complicated.
I agree those are different things, but they're clearly distinguished by the interior and exterior ring marking. A polygon with a hole is a multipolygon where one is exterior and one interior. Two overlapping polygons are both exterior.
Maybe, but by that argument the Best wishes Jonathan |
Dear Jonathan, Thanks for getting back with us on this. I think I've covered most of the comments with my personal disposition on them. This issue is getting a bit ungainly. I think that I will probably close it and move your comments to a google doc where we can actually do word by word diffing. (see my comment on this below) This level of review of the text is not supported by github, where we should probably just be vetting small changes and the actual asciidoc syntax. I'll wait to do this till tomorrow or the next day in case there's a real push to keep this review here... GitHub Usage Response
Comments only support markdown and the diff view of a pull request will never be "rendered". .adoc files do render in github. The best solution would probably to comment paragraph by paragraph in plain text in the diff view. I'll migrate your comments to that style as an example.
As I expected, for editing paragraphs of text, word-by-word diffing is desirable and it seems that the only way to get this is with some manual indication of what's been deleted and what's been inserted. (I suggested this above, but commenting line by line with this kind of thing as needed may be a good happy medium?) Proposal Response .
I have the same stance as @ChrisBarker-NOAA on this. It's just a common way of being defensive in the data encoding. It is an extra bit of certainty that the linear ring is complete and wasn't corrupted on the way in or out of a file. I'm not attached, but the idea of multiple layers of certainty for protection against data corruption is a positive that you may not be thinking about? As another point of reference, the OGC Simple Features Access spec follows this convention as well.
I feel pretty strongly that we should NOT say that a polygon with a hole is a multipolygon.
I have no problem allowing the node_count to be omitted.
I've seen the history of the convention described elsewhere (specifically in the DSG spec), and was following that. No problem removing the content there.
As long as it can be required for data associated with real-world coordinates. I am uncomfortable allowing anyone to not-state their datum/projection assumptions when recording lat/lon or other real-world coordinates.
While "crs" is the same thing, the grid_mapping variable name is problematic because we are not dealing with grids here. If it's objectionable, I'm happy to just use grid_mapping, but it could be confusing. Regarding putting a crs attribute on data variables, that has always confused me. Why not attach the spatial metadata to the spatial data directly, as we've suggested here - attaching a crs to the geometry container variable? It's indirect and just odd that the metadata goes on a data variable that references the spatial data. I have no problem following the existing convention, but it's a really odd artifact of CF that I've never really understood the logic for. Best,
|
Dear Dave, Chris and Jonathan,
A few comments on the proposal
.
I haven't though about github usage
yet
, but I will try a few text edit suggestions based on my comments below.
I don't think that we need to have the cf_role attribute on the node
variables. Their role is well defined by them being named
geometry_container:node_coordinates attribute, in just the same way that
auxiliary coordinates are identified from the coordinate attribute of a
data variable. The use of cf_role in DSGs is different, I think - in that
case there is nothing else to say how it relates to the data variable.
It would be very useful to include a data variable with a geometry
attribute in example 7.14 (A multipolygon, one part of which has holes).
One thing that is not clear to me is how you would specify multiple
geometries along a dimension. For example, if I have a data variable with
shape 12x3 that contains three equal length timeseries of countrywide
averages for the UK, Netherlands and France, how would I attach the three
geometries defining the three countries?
Duplication of first and last points in a polygon ... is legacy from older
geo formats. ... compatibility with other formats is a good thing.
I have the same stance as @ChrisBarker-NOAA
<https://github.com/chrisbarker-noaa> on this. It's just a common way of
being defensive in the data encoding. It is an extra bit of certainty that
the linear ring is complete and wasn't corrupted on the way in or out of a
file. I'm not attached, but the idea of multiple layers of certainty for
protection against data corruption is a positive that you may not be
thinking about? As another point of reference, the OGC Simple Features
Access <https://en.wikipedia.org/wiki/Shoelace_formula> spec follows this
convention as well.
I'm not keen on the duplicating the start/end node. We don't duplicate
points for other cyclic entities (e.g. global longitude coordinates) and if
the repeated nodes were there, I'd feel duty bound to check them for
equality, which is never nice with floats.
A polygon with holes is a Different Thing than multiple polygons. For
instance, what is the area of a polygon with a hole? Vs two polygons that
happen to overlap?
I feel pretty strongly that we should NOT say that a polygon with a hole
is a multipolygon.
1. A polygon is a thing with some area. You calculate the area of the
outer ring and subtract the area of the inner rings. The shoelace
formula <https://en.wikipedia.org/wiki/Shoelace_formula> relies on
opposite ordering for this.
2. Being consistent with other geospatial formats is a big positive
for adoptability and familiarity of a format.
3. This is also consistent with the standing OGC Simple Feature Access
<https://en.wikipedia.org/wiki/Shoelace_formula> standard, which is
the canonical reference for this kind of data at this point IMHO.
I'm happy with this definition - we should have some text that more
precisely defines a polygon (happy to initialise).
First, why should a grid mapping be required in the case of geometries
when it's not otherwise?
As long as it can be required for data associated with real-world
coordinates. I am uncomfortable allowing anyone to not-state their
datum/projection assumptions when recording lat/lon or other real-world
coordinates.
As far as I can see, the proposed crs attribute is the same thing. But why
do we need this anyway, since there is always a data variable in the file?
While "crs" is the same thing, the grid_mapping variable name is
problematic because we are not dealing with grids here. If it's
objectionable, I'm happy to just use grid_mapping, but it could be
confusing. Regarding putting a crs attribute on data variables, that has
always confused me. Why not attach the spatial metadata to the spatial data
directly, as we've suggested here - attaching a crs to the geometry
container variable? It's indirect and just odd that the metadata goes on a
data variable that references the spatial data. I have no problem following
the existing convention, but it's a really odd artifact of CF that I've
never really understood the logic for.
I don't favour a special rule for one type of bounds over another, so I
would not like to see the crs as mandatory, nor attachable to the geometry
container. Viewing the geometries as bounds, I think that we do have grid
... which raises the issue that a geometry is bounds without an associated
coordinate value. This is new in CF. Is it desirable? I can see use for a
representative location that is useful (e.g. the average of the nodes' lats
and lons).
All the best,
David
…--
David Hassell
National Centre for Atmospheric Science
Department of Meteorology, University of Reading,
Earley Gate, PO Box 243, Reading RG6 6BB
Tel: +44 118 378 5613
http://www.met.reading.ac.uk/
|
Dear Dave and David Thanks for your comments. I agree with David that a multipolygon example would be useful. Regarding duplicating points, another argument is for consistency within, as David says. CF can already describe polygonal bounds for a cell. In that case, the points are ordered anticlockwise (like a polygon in the new convention) and the first and last point are joined, with no repeated point. I think it's more important for CF to be consistent with CF than with other conventions. CF generally avoids redundancy. I could accept that a polygon with holes is different from a multipolygon, but I'd appreciate some further clarification. A single polygon with a single hole is a two-part geometry, needing a Dave says "I am uncomfortable allowing anyone to not-state their datum/projection assumptions when recording lat/lon or other real-world coordinates." However, that is the situation in CF. Even for the real world, it is optional to state these assumptions, because they are not always relevant to the dataset in question. It wouldn't be consistent to take a different view in this particular part of the convention. Consistency is also the reason for using the attribute for Best wishes Jonathan |
Dear Jonathan and David, I can accept the consistency within CF argument, and don't think it's a big enough problem to allow polygons to not self-close. Will attempt to petter define geometry such that a single-part geometry can have holes. The CRS issue is an old horse I've been riding for a long time. My issue is that:
is simply not true. There are no real world coordinates that do not require a statement of the reference system they use. My stance on the issue is that, if a Still curious about my suggestion re: modifying text in google docs or another editor designed for collaborative writing (not coding)... thoughts? Best,
|
I agree here -- I think we should think of this as a way to represent OGC-standard data in netCDF in a CF-compatible way, rather than "A way to specify geometries in CF" I have enough trouble getting GIS folks to deal with netcdf as it is....
Good idea, and good to provide a Multipolygon (and polygons with holes) example. And if we do keep the "OGC compatible" emphasis, then we can point folks to OGC standards for reference.
I agree -- It's probably too late to require that it be specified, but using WGS84 as an official default is a good way to capture (most) use-cases. There is no such thinking as "no reference system", there is only sloppiness. :-) In addition, if people are using a coordinate system that has no connection to the earth, then that should be explicit by defining a grid_mapping with grid_mapping_name "non_earth_coordinates" or something. and an "unknown" DRS may be needed for when people are pulling data from another source that was already sloppy...
Google docs is a great way to collaborate on a document -- much better than passing MSWord files around, etc. But one of the great things about asciidoc (and it's like) is that they're plain text, just like code, and thus tools designed for code work will with it -- git, github, diff, etc.... So the gitHub workflow is actually a pretty good one for this:
Also, it looks like asciidoc supported comments: http://asciidoctor.org/docs/asciidoc-syntax-quick-reference/#comments so you can add comments and/or comment out chunks of text, rather than deleting them. All that being said, if there are 2-3 people that want to collaborate on it in Google docs, then bring it back into gitHub, go for it. What will be lost is the process -- but that probably doesn't matter. |
The issue with comments on documents (as opposed to code) is that documents have lines of text that are much more than the typical 60-80 characters that's typical of code. GitHub is optimized for syntax and short-line commenting. It works well if comments are conceptual or like, "can you reword this?", but not so good for detailed editing such as Jonathan's latest. For now, I'll close this PR, reconcile Jonathan's suggestions into a new version of the doc and open a new PR that references this one. This comment string is enough to rinse and repeat with a reference back here for old conversation. |
Hi Dave, Does this closing this mean that discussion is currently not possible, and will opening a new pull request mean that the discussion will ultimately be split across different threads? Thanks, David |
@dblodgett-usgs wrote:
True, but: if you have a document that has entire paragraphs as single line,s it makes it easy to comment on a paragraph, so not so bad, really. Ans ascidoc and other markups don't assign significance to a single line break, so you can, (and should) use moderate sized lines anyway. But this by no means needs to be resolved on this issue, and at the end of the day any method that works for the folks taking part is the one to go with. |
closing the issue means it shows up as closed when you search for it. But you can still add to it, and what you add will be captured for posterity (like this one...) Opening a new PR will create a new discussion thread for that PR -- so yes, the discussion will be split, but I think that's probably a good thing -- folks can look here for the history, but the new PR will have discussion on what will hopefully be a close to final proposal, having included all this discussion. |
Good point! I hadn't thought about that. I'll format my next PR like that and we can try it out. |
Added sentence to introductory part of Section 4.3 beginning with "Optionally" Added "Recommendations" sentence after the paragraph starting with "Optionally" of Section 4.3
Dear CF,
This addition has been discussed on the email list heavily. It has not been submitted through trac as it seems that this ticket represents consensus to stop using trac in favor of github. I think it may be useful to consider a contribution not vetted in trac to pilot a workflow in github. If it is preferably to the community, I will circle back and use trac as in other contributions. Just trying to nudge the group and give a bit of food for thought.
Note that I have additional commit history that demonstrates peer review of the actual text on this branch. If the group would rather have a rich commit history, we can close this and open a PR from that branch.
For completeness of the history in case people come to this PR without having been part of the conversation on the email list, this work has been developed and vetted in a github repository here. The issue history in that repository contains nearly all the communication between the team that developed the proposal. The wiki in that repository also contains a more verbose description of the new section 7.5 being contributed here. That repository also contains a reference python implementation of the format. This repository contains an R implementation.
Best,
Dave
See https://cf-trac.llnl.gov/trac/ticket/160