Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CRS JSON export (refs #1545) #1547

Merged
merged 16 commits into from
Aug 20, 2019
Merged

Add CRS JSON export (refs #1545) #1547

merged 16 commits into from
Aug 20, 2019

Conversation

rouault
Copy link
Member

@rouault rouault commented Jul 6, 2019

(output updated with latest developments)
$ src/projinfo EPSG:32631 -o JSON -q

{
  "type": "ProjectedCRS",
  "name": "WGS 84 / UTM zone 31N",
  "base_crs": {
    "name": "WGS 84",
    "datum": {
      "type": "GeodeticReferenceFrame",
      "name": "World Geodetic System 1984",
      "ellipsoid": {
        "name": "WGS 84",
        "semi_major_axis": 6378137,
        "inverse_flattening": 298.257223563
      }
    },
    "coordinate_system": {
      "subtype": "ellipsoidal",
      "axis": [
        {
          "name": "Geodetic latitude",
          "abbreviation": "Lat",
          "direction": "north",
          "unit": "degree"
        },
        {
          "name": "Geodetic longitude",
          "abbreviation": "Lon",
          "direction": "east",
          "unit": "degree"
        }
      ]
    },
    "id": {
      "authority": "EPSG",
      "code": 4326
    }
  },
  "conversion": {
    "name": "UTM zone 31N",
    "method": {
      "name": "Transverse Mercator",
      "id": {
        "authority": "EPSG",
        "code": 9807
      }
    },
    "parameters": [
      {
        "name": "Latitude of natural origin",
        "value": 0,
        "unit": "degree",
        "id": {
          "authority": "EPSG",
          "code": 8801
        }
      },
      {
        "name": "Longitude of natural origin",
        "value": 3,
        "unit": "degree",
        "id": {
          "authority": "EPSG",
          "code": 8802
        }
      },
      {
        "name": "Scale factor at natural origin",
        "value": 0.9996,
        "unit": "unity",
        "id": {
          "authority": "EPSG",
          "code": 8805
        }
      },
      {
        "name": "False easting",
        "value": 500000,
        "unit": "metre",
        "id": {
          "authority": "EPSG",
          "code": 8806
        }
      },
      {
        "name": "False northing",
        "value": 0,
        "unit": "metre",
        "id": {
          "authority": "EPSG",
          "code": 8807
        }
      }
    ]
  },
  "coordinate_system": {
    "subtype": "Cartesian",
    "axis": [
      {
        "name": "Easting",
        "abbreviation": "E",
        "direction": "east",
        "unit": "metre"
      },
      {
        "name": "Northing",
        "abbreviation": "N",
        "direction": "north",
        "unit": "metre"
      }
    ]
  },
  "area": "World - N hemisphere - 0°E to 6°E - by country",
  "bbox": {
    "south_latitude": 0,
    "west_longitude": 0,
    "north_latitude": 84,
    "east_longitude": 6
  },
  "id": {
    "authority": "EPSG",
    "code": 32631
  }
}

@rouault rouault mentioned this pull request Jul 6, 2019
@rouault rouault force-pushed the json_export branch 6 times, most recently from 7123662 to d824832 Compare July 6, 2019 15:55
@rouault
Copy link
Member Author

rouault commented Jul 6, 2019

I've more or less implemented the same ellision rules as WKT2, that is if an object has an ID, then omit the IDs of its children, except for the base CRS of the projected CRS, the method and parameter name IDs. Similarly the "usage" (area & bbox) are only reported on the top object.

A few potential ideas to make it more compact:

  • for interior objects, most of the time the "type" attribute is useless and directly inferred by the parent context. For example "ellipsoid": { "type": "Ellipsoid", ... }. The cases where I'd keep the type would be for units (to know if it is linear, angular, etc..).
  • For { "type": "GeographicCRS", ... } as the value of "base_crs", perhaps keep it. Or replace the generic "base_crs" by "base_geographic_crs" ? Similarly for "datum": { "type": "GeodeticReferenceFrame" } --> "geodetic_reference_frame": { ... } ?
  • for units, we could potentially recognize a few common ones under a short form: "unit": "metre", "unit": "degree"
  • for Ellipsoid, we could potentially decide to conventionnaly omit unit completely, if the semi major axis is metre. Similarly for the longitude of PrimeMeridian if it is degree ? (WKT2 didn't go that far though)
  • omit PrimeMeridian if it is Greenwich ?

@rouault rouault force-pushed the json_export branch 5 times, most recently from 0e30264 to 0fecb4a Compare July 6, 2019 17:56
@kbevers
Copy link
Member

kbevers commented Jul 8, 2019

I think this looks quite good already. Impressive work, Even!

for interior objects, most of the time the "type" attribute is useless and directly inferred by the parent context. For example "ellipsoid": { "type": "Ellipsoid", ... }. The cases where I'd keep the type would be for units (to know if it is linear, angular, etc..).

Sounds good to me. Especially if combined with the short form unit descriptor described below.

For { "type": "GeographicCRS", ... } as the value of "base_crs", perhaps keep it. Or replace the generic "base_crs" by "base_geographic_crs" ? Similarly for "datum": { "type": "GeodeticReferenceFrame" } --> "geodetic_reference_frame": { ... } ?

I am indifferent to this one. Do you want to to this because snake_case is more "JSONic" than CamelCase?

for units, we could potentially recognize a few common ones under a short form: "unit": "metre", "unit": "degree"

Sounds good.

for Ellipsoid, we could potentially decide to conventionnaly omit unit completely, if the semi major axis is metre. Similarly for the longitude of PrimeMeridian if it is degree ? (WKT2 didn't go that far though)

Sounds very reasonable to me.

omit PrimeMeridian if it is Greenwich ?

Yes.

In general, I think that a short form approach to the JSON output is preferred when properties of CRS or transformation is "standard". That is, SI units, Greenwich prime meridian and what else is intuitively understood by people familiar with the topic.

Is the JSON output from projinfo enabled by default or do you have to toggle it with -o JSON? I prefer the latter.

Before this is merged I would like to have the docs for projinfo updated and preferably also a description or specification of the JSON format (especially if the ideas above is implemented).

@rouault
Copy link
Member Author

rouault commented Jul 8, 2019

I am indifferent to this one. Do you want to to this because snake_case is more "JSONic" than CamelCase?

yes, for key values, I use snake_case . For "type" values, I've imitated GeoJSON where you have the CamelCase convention ( "type", "Feature", "type": "GeometryCollection", etc). But my comment here wasn't about the naming convention, but more about how to specify the type of an objet. Currently with "datum": { "type": "GeodeticRefererenceFrame" }, this is generic, and if we implement "type": "DynamicGeodeticReferenceFrame" this is easily possible. I we omit the "type", we need to move its indication one level up in some way.

Is the JSON output from projinfo enabled by default or do you have to toggle it with -o JSON

Not enabled by default. Have to toggle it with -o JSON

I would like to have the docs for projinfo updated

Sure, but we aren't there yet. And I guess the reading side should be there too. Which will bring the issue of having a JSON parser. I've in GDAL a hand-made streaming parser (made to be able to parse gigabytes lengthy GeoJSON files), mixed with libjson-c, but a pure steaming parser is not convenient to use here, and a standard "DOM" oriented if I might say parser would be better. I guess I could possibly have this DOM layer on top of the streaming parser, or use an external library.

preferably also a description or specification of the JSON format

Like https://github.com/rouault/proj.4/blob/json_export/data/crsjson.schema.json possibly augmented with a few comments ?

@kbevers
Copy link
Member

kbevers commented Jul 8, 2019

my comment here wasn't about the naming convention, but more about how to specify the type of an objet. Currently with "datum": { "type": "GeodeticRefererenceFrame" }, this is generic, and if we implement "type": "DynamicGeodeticReferenceFrame" this is easily possible. I we omit the "type", we need to move its indication one level up in some way.

Sorry, I misinterpreted that. I understand now. I think the current approach is fine - it is in line with the WKT2 standard and is more flexible in case new types are introduced.

Like rouault/proj.4:data/crsjson.schema.json@json_export possibly augmented with a few comments ?

Yes, I think that will be enough. Something stating that this is based on WKT2:2018 and how it deviates from the standard. I have a feeling that once we provide this type of output (and eventually input) people is going to rely on it quickly so some form of governance of the format is required. Good documentation helps with that.

@hobu
Copy link
Contributor

hobu commented Jul 9, 2019

I think the canonical name for this should be projson or something more descriptive than simply json.

@snowman2
Copy link
Contributor

snowman2 commented Jul 9, 2019

Or confuse people and do WKTJSON_2019 😛

@mwtoews
Copy link
Member

mwtoews commented Jul 9, 2019

I'd prefer keeping the output format as JSON, since this is a valid format for the output, and not some special format that is inspired by JSON.

Copy link
Member

@mwtoews mwtoews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amend docs/source/apps/projinfo.rst lines 57, 72, 93 and 226. (Also, I don't mind doing this, if you prefer)

@rouault rouault force-pushed the json_export branch 2 times, most recently from e11ca45 to c5dd688 Compare August 9, 2019 11:24
@rouault
Copy link
Member Author

rouault commented Aug 9, 2019

Folks, I believe this work is feature-complete now and ready for review. I've opted for PROJJSON as suggested by @hobu . I've done changes on export so that it is a bit less verbose . All CRS, Datum and CoordinateOperation objects can be exported and imported. C function proj_as_projjson() for export added (import is done through existing proj_create())

For parsing JSON, I use the single file header https://github.com/nlohmann/json/blob/develop/single_include/nlohmann/json.hpp . I've integrated it in the source tree. I can imagine that some distributions might not like that (it is sometimes available as a package). It should be easy to patch PROJ source to include the system version if present (actually it should probably be just a matter of removing the include/proj/internal/nlohmann directory). I didn't go as far as tweaking the build systemS to make that a configurable option.

@hobu
Copy link
Contributor

hobu commented Aug 9, 2019

We are using nlohmann in PDAL as well to replace jsoncpp which has been terrible with interface changes and deployment issues. It has worked well for us, but our upcoming 2.0 release is the first public release with it.

@hobu
Copy link
Contributor

hobu commented Aug 9, 2019

We are using nlohmann in PDAL

I forgot to add we buried it in a namespace so it wouldn't clash, and we are careful about not exposing it in public APIs.

@rouault
Copy link
Member Author

rouault commented Aug 9, 2019

I forgot to add we buried it in a namespace so it wouldn't clash, and we are careful about not exposing it in public APIs.

The same here with this little #define trick : https://github.com/rouault/proj.4/blob/json_export/include/proj/internal/include_nlohmann_json.hpp#L5

@snowman2
Copy link
Contributor

snowman2 commented Aug 12, 2019

First of all, this is fantastic!

One thing I randomly bumped into is the difference in the area of use in the datum representations of WKT versus PROJ JSON. Also, noticed this difference between the direct Datum PROJ JSON versus the datum information in the CRS PROJ JSON.

>>> from pyproj.crs import CRS, Datum
>>> crs_utm = CRS.from_epsg(26915)
>>> crs_utm.to_json_dict()["base_crs"]["datum"]
{'type': 'GeodeticReferenceFrame', 'name': 'North American Datum 1983', 'ellipsoid': {'name': 'GRS 1980', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257222101}}
>>> dd = Datum.from_json_dict(crs_utm.to_json_dict()["base_crs"]["datum"])
>>> dd
DATUM["North American Datum 1983",
    ELLIPSOID["GRS 1980",6378137,298.257222101,
        LENGTHUNIT["metre",1,
            ID["EPSG",9001]]]]
>>> dd.to_json_dict()
{'type': 'GeodeticReferenceFrame', 'name': 'North American Datum 1983', 'ellipsoid': {'name': 'GRS 1980', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257222101}}
>>> crs_utm.datum.to_json_dict()
{'type': 'GeodeticReferenceFrame', 'name': 'North American Datum 1983', 'ellipsoid': {'name': 'GRS 1980', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257222101}, 'area': 'North America - NAD83', 'bbox': {'south_latitude': 14.92, 'west_longitude': 167.65, 'north_latitude': 86.46, 'east_longitude': -47.74}, 'id': {'authority': 'EPSG', 'code': 6269}}
>>> dd2 = Datum.from_json_dict(crs_utm.datum.to_json_dict())
>>> dd2
DATUM["North American Datum 1983",
    ELLIPSOID["GRS 1980",6378137,298.257222101,
        LENGTHUNIT["metre",1]],
    ID["EPSG",6269]]
>>> dd2.to_json_dict()
{'type': 'GeodeticReferenceFrame', 'name': 'North American Datum 1983', 'ellipsoid': {'name': 'GRS 1980', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257222101}, 'area': 'North America - NAD83', 'bbox': {'south_latitude': 14.92, 'west_longitude': 167.65, 'north_latitude': 86.46, 'east_longitude': -47.74}, 'id': {'authority': 'EPSG', 'code': 6269}}

Not sure if this matters or not, but figured it was noteworthy.

@rouault
Copy link
Member Author

rouault commented Aug 12, 2019

Not sure if this matters or not, but figured it was noteworthy.

This is expected. The export follows the same rules as WKT (normally you should get similar behaviour, but I didn't try to check to that level of detail...), which includes omitting "details" of embedded objects when you export a top-level object. When you export a CRS, its datum id and datum area of uses are omitted for compactness. So when you instanciate a CRS back from that, those information are no longer there (they should be queried from the database if needed). Whereas when you instanciate a CRS from its code and export its datum right away, the datum object of this CRS has id, area of use etc, which can be exported, if it is exported as a standalone object.

Copy link
Member

@kbevers kbevers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. All in all I think this is a great piece of work. Good job, Even.

I have a few remarks regarding the JSON schema. Or rather the filename and URL as you can see in the inlined comments. In addition to those I am wondering how updates to the schema is dealt with? This version of the schema is obviously tied to PROJ 6.2.0, but what happens in the case we need to fix a bug in the schema for version 6.3.0? As far as I can see no versioning info is present in the schema.

@@ -0,0 +1,935 @@
{
"$id": "https://proj.org/crsjson.schema.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you have decided to go with the name PROJJSON, shouldn't it then be projjson.schema.json?

@@ -44,6 +44,7 @@ Synopsis
(*added in 6.2*)
- a OGC URN combining references for concatenated operations
(e.g. "urn:ogc:def:coordinateOperation,coordinateOperation:EPSG::3895,coordinateOperation:EPSG::1618")
- a PROJJSON string. The jsonschema is at https://github.com/OSGeo/proj/blob/master/data/crsjson.schema.json (*added in 6.2*)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the schema file this is said to be located at https://proj.org/crsjson.schema.json. I would prefer to use the same location here too. Having a direct link to master on github is a potential risk if the schema is updated before a new release is published.

{
"$id": "https://proj.org/crsjson.schema.json",
"$schema": "http://json-schema.org/draft-07/schema#",
"description": "Schema for CRS JSON",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also be PROJJSON instead of CRS JSON.

@rouault
Copy link
Member Author

rouault commented Aug 15, 2019

In addition to those I am wondering how updates to the schema is dealt with? This version of the schema is obviously tied to PROJ 6.2.0, but what happens in the case we need to fix a bug in the schema for version 6.3.0? As far as I can see no versioning info is present in the schema.

Hum I thought this was perfect work, peer reviewed by hundreds of people, so there is no bug, right ;-) ? More seriously, AFAICS there is no dedicated field in https://json-schema.org/specification.html to put a version number for a schema, so I think this should be captured in the $id property if we wanted to, like https://proj.org/schemas/v0.1/projjson.schema.json ? (not sure we want to advertize a 1.0 version number at this point)
Versioning the schema would be one thing, but I'm not sure how useful this would be given that PROJJSON content itself doesn't contain a link to it. Should we include a "$schema": "https://proj.org/schemas/v0.1/projjson.schema.json" link in the top element of a PROJJSON export ?
Note: GeoJSON doesn't point to a schema (but it had no json schema developed officially for it, so that's propably not a good reference).

@kbevers
Copy link
Member

kbevers commented Aug 16, 2019

so I think this should be captured in the $id property if we wanted to, like proj.org/schemas/v0.1/projjson.schema.json ?

Yes, it was something like that I was thinking about.

Should we include a "$schema": "proj.org/schemas/v0.1/projjson.schema.json" link in the top element of a PROJJSON export ?
Note: GeoJSON doesn't point to a schema (but it had no json schema developed officially for it, so that's propably not a good reference).

I think that would be a smart thing to do. As soon as this is released it is going to be (ab)used by lots of people and changes in the schema will affect them in some way or other. If there's no standardised way of pointing towards the schema in use (similar to XML) this is as good as it gets, I guess.

Keeping the schema as version 0.1 is fine with me. We can use the 6.2.x releases up until 7.0 as a preview phase and change it to 1.0 with PROJ 7. Hopefully any potential kinks will be ironed out by then.

@kbevers
Copy link
Member

kbevers commented Aug 20, 2019

@rouault As far as I can tell you have addressed my comments so if you are happy with the PR as it is now I think it is time to merge the code.

@rouault rouault merged commit 2c9c015 into OSGeo:master Aug 20, 2019
@rouault
Copy link
Member Author

rouault commented Aug 20, 2019

Merged!

@kbevers
Copy link
Member

kbevers commented Aug 20, 2019

Awesome. Impressive work on this!

@snowman2
Copy link
Contributor

Thanks for adding this. Is this planned to be added to the 6.2.0 release?

@rouault rouault added this to the 6.2.0 milestone Aug 21, 2019
@rouault
Copy link
Member Author

rouault commented Aug 21, 2019

Is this planned to be added to the 6.2.0 release?

Yes, this is in master, and master will be release as 6.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants