Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coordinates modelling (add support for WKT to Flatten Tool) #10

Closed
duncandewhurst opened this issue Aug 9, 2022 · 8 comments · Fixed by #262
Closed

Coordinates modelling (add support for WKT to Flatten Tool) #10

duncandewhurst opened this issue Aug 9, 2022 · 8 comments · Fixed by #262
Assignees
Labels
CSV format This issue relates to the CSV publication format Schema Tooling This issue relates to tooling
Milestone

Comments

@duncandewhurst
Copy link
Collaborator

duncandewhurst commented Aug 9, 2022

We plan to reuse GeoJSON's Feature object to represent the physical location of a node (as a Point) and the route of a link between its endpoints (as a LineString) in both the JSON and GeoJSON formats.

Points

If we use Flatten Tool for conversion from the JSON format to the CSV format, Point geometries would be represented as a semi-colon separated list:

location/geometry/type location/geometry/coordinates
Point 26.081;-24.405

This poses two potential problems for users:

  1. The ordering of longitude and latitude is not explicit, so its easy to mix them up
  2. When importing the data into a GIS tool, additional processing might be required to split the coordinates. This is the case in QGIS, for example.

There are a couple of possible alternatives, either would require some special-casing in Flatten Tool:

Separate fields for longitude and latitude

location/geometry/type location/geometry/longitude location/geometry/latitude
Point 26.081 -24.405

This seems like the most user-friendly alternative. It is readily supported by QGIS, and presumably other GIS tools, and is equally usable for users who are not using GIS-specific tooling.

Well known text

location/geometry
POINT (26.081 -24.405)

This option is readily supported by QGIS, and presumably other GIS tools, but is less usable for users who are not using GIS-specific tooling.

Linestrings

I ran into some problems trying to flatten a GeoJSON Linestring in Flatten Tool so I don't know what the default behaviour is. One possibility is a semi-colon separated list of semi-colon separated lists:

location/geometry/type location/geometry/coordinates
Linestring [26.081;-24.405]; [26.09; -24.416]

Another possibility is a multi-table representation, related by id:

id location/geometry/type
1 Linestring
2 Linestring
id location/geometry/coordinates
1 [26.081;-24.405]
1 [26.09; -24.416]
2 [25.05;-23.234]
2 [25.16; -23.332]

Neither seems particularly desirable in terms of usability. Both would require substantial additional processing to import into GIS tools and the ordering of longitude and latitude is not explicit in either.

In terms of alternatives, separating longitude and latitude into separate fields would only work for the multi-table representation, which would still have significant usability issues. However, well-known text is an option:

location/geometry
LINESTRING (26.081 24.405, 26.09 -24.416)

Summary

Based on the analysis above, there are 3 options:

  1. If consistency in the representation of Point and Linestring geometries in the CSV format is desirable, then both could be represented using well-known text.

  2. If consistency is not important, then Point geometries could be represented using separate longitude and latitude fields and Linestring geometries could be represented using well-known text.

  3. The detailed routes of links could simply be omitted from the CSV representation, since it is adequately handled by the JSON and GeoJSON formats.

The purpose of this issue is to surface any other options that should be considered, to seek feedback on the preferred option and to explore the implications for tooling.

@duncandewhurst duncandewhurst added CSV format This issue relates to the CSV publication format Tooling This issue relates to tooling Schema labels Aug 9, 2022
@lgs85
Copy link
Contributor

lgs85 commented Aug 9, 2022

Thanks for laying this out @duncandewhurst. Am coming into this with minimal background so feel free to ignore if out of scope. One thing I noticed is that flatterer seems to deal with geojson, including points and linestrings, quite well already. Here's what an example output csv looks like:

type geometry_type geometry_coordinates
Feature Point "[102,0.5]"
Feature LineString "[[102,0],[103,1],[104,0][102,0]]"

More generally, my view is that for the vast majority of applications that would use the geospatial data, it will be easier to import a geojson file directly, so we shouldn't worry too much about presenting lat/long in analysis-ready format in a csv export. That being said, I don't love the idea of leaving the geospatial data out of the csv export entirely, as this could cause confusion e.g. if the csv conversion is used for database imports. So option 1, or something like the flatterer output if that's an option, might be the best bet.

@duncandewhurst
Copy link
Collaborator Author

Thanks, for the reminder about flatterer, @lgs85. I've opened a separate issue (#14) on deciding what tool to use so that we can keep this issue focused on the desired modelling.

More generally, my view is that for the vast majority of applications that would use the geospatial data, it will be easier to import a geojson file directly, so we shouldn't worry too much about presenting lat/long in analysis-ready format in a csv export. That being said, I don't love the idea of leaving the geospatial data out of the csv export entirely, as this could cause confusion e.g. if the csv conversion is used for database imports.

I agree that GeoJSON would be easier for many use cases but that it would also be desirable to have the same information available in each publication format.

I don't think we should use Flatterer's representation of geometry_coordinates since it shares the same usability issues as Flatten Tool's representation and, arguably, it's worse since users would also need to handle the extra set of square brackets.

@duncandewhurst duncandewhurst added this to the Alpha milestone Aug 12, 2022
@duncandewhurst
Copy link
Collaborator Author

The W3C's Spatial Data on the Web Best Practice 8: State how coordinate values are encoded is a useful reference for this issue.

@lgs85
Copy link
Contributor

lgs85 commented Sep 9, 2022

Coming back to this, I see no benefit in representing points as separate long/lat fields and lineStrings as WKT, as i) very few users will want to use just node data, so ii) they will have to parse the WKT for lineStrings anyway, iii) representing both as WKT has the advantage of consistency, which iv) makes conversion and conversion tooling easier.

Therefore suggest we represent both points and lineStrings as WKT.

@duncandewhurst
Copy link
Collaborator Author

For the Alpha, we'll use the default format provided by Flatten Tool and look to update the tool to provide WKT format in the Beta.

@duncandewhurst duncandewhurst modified the milestones: Alpha, Beta Sep 15, 2022
@duncandewhurst duncandewhurst changed the title Coordinates modelling Coordinates modelling (add support for WKT to Flatten Tool) Oct 17, 2022
@duncandewhurst duncandewhurst removed their assignment Oct 17, 2022
@duncandewhurst
Copy link
Collaborator Author

Feedback from the World Bank's infrastructure map team is that WKT is expected for CSV files so I think we do want to use WKT for both point and linestring geometries.

@duncandewhurst
Copy link
Collaborator Author

duncandewhurst commented Apr 26, 2023

@Bjwebb, I've created a draft PR with updated CSV examples showing what I expect the WKT format to look like. Please could you check that you're happy with it from a Flatten Tool perspective?

Edit: Noting that I've replaced the whole Node.location and Span.route objects with WKT fields, rather than only replacing Node.location.coordinates and Span.route.coordinates, since the WKT format encodes both the geometry type and the coordinates.

@Bjwebb
Copy link
Contributor

Bjwebb commented Apr 28, 2023

This looks like what I'm expecting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CSV format This issue relates to the CSV publication format Schema Tooling This issue relates to tooling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants