Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for H3 #22

Merged
merged 33 commits into from
Jul 23, 2018
Merged

Support for H3 #22

merged 33 commits into from
Jul 23, 2018

Conversation

willcohen
Copy link
Contributor

@willcohen willcohen commented Jul 2, 2018

  • Adds h3 as dependency and adds the GeoCoord class to Point/Shapelike
  • Create an h3 namespace
  • Add basic conversion functions (to-string, to-long, pt->h3, h3->pt)
  • Add basic operations (get-resolution, hex-ring, hex-range, jts-boundary, geo-coords, polyfill, compact, uncompact)
  • Add additional functionality (edge functions, multi-polygon, pentagon? check, is-valid? check, neighbors? check)
  • Add polygons helper function to jts as inverse of geo.jts/multi-polygon, to create polygons that can be passed to polyfill.
  • Add geometries helper function to jts as inverse of geo.jts/geometry-collection, to create geometries from a collection.
  • Add tests
  • Add type hints

@willcohen willcohen changed the title [WIP] Support for H3 Support for H3 Jul 3, 2018
@willcohen
Copy link
Contributor Author

@worace This PR is ready for review -- I think it now covers a sufficiently full subset of functions available in the H3 library. There's a few remaining helpers that we could later consider but the differences between them what's currently included here are, I think, pretty minor.

@worace
Copy link
Contributor

worace commented Jul 4, 2018

Looks promising @willcohen. I probably won't be able to give this a super thorough look until after the holiday but a few things stand out that I'd like to understand better:

  • Needing another protocol for Polygonal in Spatial is kind of a bummer; I guess H3 does not operate on all JTS geometries? It's only closed ones? Would like to think about this a bit more
  • It's a little unfortunate to need to support so many operations on Strings as well as Longs. Do you have any sense as a user of the library of whether one or the other of these is more common? If we need both I guess that's fine but if it were at all reasonable to consider one to be the sanctioned format and have a means of converting the other into it I'd at least want to consider that.

BTW did you uncover any javadocs for H3 Java while doing this? Or are you just figuring it out from looking at their source?

@willcohen
Copy link
Contributor Author

Thanks for giving this a first pass.

H3 doesn't operate on geometries at all – only indices of cells, GeoCoords, and nested arrays of GeoCoords. The only thing we need Polygonal for at the moment is the polyfill function, so the protocol could definitely be eliminated. That said, neither Point nor Shapelike fit the bill. H3's polyfill works on an input of n rings of coordinates, with optional internal rings acting as holes, similar to geo's existing linear-ring and polygon. I could instead make polyfill work on a combination of something like GeoHash, LinearRing, RectangleImpl, and so on, but it'd be a conditional function using instance? similar to what ultimately got replaced with the GeoJSON protocols in #12. Part of what suggested to me that it might deserve be a standalone protocol is the fact that a Polygonal interface exists in JTS already.

As best as I can tell, the string versus long question is an implemention detail, sort of. The overall spec only refers to a 64-bit value for the representation of the index. On the one hand, the provided unix CLI commands use strings. On the other hand, java's Long is actually probably closer to the intent of H3Index than an unbounded string, since it actually is 64-bit.

On a personal level, if I had to pick only one implementation between strings and longs, I'd probably go with strings, if only to be more like geohashes. That said, pretty much every single method is implemented for strings as well as longs (though sometimes the string ones are often just wrappers around the Long versions), and the test suite covers both pretty evenly. That makes it hard for me to see how one is favored over the other. The distinction between the two might just be that strings are better though slightly slower if humans are going to see the index values, and longs might be more performant when it all stays internal.

Lastly, no javadocs that I can see – only the docs in H3Core.java itself.

@willcohen
Copy link
Contributor Author

willcohen commented Jul 4, 2018

Re #18 and #24, the difference between String and Long does appear to be one of efficiency. The "with reflection" version of the (extremely rough) benchmarks below is as of 5c0db47, and the "with protocol" version is as of 7913eb1, with the (into []) also temporarily removed on each for edges, to remove that overhead. Removing reflection increases speed by about 5x overall. Operating on Long is consistently faster than String with and without reflection, with ~20% gain on simple operations and a fair bit more on more intensive ones.

This leads me to think it's probably the right move to support both String and Long – it's a little more complicated to implement, but means that users can transparently prioritize either readability or speed depending on their needs. We might think of the Long as more like the GeoHash object itself, and the String as the human-readable representation, with the difference that the H3 library is happy to perform relatively efficient operations on either natively. @worace

With h3->pt:

;; With reflection 
(time
  (dotimes [_ 100000]
    (h3->pt "871f24ac5ffffff")))
"Elapsed time: 551.70143 msecs"

(time
  (dotimes [_ 100000]
    (h3->pt 608533827635118079)))
"Elapsed time: 513.701249 msecs"

;; With protocol
(time
  (dotimes [_ 100000]
    (h3->pt "871f24ac5ffffff")))
"Elapsed time: 104.001963 msecs"

(time
  (dotimes [_ 100000]
    (h3->pt 608533827635118079)))
"Elapsed time: 85.89182 msecs"

With edges:

;; With reflection
(time
  (dotimes [_ 100000]
    (edges "871f24ac5ffffff")))
"Elapsed time: 418.895697 msecs"

(time
  (dotimes [_ 100000]
    (edges 608533827635118079)))
"Elapsed time: 381.749192 msecs"

;; With protocol
(time
  (dotimes [_ 100000]
    (edges "871f24ac5ffffff")))
"Elapsed time: 96.562609 msecs"

(time
  (dotimes [_ 100000]
    (edges 608533827635118079)))
"Elapsed time: 55.966168 msecs"

@willcohen
Copy link
Contributor Author

@worace have you given any more thought to how you’d like to proceed here? After taking some time to think about it, I still think doing both string and long makes sense — since one’s more human readable and one’s closer to the 64-bit index they both seem to have their uses.

Finally, if we do keep the new protocol, it’s probably true that Polygonal isn’t the right name. Per JTS, a linestring isn’t actually polygonal. Maybe it gets called Polygonlike, with the idea that the first function it even implements is what actually makes something Polygonal pop out?

@worace
Copy link
Contributor

worace commented Jul 11, 2018

Yeah i think keeping both string and long is fine. I think Polygonal is fine; I'd kind of prefer to move it into the h3 namespace since it's mostly a concern for that module. What do you think?

@willcohen
Copy link
Contributor Author

Sounds good -- rebased to master and moved Polygonal to h3.

@willcohen
Copy link
Contributor Author

willcohen commented Jul 16, 2018

After putting this through its paces on real projects a little more over the past week, I also am realizing that geo.io has a blind spot when it comes to multiple geometries. Mapping H3's to-jts to the output from H3's hex-range means that sequences of Geometries get generated. Since QGIS and other tools don't operate nicely on plain GeometryCollection geojsons, it's important to be able to turn a GeometryCollection into sets of feature maps, so that they can be written out using to-geojson-feature-collection. Accordingly, I extended the (to-features) in geo.io to also work with JTS's Geometry and GeometryCollection, since it's a pretty handy normalizing function we've already got.

The other big change since last week was to have h3's to-jts return polygons instead of linear rings, to better match the geohash library's functionality.

@worace

@worace
Copy link
Contributor

worace commented Jul 16, 2018

have h3's to-jts return polygons instead of linear rings, to better match the geohash library's functionality

Seems good, polygons are probably more intuitive.

Accordingly, I extended the (to-features) in geo.io to also work with JTS's Geometry and GeometryCollection

I guess this is ok but it does make me wonder how or if we should handle the actual GeometryCollection GeoJSON type, or the Multi{Point,Polygon,LineString} types.

Does QGIS handle MultiPolygon any better?

@worace
Copy link
Contributor

worace commented Jul 16, 2018

Otherwise I think this is looking pretty good. The main reason I have held off on merging is to figure out how I'm going to handle the next (2.0) release. I think there's already quite a lot of big changes for that release, so I'd like to hold this for a subsequent 2.1 release. More info in #22

@worace
Copy link
Contributor

worace commented Jul 16, 2018

On the Geom/Feature-Collection thing...I'm not really opposed to just defaulting to FeatureCollection even if GeometryCollection is technically more correct since I think it's what people tend to use more often. But I think it at least warrants considering whether there are any common/legit GeometryCollection use-cases that we aren't thinking of.

@willcohen
Copy link
Contributor Author

willcohen commented Jul 16, 2018

Okay. If you want to hold off on this until 2.1, do you think 2.1 would proceed relatively soon after 2.0, or would you want to let this bake for a more extended period of time first?

I also think it's totally fine to default to FeatureCollection for most situations -- the question I have is just around what to actually DO with the GeometryCollections. We don't really have ways of running to-geojson and so on on sequences of other Geometries (or Geometries that should become Features), so it currently seems to make some sense to have some GeometryCollection support if only to help combine those sequences into single objects that can be themselves manipulated.

Current state of affairs with this PR:

A plain MultiPolygon works well in QGIS, since it's just a geometry:

(spit "/Users/foo/Desktop/multipoly.geojson"
      (geo.io/to-geojson
        (geo.jts/multi-polygon [(geo.jts/polygon-wkt [[-1 -1 11 -1 11 11 -1 -1]])
                                (geo.jts/polygon-wkt [[0 0 10 0 10 10 0 0]])])))

A GeometryCollection of that same thing works poorly:

(spit "/Users/foo/Desktop/geocoll.geojson"
      (geo.io/to-geojson
        (geo.jts/geometry-collection 
          [(geo.jts/multi-polygon [(geo.jts/polygon-wkt [[-1 -1 11 -1 11 11 -1 -1]])
                                   (geo.jts/polygon-wkt [[0 0 10 0 10 10 0 0]])])])))

That GeometryCollection converted into a set of feature maps and FeatureCollection works well again.

(spit "/Users/foo/Desktop/featurecoll.geojson"
  (geo.io/to-geojson-feature-collection
    (geo.io/to-features
      (geo.jts/geometry-collection 
        [(geo.jts/multi-polygon [(geo.jts/polygon-wkt [[-1 -1 11 -1 11 11 -1 -1]])
                                 (geo.jts/polygon-wkt [[0 0 10 0 10 10 0 0]])])]))))

@worace
Copy link
Contributor

worace commented Jul 16, 2018

If you want to hold off on this until 2.1, do you think 2.1 would proceed relatively soon after 2.0

That would be my plan. I don't see any reason not to do a release shortly after.

src/geo/jts.clj Outdated
(-> c
(.getGeometryN n)
(set-srid srid)))]
(into [] (map #(geom-n c %) (range n)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal here just to produce a vector? If so you can use mapv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed with 10b8bd5 -- I always forget about mapv.

@willcohen
Copy link
Contributor Author

willcohen commented Jul 19, 2018

@worace How would you like modifications to this PR to go, now that you've merged the branch? My last commit happened after you merged to release/2.1.0. Should I change the PR target to release/2.1.0, fix the last commit to apply to 2.1.0-rc-0, and keep adding additional changes to oush to that branch?

I've started putting 2.1.0-rc-0 into use and trying it out and in polyfills with edge-case sized geometries and fine resolutions the JVM crashes when it calls to C. They're working on some fixes but we should definitely try to make sure that we aren't crashing the JVM here, I think.

Regardless of the eventual upstream solution, I think the workaround solution for here should be first checking whether the polyfill will produce too many hexagons and if so, sufficiently subdivide the JTS geometry as needed and run the polyfill on the smaller shapes. Adjacent polyfills should return non-overlapping hexagonal indices so the only thing the user will see is a slower polyfill, but it means that the h3 namespace now has need for something like hex-area and a few other JTS helpers to do this (like centroids and envelopes) so the subdivision can proceed efficiently.

@willcohen willcohen changed the base branch from master to release/2.1.0 July 20, 2018 15:33
@willcohen
Copy link
Contributor Author

@worace Polyfill is now much more cautious when it runs. There's two magic numbers defined at the top of h3. When safe-polyfill-hexagon-maximum is exceeded by polyfill's estimate (based on how h3's maxPolyfillSize works), the polygon is subdivided into pieces and polyfill is run individually on each of those shapes recursively. The polyfill tries to uncompact at the end, but only if safe-uncompact-hexagon-maximum isn't exceeded. This generally should only be an issue when people are polyfilling huge shapes at a very high resolution -- the max checks are still always run, but heavy recursion or compacted polyfill returns generally only come into play when the operations are getting large.

That said, my choices for these magic numbers is based only on what my laptop's stock-ish openjdk JVM seems okay with. If the current settings still lead to crashes, we can always lower them.

Many of these new functions in the h3 namespace are internal, because they duplicate current internal functionality of h3's C core. If h3 changes its polyfill functionality to be more efficient, we'll want to be able to do the same without breaking the API.

@worace
Copy link
Contributor

worace commented Jul 21, 2018

My last commit happened after you merged to release/2.1.0. Should I change the PR target to release/2.1.0, fix the last commit to apply to 2.1.0-rc-0

I think leave it pointing to master and we can merge from master to the 2.1.0 release branch as needed.

I got to try this out a bit over the last few days for a project and it seems to be working well. I did not run into any of the large-geometry/fine-resolution issues you are describing but glad that you caught them. Bummer that we have to deal with that but but oh well.

I'm not sure we've fully addressed the GeomCollection / GeoJSON issues you raised, but as I understand it, it's not any better or worse off after this PR, so I think we can address it as a separate issue when we figure out what to do.

@willcohen willcohen changed the base branch from release/2.1.0 to master July 23, 2018 20:29
@willcohen
Copy link
Contributor Author

Okay. Switched it back to master. Note that it's got release/2.1.0 merged in there one time because this branch was starting to diverge and i wanted to be synced up to 2.1.0-rc-0 as a baseline. If you'd like me to roll that merge back, let me know. I figure I'll push pause on this for right now until 2.0 gets released so there's not too many moving pieces, but as a discrete PR I think this is basically done.

I agree that it might make sense to deal with the geomcollection stuff in a separate request, since that should be a smaller change. As long as 2.1 waits until we resolve it, it's fine. This PR does have the geometries function, which does affect geometrycollections on some level, but I suspect that'll be useful no matter what the final decision is. @worace

@worace worace merged commit efdd488 into Factual:master Jul 23, 2018
@willcohen willcohen deleted the h3 branch July 24, 2018 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants