New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
geospark geomesa interoperation #253
Comments
The suggested workaround seems to work partially. But when renaming also the UDF (I actually want to get the speedup of geospark) the functions do not seem to be properly registered
|
An reproducible example can be found at https://github.com/geoHeil/geomesa-geospark |
I get the following problems: clash of classes
when changing the scope from |
@geoHeil This is probably because GeoMesa also has its own customize Geometry kryo serializer which is same as GeoSpark. GeoSpark wrote a bunch of code to put spatial indexes and geometries into an array. Since we both utilize JTS geometry, this could be a conflict. |
See the latest updates to https://github.com/geoHeil/geomesa-geospark
one problems remains:
geospark & geomesaregular join
geospark solooptimized range join
|
@jiayuasu do you believe this conflict is causing the problem that spark resorts to regular, i.e. no longer optimized joins? |
@geoHeil You probably can try to register GeoSpark join strategy manually: https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/datasyslab/geosparksql/utils/GeoSparkSQLRegistrator.scala In other words, add the following line:
|
@jiayuasu thanks a lot. This is correct / was lacking from my registrator. Now optimized range joins are used as well. Do you have an opinion regarding UDT registration / clashing class names? Or a better Idea than my own above with shading? geospark
geomesa
|
Geomesa will serialize using JTS
geospark using
is there any problem if JTS code (from geomesa) is serialized via the geospark serializer? Any problems regarding efficiency? |
According to James (from geomesa gitter chat)
is there some interest from both projects to collaborate here? |
this is then the problem of clashing UDT. How could this be resolved (quickly)? |
With the upgrade to geomesa 2.4.x geotools was upgrade to version 21 this also internally switches to locationtech based JTS. Unfortunately, having two versions of geotools on the classpath is causing troubles for me |
Instead of some discussions in the gitter channels of geospark and geoemsa perhaps this is the better place to continue the discussion #253 Tasks to be done: Short term
Long term
@jiayuasu , @jnh5y what do you think about this? |
Just as a clarification, shading refers to packaging the transitive dependencies in an uber-jar, but what you are referring to is shading + relocation, which will hide the transitive dependencies from everything else on the classpath. That seems like a good medium-term solution, although I will note that you may run into issues in your end project if you try to use the shade plugin there, while having a dependency that is also shaded/relocated (i.e. 2 levels of shading will likely not work). |
As a data scientist I want to be able to mix and match spatial libraries for spark. Currently, it is rather XOR as they do not integrate with each other and have overlapping classes and UDF function names.
In particular I would want to be able to easily integrate geospark and geomesa
One possibility could be to write my own udf registrator: https://github.com/DataSystemsLab/GeoSpark/blob/master/sql/src/main/scala/org/datasyslab/geosparksql/UDF/UdfRegistrator.scala
However this is still not handling overlapping classes (JTS, geotools)
The text was updated successfully, but these errors were encountered: