Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Radius units and ordering by distance in JoinQuery.DistanceJoinQuery #100

Closed
deil87 opened this issue Jun 2, 2017 · 9 comments

Comments

@deil87
Copy link

commented Jun 2, 2017

Hi!
Unfortunately I haven't found units for radius parameter in CircleRDD constructor.
I seems that it's not in km.
Second question: Wouldn't it be a good idea to return joined points sorted by distance(not just HashSet)?

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

@deil87

  1. The unit of the radius should always be same with the input data. GeoSpark always assume objects are on a planar space. Thus, you don't need to specify unit for the radius.

In this regard, if you want to have accurate results, you'd better use GeoSpark CRS transformation function first to transform your input data to meter-based CRS such as epsg:3857.

  1. It is not a bad idea to sort the joined points. However, this will lead to unnecessary time overhead for users who are not interested in sorted results. Therefore, I leave this option to users. If you want to sort, you can simply write an additional MAP to sort the data.
@deil87

This comment has been minimized.

Copy link
Author

commented Jun 2, 2017

  1. So, I'm trying to do like this
    new PointRDD(ss.sparkContext, inputLocationCSV, pointRddOffset, pointRDDSplitter, true, StorageLevel.MEMORY_ONLY, "epsg:4326", "epsg:3857")

Got this error
Exception in thread "main" java.lang.Exception: [JoinQuery][DistanceJoinQuery]one input RDD doesn't perform necessary CRS transformation. Please check your RDD constructors. at org.datasyslab.geospark.spatialOperator.JoinQuery.executeDistanceJoinUsingIndex(JoinQuery.java:150) at org.datasyslab.geospark.spatialOperator.JoinQuery.DistanceJoinQuery(JoinQuery.java:647) at com.geo.SpatialKNNMaterialInterpolator$$anonfun$testDistanceJoinQueryUsingIndex$1$1.apply$mcVI$sp(SpatialKNNMaterialInterpolator.scala:81) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) at com.geo.SpatialKNNMaterialInterpolator$.testDistanceJoinQueryUsingIndex$1(SpatialKNNMaterialInterpolator.scala:79) at com.geo.SpatialKNNMaterialInterpolator$.main(SpatialKNNMaterialInterpolator.scala:88) at com.geo.SpatialKNNMaterialInterpolator.main(SpatialKNNMaterialInterpolator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)

(My input data are lat/lon and I want to use distance for Circle in metres)

What is the right sourse epsg for lat/lon ( lets say from Google Map's Mercator)?

  1. Sorry for that question)
@jiayuasu

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

@deil87
Since you are doing a DistanceJoin which involves two Spatial RDDs.
You need to make sure you did CRS transformation on both Spatial RDDs. It seems that you just did it on one Spatial RDD.

@deil87

This comment has been minimized.

Copy link
Author

commented Jun 2, 2017

@jiayuasu
As I understand what I'm doing.... I only use one input csv and it is used as source and target RDD for distanceJoin.

val objectRDD = new PointRDD(ss.sparkContext, inputLocationCSV, pointRddOffset, pointRDDSplitter, true, StorageLevel.MEMORY_ONLY, "epsg:4326", "epsg:3857")
val queryWindowRDD = new CircleRDD(objectRDD,0.01)

objectRDD.spatialPartitioning(GridType.RTREE)
queryWindowRDD.spatialPartitioning(objectRDD.grids)

objectRDD.buildIndex(IndexType.RTREE,true)

objectRDD.indexedRDD.persist(StorageLevel.MEMORY_ONLY)
queryWindowRDD.spatialPartitionedRDD.persist(StorageLevel.MEMORY_ONLY)

val result = JoinQuery.DistanceJoinQuery(objectRDD, queryWindowRDD ,true,true)


@jiayuasu

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

@deil87

There is a bug in GeoSpark due to the newly added CRS transformation function: CircleRDD forgets to copy CRS information when the user constructs a new CircleRDD from another SpatialRDD.

This bug has been fixed just now and pushed into 0.7.1-snapshot. You need to refresh your GeoSpark dependency coordinate and make sure the IDE downloads the latest version.

@deil87

This comment has been minimized.

Copy link
Author

commented Jun 2, 2017

Good new! Thanks!
How soon latest version will appear in maven central repo? Seems like old build is still there.
(geospark-0.7.1-snapshot.jar 2017-06-02 00:59 13031304)

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Jun 2, 2017

@deil87

I think it has been synced to Maven Central several hours ago. Please try it out.

@deil87

This comment has been minimized.

Copy link
Author

commented Jun 3, 2017

@jiayuasu
For some reason I can't see fixed build.
http://central.maven.org/maven2/org/datasyslab/geospark/0.7.1-snapshot/
Could you please check one more time?

@jiayuasu

This comment has been minimized.

Copy link
Member

commented Jun 3, 2017

@deil87 It seems that the auto-sync between sonatype and maven central got some problems. Please try 0.7.1-snapshot2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.