Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SpatialJoinQuery throws exception for RDDs partitioned the same way #186

Closed
tociek opened this issue Feb 9, 2018 · 4 comments
Closed

SpatialJoinQuery throws exception for RDDs partitioned the same way #186

tociek opened this issue Feb 9, 2018 · 4 comments

Comments

@tociek
Copy link
Contributor

tociek commented Feb 9, 2018

Hi,

Here is my code. For some reason SpatialJoinQuery still throws exception even though both RDDs are partitioned the same way. Am I missing something here?

val nh = ShapefileReader.readToPolygonRDD(...)
nh.analyze
nh.spatialPartitioning(GridType.QUADTREE)
nh.spatialPartitionedRDD.repartition(100)

val trains = ShapefileReader.readToPointRDD(...)
trains.analyze
trains.spatialPartitioning(GridType.QUADTREE)
trains.spatialPartitionedRDD.repartition(100)

val res = JoinQuery.SpatialJoinQuery(nh,trains,false,false)
java.lang.IllegalArgumentException: [JoinQuery] queryRDD is not partitioned by the same grids with spatialRDD. Please make sure they both use the same grids otherwise wrong results will appear.
  at org.datasyslab.geospark.spatialOperator.JoinQuery.verifyPartitioningMatch(JoinQuery.java:60)
  at org.datasyslab.geospark.spatialOperator.JoinQuery.spatialJoin(JoinQuery.java:383)
  at org.datasyslab.geospark.spatialOperator.JoinQuery.SpatialJoinQuery(JoinQuery.java:158)

@jiayuasu
Copy link
Member

jiayuasu commented Feb 9, 2018

@tociek You didn't correctly follow the example provided by us.
https://github.com/jiayuasu/GeoSparkTemplateProject/blob/master/geospark/scala/src/main/scala/ScalaExample.scala
This is the correct way to partition both RDDs.

    arealmRDD.analyze()
    tripRDD.analyze()

    tripRDD.spatialPartitioning(GridType.KDBTREE)
    tripRDD.buildIndex(IndexType.QUADTREE, true)
    arealmRDD.spatialPartitioning(tripRDD.getPartitioner)

@tociek
Copy link
Contributor Author

tociek commented Feb 11, 2018

@jiayuasu Thanks for pointing that out - it took me a while, but now I understand what happened here... (I think ;) )
I thought that it is enough if both RDDs are partitioned using the same method, while in fact, they must share the same partitions boundaries values in order to get correct result. In other words the use of arealmRDD.spatialPartitioning(tripRDD.getPartitioner) method actually assigns tripRDD grid (values) to arealmRDD, whereas tripRDD.spatialPartitioning(GridType.KDBTREE) specifies method for partitioning.

Am I right?

PS. I would be happy to contribute to geospark tutorials (for dumbs like me) once I get better understanding of your great work.

@jiayuasu
Copy link
Member

jiayuasu commented Feb 12, 2018

@tociek

Yes, you understand the meaning of "partitioned by the same grids". We intentionally divide the process of a spatial join query to multiple phases so that the user can control and optimize each phase on his own, although this is confusing for new users.

You are more than welcome to join the editing of GeoSpark tutorial. We do need GeoSpark users to collaborate with us to improve GeoSpark including source code, tutorials, demos, use cases.

@tociek
Copy link
Contributor Author

tociek commented Feb 12, 2018

@jiayuasu
In my humble opinion overloaded spatialPartitioning method is confusing, as it does different things depending on parameter type. The one which takes partitioner as a parameter should be called something like assignPartitioningGrid or setPartitionGrid or setPartitioningScheme. In such case partitioner should have a method like getGrid/getScheme which in turn would be an argument for the method... Just a thought...

@jiayuasu jiayuasu pinned this issue May 25, 2019
@jiayuasu jiayuasu unpinned this issue Jul 12, 2020
jiayuasu pushed a commit that referenced this issue May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants