Expected behavior
ST_Union_Aggr to have good performance.
Actual behavior
ST_Union_Aggr is taking too long when there are many polygons. The behavior in the cluster is that the task gets stuck for a long time.
I'm aggregating buffers (ST_Buffer). The aggregations with few buffers, say 10000, end fast. But for aggregations with 80000 buffers, it takes about 2 hours.
| Number of buffers |
Time |
| 10000 |
10 min |
| 80000 |
2h |
I know it's a tough operation and that it generates a lot of skewed data, which end up hammering a single executor.
Is there anything I can do to improve the performance?
GeoSpark version = 1.2.0
Apache Spark version = 2.3.0
JRE version = 1.8
API type = Scala
Expected behavior
ST_Union_Aggr to have good performance.
Actual behavior
ST_Union_Aggr is taking too long when there are many polygons. The behavior in the cluster is that the task gets stuck for a long time.
I'm aggregating buffers (ST_Buffer). The aggregations with few buffers, say 10000, end fast. But for aggregations with 80000 buffers, it takes about 2 hours.
I know it's a tough operation and that it generates a lot of skewed data, which end up hammering a single executor.
Is there anything I can do to improve the performance?
GeoSpark version = 1.2.0
Apache Spark version = 2.3.0
JRE version = 1.8
API type = Scala