Skip to content

[Question] ST_Union_Aggr performance #360

Description

@pedromorfeu

Expected behavior

ST_Union_Aggr to have good performance.

Actual behavior

ST_Union_Aggr is taking too long when there are many polygons. The behavior in the cluster is that the task gets stuck for a long time.

I'm aggregating buffers (ST_Buffer). The aggregations with few buffers, say 10000, end fast. But for aggregations with 80000 buffers, it takes about 2 hours.

Number of buffers Time
10000 10 min
80000 2h

I know it's a tough operation and that it generates a lot of skewed data, which end up hammering a single executor.

Is there anything I can do to improve the performance?

GeoSpark version = 1.2.0
Apache Spark version = 2.3.0
JRE version = 1.8
API type = Scala

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions