[Question] ST_Union_Aggr performance

## Expected behavior
ST_Union_Aggr to have good performance.

## Actual behavior
ST_Union_Aggr is taking too long when there are many polygons. The behavior in the cluster is that the task gets stuck for a long time. 

I'm aggregating buffers (ST_Buffer). The aggregations with few buffers, say 10000, end fast. But for aggregations with 80000 buffers, it takes about 2 hours.

|Number of buffers|Time|
|--|--|
|10000|10 min|
|80000|**2h**|

I know it's a tough operation and that it generates a lot of skewed data, which end up hammering a single executor.

Is there anything I can do to improve the performance?

GeoSpark version = 1.2.0
Apache Spark version = 2.3.0
JRE version = 1.8
API type = Scala


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] ST_Union_Aggr performance #360

Expected behavior

Actual behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Number of buffers	Time
10000	10 min
80000	2h

Uh oh!

[Question] ST_Union_Aggr performance #360

Description

Expected behavior

Actual behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions