Skip to content

ST_Transform slow due to lock contention #456

Description

@devyn

Expected behavior

ST_Transform should cache values from CRS (decode, findMathTransform) manually in a thread-local cache to avoid waiting around for locks on the caches internal to CRS.

GeoSpark uses the CRS utilities in a way that I don't think was anticipated by the authors of geotools by looking up the same spatial referencing information for every single row across many threads.

Actual behavior

The synchronization inside the caches that geotools' CRS utility singleton eventually references mean that the vast majority of ST_Transform work ends up single threaded within each executor.

Steps to reproduce the problem

Do an ST_Transform on a large set of data with a single executor and watch thread execution (either by CPU usage, or with VisualVM) - threads end up waiting their turn for access to the cache in CRS

Settings

GeoSpark version = 1.2.0

Apache Spark version = 2.4.4

JRE version = 1.8

API type = Scala

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions