Expected behavior
ST_Transform should cache values from CRS (decode, findMathTransform) manually in a thread-local cache to avoid waiting around for locks on the caches internal to CRS.
GeoSpark uses the CRS utilities in a way that I don't think was anticipated by the authors of geotools by looking up the same spatial referencing information for every single row across many threads.
Actual behavior
The synchronization inside the caches that geotools' CRS utility singleton eventually references mean that the vast majority of ST_Transform work ends up single threaded within each executor.
Steps to reproduce the problem
Do an ST_Transform on a large set of data with a single executor and watch thread execution (either by CPU usage, or with VisualVM) - threads end up waiting their turn for access to the cache in CRS
Settings
GeoSpark version = 1.2.0
Apache Spark version = 2.4.4
JRE version = 1.8
API type = Scala
Expected behavior
ST_Transform should cache values from CRS (decode, findMathTransform) manually in a thread-local cache to avoid waiting around for locks on the caches internal to CRS.
GeoSpark uses the CRS utilities in a way that I don't think was anticipated by the authors of geotools by looking up the same spatial referencing information for every single row across many threads.
Actual behavior
The synchronization inside the caches that geotools' CRS utility singleton eventually references mean that the vast majority of ST_Transform work ends up single threaded within each executor.
Steps to reproduce the problem
Do an ST_Transform on a large set of data with a single executor and watch thread execution (either by CPU usage, or with VisualVM) - threads end up waiting their turn for access to the cache in CRS
Settings
GeoSpark version = 1.2.0
Apache Spark version = 2.4.4
JRE version = 1.8
API type = Scala