Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geoarrow-based multithreaded coordinate reprojection #337

Merged
merged 5 commits into from
Jan 29, 2024

Conversation

kylebarron
Copy link
Member

@kylebarron kylebarron commented Jan 29, 2024

Change list

  • Include PROJJSON CRS on geometry array metadata when converting from GeoPandas
  • Move CRS checking and reprojection into GeoArrow level instead of GeoPandas level

Example

Low level example and benchmark. 4.6x reprojection speedup across 8 threads!

import geodatasets
import pyarrow as pa
import geopandas as gpd
import numpy as np

from lonboard._geoarrow.geopandas_interop import geopandas_to_geoarrow
from lonboard._geoarrow.ops import reproject


path = geodatasets.get_path('nybb')
gdf = gpd.read_file(path)
table = geopandas_to_geoarrow(gdf)

field = table.schema.field(4)
column = table['geometry']
chunk = column.chunk(0)

new_chunked_array = pa.chunked_array([chunk] * 100)
%timeit out = reproject(field, new_chunked_array, max_workers=8)
# 693 ms ± 85.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit out = reproject(field, new_chunked_array, max_workers=1)
# 3.22 s ± 22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@kylebarron kylebarron merged commit 655e55d into main Jan 29, 2024
5 checks passed
@kylebarron kylebarron deleted the kyle/reproject-geoarrow branch January 29, 2024 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant