-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
🚀 Feature
The Kaggle python environment currently pins geopandas==v0.14.4 which is now 16 months old, saying the learntools are broken with higher versions. Is this still true? Has anyone checked lately?
It also uses pyarrow==19.0.1 but the most recent version is 21.
Motivation
This is starting to impact the ability to use new geospatial functionality. E.g. PyArrow v21 has added native support for geospatial data types in Parquet:
- GH-45522: [Parquet][C++] Parquet GEOMETRY and GEOGRAPHY logical type implementations apache/arrow#45459
- [Parquet][C++] Implement Geography and Geometry types in the C++ Parquet implementation apache/arrow#45522
GeoParquet outputs generated with pyarrow v21 now seem to result in issues with the older geopandas / pyarrow on Kaggle. E.g.
# geopandas 0.14.4 + pyarrow 19.0.1
import geopandas as gpd
gpd.read_parquet("s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states")Results in
OSError: Error creating dataset.
Could not read schema from 'pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet'.
Is this a 'parquet' file?: Could not open Parquet input source 'pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet': Metadata contains Thrift LogicalType that is not recognized
While the same expression with geopandas 1.1.1 + pyarow 21.0.0 works fine.
However, if the geoparquet output is generated using pyarrow v20.0.0 (before native geoparquet support) + custom b"geo" metadata, the older geopandas & pyarrow setup is able to read them.