Skip to content

Update to geopandas v1.1.x, pyarrow v21 #1491

@zaneselvans

Description

@zaneselvans

🚀 Feature

The Kaggle python environment currently pins geopandas==v0.14.4 which is now 16 months old, saying the learntools are broken with higher versions. Is this still true? Has anyone checked lately?

It also uses pyarrow==19.0.1 but the most recent version is 21.

Motivation

This is starting to impact the ability to use new geospatial functionality. E.g. PyArrow v21 has added native support for geospatial data types in Parquet:

GeoParquet outputs generated with pyarrow v21 now seem to result in issues with the older geopandas / pyarrow on Kaggle. E.g.

# geopandas 0.14.4 + pyarrow 19.0.1
import geopandas as gpd
gpd.read_parquet("s3://pudl.catalyst.coop/nightly/out_censusdp1tract__states")

Results in

OSError: Error creating dataset.
Could not read schema from 'pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet'.
Is this a 'parquet' file?: Could not open Parquet input source 'pudl.catalyst.coop/nightly/out_censusdp1tract__states.parquet': Metadata contains Thrift LogicalType that is not recognized

While the same expression with geopandas 1.1.1 + pyarow 21.0.0 works fine.

However, if the geoparquet output is generated using pyarrow v20.0.0 (before native geoparquet support) + custom b"geo" metadata, the older geopandas & pyarrow setup is able to read them.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions