Propagate SRID from EWKB as PROJJSON CRS in geoarrow.wkb metadata

The driver already emits `geoarrow.wkb` Arrow extension metadata (thanks! confirmed working in #1), but the `ARROW:extension:metadata` field has no CRS information. Since Redshift sends geometry as EWKB over the wire — which includes the SRID in the header — the driver could propagate it as PROJJSON CRS in the geoarrow metadata.

## Why this matters

Without CRS metadata, consumers (DuckDB, GeoParquet writers, pyarrow) receive geometry with no coordinate reference system. When the data gets written to GeoParquet, the CRS field is empty. This means downstream tools can't distinguish EPSG:4326 from EPSG:3857 or any other projection — they have to guess or require the user to specify it out-of-band.

## What the driver does today

1. Receives EWKB from Redshift (flag bit `0x20000000` + 4-byte SRID in header)
2. Strips the SRID, outputs plain WKB
3. Tags the Arrow column with `ARROW:extension:name = "geoarrow.wkb"`
4. `ARROW:extension:metadata` is empty (no CRS)

## Proposed change

Extract the SRID from the first non-null EWKB value and encode it as PROJJSON in `ARROW:extension:metadata`:

```json
{
  "crs": {
    "type": "ProjectedCRS" or "GeographicCRS",
    "name": "EPSG:4326",
    "id": { "authority": "EPSG", "code": 4326 }
  }
}
```

This is the same approach used in the Databricks ADBC driver: [adbc-drivers/databricks#350](https://github.com/adbc-drivers/databricks/pull/350). The Databricks case is more complex because it needs to flatten `Struct<srid, wkb>` first — here the SRID is already in the EWKB header, so it's just a matter of reading it during decode and attaching it to the schema.

For SRID 0 (unset), the metadata should remain empty per the geoarrow spec (no CRS = unknown).

## Two-phase schema

Since the SRID isn't known until the first record batch is read, this likely needs the same two-phase approach as Databricks: emit the schema without CRS initially, then rebuild it with CRS after seeing the first non-null geometry. Alternatively, the driver could query `SELECT ST_SRID(geom_col) FROM table LIMIT 1` upfront, but reading it from the EWKB bytes is simpler and doesn't require an extra round-trip.

## Context

We're using the Redshift ADBC driver from DuckDB (`adbc_scanner`) to export geospatial data to GeoParquet. DuckDB picks up `geoarrow.wkb` as native GEOMETRY, which is great — but the missing CRS means the output GeoParquet has no coordinate reference system. We recently got this working end-to-end for Databricks via the PR linked above, and the Redshift case should be strictly simpler.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate SRID from EWKB as PROJJSON CRS in geoarrow.wkb metadata #2

Why this matters

What the driver does today

Proposed change

Two-phase schema

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Propagate SRID from EWKB as PROJJSON CRS in geoarrow.wkb metadata #2

Description

Why this matters

What the driver does today

Proposed change

Two-phase schema

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions