beta(0.4.0): GeoBrix 0.4.0 — 154 functions across a full lightweight tier (pyrx/pygx/pyvx) + heavyweight by mjohns-databricks · Pull Request #33 · databrickslabs/geobrix

mjohns-databricks · 2026-05-28T14:03:21Z

Summary

GeoBrix v0.4.0 — a major release. 154 functions across RasterX (107), GridX (40), VectorX (6), and PMTiles (1), now delivered through two interchangeable execution tiers:

Lightweight — pure-Python/PySpark (pyrx / pygx / pyvx); no JAR, no init script, no native GDAL. Runs on Serverless, standard/shared clusters, Lakeflow declarative pipelines, and ARM.
Heavyweight — Scala + GDAL/OGR on Apache Spark, for distributed processing on classic x86 clusters.

Both tiers register the same function names, so moving between them is a one-line import swap, with exact cross-tier result parity.

New capabilities since 0.3.0

RasterX

Vector↔raster bridge (rasterize / polygonize); raster→quadbin aggregators (parallel to the H3 family)
Web-mercator XYZ tile output (to_webmercator, tilexyz, xyzpyramid)
Terrain analysis (slope, aspect, hillshade, TRI, TPI, roughness, color-relief)
Spectral indices (EVI, SAVI, NDWI, NBR + generic dispatcher)
Resample + IDW interpolation; pixel ops/extraction; COG, proximity, contour, viewshed; TIN DTM rasters

GridX

CARTO quadbin v0 subpackage (10 functions)
Custom user-defined grids (7 functions)
British National Grid (BNG) carried forward

VectorX

Mapbox Vector Tile encoding (st_asmvt) + tile pyramid generator (st_asmvt_pyramid)
TIN surface modeling (st_triangulate, interpolate-elevation)
Legacy-Mosaic geometry migration

PMTiles

PMTiles v3 archive .write.format("pmtiles") DataSource + gbx_pmtiles_agg aggregate

Lightweight tier (the headline of 0.4.0)

The pure-Python tier reaches full parity with the heavyweight one across all three packages:

pyrx (RasterX) — every rst_* function; native raster_gbx / gtiff_gbx DataSource readers + writer; lightweight vector readers/writers; PMTiles light writer.
pyvx (VectorX) — MVT encoding + pyramid, TIN surface modeling, legacy-geometry migration.
pygx (GridX) — quadbin, BNG, and custom grids, with exact cell-set parity.

Heavyweight remains the choice for the OGR vector readers, the Steiner-point conforming triangulation mode, and the heavy PMTiles DataSource writer.

Benchmarks

Light-vs-heavy benchmarks across the function surface (cluster, 1,000 tiles, 1 warmup / 5 measured), published in docs/docs/api/benchmarking.mdx. The GridX families (BNG 23, quadbin 10, custom 7) sit roughly at parity with exact cross-tier parity gates; per-op reads and the UDF-boundary cost model are documented.

Verification

JAR-gated cross-tier exact parity suites across RasterX, VectorX, and GridX.
Binding parity 154/154 (Scala override def name / Python functions.py / function-info.json), enforced on every push.
Doc-coverage and user-facing-voice gates green.

Full per-feature changelog: docs/docs/beta-release-notes.mdx § What's new in v0.4.0.

This pull request and its description were written by Isaac.

Ports the custom cell->geometry decoders so the lightweight tier can turn a cell id back into its polygon/centroid. Emits plain WKB with no SRID (matching heavy JTS.toWKB, the 2D no-SRID variant) so geometry parity holds across tiers. Co-authored-by: Isaac

Ports CustomGridSystem.polyfill so the lightweight tier can enumerate the cells covering a geometry. Keeps heavy's intentional +1 bbox over-scan then filters by cell-centroid containment, so the light cell set is exact-equal to heavy. Co-authored-by: Isaac

Ports CustomGridSystem.kRing so the lightweight tier can expand a cell to its Chebyshev (square) neighborhood -- the cell-set primitive callers need for grid-local windowing without dropping to heavy. Clamped to grid bounds (verbatim heavy), so edge cells return fewer neighbors rather than off-grid ids. Co-authored-by: Isaac

@udf

Wire the custom-gridding SQL surface over the completed _custom.py core: - _serde.CUSTOM_GRID_SCHEMA: 8-field grid STRUCT matching heavy Custom_GridSpec.gridStructType (4 LONG bounds + cell_splits/root_x/root_y/ srid INT, all non-null). - _env.assert_custom_available (shapely-only guard). - gbx_custom_grid: validating @udf (eager xMax>xMin/yMax>yMin/splits>=2/ rootX,Y>0 like Custom_Grid.eval); 7-arg form defaults srid=-1. - pointascell/cellaswkb/cellaswkt/centroid: pandas_udf; pointascell takes the geometry's FIRST coordinate (heavy getCoordinate, not centroid). - polyfill/kring: plain @udf (array-output, scale-safe). - 7 custom_* Column wrappers (custom_grid always supplies all 8 args). Geom inputs via parse_geom ([E]WKB/[E]WKT); cell geometry plain WKB no SRID. Serverless-safe (udf.register + Column exprs only). 8 register-fn tests green. Co-authored-by: Isaac

The require guard checked !x.isNaN twice, so a NaN northing was never caught by the NaN guard; it was only incidentally rejected by the later Y-bounds require (NaN comparisons are false), surfacing a misleading out-of-bounds message instead. Mirrors the light pygx guard that surfaced it. Co-authored-by: Isaac

Cross-tier light(pygx) vs heavy(gridx.custom) parity for all 7 gbx_custom_* functions, gating on the staged product JAR (auto-skips without it). Validates the verbatim CustomGridSystem/GridConf port against heavy Scala: - pointascell EXACT BIGINT across interior/origin/max-corner cells, res 0 and deeper res, cell_splits 2 and 4, and srid 27700 vs -1 (srid is metadata only -> identical ids). - polyfill and kring EXACT cell-set equality. kring includes the max-corner (999,999) seed so the upper clamp min(pos+k, totalCells) is exercised -- the unit tests only reached the lower/origin corner; light == heavy holds. - cellaswkb/cellaswkt/centroid geometry within 1e-6 with SRID == 0 in BOTH tiers (custom carries no SRID; JTS.toWKB is the 2D no-SRID variant). - all-4-encodings (WKB/EWKB/WKT/EWKT) for pointascell and polyfill: identical within each tier and light == heavy. - Y-NaN lock-in (CG-T8): light point_to_cell_id and heavy gbx_custom_pointascell both reject a NaN-Y point. No _custom.py/functions.py port fixes were required to reach green. Co-authored-by: Isaac

The hero button read 'Get Started — Lightweight ->' and pointed at the tier-scoped quick-start; simplify to a single 'Get Started' that lands on the tier-neutral intro page. Co-authored-by: Isaac

Mirror the BNG/quadbin grid bench legs for the 7 gbx_custom_* functions so pygx (light) is benchmarked against gridx.custom (heavy) with exact-output parity. Adds a fixed custom-grid corpus (the doc example grid 0,1000000,0,1000000,2,1000,1000,27700) with point/polygon/cell generators, four representative run_custom_* legs (pointascell, polyfill, kring, cellaswkb), a _CELL_GRID_CUSTOM dispatch + parity cell, and the --benchmark-grid-custom / --grid-custom-only launcher flags. pointascell uses the geometry's first coordinate (heavy geom.getCoordinate), not the centroid (unlike BNG). cellaswkt/centroid share cellaswkb's cell->geometry UDF boundary, and gbx_custom_grid is the validating STRUCT constructor exercised on every leg as the inline grid build, so neither is benchmarked as a standalone leg. Co-authored-by: Isaac

Co-authored-by: Isaac

The committed file failed the Docker black check (CI gate); reformat in-container to match CI. Co-authored-by: Isaac

The pygx custom-grid port shipped with exact cross-tier parity, so GridX is now fully 1:1 light<->heavy. Flip the 7 gbx_custom_* tier tags heavy->both, correct the coverage claims on intro/quick-start/ execution-tiers/README, and mark the design spec IMPLEMENTED (supersedes the custom-is-heavy-only out-of-scope note). Benchmarking numbers deferred to the staged cluster run. Co-authored-by: Isaac

Release notes under-told the headline 0.4.0 story (lightweight tier was described as raster-only). Cover pyrx/pygx/pyvx and the now-both-tier GridX + VectorX; fix the quadbin count (9->10, add cellunion_agg); note gbx_pmtiles_agg is both-tier; frame custom grids as both-tier; add the BNG string-resolution consistency fix, the light-BINARY-vs-heavy-STRUCT agg deviation, and the custom pointToCellID NaN-Y fix. Landing page: drop the stale 'custom grids are heavyweight-only' claims (GridX feature + tier callout) now that custom-grid light parity shipped, and retitle the readers card to 'Powerful Readers & Writers'. Co-authored-by: Isaac

The em-dash aside ('pure Python, Serverless/ARM-ready -- and the heavyweight Scala tier') repeated the tier story already told in the callout below; drop it from both the GridX and VectorX cards. Co-authored-by: Isaac

Frame the heavyweight rationale around full GDAL/OGR rather than just the readers -- OGR writers and format-specific GDAL options are also heavyweight-only, so 'full GDAL/OGR' is the accurate draw. Co-authored-by: Isaac

The RasterX/GridX/VectorX cards each repeated the same lightweight/ heavyweight tier sentence; the dedicated tier callout below already covers that. Pare the cards down to capability highlights. Co-authored-by: Isaac

The custom-grid cluster bench (CG-T11) shipped only 4 of the 7 gbx_custom_* functions and its parity gate covered just those 4. Add the 3 missing legs -- custom_cellaswkt (cell -> WKT string), custom_centroid (cell -> WKB point), and custom_grid (8 scalar args -> validated STRUCT) -- so the custom set reaches the same completeness as the 23-leg BNG set, and extend the _CELL_GRID_CUSTOM hard parity gate to all 7 (cellaswkt/centroid: decoded geometry < 1e-6; grid: exact struct field-tuple equality). _CUSTOM_N_LEGS 4 -> 7. Parity on a cluster run is captured by the dedicated verdict block inside the grid cell (same mechanism as BNG/quadbin), NOT by comparison.csv fingerprints -- no grid leg (BNG, quadbin, or custom) emits an output_fingerprint, so all grid rows show comparison.csv consistency=na by design; the real per-fn verdict is the asserted PASS/FAIL gate. Local light-vs-heavy smoke (JAR present): all 14 leg runs ok, all 7 parity gates exact. Co-authored-by: Isaac

Remove an unused 'import math' (test_parity_custom) and an unused 'to_wkb as _towkb' (test_custom_core) left over from CG-T9; both tripped the flake8 F401 gate. Co-authored-by: Isaac

Adds the custom equal-area grid (light pygx vs heavy gridx.custom) cluster benchmark to the GridX results tab, mirroring the existing quadbin/BNG subsections. Records exact cross-tier parity for all 7 gbx_custom_* functions and the spark-path timings (1000 tiles), with the UDF-boundary framing so the sub-second ~0.5-0.8x gap on cheap per-row ops reads as the Python/Arrow serialization fixed cost, not an algorithmic deficiency. Completes GridX (quadbin + BNG + custom) bench coverage on this page. Co-authored-by: Isaac

The custom-grid table was built from a single-measured-iteration run whose sub-millisecond timings were dominated by run-to-run noise (reported light as ~0.5-0.8x / 2-3x slower). Re-running the grid legs at the BNG-consistent 1 warmup + 5 measured iterations produces stable medians: custom now reads roughly at parity (0.87x-1.34x). The same 5-measured run finally makes the quadbin per-function speedups stable enough to publish, so this adds the 10-row quadbin table that previously did not exist (prose had argued the 1-iter numbers were too noisy to print) and aligns all three grid tables on the same methodology note. Co-authored-by: Isaac

The two Overview pages floated at the top of 'Readers & Writers', above the Readers and Writers category parents. Move each to be the first item under its respective parent so the overview reads as the section intro. Co-authored-by: Isaac

Writers Overview was left at the bottom of the Writers category; move it to the first item so both Readers and Writers lead with their overview. Co-authored-by: Isaac

On the API overview, replace the redundant heading text + oversized icon pairs with the package wordmark logo as the heading itself (RasterX/ GridX/VectorX). On each function-reference page, put the wordmark logo left of the title at 3em, drop the now-redundant package word, and remove the duplicate full-size body logo. Co-authored-by: Isaac

Lightweight GridX (pygx): BNG, quadbin & custom grids at 1:1 parity + benchmarks & docs

DBR 17.3 LTS and 18 LTS both ship Scala 2.13.16; align the build to the runtime. The scoverage scalacPluginVersion pin stays at 2.3.0: Maven Central publishes only 2.3.0 for scalac-scoverage-plugin_2.13.16 (the 2.1.5 plugin default 2.5.2 is 404 for this scala minor), same as 2.13.12. Refreshed the pin comment to reference 2.13.16. skipScoverage and standard (scoverage:test, 20247 statements instrumented) builds both green under 2.13.16. Co-authored-by: Isaac

DBR 17.3 LTS (and 18 LTS) ship Scala 2.13.16; the note said 2.13.12. Co-authored-by: Isaac

Add a prominent DBR LTS compatibility table (Ubuntu/Spark/Python/Scala/ Java + GeoBrix support) to README, installation, and intro; note DBR 19 (Ubuntu 26.04) will require rebuilding the heavyweight native GDAL libs. Normalize stale 'DBR 17.3-only' support/requirement references to 17.3 + 18 (single wheel + Java-17-bytecode JAR run on both); keep factual bench- environment records, the product ST/H3 17.1+ floor, and historical notes. Co-authored-by: Isaac

Per review, the Supported Databricks Runtimes table reads better after the function-count stat cards than before them; move it below the boxes (after the GridX breakdown line), before Background. Co-authored-by: Isaac

Re-render the RasterX info-graphic SVGs and rasterize to PNG so the version pill reads 'DBR 17.3 / 18 LTS / Scala 2.13.16' instead of the stale '17.3 LTS / Scala 2.13'. Co-authored-by: Isaac

Mirror the intro reorder: the Supported Databricks Runtimes table reads better after the Tiers overview than at the very top, so move it down to just before Quick start. Co-authored-by: Isaac

mjohns-databricks had a problem deploying to runtime May 28, 2026 14:03 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 14:03 — with GitHub Actions Inactive

mjohns-databricks temporarily deployed to runtime May 28, 2026 14:08 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 14:08 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 18:40 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 18:40 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 18:40 — with GitHub Actions Inactive

mjohns-databricks temporarily deployed to runtime May 28, 2026 18:59 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 18:59 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 18:59 — with GitHub Actions Inactive

mjohns-databricks temporarily deployed to runtime May 28, 2026 19:16 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 19:16 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 19:16 — with GitHub Actions Inactive

mjohns-databricks temporarily deployed to runtime May 28, 2026 19:30 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 19:30 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 19:30 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 20:22 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 20:22 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 20:22 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 20:37 — with GitHub Actions Inactive

mjohns-databricks had a problem deploying to runtime May 28, 2026 20:37 — with GitHub Actions Failure

mjohns-databricks temporarily deployed to runtime May 28, 2026 20:37 — with GitHub Actions Inactive

Michael Johns and others added 29 commits June 15, 2026 09:24

docs(landing): simplify hero CTA to 'Get Started' -> intro

a7bbe83

The hero button read 'Get Started — Lightweight ->' and pointed at the tier-scoped quick-start; simplify to a single 'Get Started' that lands on the tier-neutral intro page. Co-authored-by: Isaac

docs(readme): drop redundant function-count line (badges convey it)

7a67b98

Co-authored-by: Isaac

style(pygx): black-format test_bng_tessellate (CI lint gate)

5715931

The committed file failed the Docker black check (CI gate); reformat in-container to match CI. Co-authored-by: Isaac

docs(landing): trim tier-detail aside from GridX/VectorX cards

6e295fd

The em-dash aside ('pure Python, Serverless/ARM-ready -- and the heavyweight Scala tier') repeated the tier story already told in the callout below; drop it from both the GridX and VectorX cards. Co-authored-by: Isaac

docs(landing): heavyweight tier framed as 'full GDAL/OGR'

164afdb

Frame the heavyweight rationale around full GDAL/OGR rather than just the readers -- OGR writers and format-specific GDAL options are also heavyweight-only, so 'full GDAL/OGR' is the accurate draw. Co-authored-by: Isaac

docs(landing): trim repetitive tier boilerplate from feature cards

cc1e927

The RasterX/GridX/VectorX cards each repeated the same lightweight/ heavyweight tier sentence; the dedicated tier callout below already covers that. Pare the cards down to capability highlights. Co-authored-by: Isaac

style(pygx): drop unused imports in custom test files (flake8 F401)

10c828a

Remove an unused 'import math' (test_parity_custom) and an unused 'to_wkb as _towkb' (test_custom_core) left over from CG-T9; both tripped the flake8 F401 gate. Co-authored-by: Isaac

docs(sidebar): Writers Overview first under Writers (match Readers)

ccbdab4

Writers Overview was left at the bottom of the Writers category; move it to the first item so both Readers and Writers lead with their overview. Co-authored-by: Isaac

Merge pull request #39 from databrickslabs/pygx-light

33b7482

Lightweight GridX (pygx): BNG, quadbin & custom grids at 1:1 parity + benchmarks & docs

docs(claude): correct Scala version note to 2.13.16

d43743c

DBR 17.3 LTS (and 18 LTS) ship Scala 2.13.16; the note said 2.13.12. Co-authored-by: Isaac

docs(intro): move runtime table below the function-count boxes

6e1cc79

Per review, the Supported Databricks Runtimes table reads better after the function-count stat cards than before them; move it below the boxes (after the GridX breakdown line), before Background. Co-authored-by: Isaac

docs(images): regenerate pills for DBR 17.3/18 + Scala 2.13.16

40f6023

Re-render the RasterX info-graphic SVGs and rasterize to PNG so the version pill reads 'DBR 17.3 / 18 LTS / Scala 2.13.16' instead of the stale '17.3 LTS / Scala 2.13'. Co-authored-by: Isaac

docs(readme): move runtime table below the Tiers section

622fff5

Mirror the intro reorder: the Supported Databricks Runtimes table reads better after the Tiers overview than at the very top, so move it down to just before Quick start. Co-authored-by: Isaac

mjohns-databricks mentioned this pull request Jun 19, 2026

beta(0.4.0): lightweight-tier hardening — Serverless, geometry consistency, reader defaults #40

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

beta(0.4.0): GeoBrix 0.4.0 — 154 functions across a full lightweight tier (pyrx/pygx/pyvx) + heavyweight#33

beta(0.4.0): GeoBrix 0.4.0 — 154 functions across a full lightweight tier (pyrx/pygx/pyvx) + heavyweight#33
mjohns-databricks merged 822 commits into
mainfrom
beta/0.4.0

mjohns-databricks commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mjohns-databricks commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New capabilities since 0.3.0

Lightweight tier (the headline of 0.4.0)

Benchmarks

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mjohns-databricks commented May 28, 2026 •

edited

Loading