beta(0.4.0): GeoBrix 0.4.0 — 154 functions across a full lightweight tier (pyrx/pygx/pyvx) + heavyweight#33
Merged
Merged
Conversation
Ports the custom cell->geometry decoders so the lightweight tier can turn a cell id back into its polygon/centroid. Emits plain WKB with no SRID (matching heavy JTS.toWKB, the 2D no-SRID variant) so geometry parity holds across tiers. Co-authored-by: Isaac
Ports CustomGridSystem.polyfill so the lightweight tier can enumerate the cells covering a geometry. Keeps heavy's intentional +1 bbox over-scan then filters by cell-centroid containment, so the light cell set is exact-equal to heavy. Co-authored-by: Isaac
Ports CustomGridSystem.kRing so the lightweight tier can expand a cell to its Chebyshev (square) neighborhood -- the cell-set primitive callers need for grid-local windowing without dropping to heavy. Clamped to grid bounds (verbatim heavy), so edge cells return fewer neighbors rather than off-grid ids. Co-authored-by: Isaac
Wire the custom-gridding SQL surface over the completed _custom.py core: - _serde.CUSTOM_GRID_SCHEMA: 8-field grid STRUCT matching heavy Custom_GridSpec.gridStructType (4 LONG bounds + cell_splits/root_x/root_y/ srid INT, all non-null). - _env.assert_custom_available (shapely-only guard). - gbx_custom_grid: validating @udf (eager xMax>xMin/yMax>yMin/splits>=2/ rootX,Y>0 like Custom_Grid.eval); 7-arg form defaults srid=-1. - pointascell/cellaswkb/cellaswkt/centroid: pandas_udf; pointascell takes the geometry's FIRST coordinate (heavy getCoordinate, not centroid). - polyfill/kring: plain @udf (array-output, scale-safe). - 7 custom_* Column wrappers (custom_grid always supplies all 8 args). Geom inputs via parse_geom ([E]WKB/[E]WKT); cell geometry plain WKB no SRID. Serverless-safe (udf.register + Column exprs only). 8 register-fn tests green. Co-authored-by: Isaac
The require guard checked !x.isNaN twice, so a NaN northing was never caught by the NaN guard; it was only incidentally rejected by the later Y-bounds require (NaN comparisons are false), surfacing a misleading out-of-bounds message instead. Mirrors the light pygx guard that surfaced it. Co-authored-by: Isaac
Cross-tier light(pygx) vs heavy(gridx.custom) parity for all 7 gbx_custom_* functions, gating on the staged product JAR (auto-skips without it). Validates the verbatim CustomGridSystem/GridConf port against heavy Scala: - pointascell EXACT BIGINT across interior/origin/max-corner cells, res 0 and deeper res, cell_splits 2 and 4, and srid 27700 vs -1 (srid is metadata only -> identical ids). - polyfill and kring EXACT cell-set equality. kring includes the max-corner (999,999) seed so the upper clamp min(pos+k, totalCells) is exercised -- the unit tests only reached the lower/origin corner; light == heavy holds. - cellaswkb/cellaswkt/centroid geometry within 1e-6 with SRID == 0 in BOTH tiers (custom carries no SRID; JTS.toWKB is the 2D no-SRID variant). - all-4-encodings (WKB/EWKB/WKT/EWKT) for pointascell and polyfill: identical within each tier and light == heavy. - Y-NaN lock-in (CG-T8): light point_to_cell_id and heavy gbx_custom_pointascell both reject a NaN-Y point. No _custom.py/functions.py port fixes were required to reach green. Co-authored-by: Isaac
The hero button read 'Get Started — Lightweight ->' and pointed at the tier-scoped quick-start; simplify to a single 'Get Started' that lands on the tier-neutral intro page. Co-authored-by: Isaac
Mirror the BNG/quadbin grid bench legs for the 7 gbx_custom_* functions so pygx (light) is benchmarked against gridx.custom (heavy) with exact-output parity. Adds a fixed custom-grid corpus (the doc example grid 0,1000000,0,1000000,2,1000,1000,27700) with point/polygon/cell generators, four representative run_custom_* legs (pointascell, polyfill, kring, cellaswkb), a _CELL_GRID_CUSTOM dispatch + parity cell, and the --benchmark-grid-custom / --grid-custom-only launcher flags. pointascell uses the geometry's first coordinate (heavy geom.getCoordinate), not the centroid (unlike BNG). cellaswkt/centroid share cellaswkb's cell->geometry UDF boundary, and gbx_custom_grid is the validating STRUCT constructor exercised on every leg as the inline grid build, so neither is benchmarked as a standalone leg. Co-authored-by: Isaac
Co-authored-by: Isaac
The committed file failed the Docker black check (CI gate); reformat in-container to match CI. Co-authored-by: Isaac
The pygx custom-grid port shipped with exact cross-tier parity, so GridX is now fully 1:1 light<->heavy. Flip the 7 gbx_custom_* tier tags heavy->both, correct the coverage claims on intro/quick-start/ execution-tiers/README, and mark the design spec IMPLEMENTED (supersedes the custom-is-heavy-only out-of-scope note). Benchmarking numbers deferred to the staged cluster run. Co-authored-by: Isaac
Release notes under-told the headline 0.4.0 story (lightweight tier was described as raster-only). Cover pyrx/pygx/pyvx and the now-both-tier GridX + VectorX; fix the quadbin count (9->10, add cellunion_agg); note gbx_pmtiles_agg is both-tier; frame custom grids as both-tier; add the BNG string-resolution consistency fix, the light-BINARY-vs-heavy-STRUCT agg deviation, and the custom pointToCellID NaN-Y fix. Landing page: drop the stale 'custom grids are heavyweight-only' claims (GridX feature + tier callout) now that custom-grid light parity shipped, and retitle the readers card to 'Powerful Readers & Writers'. Co-authored-by: Isaac
The em-dash aside ('pure Python, Serverless/ARM-ready -- and the
heavyweight Scala tier') repeated the tier story already told in the
callout below; drop it from both the GridX and VectorX cards.
Co-authored-by: Isaac
Frame the heavyweight rationale around full GDAL/OGR rather than just the readers -- OGR writers and format-specific GDAL options are also heavyweight-only, so 'full GDAL/OGR' is the accurate draw. Co-authored-by: Isaac
The RasterX/GridX/VectorX cards each repeated the same lightweight/ heavyweight tier sentence; the dedicated tier callout below already covers that. Pare the cards down to capability highlights. Co-authored-by: Isaac
The custom-grid cluster bench (CG-T11) shipped only 4 of the 7 gbx_custom_* functions and its parity gate covered just those 4. Add the 3 missing legs -- custom_cellaswkt (cell -> WKT string), custom_centroid (cell -> WKB point), and custom_grid (8 scalar args -> validated STRUCT) -- so the custom set reaches the same completeness as the 23-leg BNG set, and extend the _CELL_GRID_CUSTOM hard parity gate to all 7 (cellaswkt/centroid: decoded geometry < 1e-6; grid: exact struct field-tuple equality). _CUSTOM_N_LEGS 4 -> 7. Parity on a cluster run is captured by the dedicated verdict block inside the grid cell (same mechanism as BNG/quadbin), NOT by comparison.csv fingerprints -- no grid leg (BNG, quadbin, or custom) emits an output_fingerprint, so all grid rows show comparison.csv consistency=na by design; the real per-fn verdict is the asserted PASS/FAIL gate. Local light-vs-heavy smoke (JAR present): all 14 leg runs ok, all 7 parity gates exact. Co-authored-by: Isaac
Remove an unused 'import math' (test_parity_custom) and an unused 'to_wkb as _towkb' (test_custom_core) left over from CG-T9; both tripped the flake8 F401 gate. Co-authored-by: Isaac
Adds the custom equal-area grid (light pygx vs heavy gridx.custom) cluster benchmark to the GridX results tab, mirroring the existing quadbin/BNG subsections. Records exact cross-tier parity for all 7 gbx_custom_* functions and the spark-path timings (1000 tiles), with the UDF-boundary framing so the sub-second ~0.5-0.8x gap on cheap per-row ops reads as the Python/Arrow serialization fixed cost, not an algorithmic deficiency. Completes GridX (quadbin + BNG + custom) bench coverage on this page. Co-authored-by: Isaac
The custom-grid table was built from a single-measured-iteration run whose sub-millisecond timings were dominated by run-to-run noise (reported light as ~0.5-0.8x / 2-3x slower). Re-running the grid legs at the BNG-consistent 1 warmup + 5 measured iterations produces stable medians: custom now reads roughly at parity (0.87x-1.34x). The same 5-measured run finally makes the quadbin per-function speedups stable enough to publish, so this adds the 10-row quadbin table that previously did not exist (prose had argued the 1-iter numbers were too noisy to print) and aligns all three grid tables on the same methodology note. Co-authored-by: Isaac
The two Overview pages floated at the top of 'Readers & Writers', above the Readers and Writers category parents. Move each to be the first item under its respective parent so the overview reads as the section intro. Co-authored-by: Isaac
Writers Overview was left at the bottom of the Writers category; move it to the first item so both Readers and Writers lead with their overview. Co-authored-by: Isaac
On the API overview, replace the redundant heading text + oversized icon pairs with the package wordmark logo as the heading itself (RasterX/ GridX/VectorX). On each function-reference page, put the wordmark logo left of the title at 3em, drop the now-redundant package word, and remove the duplicate full-size body logo. Co-authored-by: Isaac
Lightweight GridX (pygx): BNG, quadbin & custom grids at 1:1 parity + benchmarks & docs
DBR 17.3 LTS and 18 LTS both ship Scala 2.13.16; align the build to the runtime. The scoverage scalacPluginVersion pin stays at 2.3.0: Maven Central publishes only 2.3.0 for scalac-scoverage-plugin_2.13.16 (the 2.1.5 plugin default 2.5.2 is 404 for this scala minor), same as 2.13.12. Refreshed the pin comment to reference 2.13.16. skipScoverage and standard (scoverage:test, 20247 statements instrumented) builds both green under 2.13.16. Co-authored-by: Isaac
DBR 17.3 LTS (and 18 LTS) ship Scala 2.13.16; the note said 2.13.12. Co-authored-by: Isaac
Add a prominent DBR LTS compatibility table (Ubuntu/Spark/Python/Scala/ Java + GeoBrix support) to README, installation, and intro; note DBR 19 (Ubuntu 26.04) will require rebuilding the heavyweight native GDAL libs. Normalize stale 'DBR 17.3-only' support/requirement references to 17.3 + 18 (single wheel + Java-17-bytecode JAR run on both); keep factual bench- environment records, the product ST/H3 17.1+ floor, and historical notes. Co-authored-by: Isaac
Per review, the Supported Databricks Runtimes table reads better after the function-count stat cards than before them; move it below the boxes (after the GridX breakdown line), before Background. Co-authored-by: Isaac
Re-render the RasterX info-graphic SVGs and rasterize to PNG so the version pill reads 'DBR 17.3 / 18 LTS / Scala 2.13.16' instead of the stale '17.3 LTS / Scala 2.13'. Co-authored-by: Isaac
Mirror the intro reorder: the Supported Databricks Runtimes table reads better after the Tiers overview than at the very top, so move it down to just before Quick start. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GeoBrix v0.4.0 — a major release. 154 functions across RasterX (107), GridX (40), VectorX (6), and PMTiles (1), now delivered through two interchangeable execution tiers:
pyrx/pygx/pyvx); no JAR, no init script, no native GDAL. Runs on Serverless, standard/shared clusters, Lakeflow declarative pipelines, and ARM.Both tiers register the same function names, so moving between them is a one-line import swap, with exact cross-tier result parity.
New capabilities since 0.3.0
RasterX
rasterize/polygonize); raster→quadbin aggregators (parallel to the H3 family)to_webmercator,tilexyz,xyzpyramid)GridX
VectorX
st_asmvt) + tile pyramid generator (st_asmvt_pyramid)st_triangulate, interpolate-elevation)PMTiles
.write.format("pmtiles")DataSource +gbx_pmtiles_aggaggregateLightweight tier (the headline of 0.4.0)
The pure-Python tier reaches full parity with the heavyweight one across all three packages:
pyrx(RasterX) — everyrst_*function; nativeraster_gbx/gtiff_gbxDataSource readers + writer; lightweight vector readers/writers; PMTiles light writer.pyvx(VectorX) — MVT encoding + pyramid, TIN surface modeling, legacy-geometry migration.pygx(GridX) — quadbin, BNG, and custom grids, with exact cell-set parity.Heavyweight remains the choice for the OGR vector readers, the Steiner-point
conformingtriangulation mode, and the heavy PMTiles DataSource writer.Benchmarks
Light-vs-heavy benchmarks across the function surface (cluster, 1,000 tiles, 1 warmup / 5 measured), published in
docs/docs/api/benchmarking.mdx. The GridX families (BNG 23, quadbin 10, custom 7) sit roughly at parity with exact cross-tier parity gates; per-op reads and the UDF-boundary cost model are documented.Verification
override def name/ Pythonfunctions.py/function-info.json), enforced on every push.Full per-feature changelog:
docs/docs/beta-release-notes.mdx§ What's new in v0.4.0.This pull request and its description were written by Isaac.