[GH-2700] Add 02-vegetation-change notebook: end-to-end raster workflow#2896
Merged
jiayuasu merged 2 commits intoapache:masterfrom May 3, 2026
Merged
[GH-2700] Add 02-vegetation-change notebook: end-to-end raster workflow#2896jiayuasu merged 2 commits intoapache:masterfrom
jiayuasu merged 2 commits intoapache:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new end-to-end raster use-case notebook to the Sedona docker example suite, demonstrating a full “raster → analytics → vector aggregation → COG output → visualization” workflow on Sedona’s 1.9 raster APIs.
Changes:
- Introduces
02-vegetation-change.ipynb, synthesizing two small GeoTIFF scenes and computing NDVI + ΔNDVI viaRS_MapAlgebra. - Aggregates raster results to vector parcels using
RS_ZonalStats, ranks parcels, then clips and exports a COG withRS_AsCOG. - Visualizes the pipeline outputs in a 4-panel matplotlib figure.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+78
to
+88
| "## 3. Load both scenes with the `raster` data source\n", | ||
| "\n", | ||
| "`sedona.read.format(\"raster\")` (new in 1.9) auto-tiles GeoTIFFs and yields one `Raster`-typed row per tile, sidestepping Spark's 2 GB record-size ceiling on large GeoTIFFs. Our scenes are tiny so each yields a single tile, but the call shape is identical for multi-gigabyte inputs." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": "scenes = (\n sedona.read.format(\"raster\")\n .load(f\"{WORK}/scene_*.tif\")\n .selectExpr(\"split(name, '\\\\\\\\.')[0] as scene\", \"rast\")\n)\nscenes.cache()\nscenes.show(truncate=80)" |
Comment on lines
+135
to
+142
| "delta = sedona.sql(\"\"\"\n", | ||
| " SELECT RS_MapAlgebra(\n", | ||
| " (SELECT rast FROM ndvi WHERE scene = 'scene_after'),\n", | ||
| " (SELECT rast FROM ndvi WHERE scene = 'scene_before'),\n", | ||
| " 'D',\n", | ||
| " 'out[0] = rast0[0] - rast1[0];',\n", | ||
| " -9999.0\n", | ||
| " ) AS rast\n", |
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": "import matplotlib.patches as mpatches\nimport matplotlib.pyplot as plt\n\n\ndef ndvi_arr(path):\n with rasterio.open(path) as ds:\n red = ds.read(1).astype(\"float32\")\n nir = ds.read(2).astype(\"float32\")\n return (nir - red) / (nir + red + 1e-6)\n\n\nndvi_before = ndvi_arr(f\"{WORK}/scene_before.tif\")\nndvi_after = ndvi_arr(f\"{WORK}/scene_after.tif\")\ndelta_arr = ndvi_after - ndvi_before\nwith rasterio.open(f\"{WORK}/delta_topparcel_cog.tif\") as ds:\n top_arr = ds.read(1)\n top_extent = (ds.bounds.left, ds.bounds.right, ds.bounds.bottom, ds.bounds.top)\n\nfig, axes = plt.subplots(1, 4, figsize=(16, 4))\nextent = (AOI[0], AOI[2], AOI[1], AOI[3])\naxes[0].imshow(ndvi_before, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[0].set_title(\"NDVI before\")\naxes[1].imshow(ndvi_after, vmin=-0.2, vmax=0.8, cmap=\"RdYlGn\", extent=extent)\naxes[1].set_title(\"NDVI after\")\naxes[2].imshow(delta_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=extent)\naxes[2].set_title(\"\u0394NDVI (after \u2212 before)\")\naxes[3].imshow(top_arr, vmin=-0.5, vmax=0.5, cmap=\"PiYG\", extent=top_extent)\naxes[3].set_title(f\"Top parcel ({top_id}) \u0394NDVI\")\nfor ax in axes:\n ax.set_xticks([])\n ax.set_yticks([])\nfig.tight_layout()\nfig" |
…, parcel overlay)
Member
Author
|
Pushed
Re-verified end-to-end through the local mirror of Output unchanged: top-greening parcel |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes part of Sedona example notebooks in the docker image are very out of date #2700.What changes were proposed in this PR?
Continues the docker-image notebook refresh series (issue #2700, milestone 1.9.1). Adds the first raster-pipeline notebook in the series.
docs/usecases/02-vegetation-change.ipynbanswers:End-to-end on Sedona's 1.9 raster surface:
/tmp/veg-change/(uint16, EPSG:4326, tiled GeoTIFF). The "before" scene is mostly bare; the "after" scene has a circular field of vegetation in the south-west corner with elevated NIR. Written withtiled=True, blockxsize=256, blockysize=256because the Sedona raster reader rejects strip-based GeoTIFFs as "too thin".sedona.read.format("raster")— the new auto-tiling reader (Add a new raster data source reader that can automatically tile GeoTiffs and bypass the Spark record limit #2672, 1.9.0).RS_MapAlgebrato compute NDVI per scene.RS_MapAlgebrato compute the per-pixel ΔNDVI delta.RS_ZonalStats(rast, geom, 'mean')— the canonical raster→vector aggregation.RS_Clipon the top-ranked parcel for a close-up.RS_AsCOG(Add RS_AsCOG (Cloud Optimized GeoTiff) writer with necessary configs #2652, 1.9.0) round-trip through a Cloud-Optimized GeoTIFF; read back via the samerasterreader to prove it's valid for cloud-hosted streaming.The synthesized greening pattern places its peak in parcel P10, which is what the workflow ranks top — built-in ground truth for the answer.
Notebook is structured as numbered markdown sections (
## 1.through## 9.), matching the convention from01-mobility-pulseand05-geopandas-on-spark. Notebook intro flags**Requires Sedona ≥ 1.9.0.**explicitly because the auto-tiling raster reader andRS_AsCOGare 1.9-only.No new data shipped. No network required.
How was this patch tested?
End-to-end through the local mirror of
docker/test-notebooks.sh(matched docker stack: Python 3.10,pyspark==4.0.1,apache-sedona==1.9.0, JDK 17,local[*],DRIVER_MEM=4g, Sedona JAR viaPYSPARK_SUBMIT_ARGSMaven coords).Output sanity-checked: top-greening parcel
P10matches the synthesized field location; COG round-trip read-back as 65×65 REAL_64BITS as expected; allRS_*results have the right dimensions.Three real failure modes surfaced and were fixed during local verification before this commit:
/tmppollution intercepted Spark's directory listing for the input glob → use a dedicated/tmp/veg-change/subdir for the synthetic rasters.rasterdata source schema is[rast, x, y, name](notpath); derive the scene label fromname.tiled=True, blockxsize=256, blockysize=256torasterio.open.The CI Docker-build workflow (path-filter widening landed in #2889) will run on this PR — the
apache/sedona:latestmatrix leg builds the image with this notebook bundled and runstest-notebooks.shagainst it, so the in-container PASS line lands in CI.Did this PR include necessary documentation updates?
**Requires Sedona ≥ 1.9.0.**and lists the gotchas (tiled GeoTIFF requirement,namenotpathin the schema).docs/usecases/data/README.mdupdates.