Skip to content

[GH-2700] Refresh docker example notebooks: scaffolding + quickstart#2876

Merged
jiayuasu merged 2 commits intoapache:masterfrom
jiayuasu:docker-notebooks-refresh
May 1, 2026
Merged

[GH-2700] Refresh docker example notebooks: scaffolding + quickstart#2876
jiayuasu merged 2 commits intoapache:masterfrom
jiayuasu:docker-notebooks-refresh

Conversation

@jiayuasu
Copy link
Copy Markdown
Member

@jiayuasu jiayuasu commented May 1, 2026

Summary

Issue: #2700. Milestone: 1.9.1.

The notebooks bundled in the Sedona docker image are several releases behind. This is the first of a planned series of PRs that refresh them. It's intentionally scoped to scaffolding so the rest of the series (vector / raster / STAC notebooks) can land independently on top of it.

  • Move legacy notebooks to docs/usecases/legacy/. All five existing notebooks (ApacheSedonaCore, ApacheSedonaSQL, ApacheSedonaSQL_SpatialJoin_AirportsPerCountry, ApacheSedonaRaster, Sedona_OvertureMaps_GeoParquet) now live under docs/usecases/legacy/. Existing GitHub URLs continue to work; the docker image stops bundling them because the Dockerfile's COPY docs/usecases/*.ipynb is non-recursive.
  • Add 00-quickstart.ipynb. A nine-cell, ~30-second walkthrough — read two shapefiles, spatial join, aggregate, write GeoParquet 1.1, render a SedonaKepler choropleth. Uses the Natural Earth data already shipped under docs/usecases/data/, so no new bytes and no network are required.
  • docker/test-notebooks.sh:
    • Bump per-notebook timeout 600s → 900s to absorb network variance for upcoming notebooks that hit STAC and remote object stores.
    • Honour SEDONA_NOTEBOOK_OFFLINE=1: notebooks tagged requires-network: true are skipped, so the harness still passes in sandboxed CI environments without outbound network access. The skip count is reported in the summary.
  • docs/tutorial/sql.md: redirect the two existing links from the moved AirportsPerCountry notebook to its new legacy/ path.

Test plan

  • docker build -f docker/sedona-docker.dockerfile -t sedona:dev . succeeds.
  • docker run --rm sedona:dev /opt/sedona/docker/test-notebooks.sh exits 0 and reports 00-quickstart passing.
  • docker run --rm -e SEDONA_NOTEBOOK_OFFLINE=1 sedona:dev /opt/sedona/docker/test-notebooks.sh still exits 0 (no network-tagged notebooks exist yet, so the offline path is a no-op until the next PR).
  • Manual smoke: docker run -p 8888:8888 -p 8080:8080 sedona:dev, open 00-quickstart.ipynb in JupyterLab, run all cells, eyeball the Kepler map.
  • Existing tutorial links from docs/tutorial/sql.md resolve to docs/usecases/legacy/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb.

…start

The notebooks bundled in the Sedona docker image were several releases
behind. Start a refresh series (issue apache#2700, milestone 1.9.1) by:

- moving the five legacy notebooks under docs/usecases/legacy/. They
  remain reachable via existing GitHub URLs but are no longer copied
  into the image (the Dockerfile's COPY of *.ipynb is non-recursive).
- adding 00-quickstart.ipynb: a ten-cell, ~30-second walkthrough that
  reads two shapefiles, runs a spatial join, aggregates, writes
  GeoParquet 1.1, and renders a SedonaKepler choropleth. Uses the
  Natural Earth data already shipped under docs/usecases/data/, so it
  needs no new bytes and no network.
- bumping docker/test-notebooks.sh per-notebook timeout 600s -> 900s
  to absorb network variance for upcoming notebooks that hit STAC and
  remote object stores.
- teaching docker/test-notebooks.sh to honour SEDONA_NOTEBOOK_OFFLINE=1
  so notebooks tagged "requires-network: true" can be skipped in
  sandboxed CI environments without outbound network access.
- redirecting the two existing tutorial/sql.md links from the moved
  AirportsPerCountry notebook to its new legacy/ path.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refreshes the notebooks shipped in the Sedona docker image by introducing a new quickstart notebook, moving older notebooks into a non-shipped legacy/ folder, and extending the docker notebook test harness (timeout + offline skipping).

Changes:

  • Added a new 00-quickstart.ipynb notebook intended for the docker image.
  • Moved existing example notebooks under docs/usecases/legacy/ and updated tutorial links accordingly.
  • Updated docker/test-notebooks.sh to support offline-mode skipping via a requires-network: true marker and increased the per-notebook timeout.

Reviewed changes

Copilot reviewed 3 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
docs/usecases/00-quickstart.ipynb Adds a new quickstart notebook workflow (read shapefiles → spatial join → GeoParquet write/read → Kepler viz).
docker/test-notebooks.sh Adds offline skip logic + increases timeout + summary includes skipped count.
docs/tutorial/sql.md Updates links to the AirportsPerCountry notebook’s new legacy/ path.
docs/usecases/legacy/ApacheSedonaCore.ipynb Legacy notebook relocated under legacy/.
docs/usecases/legacy/ApacheSedonaSQL.ipynb Legacy notebook relocated under legacy/.
docs/usecases/legacy/ApacheSedonaSQL_SpatialJoin_AirportsPerCountry.ipynb Legacy notebook relocated under legacy/ (and referenced by docs).
docs/usecases/legacy/Sedona_OvertureMaps_GeoParquet.ipynb Legacy notebook relocated under legacy/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/test-notebooks.sh
Comment thread docs/usecases/00-quickstart.ipynb Outdated
Comment thread docs/usecases/00-quickstart.ipynb Outdated
- docker/test-notebooks.sh: enable `set -o pipefail`. The per-notebook
  pipeline `timeout 900 python3 ... | tee ...` was returning tee's exit
  status, so a notebook crash or timeout would be silently misreported
  as a pass. With pipefail the `if` block now sees the real exit code.
- docs/usecases/00-quickstart.ipynb: drop the bullet list pointing to
  notebooks 01..05 — they don't exist yet and would be dead links until
  the rest of the refresh series lands. Replace with a one-line note
  that more notebooks are coming. Also drop the "ten cells" wording
  (the notebook has nine) and the trailing forward-reference to
  01-mobility-pulse in the closing cell.
@jiayuasu
Copy link
Copy Markdown
Member Author

jiayuasu commented May 1, 2026

All three review comments addressed in 43e47e1:

  • docker/test-notebooks.sh: enabled set -o pipefail so the per-notebook if timeout 900 python3 ... | tee ... block sees the real exit code instead of tee's. Crashes and timeouts will no longer be silently misreported as passes.
  • docs/usecases/00-quickstart.ipynb: dropped the bullet list pointing at notebooks 01..05 and the trailing forward-reference to 01-mobility-pulse.ipynb — those don't exist yet and would be dead links until the rest of the refresh series lands. Replaced the list with a one-line note that more notebooks are coming.
  • Same notebook: dropped the "ten cells" wording (notebook is nine cells).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants