pipeline and configuration improvements#279
Merged
andersy005 merged 10 commits intomainfrom Oct 22, 2025
Merged
Conversation
…torConfig Add ocr.pipeline.partition.partition_buildings_by_geography which uses DuckDB to partition regional geoparquet into per-state and per-county parquet files. Wire the new pipeline into the deploy CLI as the `partition-buildings` command (and update scheduling/command names accordingly). Refactor VectorConfig: remove the cached building_geoparquet_uri, build the buildings path on-the-fly in building_geoparquet_glob (ensuring parent dirs exist), and update pretty_paths to display the glob. This aligns config with the new partitioning workflow.
When running on Platform.LOCAL the `run` command incorrectly scheduled `ocr aggregate`. Update to submit `ocr partition-buildings` and adjust the job name so local runs partition buildings by geography (state/county).
Replace consolidated_buildings_path with buildings_path_glob / buildings_path in fire_wind_risk_regional_aggregator.py and write_aggregated_region_analysis_files.py. Update parameter names, types, function calls, and debug log messages accordingly.
Add new ocr.pipeline.create_building_pmtiles module that exports create_building_pmtiles to convert consolidated building geoparquet to PMTiles using DuckDB + tippecanoe and upload the result. Update CLI: rename create_pmtiles -> create_building_pmtiles, change the dispatched command/name to "ocr create-building-pmtiles", and import/call the new pipeline function.
Replace occurrences of the old `create-pmtiles` command with `create-building-pmtiles` in the deployment CLI job submissions (both Coiled and local dispatch) and update the docs/examples to match. Also normalize markdown list indentation/formatting in the updated docs.
Replace the previous 'ocr aggregate' submission with 'ocr partition-buildings' and update the job name to partition-buildings-{environment} so the Coiled run submits the partitioning pipeline instead of the old aggregate command.
Update deployment diagram and data-pipeline tutorial to replace the generic "aggregate" step with the `partition-buildings` job/CLI and correct related CLI examples and subcommand names.
… tippecanoe --read-parallel flag
andersy005
added a commit
that referenced
this pull request
Nov 4, 2025
* main: (46 commits) Chage summary stats geoparquet filepaths from `output` to `intermediate` (#299) Update data downloads page (#300) Bump prefix-dev/setup-pixi from 0.9.1 to 0.9.2 in the actions group (#298) Update data download documentation (#293) migrate vector input datasets to unified ingestion and remove unused datasets (#297) Fix duplicate `avg_name` (#296) fix California and Tennessee region IDs in staging automatic deploy (#294) Add additional region IDs to QA PR automatic deploy (#292) create a unified infrastructure for ingesting and processing input datasets (#289) Combine county, tract and block PMTiles layers into a single regions.pmtiles layer (#291) Pyramid (#284) Use buffered slices to remove edge effects from neighborhood operations (#288) Bumps up RAM for `write-aggregated-region-analysis-files` job (#290) fix block dataset path construction in wind risk regional aggregation (#282) Adds a bbox struct for region pmtiles (#281) compute Dask-backed data before assert_equal/assert_all_close (#283) pipeline and configuration improvements (#279) Add cached valid_region_ids.json and use it in ChunkingConfig (#280) Combining wind-smeared data and Riley BP + smoothing (#278) update-docs: add first draft of all docs pages (#275) ...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.