Skip to content

Add evaluation/analysis notebooks#302

Merged
andersy005 merged 47 commits intomainfrom
eval
Nov 10, 2025
Merged

Add evaluation/analysis notebooks#302
andersy005 merged 47 commits intomainfrom
eval

Conversation

@orianac
Copy link
Member

@orianac orianac commented Nov 4, 2025

Adding more analysis notebooks for the docs site.

@andersy005 could you move the following notebooks such that they can appear on the docs site?

  • compare-risk-buildings.ipynb
  • benchmarking.ipynb
  • score_bins.ipynb
  • california_comparison.ipynb
  • compare-risk-raster.ipynb

We'll want to update these with the latest prod run so that the figures are representative of our data, so hold off on merging until we do that.

Also, please add any comments - very appreciated!

@orianac orianac requested a review from andersy005 November 4, 2025 03:00
* main: (46 commits)
  Chage summary stats geoparquet filepaths from `output` to `intermediate` (#299)
  Update data downloads page (#300)
  Bump prefix-dev/setup-pixi from 0.9.1 to 0.9.2 in the actions group (#298)
  Update data download documentation (#293)
  migrate vector input datasets to unified ingestion and remove unused datasets (#297)
  Fix duplicate `avg_name` (#296)
  fix California and Tennessee region IDs in staging automatic deploy (#294)
  Add additional region IDs to QA PR automatic deploy (#292)
  create a unified infrastructure for ingesting and processing input datasets (#289)
  Combine county, tract and block PMTiles layers into a single regions.pmtiles layer (#291)
  Pyramid (#284)
  Use buffered slices to remove edge effects from neighborhood operations (#288)
  Bumps up RAM for `write-aggregated-region-analysis-files` job (#290)
  fix block dataset path construction in wind risk regional aggregation (#282)
  Adds a bbox struct for region pmtiles (#281)
  compute Dask-backed data before assert_equal/assert_all_close (#283)
  pipeline and configuration improvements (#279)
  Add cached valid_region_ids.json and use it in ChunkingConfig (#280)
  Combining wind-smeared data and Riley BP + smoothing (#278)
  update-docs: add first draft of all docs pages (#275)
  ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orianac, i'm unable to run this notebook because it appears it references some local files that weren't added to this branch. are these files available in our S3 bucket? if so, can you point me to their location. if not, can you upload them to the S3 bucket?

tracts_dict = {}
for statistic in ['corr_low', 'low_bias', 'high_bias']:
    tracts_dict[statistic] = gpd.read_file(f'{statistic}_tracts_{version}.shp', index_col=0)

Copy link
Member

@andersy005 andersy005 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, can we centralize our inputs/outputs to use the carbonplan-ocr bucket instead of introducing yet another bucket?

for example, we're currently loading inputs and saving outputs to carbonplan-risks:

# Loading input
states = gpd.read_file('s3://carbonplan-risks/shapefiles/cb_2018_us_state_20m.zip')

# Saving output
buildings_in_census_tracts.to_parquet(
    f's3://carbonplan-risks/shapefiles/buildings_tracts_{version}_geo.parquet'
)

could we migrate these to use carbonplan-ocr instead?

@katamartin
Copy link
Member

katamartin commented Nov 7, 2025

Next steps:

  1. @andersy005 to merge change adding consolidated parquet file generation to production pipeline
  2. @andersy005 to run production pipeline
  3. @andersy005 to update these notebooks with new data + parquet file
  4. @andersy005 to merge this PR!
  5. @orianac to update score bins notebook to add in 0 < RPS < 0.01 bin (follow-up PR)
  6. @orianac to update methods doc to point back to score bins notebook (or at least describing the binning logic in alignment with happens in notebook)

@andersy005
Copy link
Member

andersy005 commented Nov 10, 2025

@orianac / @katamartin, i updated the following notebooks with the new data (v0.12.0)

@orianac, i wasn't able to update this notebook because it uses some files that appear to be available on your local branch

  • compare-risk-raster.ipynb

i'm going to merge this PR as we discussed on Friday. please open follow up PRs to fix anything i may have missed

@andersy005 andersy005 changed the title Eval Add evaluation/analysis notebooks Nov 10, 2025
@andersy005 andersy005 merged commit 8ec8d27 into main Nov 10, 2025
9 checks passed
@andersy005 andersy005 deleted the eval branch November 10, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants