docs: comprehensive data catalog and gap analysis#2
Conversation
896-line catalog covering: - 5 currently available datasets with full schema docs - 14 categories of planned datasets (40+ sources) with URLs - Gap analysis prioritized by impact (high/medium/lower) - Quick reference URL table for all 30+ data sources Key sources identified: EIA-860/861/923/930, HIFLD, EPA eGRID/CEMS, NREL (URDB, NSRDB, ATB), DOE AFDC, LBNL trackers, gridstatus.io, ISO queue portals, and PUDL (Catalyst Cooperative) as a meta-source.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cab765037c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | **File Count** | ~3,000 files | | ||
| | **Naming Convention** | `{eiaId}.json` (utilities), `iso-{shortName}.json` (ISOs), `cca-{slug}.json` (CCAs), `ba-{slug}.json` (BAs) | | ||
| | **Source** | HIFLD ArcGIS (service territories + control areas) + CEC ArcGIS (CCAs) | | ||
| | **Format** | GeoJSON (compact, in `public/data/territories/`) | |
There was a problem hiding this comment.
Point territory file path to actual dataset directory
This entry documents territory GeoJSON as living in public/data/territories/, but in this repository the tracked dataset and contributor workflows use data/territories/ (see README.md and docs/CONTRIBUTING.md, plus the committed files under data/territories). Leaving this path incorrect will send contributors to a non-existent location and cause avoidable failures when they try to inspect or update boundary files.
Useful? React with 👍 / 👎.
First PR in the substations 9th-entry-point sequence. Adds the substations
table with EIA/OSM source tracking, voltage ranges, substation type/status,
and soft-delete columns mirroring power_plants + ev_stations patterns.
Columns:
- id, slug (unique), name
- owner_name, owner_utility_id (FK → utilities.id, ON DELETE SET NULL)
- state, county, latitude, longitude
- geography(Point, 4326), geometry(Point, 4326) — PostGIS
- min_voltage_kv, max_voltage_kv
- substation_type ('transmission' | 'distribution' | 'hybrid' | 'unknown')
- status ('in_service' | 'out_of_service' | 'planned' | 'retired' | 'unknown')
- source ('eia' | 'osm' | 'manual' | 'hybrid') — lineage for ODbL attribution
- source_url, eia_id, osm_id, hifld_legacy_id
- search_vector (tsvector), locked_status
- submitted_by, reviewed_at, reviewed_by
- created_at, updated_at, deleted_at, version (soft-delete audit block)
Indexes: 8 btree (slug, owner_utility_id, state, substation_type, status,
source, eia_id, osm_id), 3 spatial (GIST/SPGIST on geography + GIST on
geometry), 2 FTS (GIN on search_vector + GIN trigram on name).
Migration applied to production Neon: substations table created with 0 rows.
No data sync yet — that comes in PR #2 (meridian/substations-sync).
Part of: Substations rollout (PR 1/9)
Research: memory/specs/ninth-entry-point-research.md
First PR in the substations 9th-entry-point sequence. Adds the substations
table with EIA/OSM source tracking, voltage ranges, substation type/status,
and soft-delete columns mirroring power_plants + ev_stations patterns.
Columns:
- id, slug (unique), name
- owner_name, owner_utility_id (FK → utilities.id, ON DELETE SET NULL)
- state, county, latitude, longitude
- geography(Point, 4326), geometry(Point, 4326) — PostGIS
- min_voltage_kv, max_voltage_kv
- substation_type ('transmission' | 'distribution' | 'hybrid' | 'unknown')
- status ('in_service' | 'out_of_service' | 'planned' | 'retired' | 'unknown')
- source ('eia' | 'osm' | 'manual' | 'hybrid') — lineage for ODbL attribution
- source_url, eia_id, osm_id, hifld_legacy_id
- search_vector (tsvector), locked_status
- submitted_by, reviewed_at, reviewed_by
- created_at, updated_at, deleted_at, version (soft-delete audit block)
Indexes: 8 btree (slug, owner_utility_id, state, substation_type, status,
source, eia_id, osm_id), 3 spatial (GIST/SPGIST on geography + GIST on
geometry), 2 FTS (GIN on search_vector + GIN trigram on name).
Migration applied to production Neon: substations table created with 0 rows.
No data sync yet — that comes in PR #2 (meridian/substations-sync).
Part of: Substations rollout (PR 1/9)
Research: memory/specs/ninth-entry-point-research.md
Co-authored-by: texture-coding-agent <coding-agent@texturehq.com>
…(ALL-733)
Problem: The /utilities list endpoint and /utilities/{slug} detail endpoint
used different code paths for sparse-fieldset projection. The list route had
a local selectFields helper; the detail route had no sparse-fieldset support
at all. This meant:
- Detail endpoint couldn't honor ?fields= (returned the full shape).
- List endpoint's sparse projection wasn't reused anywhere else.
- No invariant that list and detail produce the same per-record shape
when given the same inputs.
Morgan caught the end-user impact in the Relay recon (2026-05-06, bug #2):
for a 3,133-utility sync at the Registered 5k/hr tier, having to fall back
to list-then-detail was ~38 min instead of ~2 sec.
Fix:
- Hoist selectFields + parseFieldsParam into lib/api/public-response.ts so
every public route uses the same serializer pipeline.
- Extend publicJsonResponse and publicPaginatedResponse with an optional
{ fields } option (accepts raw ?fields= string or pre-parsed string[]).
- Enforce order: stripInternal → selectFields. Internal fields can never be
resurrected via ?fields=searchVector etc.
- Wire ?fields= into /utilities/{slug}.
- Swap the ad-hoc selectFields in /utilities route.ts for the shared helpers.
Regression tests (lib/api/__tests__/public-response.test.ts):
- parseFieldsParam: null/empty/whitespace handling, de-duping w/ preserved
order.
- selectFields: existing-keys-only, null/0 preserved, non-object passthrough.
- publicJsonResponse + publicPaginatedResponse with ?fields=.
- Internal-field resurrection guard.
- List/detail shape parity: same keys, same numeric values, same ?fields=
projection across both envelopes.
Fixes ALL-733
…(ALL-733) (#208) Problem: The /utilities list endpoint and /utilities/{slug} detail endpoint used different code paths for sparse-fieldset projection. The list route had a local selectFields helper; the detail route had no sparse-fieldset support at all. This meant: - Detail endpoint couldn't honor ?fields= (returned the full shape). - List endpoint's sparse projection wasn't reused anywhere else. - No invariant that list and detail produce the same per-record shape when given the same inputs. Morgan caught the end-user impact in the Relay recon (2026-05-06, bug #2): for a 3,133-utility sync at the Registered 5k/hr tier, having to fall back to list-then-detail was ~38 min instead of ~2 sec. Fix: - Hoist selectFields + parseFieldsParam into lib/api/public-response.ts so every public route uses the same serializer pipeline. - Extend publicJsonResponse and publicPaginatedResponse with an optional { fields } option (accepts raw ?fields= string or pre-parsed string[]). - Enforce order: stripInternal → selectFields. Internal fields can never be resurrected via ?fields=searchVector etc. - Wire ?fields= into /utilities/{slug}. - Swap the ad-hoc selectFields in /utilities route.ts for the shared helpers. Regression tests (lib/api/__tests__/public-response.test.ts): - parseFieldsParam: null/empty/whitespace handling, de-duping w/ preserved order. - selectFields: existing-keys-only, null/0 preserved, non-object passthrough. - publicJsonResponse + publicPaginatedResponse with ?fields=. - Internal-field resurrection guard. - List/detail shape parity: same keys, same numeric values, same ?fields= projection across both envelopes. Fixes ALL-733 Co-authored-by: texture-coding-agent <coding-agent@texturehq.com>
Data Catalog
Adds
docs/DATA_CATALOG.md— a 896-line comprehensive catalog of open grid data sources.What's in it
Currently Available (5 datasets):
Planned / In Progress (14 categories, 40+ sources):
Gap Analysis:
Key insight: PUDL (Catalyst Cooperative) already processes most federal datasets into analysis-ready SQLite/Parquet — worth using as an upstream source.
Quick reference
All 30+ data source URLs compiled in one table at the bottom.