Skip to content

docs: comprehensive data catalog and gap analysis#2

Merged
victorquinn merged 1 commit intomainfrom
talos/data-catalog
Feb 19, 2026
Merged

docs: comprehensive data catalog and gap analysis#2
victorquinn merged 1 commit intomainfrom
talos/data-catalog

Conversation

@victorquinn
Copy link
Copy Markdown
Member

Data Catalog

Adds docs/DATA_CATALOG.md — a 896-line comprehensive catalog of open grid data sources.

What's in it

Currently Available (5 datasets):

  • Utilities (3,132), ISOs/RTOs (7), Balancing Authorities (45), Regions (3,000), Territory Boundaries (3,000+ GeoJSON)

Planned / In Progress (14 categories, 40+ sources):

  • Generation & Power Plants (EIA-860, EIA-923, EPA eGRID, HIFLD)
  • Transmission & Substations (HIFLD, OpenStreetMap)
  • Grid Operations & Market Data (EIA-930, gridstatus.io, ISO portals)
  • Emissions (EPA CEMS, WattTime, Electricity Maps)
  • Rates & Tariffs (NREL URDB, OpenEI)
  • Renewable Energy & DER (NSRDB, WIND Toolkit, LBNL Tracking the Sun)
  • EV Charging (DOE AFDC, Open Charge Map)
  • Energy Storage (DOE GESDB)
  • Interconnection Queues (LBNL + all 7 ISO portals)
  • Reliability & Outages (EIA-861, DOE OE-417, NERC)
  • Demand Response & EE (EIA-861 sub-files)
  • Regulatory & FERC (Forms 1, 714, 2)
  • International (ENTSO-E, Ember, IEA, IRENA)
  • Meta-Sources (PUDL, OEDI, Data.gov)

Gap Analysis:

  • 🔴 High priority: Distribution infrastructure, real-time outages, machine-readable tariffs
  • 🟡 Medium: LMP node geography, curtailment data, behind-the-meter DER
  • 🟢 Lower: State PUC filings, microgrids, hydrogen infrastructure

Key insight: PUDL (Catalyst Cooperative) already processes most federal datasets into analysis-ready SQLite/Parquet — worth using as an upstream source.

Quick reference

All 30+ data source URLs compiled in one table at the bottom.

896-line catalog covering:
- 5 currently available datasets with full schema docs
- 14 categories of planned datasets (40+ sources) with URLs
- Gap analysis prioritized by impact (high/medium/lower)
- Quick reference URL table for all 30+ data sources

Key sources identified: EIA-860/861/923/930, HIFLD, EPA eGRID/CEMS,
NREL (URDB, NSRDB, ATB), DOE AFDC, LBNL trackers, gridstatus.io,
ISO queue portals, and PUDL (Catalyst Cooperative) as a meta-source.
@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
opengrid Ready Ready Preview, Comment Feb 19, 2026 1:10am

Request Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cab765037c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread docs/DATA_CATALOG.md
| **File Count** | ~3,000 files |
| **Naming Convention** | `{eiaId}.json` (utilities), `iso-{shortName}.json` (ISOs), `cca-{slug}.json` (CCAs), `ba-{slug}.json` (BAs) |
| **Source** | HIFLD ArcGIS (service territories + control areas) + CEC ArcGIS (CCAs) |
| **Format** | GeoJSON (compact, in `public/data/territories/`) |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Point territory file path to actual dataset directory

This entry documents territory GeoJSON as living in public/data/territories/, but in this repository the tracked dataset and contributor workflows use data/territories/ (see README.md and docs/CONTRIBUTING.md, plus the committed files under data/territories). Leaving this path incorrect will send contributors to a non-existent location and cause avoidable failures when they try to inspect or update boundary files.

Useful? React with 👍 / 👎.

@victorquinn victorquinn merged commit e7b25d9 into main Feb 19, 2026
3 checks passed
texture-fleet-agent Bot pushed a commit that referenced this pull request May 5, 2026
First PR in the substations 9th-entry-point sequence. Adds the substations
table with EIA/OSM source tracking, voltage ranges, substation type/status,
and soft-delete columns mirroring power_plants + ev_stations patterns.

Columns:
- id, slug (unique), name
- owner_name, owner_utility_id (FK → utilities.id, ON DELETE SET NULL)
- state, county, latitude, longitude
- geography(Point, 4326), geometry(Point, 4326) — PostGIS
- min_voltage_kv, max_voltage_kv
- substation_type ('transmission' | 'distribution' | 'hybrid' | 'unknown')
- status ('in_service' | 'out_of_service' | 'planned' | 'retired' | 'unknown')
- source ('eia' | 'osm' | 'manual' | 'hybrid') — lineage for ODbL attribution
- source_url, eia_id, osm_id, hifld_legacy_id
- search_vector (tsvector), locked_status
- submitted_by, reviewed_at, reviewed_by
- created_at, updated_at, deleted_at, version (soft-delete audit block)

Indexes: 8 btree (slug, owner_utility_id, state, substation_type, status,
source, eia_id, osm_id), 3 spatial (GIST/SPGIST on geography + GIST on
geometry), 2 FTS (GIN on search_vector + GIN trigram on name).

Migration applied to production Neon: substations table created with 0 rows.

No data sync yet — that comes in PR #2 (meridian/substations-sync).

Part of: Substations rollout (PR 1/9)
Research: memory/specs/ninth-entry-point-research.md
texture-fleet-agent Bot added a commit that referenced this pull request May 5, 2026
First PR in the substations 9th-entry-point sequence. Adds the substations
table with EIA/OSM source tracking, voltage ranges, substation type/status,
and soft-delete columns mirroring power_plants + ev_stations patterns.

Columns:
- id, slug (unique), name
- owner_name, owner_utility_id (FK → utilities.id, ON DELETE SET NULL)
- state, county, latitude, longitude
- geography(Point, 4326), geometry(Point, 4326) — PostGIS
- min_voltage_kv, max_voltage_kv
- substation_type ('transmission' | 'distribution' | 'hybrid' | 'unknown')
- status ('in_service' | 'out_of_service' | 'planned' | 'retired' | 'unknown')
- source ('eia' | 'osm' | 'manual' | 'hybrid') — lineage for ODbL attribution
- source_url, eia_id, osm_id, hifld_legacy_id
- search_vector (tsvector), locked_status
- submitted_by, reviewed_at, reviewed_by
- created_at, updated_at, deleted_at, version (soft-delete audit block)

Indexes: 8 btree (slug, owner_utility_id, state, substation_type, status,
source, eia_id, osm_id), 3 spatial (GIST/SPGIST on geography + GIST on
geometry), 2 FTS (GIN on search_vector + GIN trigram on name).

Migration applied to production Neon: substations table created with 0 rows.

No data sync yet — that comes in PR #2 (meridian/substations-sync).

Part of: Substations rollout (PR 1/9)
Research: memory/specs/ninth-entry-point-research.md

Co-authored-by: texture-coding-agent <coding-agent@texturehq.com>
texture-fleet-agent Bot pushed a commit that referenced this pull request May 6, 2026
…(ALL-733)

Problem: The /utilities list endpoint and /utilities/{slug} detail endpoint
used different code paths for sparse-fieldset projection. The list route had
a local selectFields helper; the detail route had no sparse-fieldset support
at all. This meant:

  - Detail endpoint couldn't honor ?fields= (returned the full shape).
  - List endpoint's sparse projection wasn't reused anywhere else.
  - No invariant that list and detail produce the same per-record shape
    when given the same inputs.

Morgan caught the end-user impact in the Relay recon (2026-05-06, bug #2):
for a 3,133-utility sync at the Registered 5k/hr tier, having to fall back
to list-then-detail was ~38 min instead of ~2 sec.

Fix:
- Hoist selectFields + parseFieldsParam into lib/api/public-response.ts so
  every public route uses the same serializer pipeline.
- Extend publicJsonResponse and publicPaginatedResponse with an optional
  { fields } option (accepts raw ?fields= string or pre-parsed string[]).
- Enforce order: stripInternal → selectFields. Internal fields can never be
  resurrected via ?fields=searchVector etc.
- Wire ?fields= into /utilities/{slug}.
- Swap the ad-hoc selectFields in /utilities route.ts for the shared helpers.

Regression tests (lib/api/__tests__/public-response.test.ts):
- parseFieldsParam: null/empty/whitespace handling, de-duping w/ preserved
  order.
- selectFields: existing-keys-only, null/0 preserved, non-object passthrough.
- publicJsonResponse + publicPaginatedResponse with ?fields=.
- Internal-field resurrection guard.
- List/detail shape parity: same keys, same numeric values, same ?fields=
  projection across both envelopes.

Fixes ALL-733
texture-fleet-agent Bot added a commit that referenced this pull request May 6, 2026
…(ALL-733) (#208)

Problem: The /utilities list endpoint and /utilities/{slug} detail endpoint
used different code paths for sparse-fieldset projection. The list route had
a local selectFields helper; the detail route had no sparse-fieldset support
at all. This meant:

  - Detail endpoint couldn't honor ?fields= (returned the full shape).
  - List endpoint's sparse projection wasn't reused anywhere else.
  - No invariant that list and detail produce the same per-record shape
    when given the same inputs.

Morgan caught the end-user impact in the Relay recon (2026-05-06, bug #2):
for a 3,133-utility sync at the Registered 5k/hr tier, having to fall back
to list-then-detail was ~38 min instead of ~2 sec.

Fix:
- Hoist selectFields + parseFieldsParam into lib/api/public-response.ts so
  every public route uses the same serializer pipeline.
- Extend publicJsonResponse and publicPaginatedResponse with an optional
  { fields } option (accepts raw ?fields= string or pre-parsed string[]).
- Enforce order: stripInternal → selectFields. Internal fields can never be
  resurrected via ?fields=searchVector etc.
- Wire ?fields= into /utilities/{slug}.
- Swap the ad-hoc selectFields in /utilities route.ts for the shared helpers.

Regression tests (lib/api/__tests__/public-response.test.ts):
- parseFieldsParam: null/empty/whitespace handling, de-duping w/ preserved
  order.
- selectFields: existing-keys-only, null/0 preserved, non-object passthrough.
- publicJsonResponse + publicPaginatedResponse with ?fields=.
- Internal-field resurrection guard.
- List/detail shape parity: same keys, same numeric values, same ?fields=
  projection across both envelopes.

Fixes ALL-733

Co-authored-by: texture-coding-agent <coding-agent@texturehq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant