Skip to content

JasonVranek/ethspectoor

Repository files navigation

The Ethspectoor

Deterministic extraction and exploration of Ethereum specification data. Parses every spec repo (consensus, execution, builder, relay, beacon APIs, execution APIs, remote signing) into structured indexes, then serves them over MCP and a static explorer UI.

1,083 types, 168 endpoints, 355 constants, 47 type aliases across 7 specs.

Live Explorer

ethspectoor.blockspaceforum.com or open docs/index.html locally. No build step, no dependencies.

Tabs

  • Specs -- overview of all indexed specification repos with item counts, fork timelines, and one-sentence descriptions.
  • Types -- browse and search all types, functions, and classes with fuzzy matching, fork-aware code display, syntax highlighting, and clickable cross-references. Three-panel layout: sidebar filters, item list, detail view.
  • Endpoints -- REST and JSON-RPC endpoints with parameters, response types, SSZ support indicators, and fork variants.
  • Diff -- compare what changed between any two forks per spec. Inline side-by-side code diff with LCS-based line highlighting (green for added, red for removed, aligned gutters).
  • PRs -- browse indexed open pull requests against spec repos. Each PR shows which types it adds, modifies, or removes with inline diff previews against mainline.
  • Visualizer (visualizer.html) -- fork-aware transaction lifecycle diagram. Shows the PBS data flow (consensus, execution, builder, relay, sidecar) for deneb through fulu, and switches to the ePBS path (EIP-7732) for gloas with P2P bid gossip, payload revelation, and PTC voting.
  • About (about.html) -- overview of the project, tools, and setup for first-time visitors.

PR Viewer

The PR tab tracks open pull requests against spec repos and shows their impact before they merge. For each PR you can see:

  • Which types/functions are added, modified, or removed
  • Inline side-by-side code diff with line-level highlighting
  • Field-level diff summaries (+3 fields, -1 field, ~2 fields)
  • Direct links to the source PR on GitHub

PR data is generated by pr_index.py and embedded as overlays in catalog.json. The PR viewer and MCP server share the same data.

Quick Start

Requires Python 3.10+ and git.

# Install dependencies (pyyaml for build, mcp for server)
pip install pyyaml mcp
# or with uv:
uv pip install pyyaml mcp

# Build everything: clones all 7 spec repos, extracts, links, builds catalog
python3 build.py --all

# Open the explorer
open docs/index.html

# Start the MCP server
python3 server.py --catalog docs/catalog.json

That's it. build.py --all handles cloning repos (to ./repos/specs/), building per-spec indexes, cross-reference linking, and assembling the final catalog.json. First run takes a few minutes to clone; subsequent runs pull updates and rebuild.

To include PR overlays (requires GITHUB_TOKEN):

python3 build.py --all --include-prs

MCP Server

The MCP server exposes 10 tools over stdio transport. AI agents (Claude, Hermes, Cursor, etc.) can query any type, endpoint, or PR across all specs with structured responses.

# stdio transport (for agent integration)
python3 server.py

# custom catalog and repos directory (enables reindex)
python3 server.py --catalog docs/catalog.json --repos-dir ./repos

# rebuild everything before starting
python3 server.py --rebuild --repos-dir ./repos

Or run without a persistent venv:

uv run --with mcp --with pyyaml python3 server.py --catalog docs/catalog.json

Tools

Tool Description
list_specs List all indexed specs with item counts, endpoint counts, and available forks
lookup_type Look up a type, function, or container by name. Returns fields, code, source link, references, and EIP associations. Supports fuzzy matching and PR fork resolution
lookup_endpoint Search API endpoints by path, operation name, or keyword. Returns parameters, response types, SSZ support, and fork variants
what_changed Show what was added or modified in a specific fork. Includes EIP associations
trace_type Trace a type across spec boundaries. Shows where it is defined, who uses it, and cross-spec references
search Fuzzy search across all spec items, constants, type aliases, and endpoints
diff_type Compare a type or function between two forks. Shows field additions, removals, and code changes
list_prs List indexed PR overlays with PR number, title, author, and what changed
index_pr Index a GitHub PR as a virtual fork. Makes it queryable via pr-NNNN fork syntax
reindex Rebuild spec indexes from source repos and reload. Requires --repos-dir

Client Configuration

Hermes / Claude Desktop (config.yaml)

mcp:
  ethspectoor:
    command: "uv"
    args:
      - "run"
      - "--with"
      - "mcp"
      - "--with"
      - "pyyaml"
      - "python3"
      - "/path/to/ethspectoor/server.py"
      - "--catalog"
      - "/path/to/ethspectoor/docs/catalog.json"
      - "--indexes-dir"
      - "/path/to/ethspectoor/indexes"
      - "--repos-dir"
      - "/path/to/ethspectoor/repos/specs"

The --indexes-dir and --repos-dir flags are optional but enable the reindex and index_pr tools to rebuild indexes without restarting.

Data Flow

Both the MCP server and the explorer UI read the same artifact: catalog.json. Types that appear in multiple specs are merged with canonical-source attribution (e.g. BeaconState resolves to consensus-specs, not beacon-apis). No drift between what agents see and what the UI shows.

repos/ --> build.py --> indexes/ (per-spec, intermediate)
                            |
                        link.py --> _cross_refs.json
                            |
                    build_catalog.py --> catalog.json (canonical)
                            |                  |
                    pr_index.py (overlays)      |
                                        +------+------+
                                        |             |
                                  server.py (MCP)  docs/ (UI)

PR Shadow Indexes

Track open PRs against spec repos as virtual forks. PRs are indexed but invisible to normal queries. Reference them explicitly to see full resulting types, field-level diffs, and cross-type impact.

# Index all open PRs for consensus-specs
python3 pr_index.py --spec consensus-specs --repo-dir ./repos/specs/consensus-specs

# Index a single PR
python3 pr_index.py --spec consensus-specs --repo-dir ./repos/specs/consensus-specs --pr 1234

# Clean up merged/closed PRs
python3 pr_index.py --spec consensus-specs --cleanup

# List indexed PRs
python3 pr_index.py --list

Requires GITHUB_TOKEN env var or --github-token for API access.

Querying PR Data via MCP

PR forks use the naming convention pr-{number}:

list_prs(spec="consensus-specs")
  -> PR #4123: "Add exit queue to BeaconState" (gloas, 3 items changed)

lookup_type("BeaconState", fork="pr-4123")
  -> full BeaconState as it would look after the PR

diff_type("BeaconState", from_fork="gloas", to_fork="pr-4123")
  -> field-level diff: what the PR adds/removes/modifies

what_changed(fork="pr-4123")
  -> all items the PR touches with action (added/modified/removed)

Normal queries (no PR fork specified) never see PR data.

Building with PR Overlays

# Build catalog including PR overlays
python3 build_catalog.py --indexes-dir ./indexes --output docs/catalog.json --include-prs

# Or start MCP server with --rebuild to include PRs
python3 server.py --rebuild --repos-dir ./repos --include-prs

CI/CD

GitHub Actions rebuilds the catalog and deploys to GitHub Pages on every push to main. A scheduled workflow runs daily to pull upstream spec changes and re-index open PRs.

See .github/workflows/deploy.yml. The pipeline runs:

python3 build.py --all --include-prs

This clones/updates all spec repos, builds indexes, links cross-references, indexes open PRs, and assembles the catalog. The docs/ directory is then deployed to GitHub Pages.

Spec Coverage

Spec Items Endpoints Constants Extractor Forks
consensus-specs 528 -- 218 Python AST phase0 through heze
execution-specs 298 -- 135 Python AST frontier through amsterdam
execution-apis 93 72 -- OpenRPC paris through amsterdam
beacon-apis 77 84 -- OpenAPI + Markdown phase0 through gloas
remote-signing-api 59 2 -- OpenAPI phase0 through fulu
builder-specs 16 5 2 OpenAPI + Markdown bellatrix through fulu
relay-specs 12 5 -- OpenAPI + Markdown bellatrix through fulu

Build

Full build (recommended)

python3 build.py --all

This clones all spec repos (if not already present), builds per-spec indexes, runs cross-reference linking, and assembles docs/catalog.json.

Single spec

# Auto-clones the repo if needed
python3 build.py --profile consensus-specs

# Or point at an existing local clone
python3 build.py --profile builder-specs --repo-dir /path/to/builder-specs

After building individual specs, run linking and catalog assembly manually:

python3 link.py --indexes-dir ./indexes
python3 build_catalog.py --indexes-dir ./indexes --output docs/catalog.json

What each step does

  • build.py extracts types, endpoints, constants, and fork metadata from a spec repo and writes a {spec}_index.json to ./indexes/.
  • link.py resolves cross-spec type references (e.g. beacon-apis types referencing consensus-specs containers).
  • build_catalog.py merges all indexes into catalog.json, deduplicating shared types across specs using canonical-source attribution. This is the single artifact consumed by both the MCP server and the explorer UI.

Architecture

.
├── build.py                  # orchestrates extraction per spec profile
├── build_catalog.py          # merges indexes into catalog.json (canonical artifact)
├── pr_index.py               # PR shadow indexer (fetch, extract, diff open PRs)
├── link.py                   # cross-spec reference resolution
├── server.py                 # MCP server (10 tools, reads catalog.json)
├── fetch_repos.sh            # clones all spec repos
├── extractors/
│   ├── profiles.py           # spec profiles (paths, fork orders, extractor config)
│   ├── extract_python.py     # Python AST extractor (consensus-specs, execution-specs)
│   ├── extract_openapi.py    # OpenAPI extractor (beacon-apis, builder-specs, relay-specs, remote-signing-api)
│   ├── extract_openrpc.py    # OpenRPC extractor (execution-apis)
│   ├── extract_markdown.py   # Markdown type/endpoint extractor (beacon-apis, builder-specs)
│   ├── enrich.py             # structural annotation (fields, params, references, domains)
│   └── fetch_examples.py     # test fixture fetcher (standalone)
├── indexes/                  # generated per-spec indexes (intermediate build artifacts)
│   └── pr/                   # PR overlay indexes (per-spec, per-PR)
├── docs/
│   ├── index.html            # HTML shell (loads app.js, no inline logic)
│   ├── about.html            # project overview, MCP docs, skill card
│   ├── visualizer.html       # fork-aware transaction lifecycle diagram (PBS + ePBS)
│   ├── catalog.json          # canonical data (from build_catalog.py)
│   ├── SKILL.md              # MCP skill document for AI agents
│   ├── logo.svg              # site logo
│   ├── favicon.svg           # browser tab icon
│   ├── css/
│   │   ├── styles.css        # shared styles (layout, nav, search, detail panels)
│   │   ├── about.css         # about page styles
│   │   └── visualizer.css    # visualizer page styles
│   ├── js/
│   │   ├── app.js            # entry point (init, routing, global bindings)
│   │   ├── state.js          # shared state (catalog data, selections)
│   │   ├── constants.js      # fork orders, spec colors, kind/method badges
│   │   ├── utils.js          # HTML escaping, ID sanitization
│   │   ├── forks.js          # fork sorting, code-for-fork resolution
│   │   ├── search.js         # fuzzy scoring and highlighting
│   │   ├── diff.js           # LCS-based line diff engine
│   │   ├── router.js         # hash-based routing and navigation
│   │   ├── url.js            # URL parameter parsing
│   │   └── views/
│   │       ├── home.js       # specs overview + setup/MCP/skill sections
│   │       ├── types.js      # type browser (three-panel, filters, detail)
│   │       ├── endpoints.js  # endpoint browser
│   │       ├── prs.js        # PR browser with inline diffs
│   │       ├── diff-view.js  # fork-to-fork diff comparison
│   │       └── skill-modal.js # SKILL.md viewer/copy modal
│   └── js/__tests__/
│       ├── constants.test.js
│       ├── diff.test.js
│       ├── forks.test.js
│       ├── router.test.js
│       ├── search.test.js
│       └── utils.test.js
├── .github/workflows/
│   └── deploy.yml            # CI: deno test -> rebuild catalog -> deploy to GitHub Pages
├── SCHEMA.md                 # index JSON schema documentation
├── CLAUDE.md                 # agent context (enzyme CLI, project conventions)
└── PLAN.md                   # development roadmap

Extractors

Each extractor handles one source format:

  • Python AST (extract_python.py): Walks Python source files, extracts class/function definitions with full code, tracks fork modifications via [New in fork] / [Modified in fork] annotations.
  • OpenAPI (extract_openapi.py): Parses OpenAPI YAML, resolves $ref chains, extracts endpoints with parameters, response types, SSZ support, and fork variants.
  • OpenRPC (extract_openrpc.py): Parses OpenRPC JSON, extracts JSON-RPC methods with params, results, error codes, and content descriptors.
  • Markdown (extract_markdown.py): Extracts type definitions and endpoint descriptions from Markdown spec pages (used alongside OpenAPI for specs that document types in prose).

Enrichment

enrich.py adds structural metadata after extraction: field lists for containers, function signatures, reference graphs between types, domain classification, and fork diff annotations (is_new, is_modified).

Profiles

profiles.py defines the extraction configuration for each spec: which extractors to run, directory paths within the repo, fork ordering, GitHub URL templates, and any spec-specific extraction options.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors