Skip to content

Refactor multi-source commands and add eCFR package support#40

Merged
chris-c-thomas merged 16 commits intomainfrom
feat/multi-source-refactor
Mar 14, 2026
Merged

Refactor multi-source commands and add eCFR package support#40
chris-c-thomas merged 16 commits intomainfrom
feat/multi-source-refactor

Conversation

@chris-c-thomas
Copy link
Copy Markdown
Owner

This pull request introduces a major refactor to support multiple legal XML sources, most notably adding eCFR (Code of Federal Regulations) support alongside the existing U.S. Code. It updates documentation, expands CLI commands, improves output file structure, and enhances frontmatter metadata for source discrimination. Several minor and patch changes also improve usability and accuracy. The most important changes are grouped below:

Multi-source architecture and eCFR support:

  • Refactored core packages to handle multiple XML sources, added the @lexbuild/ecfr package, and implemented conversion of bulk eCFR XML to structured Markdown.
  • Updated documentation (CLAUDE.md) throughout the repo to describe the new multi-source architecture, identifier schemes, output file structure, and the process for adding new source types. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

CLI and conversion command improvements:

  • Refactored download and convert CLI commands to be namespaced by source (e.g., download-usc, convert-ecfr), with bare commands now showing a source selection error. [1] [2] [3]

Output and metadata enhancements:

  • Added a fourth granularity option ("by chapter") to the eCFR converter, and updated output file naming conventions for both USC and eCFR. [1] [2]
  • Updated frontmatter in output Markdown files to include source and legal_status fields for source discrimination, and bumped format_version to 1.1.0. [1] [2] [3] [4] [5] [6] [7]

Downloader and configuration improvements:

  • Adjusted the downloader to gracefully skip reserved titles (e.g., Title 35 in eCFR), and updated .changeset/config.json to include the new @lexbuild/ecfr package in the fixed group. [1] [2] [3]

Documentation and minor updates:

  • Updated README files for core, usc, ecfr, and cli packages to reflect the new architecture and features.

These changes lay the groundwork for supporting additional legal XML sources, improve usability and clarity for users, and ensure output files are properly annotated for downstream consumers.

@chris-c-thomas chris-c-thomas added enhancement Enhancements made to exisiting features or codebase feature New feature implemented dev labels Mar 14, 2026
Copilot AI review requested due to automatic review settings March 14, 2026 08:19
@chris-c-thomas chris-c-thomas added main dependencies Pull requests that update a dependency file labels Mar 14, 2026
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Mar 14, 2026

Greptile Summary

This PR introduces a significant multi-source architecture refactor, adding the @lexbuild/ecfr package alongside the existing USC support. It namespaces CLI commands (download-usc, convert-ecfr, etc.), adds source and legal_status discriminator fields to all output frontmatter, and bumps format_version to 1.1.0. The eCFR converter follows the same collect-then-write pattern as the USC converter and supports four granularity modes (section, part, chapter, title).

Key issues found in the new @lexbuild/ecfr package:

  • Broken link registration for duplicate sections (converter.ts L213-218): The condition suffix && occurrence === 1 is always false (suffix is only set when occurrence > 1), making the second linkResolver.register call dead code. As a result, the canonical identifier for duplicate sections is re-registered on every occurrence, ultimately pointing to the last duplicate's path rather than the primary one.
  • enrichPartLevelNotes is a no-op (converter.ts L342-373): The function searches collected for part nodes, but since emitAt = "section" for section granularity, parts are never emitted and never appear in collected. Authority and regulatory source notes from part-level AUTH/SOURCE elements will never be propagated to section frontmatter via this path.
  • Chapter granularity produces double-nested paths (ecfr-path.ts L33-37): buildEcfrOutputPath adds a chapter-X directory from the ancestor lookup and then appends chapter-X.md as a child of that directory, producing paths like title-17/chapter-I/chapter-I.md instead of title-17/chapter-I.md.
  • Redundant !dryRun guard (converter.ts L242-244): An early return at L166 already handles dryRun; the guard inside the section block is always true.
  • Unused import (ecfr-builder.ts L38): ECFR_TABLE_ELEMENTS is imported but never referenced — table elements are matched via direct string comparisons instead.

Confidence Score: 2/5

  • Not safe to merge as-is — chapter granularity will write files to incorrect paths, duplicate sections will have broken cross-reference resolution, and part-level authority/source enrichment is silently broken.
  • Two of the three logic bugs directly affect correctness of output files at runtime: chapter granularity writes to wrong nested paths (users get broken file layouts), and duplicate sections resolve links to the wrong file. The enrichPartLevelNotes issue is a silent data loss in frontmatter. These are not edge-case concerns — chapter granularity is a newly advertised feature and duplicate section numbers occur in real CFR titles.
  • packages/ecfr/src/converter.ts and packages/ecfr/src/ecfr-path.ts need the most attention before merge.

Important Files Changed

Filename Overview
packages/ecfr/src/converter.ts New eCFR conversion orchestrator with three logic issues: dead/incorrect link registration for duplicate sections, a no-op enrichPartLevelNotes function that can never find part nodes, and a redundant !dryRun guard that is always true at its call site.
packages/ecfr/src/ecfr-path.ts Output path builder for eCFR; chapter-granularity paths are double-nested (e.g. title-17/chapter-I/chapter-I.md) because the chapter ancestor is added to segments from both the ancestor lookup and the node-type branch.
packages/ecfr/src/ecfr-builder.ts New SAX-driven AST builder for eCFR XML; well-structured stack-based parser. Minor: imports ECFR_TABLE_ELEMENTS but never uses it — table elements are handled via direct string comparisons instead.
packages/ecfr/src/ecfr-frontmatter.ts Builds FrontmatterData from eCFR AST nodes; correctly sets source/legal_status. Roman-numeral CFR chapter designations silently produce no chapter_number (parseInt returns NaN), which is acknowledged in a comment but leaves that frontmatter field absent for most eCFR chapters.
packages/ecfr/src/downloader.ts Clean eCFR bulk XML downloader; correctly skips reserved titles (e.g. Title 35), streams responses directly to disk, and handles HTTP errors gracefully.
packages/cli/src/commands/convert-ecfr.ts New CLI command for eCFR conversion with single-file, title-range, and --all modes; summary display and spinner logic mirrors the USC command cleanly. No logic issues found.
packages/cli/src/index.ts Updated CLI entry point registers source-specific commands and adds helpful download/convert stub commands that redirect users to the appropriate namespaced variants.
packages/core/src/ast/types.ts Added source, legal_status fields and LegalStatus/SourceType discriminator types to FrontmatterData; clean extensible design for future sources.
packages/usc/src/converter.ts Minor update to populate source: "usc" and legal_status fields in all three frontmatter-building sites (section, chapter, title); straightforward and correct.
packages/ecfr/src/ecfr-elements.ts Complete element classification for eCFR GPO/SGML-derived XML; well-commented and complete. ECFR_TABLE_ELEMENTS is exported but unused in the builder.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[eCFR XML File] --> B[XMLParser SAX stream]
    B --> C[EcfrASTBuilder\nonOpenElement / onCloseElement / onText]
    C --> D{emitAt level}
    D -->|section| E[Emit LevelNode at section]
    D -->|part| F[Emit LevelNode at part]
    D -->|title| G[Emit LevelNode at title]
    E --> H[collected: CollectedSection array]
    F --> H
    G --> H
    H --> I{granularity}
    I -->|section| J[Two-pass duplicate detection\n+ link registration]
    I -->|chapter| K[Group by chapter ancestor\nBuild synthetic LevelNode]
    I -->|part / title| L[Filter to target level]
    J --> M[buildEcfrFrontmatter]
    K --> M
    L --> M
    M --> N[buildEcfrOutputPath]
    N --> O[renderDocument → Markdown]
    O --> P[writeFile to disk]
    P --> Q{section granularity?}
    Q -->|yes| R[writeMetaFiles\n_meta.json + README.md]
    Q -->|no| S[Return EcfrConvertResult]
    R --> S
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 213-219

Comment:
**Dead code and overwritten link registration for duplicates**

The condition `suffix && occurrence === 1` can **never be true**. `suffix` is only set when `occurrence > 1` (see the assignment on line 195: `const suffix = total > 1 && occurrence > 1 ? \`-${occurrence}\` : ""`). So the inner `linkResolver.register` call is unreachable dead code.

More critically, this means that for duplicate sections, the canonical identifier gets registered **twice** — once pointing to the first occurrence's path (correct), and then overwritten to point to the last duplicate's suffixed path. Cross-references using the canonical identifier will resolve to the last duplicate instead of the primary occurrence.

The intended behaviour is likely to register the canonical identifier once (to the first occurrence) and skip re-registering for subsequent duplicates. The fix should be:

```suggestion
      if (node.identifier) {
        // Only register the canonical identifier for the first occurrence so that
        // cross-references resolve to the primary file, not a later duplicate.
        if (occurrence === 1) {
          linkResolver.register(node.identifier, suffixedPath);
        }
      }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 354-372

Comment:
**`enrichPartLevelNotes` is always a no-op for section granularity**

This function searches `collected` for a node whose `identifier` matches the part ancestor. However, `collected` contains only **section-level** nodes — because `emitAt` is set to `"section"` for section granularity (line 119), part-level nodes are never emitted and never appear in `collected`. The `for` loop at line 357 will never find a match, so authority and regulatory source are never enriched.

If part-level `AUTH`/`SOURCE` notes need to be propagated to sections, the builder needs to capture those notes when the part node is being closed (e.g., store them in a side map keyed by part identifier) rather than searching for a part node in the section-level collected array.

```typescript
// Current: will never find a part node in collected
for (const item of collected) {
  if (item.node.identifier === partIdentifier) { // always false
    ...
  }
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/ecfr/src/ecfr-path.ts
Line: 33-37

Comment:
**Chapter granularity creates a double-nested path**

When `node.levelType === "chapter"`, `segments` already contains `chapter-${chapterNum}` (added at line 27–29 because the chapter is found in `context.ancestors`). Then `chapter-${chapNum}.md` is appended again, producing paths like `output/ecfr/title-17/chapter-I/chapter-I.md` instead of the expected `output/ecfr/title-17/chapter-I.md`.

The test suite confirms that sections emit their chapter ancestor in `context.ancestors` (see `ecfr-builder.test.ts` line 61), so `chapterNum` will always be found when the node is a chapter.

The fix is to skip the ancestor-based chapter directory segment when the node itself is a chapter:

```suggestion
  } else if (node.levelType === "chapter") {
    // Chapter-level file — one file per chapter directly inside title dir (no subdirectory)
    const chapNum = node.numValue ?? "0";
    return join(outputRoot, "ecfr", titleDir, `chapter-${chapNum}.md`);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 242-245

Comment:
**Redundant `!dryRun` guard**

At this point in the code `dryRun` is always `false` — the function returns early at line 166–168 when `dryRun` is `true`. This `if (!dryRun)` guard can be removed to reduce dead-branch confusion.

```suggestion
    await writeMetaFiles(sectionMetas, titleNumber, titleName, output, granularity, input);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/ecfr/src/ecfr-builder.ts
Line: 38

Comment:
**Unused import `ECFR_TABLE_ELEMENTS`**

`ECFR_TABLE_ELEMENTS` is imported here but never referenced anywhere in this file. Table elements (`TABLE`, `TR`, `TH`, `TD`) are handled through explicit string comparisons in `onOpenElement` and `onCloseElement`. The import can be removed to keep the import list accurate.

```suggestion
  ECFR_REF_ELEMENTS,
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 2a400d2

Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/ecfr-path.ts
Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/ecfr-builder.ts Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors LexBuild into a multi-source architecture and adds first-class eCFR support alongside existing U.S. Code (USC) conversion, including updated frontmatter for source/provenance discrimination and CLI command namespacing.

Changes:

  • Introduces new @lexbuild/ecfr package (builder, converter, downloader, fixtures/tests) and integrates it into the workspace/CLI.
  • Updates @lexbuild/core for multi-source identifiers (USC + CFR), frontmatter schema (source, legal_status, format version bump), and USLM element classification renames.
  • Namespaces CLI commands by source (download-usc/convert-usc, download-ecfr/convert-ecfr) and updates docs/READMEs accordingly.

Reviewed changes

Copilot reviewed 71 out of 73 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
pnpm-lock.yaml Adds workspace linkage for @lexbuild/ecfr and lockfile updates.
packages/usc/src/converter.ts Adds source/legal_status to USC frontmatter.
packages/usc/src/converter.test.ts Adjusts status assertion to avoid legal_status substring collisions.
packages/usc/README.md Updates USC package description for multi-source messaging.
packages/ecfr/tsup.config.ts Adds build config for new @lexbuild/ecfr package.
packages/ecfr/tsconfig.json Adds TS config for new @lexbuild/ecfr package.
packages/ecfr/src/index.ts Barrel exports for eCFR converter/downloader/builder/constants.
packages/ecfr/src/ecfr-path.ts Implements eCFR output path conventions.
packages/ecfr/src/ecfr-frontmatter.ts Implements eCFR frontmatter construction.
packages/ecfr/src/ecfr-elements.ts Defines eCFR element classification sets/maps.
packages/ecfr/src/ecfr-builder.ts Implements SAX→AST builder for eCFR GPO/SGML XML.
packages/ecfr/src/ecfr-builder.test.ts Adds unit tests for eCFR builder behavior.
packages/ecfr/src/downloader.ts Adds govinfo eCFR title downloader.
packages/ecfr/src/converter.ts Adds eCFR conversion orchestrator + metadata emission.
packages/ecfr/package.json Defines new published package metadata/scripts/deps.
packages/ecfr/README.md Documents eCFR package usage, output, and frontmatter.
packages/ecfr/CLAUDE.md Adds package architecture/schema notes for eCFR.
packages/core/src/xml/uslm-elements.ts Renames/clarifies USLM element+namespace constants module.
packages/core/src/xml/uslm-elements.test.ts Updates tests to import the renamed module.
packages/core/src/xml/parser.ts Updates parser imports to new USLM constants module.
packages/core/src/xml/parser.test.ts Updates parser tests to import renamed constants module.
packages/core/src/markdown/links.ts Extends identifier parsing and fallback URLs to CFR.
packages/core/src/markdown/frontmatter.ts Bumps FORMAT_VERSION and emits source/legal_status + new optional fields.
packages/core/src/markdown/frontmatter.test.ts Updates tests for required source/legal_status and new version.
packages/core/src/index.ts Updates barrel exports (USLM constants path; USLM builder aliasing; new types).
packages/core/src/index.test.ts Updates expected FORMAT_VERSION.
packages/core/src/ast/uslm-builder.ts Updates imports for renamed USLM constants module.
packages/core/src/ast/uslm-builder.test.ts Updates test imports for builder rename.
packages/core/src/ast/types.ts Adds SourceType/LegalStatus; makes source/legal_status required in FrontmatterData.
packages/core/README.md Updates docs for multi-source support and new APIs/version.
packages/core/CLAUDE.md Updates architecture docs for multi-source and new module names/version.
packages/cli/src/parse-titles.ts Adds maxTitle parameter to support differing title ranges (54 vs 50).
packages/cli/src/index.ts Registers source-specific commands; adds stub download/convert guidance.
packages/cli/src/index.test.ts Updates CLI help tests for new commands and stub behavior.
packages/cli/src/commands/download-usc.ts Renames command to download-usc and updates help text.
packages/cli/src/commands/download-ecfr.ts Adds new download-ecfr command.
packages/cli/src/commands/convert-usc.ts Renames command to convert-usc and updates help text.
packages/cli/src/commands/convert-usc.test.ts Updates imports for renamed USC convert command module.
packages/cli/src/commands/convert-ecfr.ts Adds new convert-ecfr command with multi-title support.
packages/cli/package.json Adds dependency on @lexbuild/ecfr.
packages/cli/README.md Updates CLI docs for new command names + eCFR usage/output.
packages/cli/CLAUDE.md Updates CLI architecture docs for multi-source commands.
fixtures/fragments/ecfr/title-structure.xml Adds eCFR fixture for hierarchy parsing.
fixtures/fragments/ecfr/simple-section.xml Adds basic eCFR section fixture.
fixtures/fragments/ecfr/section-with-table.xml Adds eCFR fixture for HTML-table parsing.
fixtures/fragments/ecfr/section-with-notes.xml Adds eCFR fixture for AUTH/SOURCE/CITA/SECAUTH notes.
fixtures/fragments/ecfr/section-with-emphasis.xml Adds eCFR emphasis/FP fixture.
fixtures/fragments/ecfr/section-with-authority.xml Adds eCFR part-level authority/source fixture.
fixtures/fragments/ecfr/appendix.xml Adds eCFR appendix fixture.
fixtures/expected/title-granularity.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/table.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/subsections.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/status-transferred.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/status-reserved.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/status-repealed.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/status-current.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/simple-section.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/notes-statutory-only.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/notes-none.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/notes-amendments-only.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/notes-all.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/layout.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/duplicate-second.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/duplicate-other.md Updates USC expected output for new frontmatter fields/version.
fixtures/expected/duplicate-first.md Updates USC expected output for new frontmatter fields/version.
README.md Updates root README for multi-source support, commands, and output/docs.
CLAUDE.md Updates root architecture guide for multi-source + eCFR details.
.changeset/thin-pants-mix.md Changeset: README updates across packages.
.changeset/odd-crabs-mate.md Changeset: adds chapter granularity for eCFR.
.changeset/four-years-reply.md Changeset: source-namespaced CLI commands.
.changeset/floppy-kiwis-cheat.md Changeset: multi-source refactor + eCFR support.
.changeset/config.json Adds @lexbuild/ecfr to the fixed versioning group.
.changeset/brave-pets-create.md Changeset: reserved-title skipping in downloader.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/ecfr-frontmatter.ts
Comment thread packages/ecfr/src/ecfr-path.ts
Comment thread packages/ecfr/src/converter.ts
Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/converter.ts
Comment thread packages/ecfr/src/ecfr-builder.ts
Comment thread packages/core/src/ast/types.ts
Comment thread packages/ecfr/src/converter.ts Outdated
Comment thread packages/ecfr/src/converter.ts
…lder

  - Fix broken duplicate section link registration
  - Fix enrichPartLevelNotes being a no-op
  - Fix chapter granularity producing double-nested paths
  - Remove redundant dryRun guard around writeMetaFiles
  - Remove unused ECFR_TABLE_ELEMENTS import from ecfr-builder.ts
  - Implement true two-pass link registration: pass 1 computes output
    paths and registers all identifiers, pass 2 renders and writes files.
    Forward cross-references now resolve correctly.
  - Fix misleading chapter_number comment in ecfr-frontmatter.ts
  - Add empty string fallback in sanitizeFilename to prevent bare .md
    output filenames
  - Remove unused ignoreDepth field from EcfrASTBuilder
  - Compute meaningful partCount for part and chapter granularities
    instead of always returning 0
  - Replace non-null assertions with safe narrowing in pass 2 loop
@sonarqubecloud
Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
7 Security Hotspots
6.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@chris-c-thomas chris-c-thomas merged commit 49e6e61 into main Mar 14, 2026
6 of 7 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 14, 2026
@chris-c-thomas chris-c-thomas deleted the feat/multi-source-refactor branch March 15, 2026 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file dev enhancement Enhancements made to exisiting features or codebase feature New feature implemented main

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants