Refactor multi-source commands and add eCFR package support#40
Refactor multi-source commands and add eCFR package support#40chris-c-thomas merged 16 commits intomainfrom
Conversation
…structured markdown.
Greptile SummaryThis PR introduces a significant multi-source architecture refactor, adding the Key issues found in the new
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[eCFR XML File] --> B[XMLParser SAX stream]
B --> C[EcfrASTBuilder\nonOpenElement / onCloseElement / onText]
C --> D{emitAt level}
D -->|section| E[Emit LevelNode at section]
D -->|part| F[Emit LevelNode at part]
D -->|title| G[Emit LevelNode at title]
E --> H[collected: CollectedSection array]
F --> H
G --> H
H --> I{granularity}
I -->|section| J[Two-pass duplicate detection\n+ link registration]
I -->|chapter| K[Group by chapter ancestor\nBuild synthetic LevelNode]
I -->|part / title| L[Filter to target level]
J --> M[buildEcfrFrontmatter]
K --> M
L --> M
M --> N[buildEcfrOutputPath]
N --> O[renderDocument → Markdown]
O --> P[writeFile to disk]
P --> Q{section granularity?}
Q -->|yes| R[writeMetaFiles\n_meta.json + README.md]
Q -->|no| S[Return EcfrConvertResult]
R --> S
Prompt To Fix All With AIThis is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 213-219
Comment:
**Dead code and overwritten link registration for duplicates**
The condition `suffix && occurrence === 1` can **never be true**. `suffix` is only set when `occurrence > 1` (see the assignment on line 195: `const suffix = total > 1 && occurrence > 1 ? \`-${occurrence}\` : ""`). So the inner `linkResolver.register` call is unreachable dead code.
More critically, this means that for duplicate sections, the canonical identifier gets registered **twice** — once pointing to the first occurrence's path (correct), and then overwritten to point to the last duplicate's suffixed path. Cross-references using the canonical identifier will resolve to the last duplicate instead of the primary occurrence.
The intended behaviour is likely to register the canonical identifier once (to the first occurrence) and skip re-registering for subsequent duplicates. The fix should be:
```suggestion
if (node.identifier) {
// Only register the canonical identifier for the first occurrence so that
// cross-references resolve to the primary file, not a later duplicate.
if (occurrence === 1) {
linkResolver.register(node.identifier, suffixedPath);
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 354-372
Comment:
**`enrichPartLevelNotes` is always a no-op for section granularity**
This function searches `collected` for a node whose `identifier` matches the part ancestor. However, `collected` contains only **section-level** nodes — because `emitAt` is set to `"section"` for section granularity (line 119), part-level nodes are never emitted and never appear in `collected`. The `for` loop at line 357 will never find a match, so authority and regulatory source are never enriched.
If part-level `AUTH`/`SOURCE` notes need to be propagated to sections, the builder needs to capture those notes when the part node is being closed (e.g., store them in a side map keyed by part identifier) rather than searching for a part node in the section-level collected array.
```typescript
// Current: will never find a part node in collected
for (const item of collected) {
if (item.node.identifier === partIdentifier) { // always false
...
}
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/ecfr/src/ecfr-path.ts
Line: 33-37
Comment:
**Chapter granularity creates a double-nested path**
When `node.levelType === "chapter"`, `segments` already contains `chapter-${chapterNum}` (added at line 27–29 because the chapter is found in `context.ancestors`). Then `chapter-${chapNum}.md` is appended again, producing paths like `output/ecfr/title-17/chapter-I/chapter-I.md` instead of the expected `output/ecfr/title-17/chapter-I.md`.
The test suite confirms that sections emit their chapter ancestor in `context.ancestors` (see `ecfr-builder.test.ts` line 61), so `chapterNum` will always be found when the node is a chapter.
The fix is to skip the ancestor-based chapter directory segment when the node itself is a chapter:
```suggestion
} else if (node.levelType === "chapter") {
// Chapter-level file — one file per chapter directly inside title dir (no subdirectory)
const chapNum = node.numValue ?? "0";
return join(outputRoot, "ecfr", titleDir, `chapter-${chapNum}.md`);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/ecfr/src/converter.ts
Line: 242-245
Comment:
**Redundant `!dryRun` guard**
At this point in the code `dryRun` is always `false` — the function returns early at line 166–168 when `dryRun` is `true`. This `if (!dryRun)` guard can be removed to reduce dead-branch confusion.
```suggestion
await writeMetaFiles(sectionMetas, titleNumber, titleName, output, granularity, input);
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: packages/ecfr/src/ecfr-builder.ts
Line: 38
Comment:
**Unused import `ECFR_TABLE_ELEMENTS`**
`ECFR_TABLE_ELEMENTS` is imported here but never referenced anywhere in this file. Table elements (`TABLE`, `TR`, `TH`, `TD`) are handled through explicit string comparisons in `onOpenElement` and `onCloseElement`. The import can be removed to keep the import list accurate.
```suggestion
ECFR_REF_ELEMENTS,
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 2a400d2 |
There was a problem hiding this comment.
Pull request overview
This PR refactors LexBuild into a multi-source architecture and adds first-class eCFR support alongside existing U.S. Code (USC) conversion, including updated frontmatter for source/provenance discrimination and CLI command namespacing.
Changes:
- Introduces new
@lexbuild/ecfrpackage (builder, converter, downloader, fixtures/tests) and integrates it into the workspace/CLI. - Updates
@lexbuild/corefor multi-source identifiers (USC + CFR), frontmatter schema (source,legal_status, format version bump), and USLM element classification renames. - Namespaces CLI commands by source (
download-usc/convert-usc,download-ecfr/convert-ecfr) and updates docs/READMEs accordingly.
Reviewed changes
Copilot reviewed 71 out of 73 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Adds workspace linkage for @lexbuild/ecfr and lockfile updates. |
| packages/usc/src/converter.ts | Adds source/legal_status to USC frontmatter. |
| packages/usc/src/converter.test.ts | Adjusts status assertion to avoid legal_status substring collisions. |
| packages/usc/README.md | Updates USC package description for multi-source messaging. |
| packages/ecfr/tsup.config.ts | Adds build config for new @lexbuild/ecfr package. |
| packages/ecfr/tsconfig.json | Adds TS config for new @lexbuild/ecfr package. |
| packages/ecfr/src/index.ts | Barrel exports for eCFR converter/downloader/builder/constants. |
| packages/ecfr/src/ecfr-path.ts | Implements eCFR output path conventions. |
| packages/ecfr/src/ecfr-frontmatter.ts | Implements eCFR frontmatter construction. |
| packages/ecfr/src/ecfr-elements.ts | Defines eCFR element classification sets/maps. |
| packages/ecfr/src/ecfr-builder.ts | Implements SAX→AST builder for eCFR GPO/SGML XML. |
| packages/ecfr/src/ecfr-builder.test.ts | Adds unit tests for eCFR builder behavior. |
| packages/ecfr/src/downloader.ts | Adds govinfo eCFR title downloader. |
| packages/ecfr/src/converter.ts | Adds eCFR conversion orchestrator + metadata emission. |
| packages/ecfr/package.json | Defines new published package metadata/scripts/deps. |
| packages/ecfr/README.md | Documents eCFR package usage, output, and frontmatter. |
| packages/ecfr/CLAUDE.md | Adds package architecture/schema notes for eCFR. |
| packages/core/src/xml/uslm-elements.ts | Renames/clarifies USLM element+namespace constants module. |
| packages/core/src/xml/uslm-elements.test.ts | Updates tests to import the renamed module. |
| packages/core/src/xml/parser.ts | Updates parser imports to new USLM constants module. |
| packages/core/src/xml/parser.test.ts | Updates parser tests to import renamed constants module. |
| packages/core/src/markdown/links.ts | Extends identifier parsing and fallback URLs to CFR. |
| packages/core/src/markdown/frontmatter.ts | Bumps FORMAT_VERSION and emits source/legal_status + new optional fields. |
| packages/core/src/markdown/frontmatter.test.ts | Updates tests for required source/legal_status and new version. |
| packages/core/src/index.ts | Updates barrel exports (USLM constants path; USLM builder aliasing; new types). |
| packages/core/src/index.test.ts | Updates expected FORMAT_VERSION. |
| packages/core/src/ast/uslm-builder.ts | Updates imports for renamed USLM constants module. |
| packages/core/src/ast/uslm-builder.test.ts | Updates test imports for builder rename. |
| packages/core/src/ast/types.ts | Adds SourceType/LegalStatus; makes source/legal_status required in FrontmatterData. |
| packages/core/README.md | Updates docs for multi-source support and new APIs/version. |
| packages/core/CLAUDE.md | Updates architecture docs for multi-source and new module names/version. |
| packages/cli/src/parse-titles.ts | Adds maxTitle parameter to support differing title ranges (54 vs 50). |
| packages/cli/src/index.ts | Registers source-specific commands; adds stub download/convert guidance. |
| packages/cli/src/index.test.ts | Updates CLI help tests for new commands and stub behavior. |
| packages/cli/src/commands/download-usc.ts | Renames command to download-usc and updates help text. |
| packages/cli/src/commands/download-ecfr.ts | Adds new download-ecfr command. |
| packages/cli/src/commands/convert-usc.ts | Renames command to convert-usc and updates help text. |
| packages/cli/src/commands/convert-usc.test.ts | Updates imports for renamed USC convert command module. |
| packages/cli/src/commands/convert-ecfr.ts | Adds new convert-ecfr command with multi-title support. |
| packages/cli/package.json | Adds dependency on @lexbuild/ecfr. |
| packages/cli/README.md | Updates CLI docs for new command names + eCFR usage/output. |
| packages/cli/CLAUDE.md | Updates CLI architecture docs for multi-source commands. |
| fixtures/fragments/ecfr/title-structure.xml | Adds eCFR fixture for hierarchy parsing. |
| fixtures/fragments/ecfr/simple-section.xml | Adds basic eCFR section fixture. |
| fixtures/fragments/ecfr/section-with-table.xml | Adds eCFR fixture for HTML-table parsing. |
| fixtures/fragments/ecfr/section-with-notes.xml | Adds eCFR fixture for AUTH/SOURCE/CITA/SECAUTH notes. |
| fixtures/fragments/ecfr/section-with-emphasis.xml | Adds eCFR emphasis/FP fixture. |
| fixtures/fragments/ecfr/section-with-authority.xml | Adds eCFR part-level authority/source fixture. |
| fixtures/fragments/ecfr/appendix.xml | Adds eCFR appendix fixture. |
| fixtures/expected/title-granularity.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/table.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/subsections.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/status-transferred.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/status-reserved.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/status-repealed.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/status-current.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/simple-section.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/notes-statutory-only.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/notes-none.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/notes-amendments-only.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/notes-all.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/layout.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/duplicate-second.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/duplicate-other.md | Updates USC expected output for new frontmatter fields/version. |
| fixtures/expected/duplicate-first.md | Updates USC expected output for new frontmatter fields/version. |
| README.md | Updates root README for multi-source support, commands, and output/docs. |
| CLAUDE.md | Updates root architecture guide for multi-source + eCFR details. |
| .changeset/thin-pants-mix.md | Changeset: README updates across packages. |
| .changeset/odd-crabs-mate.md | Changeset: adds chapter granularity for eCFR. |
| .changeset/four-years-reply.md | Changeset: source-namespaced CLI commands. |
| .changeset/floppy-kiwis-cheat.md | Changeset: multi-source refactor + eCFR support. |
| .changeset/config.json | Adds @lexbuild/ecfr to the fixed versioning group. |
| .changeset/brave-pets-create.md | Changeset: reserved-title skipping in downloader. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
…lder - Fix broken duplicate section link registration - Fix enrichPartLevelNotes being a no-op - Fix chapter granularity producing double-nested paths - Remove redundant dryRun guard around writeMetaFiles - Remove unused ECFR_TABLE_ELEMENTS import from ecfr-builder.ts
- Implement true two-pass link registration: pass 1 computes output
paths and registers all identifiers, pass 2 renders and writes files.
Forward cross-references now resolve correctly.
- Fix misleading chapter_number comment in ecfr-frontmatter.ts
- Add empty string fallback in sanitizeFilename to prevent bare .md
output filenames
- Remove unused ignoreDepth field from EcfrASTBuilder
- Compute meaningful partCount for part and chapter granularities
instead of always returning 0
- Replace non-null assertions with safe narrowing in pass 2 loop
|


This pull request introduces a major refactor to support multiple legal XML sources, most notably adding eCFR (Code of Federal Regulations) support alongside the existing U.S. Code. It updates documentation, expands CLI commands, improves output file structure, and enhances frontmatter metadata for source discrimination. Several minor and patch changes also improve usability and accuracy. The most important changes are grouped below:
Multi-source architecture and eCFR support:
@lexbuild/ecfrpackage, and implemented conversion of bulk eCFR XML to structured Markdown.CLAUDE.md) throughout the repo to describe the new multi-source architecture, identifier schemes, output file structure, and the process for adding new source types. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]CLI and conversion command improvements:
download-usc,convert-ecfr), with bare commands now showing a source selection error. [1] [2] [3]Output and metadata enhancements:
sourceandlegal_statusfields for source discrimination, and bumpedformat_versionto1.1.0. [1] [2] [3] [4] [5] [6] [7]Downloader and configuration improvements:
.changeset/config.jsonto include the new@lexbuild/ecfrpackage in the fixed group. [1] [2] [3]Documentation and minor updates:
These changes lay the groundwork for supporting additional legal XML sources, improve usability and clarity for users, and ensure output files are properly annotated for downstream consumers.