feat: add document inventory queue by elasticdotventures · Pull Request #31 · PromptExecution/ledgrrr

elasticdotventures · 2026-04-17T11:01:02Z

Closes #23

Summary

add a first-class l3dg3rr_document_inventory MCP tool for deterministic document discovery and triage
derive queue state from filesystem PDFs plus known ingested artifacts, with no claim/prioritization semantics in this first cut
expose stable status fields (invalid_name, ready, ingested) plus small-model hints and filename metadata

Validation

cargo test -p ledgerr-mcp
just test

… read generated artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…cture Merges origin/main (PRs #27-#30: MCP surface collapse, codegen contract, workbook export, persist state) into codex-issue-23-document-queue. Resolution strategy: - Keep origin/main's 7-tool contract architecture throughout - Retain document inventory helpers (lib.rs) and service call path - Add DocumentInventory as a new action variant in DocumentsArgs so document_inventory is now reachable as ledgerr_documents + action - Add handle_document_inventory + parse helpers to mcp_adapter.rs - Update PUBLISHED_TOOLS actions list and mcp-capability-contract.md to include document_inventory in the ledgerr_documents row All 41 test suites pass including contract_codegen schema/doc drift checks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds a deterministic “document inventory / queue” capability to the ledgerr_documents MCP surface, enabling agents to discover PDFs in a directory and triage them into stable status buckets (invalid_name, ready, ingested) with small-model hints.

Changes:

Extend the ledgerr_documents contract/action set with document_inventory and document status types.
Implement filesystem-derived inventory in TurboLedgerService and expose it via the MCP adapter with a summary payload.
Add service-level tests covering ready/ingested/invalid classification and status filtering.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
docs/mcp-capability-contract.md	Advertises `document_inventory` as an action under `ledgerr_documents`.
crates/ledgerr-mcp/src/contract.rs	Adds `document_inventory` to the published actions and introduces `DocumentsArgs::DocumentInventory`.
crates/ledgerr-mcp/src/lib.rs	Implements document inventory discovery, status derivation, and response structs.
crates/ledgerr-mcp/src/mcp_adapter.rs	Wires `document_inventory` through the transport adapter and adds `summary.status_counts`.
crates/ledgerr-mcp/tests/document_inventory.rs	Adds tests for deterministic ordering, statuses, and filtering.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T23:23:29Z

 | Tool | Purpose | Common actions |
 |---|---|---|
-| `ledgerr_documents` | document intake, routing, manifest/account discovery, raw context retrieval | `list_accounts`, `pipeline_status`, `validate_filename`, `ingest_pdf`, `ingest_rows`, `get_raw_context` |
+| `ledgerr_documents` | document intake, routing, manifest/account discovery, raw context retrieval | `list_accounts`, `pipeline_status`, `validate_filename`, `ingest_pdf`, `ingest_rows`, `get_raw_context`, `document_inventory` |


PR description mentions a first-class l3dg3rr_document_inventory MCP tool, but the published surface documented here adds document_inventory as an action under ledgerr_documents (and there is no corresponding l3dg3rr_* compatibility alias wired). Either update the PR description to match the contract approach, or add the promised alias/tool if that’s required for back-compat.

Copilot · 2026-04-17T23:23:30Z

+#[test]
+fn document_inventory_lists_ready_ingested_and_invalid_documents_deterministically() {
+    let tmp = tempfile::tempdir().expect("tempdir");
+    let workbook_path = tmp.path().join("tax-ledger.xlsx");
+    let service = service_for(&workbook_path);


These tests cover the service-level document_inventory behavior, but the PR also introduces an MCP-facing action (ledgerr_documents/document_inventory) with a specific JSON envelope (including summary). Consider adding an adapter/contract-level test that calls mcp_adapter::handle_documents_tool for this action so the transport payload shape and argument validation remain deterministic over time.

Copilot · 2026-04-17T23:23:30Z

+fn source_ref_matches(source_ref: &std::path::Path, expected: &std::path::Path) -> bool {
+    let source_canonical = std::fs::canonicalize(source_ref).ok();
+    let expected_canonical = std::fs::canonicalize(expected).ok();
+    source_canonical.as_ref() == expected_canonical.as_ref()
+        || source_ref == expected
+        || source_ref.file_name() == expected.file_name()


source_ref_matches falls back to comparing only file_name() when canonicalization/path equality fails. This can incorrectly mark a document as ingested if two different directories contain the same *.rkyv filename (or if one path doesn’t exist), producing false-positive Ingested statuses. Prefer matching by canonicalized absolute path (and/or a normalized path relative to the allowed base) and remove the filename-only fallback.

Suggested change

fn source_ref_matches(source_ref: &std::path::Path, expected: &std::path::Path) -> bool {

let source_canonical = std::fs::canonicalize(source_ref).ok();

let expected_canonical = std::fs::canonicalize(expected).ok();

source_canonical.as_ref() == expected_canonical.as_ref()

|| source_ref == expected

|| source_ref.file_name() == expected.file_name()

fn normalized_path_for_match(path: &std::path::Path) -> PathBuf {

let mut normalized = PathBuf::new();

for component in path.components() {

match component {

std::path::Component::CurDir => {}

std::path::Component::ParentDir => {

normalized.pop();

}

other => normalized.push(other.as_os_str()),

}

}

normalized

}

fn source_ref_matches(source_ref: &std::path::Path, expected: &std::path::Path) -> bool {

match (

std::fs::canonicalize(source_ref),

std::fs::canonicalize(expected),

) {

(Ok(source_canonical), Ok(expected_canonical)) => source_canonical == expected_canonical,

_ => normalized_path_for_match(source_ref) == normalized_path_for_match(expected),

}

Copilot · 2026-04-17T23:23:30Z

+    let resolved = if directory.is_absolute() {
+        directory.to_path_buf()
+    } else {
+        let base = workbook_path
+            .parent()
+            .filter(|parent| !parent.as_os_str().is_empty())
+            .map(std::path::Path::to_path_buf)
+            .unwrap_or(std::env::current_dir().map_err(|e| ToolError::Internal(e.to_string()))?);
+        base.join(directory)
+    };
+
+    if !resolved.is_dir() {
+        return Err(ToolError::InvalidInput(format!(
+            "directory '{}' does not exist or is not a directory",
+            resolved.display()
+        )));
+    }
+    Ok(resolved)


resolve_document_inventory_directory currently accepts any absolute directory path. This makes the MCP-exposed inventory capable of listing PDFs anywhere on the host filesystem, which is inconsistent with the existing source_ref/rkyv_ref safety model that constrains paths under workbook_path.parent(). Consider applying the same allowed-base restriction (and ideally canonicalizing before the prefix check to avoid symlink escapes).

Suggested change

let resolved = if directory.is_absolute() {

directory.to_path_buf()

} else {

let base = workbook_path

.parent()

.filter(|parent| !parent.as_os_str().is_empty())

.map(std::path::Path::to_path_buf)

.unwrap_or(std::env::current_dir().map_err(|e| ToolError::Internal(e.to_string()))?);

base.join(directory)

};

if !resolved.is_dir() {

return Err(ToolError::InvalidInput(format!(

"directory '{}' does not exist or is not a directory",

resolved.display()

)));

}

Ok(resolved)

let base = workbook_path

.parent()

.filter(|parent| !parent.as_os_str().is_empty())

.map(std::path::Path::to_path_buf)

.unwrap_or(std::env::current_dir().map_err(|e| ToolError::Internal(e.to_string()))?);

let canonical_base =

std::fs::canonicalize(&base).map_err(|e| ToolError::Internal(e.to_string()))?;

let resolved = if directory.is_absolute() {

directory.to_path_buf()

} else {

base.join(directory)

};

let canonical_resolved = std::fs::canonicalize(&resolved).map_err(|_| {

ToolError::InvalidInput(format!(

"directory '{}' does not exist or is not a directory",

resolved.display()

))

})?;

if !canonical_resolved.is_dir() {

return Err(ToolError::InvalidInput(format!(

"directory '{}' does not exist or is not a directory",

resolved.display()

)));

}

if !canonical_resolved.starts_with(&canonical_base) {

return Err(ToolError::InvalidInput(format!(

"directory '{}' must be within '{}'",

resolved.display(),

canonical_base.display()

)));

}

Ok(canonical_resolved)

Copilot · 2026-04-17T23:23:31Z

+        let known_source_refs = self
+            .classification_state
+            .lock()
+            .map_err(|_| ToolError::Internal("classification lock poisoned".to_string()))?
+            .tx_rows


This implementation builds known_source_refs and then, for each discovered PDF, scans the entire list to derive ingested_tx_ids (and source_ref_matches may canonicalize paths during each comparison). For large ledgers/directories this becomes O(num_docs × num_rows) with significant filesystem calls. Consider precomputing an index (e.g., canonical raw-context path -> tx_ids) once per request to make per-document lookup O(1) and avoid repeated canonicalize() calls.

Copilot · 2026-04-17T23:23:31Z

+        } => {
+            let status_filter: Vec<DocumentQueueStatusRequest> = statuses
+                .into_iter()
+                .filter_map(|s| DocumentQueueStatusRequest::parse(s.as_str()))
+                .collect();
+            handle_document_inventory(
+                service,
+                &json!({
+                    "directory": directory,
+                    "recursive": recursive,
+                    "statuses": status_filter.iter().map(|s| s.as_str()).collect::<Vec<_>>(),
+                }),
+            )
+        }


In the contract-driven path, invalid status strings are silently dropped via filter_map, which means callers can pass unsupported values without getting an error (and may get an unfiltered result set). Since parse_document_inventory_request already treats unsupported statuses as InvalidInput, consider validating here too (e.g., return an error when any status fails to parse) so contract + direct JSON parsing behave consistently.

Suggested change

} => {

let status_filter: Vec<DocumentQueueStatusRequest> = statuses

.into_iter()

.filter_map(|s| DocumentQueueStatusRequest::parse(s.as_str()))

.collect();

handle_document_inventory(

service,

&json!({

"directory": directory,

"recursive": recursive,

"statuses": status_filter.iter().map(|s| s.as_str()).collect::<Vec<_>>(),

}),

)

}

} => handle_document_inventory(

service,

&json!({

"directory": directory,

"recursive": recursive,

"statuses": statuses,

}),

)

Copilot · 2026-04-17T23:23:31Z

+pub enum DocumentQueueStatusRequest {
+    InvalidName,
+    Ready,
+    Ingested,
+}
+
+impl DocumentQueueStatusRequest {


DocumentQueueStatusRequest is used both to filter requests (DocumentInventoryRequest.statuses) and as the response status type (DocumentRecordResponse.status). The *Request suffix is misleading for a type that is also returned to clients; consider renaming to a neutral type name (e.g., DocumentQueueStatus) and introducing a separate request filter type only if needed.

Suggested change

pub enum DocumentQueueStatusRequest {

InvalidName,

Ready,

Ingested,

}

impl DocumentQueueStatusRequest {

pub enum DocumentQueueStatus {

InvalidName,

Ready,

Ingested,

}

pub type DocumentQueueStatusRequest = DocumentQueueStatus;

impl DocumentQueueStatus {

feat: add document inventory queue

0dcd8f2

elasticdotventures mentioned this pull request Apr 17, 2026

Add document inventory and queue capabilities for back-office operation #23

Closed

Claude Sonnet (coordinator) and others added 2 commits April 17, 2026 23:15

fix(docker): copy docs and scripts into builder so contract tests can…

5b6f7d7

… read generated artifacts Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

elasticdotventures marked this pull request as ready for review April 17, 2026 23:17

Copilot AI review requested due to automatic review settings April 17, 2026 23:17

elasticdotventures merged commit dbe3fdb into main Apr 17, 2026
3 checks passed

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add document inventory queue#31

feat: add document inventory queue#31
elasticdotventures merged 3 commits into
mainfrom
codex-issue-23-document-queue

elasticdotventures commented Apr 17, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-fn source_ref_matches(source_ref: &std::path::Path, expected: &std::path::Path) -> bool {
-    let source_canonical = std::fs::canonicalize(source_ref).ok();
-    let expected_canonical = std::fs::canonicalize(expected).ok();
-    source_canonical.as_ref() == expected_canonical.as_ref()
-        || source_ref == expected
-        || source_ref.file_name() == expected.file_name()
+fn normalized_path_for_match(path: &std::path::Path) -> PathBuf {
+    let mut normalized = PathBuf::new();
+    for component in path.components() {
+        match component {
+            std::path::Component::CurDir => {}
+            std::path::Component::ParentDir => {
+                normalized.pop();
+            }
+            other => normalized.push(other.as_os_str()),
+        }
+    }
+    normalized
+}
+fn source_ref_matches(source_ref: &std::path::Path, expected: &std::path::Path) -> bool {
+    match (
+        std::fs::canonicalize(source_ref),
+        std::fs::canonicalize(expected),
+    ) {
+        (Ok(source_canonical), Ok(expected_canonical)) => source_canonical == expected_canonical,
+        _ => normalized_path_for_match(source_ref) == normalized_path_for_match(expected),
+    }

Conversation

elasticdotventures commented Apr 17, 2026

Summary

Validation

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants