tests: add tests for OCRDriver: [#2964]#3028
Merged
Merged
Conversation
Adds offline OCRDriver.test.ts covering both providers:
• test_mode short-circuit; argument validation (missing actor, missing
source, unknown provider, AWS/Mistral not configured)
• aws-textract: raw-bytes vs S3Object source selection (regional
client when fsEntry has a bucket), block normalisation (PAGE/WORD/
TABLE filtered, LINE/LAYOUT_TITLE → text/textract:* blocks),
402 on insufficient credits, per-page metering
• mistral: image vs PDF chunk packaging (image_url with base64 data
URL vs document_url with documentName), pass-through of pages /
annotation / image-limit options, markdown → LINE-block
normalisation with page indices, per-page metering, additional
annotations metering when bbox/document annotation formats are set
• default-provider selection (AWS preferred → Mistral fallback →
500 when neither is configured)
• getReportedCosts mirrors costs.ts
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Coverage Report
File CoverageNo changed files found. |
Drops the manual config/clients/stores/services stub apparatus and the loadFileInput mock in favour of the live wired driver from server.drivers.aiOcr. The Textract and Mistral SDKs are still mocked at the module boundary (the real network egress points); inputs go through the real loadFileInput against real fs/store wiring (data URLs for most cases; FSService.write produces a real fsEntry for the PDF documentName test). Aligns with AGENTS.md: "Prefer test server over mocking deps." The S3Object-source/regional-client assertion was dropped because it isn't deterministic against the in-memory S3 store and the driver's per-region TextractClient cache leaks across tests. That branch is better exercised by a real-cloud integration test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds offline OCRDriver.test.ts covering both providers:
• test_mode short-circuit; argument validation (missing actor, missing
source, unknown provider, AWS/Mistral not configured)
• aws-textract: raw-bytes vs S3Object source selection (regional
client when fsEntry has a bucket), block normalisation (PAGE/WORD/
TABLE filtered, LINE/LAYOUT_TITLE → text/textract:* blocks),
402 on insufficient credits, per-page metering
• mistral: image vs PDF chunk packaging (image_url with base64 data
URL vs document_url with documentName), pass-through of pages /
annotation / image-limit options, markdown → LINE-block
normalisation with page indices, per-page metering, additional
annotations metering when bbox/document annotation formats are set
• default-provider selection (AWS preferred → Mistral fallback →
500 when neither is configured)
• getReportedCosts mirrors costs.ts