OCR tool with adapter support by vishnuszipstack · Pull Request #15 · Zipstack/unstract

vishnuszipstack · 2024-02-29T04:23:13Z

What

OCR tool with adapter support

TODO

Create docker image and publish
Raise PR against unstract-cloud-platform for supporting this tool

Why

...

How

...

Relevant Docs

Related Issues or PRs

Dependencies Versions / Env Variables

Notes on Testing

...

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

chandrasekharan-zipstack

LGTM, left some nit comments

sdk version update in requirements ocr_adapter.process() used to write output readme updated tools json typo fix

athul-rs

LGTM.

gaya3-zipstack

Looks good.

OCR tool with adapter support

Critical: * run-worker.sh:300 pgrep regression — anchored matcher required a trailing `-` after `-worker`, but default hostname is `${type}-worker@%h` (no dash unless WORKER_INSTANCE_ID set). Status, restart, and kill_one_worker silently reported STOPPED for any worker not setting that env var. Pattern now matches `(@|-)`. * test_queue_backend_seam.py — replaced `hasattr(t, "apply")` tautology with real Celery registration assertions (force PromiseProxy resolution, verify `current_app.tasks` membership, round-trip via `.apply().get()`). * test_queue_backend_seam.py — added explicit decorator-kwarg passthrough test pinning `name`, `bind`, `autoretry_for`, `max_retries`, `default_retry_delay`. Guards against a refactor like `return shared_task(*args)` silently stripping every retry policy. Important: * run-worker.sh:423 kill_workers — bulk path used loose `pgrep -f "uv run celery.*worker"` which would kill unrelated celery procs. Iterate canonical dir list and delegate to kill_one_worker so the anchored matcher is the single source of truth. * run-worker.sh:414 silent EPERM — dropped `2>/dev/null || true` from kill invocations; re-check PIDs after sleep; warn on survivors. * test_dispatch_sites_characterisation.py — replaced regex-based inventory canary with AST walker. Now catches aliased imports (`from celery import current_app as app`), locally-constructed apps (`Celery(...).send_task(...)`), and `apply_async`. Excludes `plugins/manual_review` (pre-existing `.apply_async` baggage, out of this PR's scope). * test_dispatch_sites_characterisation.py — added `test_substrate_failure_surfaces_as_false` patching at the substrate boundary (`queue_backend.dispatch.current_app.send_task`) with `ConnectionError`. Complements the helper-side test so the full helper → seam → broker failure path is exercised without either side being trivially mocked. Suggestions: * dispatch.py — dropped redundant `args is not None else []` coercion; Celery normalises `None` internally. Avoids subtle list-vs-tuple default behaviour change for third-party routers checking `isinstance(args, tuple)`. * dispatch.py — return type Any → `DispatchHandle` Protocol exposing `.id: str` (already in use at scheduler/tasks.py). Documents the real invariant the PG Queue handle will also need. Relaxed `list[Any]` → `Sequence[Any]`, `dict[str, Any]` → `Mapping[str, Any]`. * Removed `PR #15` placeholder + env var name pinning (`WORKER_PG_QUEUE_ENABLED_TASKS`) from docstrings — describe behaviour, not unstable identifiers. * test_queue_backend_seam.py:11 — replaced stale `scheduler/tasks.py:157` line ref with symbol-based citation (`_execute_scheduled_workflow` / `send_notification_to_worker`). SonarCloud (S1192): * run-worker.sh — defined `readonly IDE_CALLBACK_WORKER_TYPE` constant and replaced the 8 literal occurrences, mirroring the existing `EXECUTOR_WORKER_TYPE` pattern. Test count: 28 → 31 (seam 15→17, characterisation 13→14). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ites (#2001) * UN-3497 [FEAT] Introduce queue_backend seam and migrate Celery call sites Wrap @shared_task and current_app.send_task behind a single workers/queue_backend/ module so a later phase can route specific tasks through PG Queue without touching call sites. Pure passthrough today — behavioural no-op over Celery, locked in by 28 equivalence and characterisation tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3497 [FIX] Address PR review feedback on queue_backend seam Critical: * run-worker.sh:300 pgrep regression — anchored matcher required a trailing `-` after `-worker`, but default hostname is `${type}-worker@%h` (no dash unless WORKER_INSTANCE_ID set). Status, restart, and kill_one_worker silently reported STOPPED for any worker not setting that env var. Pattern now matches `(@|-)`. * test_queue_backend_seam.py — replaced `hasattr(t, "apply")` tautology with real Celery registration assertions (force PromiseProxy resolution, verify `current_app.tasks` membership, round-trip via `.apply().get()`). * test_queue_backend_seam.py — added explicit decorator-kwarg passthrough test pinning `name`, `bind`, `autoretry_for`, `max_retries`, `default_retry_delay`. Guards against a refactor like `return shared_task(*args)` silently stripping every retry policy. Important: * run-worker.sh:423 kill_workers — bulk path used loose `pgrep -f "uv run celery.*worker"` which would kill unrelated celery procs. Iterate canonical dir list and delegate to kill_one_worker so the anchored matcher is the single source of truth. * run-worker.sh:414 silent EPERM — dropped `2>/dev/null || true` from kill invocations; re-check PIDs after sleep; warn on survivors. * test_dispatch_sites_characterisation.py — replaced regex-based inventory canary with AST walker. Now catches aliased imports (`from celery import current_app as app`), locally-constructed apps (`Celery(...).send_task(...)`), and `apply_async`. Excludes `plugins/manual_review` (pre-existing `.apply_async` baggage, out of this PR's scope). * test_dispatch_sites_characterisation.py — added `test_substrate_failure_surfaces_as_false` patching at the substrate boundary (`queue_backend.dispatch.current_app.send_task`) with `ConnectionError`. Complements the helper-side test so the full helper → seam → broker failure path is exercised without either side being trivially mocked. Suggestions: * dispatch.py — dropped redundant `args is not None else []` coercion; Celery normalises `None` internally. Avoids subtle list-vs-tuple default behaviour change for third-party routers checking `isinstance(args, tuple)`. * dispatch.py — return type Any → `DispatchHandle` Protocol exposing `.id: str` (already in use at scheduler/tasks.py). Documents the real invariant the PG Queue handle will also need. Relaxed `list[Any]` → `Sequence[Any]`, `dict[str, Any]` → `Mapping[str, Any]`. * Removed `PR #15` placeholder + env var name pinning (`WORKER_PG_QUEUE_ENABLED_TASKS`) from docstrings — describe behaviour, not unstable identifiers. * test_queue_backend_seam.py:11 — replaced stale `scheduler/tasks.py:157` line ref with symbol-based citation (`_execute_scheduled_workflow` / `send_notification_to_worker`). SonarCloud (S1192): * run-worker.sh — defined `readonly IDE_CALLBACK_WORKER_TYPE` constant and replaced the 8 literal occurrences, mirroring the existing `EXECUTOR_WORKER_TYPE` pattern. Test count: 28 → 31 (seam 15→17, characterisation 13→14). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3497 [FIX] Extract get_worker_pids_oneline helper Address SonarCloud S1192: ``get_worker_pids ... | tr '\n' ' ' | sed 's/ $//'`` was repeated at 4 call sites in run-worker.sh. Wrap the formatting pipeline in a single helper so the literal lives in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chandrasekharan-zipstack

…irness/dispatch comments Per @chandrasekharan-zipstack's NITs on PR #2003: code comments that name specific roadmap stages (``Phase 8 (PG Queue Gate)``, ``PR #15``) or capture migration history (``earlier iteration of this module put the key in kwargs``) go stale fast and should describe what the code does, not when in the plan it lives. Changes: * fairness.py - ``Tier`` Literal comment now explains what tiers actually do (cross-tenant resource-allocation tag for preemption / capacity decisions), addressing Chandra's clarifying question on line 33. - MIN/MAX_PRIORITY comment dropped ``Phase 8 then has to special-case`` rationale; the bound itself is the contract. - FAIRNESS_HEADER_NAME comment dropped the kwargs-iteration history. - Module docstring + ``FairnessKey``/``system()`` docstrings: ``Phase 8`` / ``PG Queue Gate`` / ``the future PG Queue scheduler`` -> ``a future dispatch scheduler`` / ``the scheduler``. * dispatch.py - Module docstring + DispatchHandle Protocol + ``dispatch()`` docstring: same genericisation. No behaviour change. 53/53 tests still pass in 5.51s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w-execution dispatches; trim comments Three corrections rolled together, all triggered by review feedback: WorkloadType is now a StrEnum (one source of truth) Replaced the Literal["api", "non_api"] form with class WorkloadType(StrEnum). Producers import WorkloadType.API / NON_API instead of scattering string literals across call sites. StrEnum members are str subclasses so JSON serialisation is free; to_dict() still calls .value explicitly so the wire payload is a plain string (downstream consumers don't need to import the enum to compare). Fairness applies only to workflow-execution dispatches Earlier confusion treated workload_type as a worker-type tag. Corrected: it's the type of the WORKFLOW being executed (API-deployment vs ETL/Task/App). Notification dispatch (send_webhook_notification on the notifications queue) is a worker-internal task, not a workflow execution — it now passes fairness=None explicitly. The scheduler dispatch (async_execute_bin) does start a workflow execution and keeps FairnessKey(..., workload_type=WorkloadType.NON_API). The inventory canary still requires fairness= to be present on every dispatch (any value, including None) — the point is the conscious choice at the call site, not the value. New test test_explicit_fairness_none_no_header_sent proves fairness=None produces the same on-wire shape as omitting the arg, without forcing producers to invent a workflow_type for tasks that aren't workflow executions. Comment hygiene Trimmed docstrings and inline comments to keep only what's necessary (external citations, surprising design choices, non-obvious invariants). Removed: * Plan-stage references (Phase 8, PR #15). * Private-repo path (Zipstack/labs:labs-ali/...) from the public OSS codebase — same rule as PR descriptions. * Migration history paragraphs that belong in commit messages, not source. * Test docstrings that paraphrased the test name. * Inline comments that restated what the code obviously does. Net: ~267 lines deleted, ~114 added across 7 files (~50% reduction in the seam module's documentation footprint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@chandrasekharan-zipstack

* UN-3501 [FEAT] Plumb fairness key on every dispatch() call site Phase 5.1 of the PG Queue rollout (epic UN-3445). Adds multi-tenant routing metadata to every dispatch() — org_id, pipeline_priority, tier — emitted under kwargs["_fairness_key"]. No consumer reads it yet; Phase 8 (PG Queue Gate) will introduce the reader. What * New workers/queue_backend/fairness.py: FairnessKey frozen dataclass + FairnessKey.system() / FairnessKey.for_org(...) constructors + FAIRNESS_KWARG_NAME constant. to_dict() produces a JSON-safe nested dict that round-trips through Celery's serializer. * queue_backend.dispatch() accepts optional fairness= kwarg; when provided, merges into a *copy* of the outgoing kwargs (no caller mutation). * Two existing production dispatch sites pass fairness: - shared/patterns/notification/helper.py::send_notification_to_worker derives org_id from payload.organization_id. - scheduler/tasks.py::_execute_scheduled_workflow derives org_id from context.organization_id. * New tests/test_fairness_key.py (14 tests): - FairnessKey value-object semantics (frozen, JSON-safe, defaults). - dispatch() integration (omit ≡ no field added, provided ≡ slotted under _fairness_key, caller kwargs not mutated). - AST-based inventory canary: every production dispatch(...) must pass fairness=. Restricted to bare-name callees so ExecutionDispatcher.dispatch(...) (different concept) isn't audited. - Additive-only canary: no consumer in workers/ references _fairness_key or FAIRNESS_KWARG_NAME. Phase 8 lifts this. Why PG Queue's eventual fairness scheduler needs structured routing data to dispatch on. Producer-side plumbing now means Phase 8 can land the reader without ever having to backfill payloads at runtime. No regression risk * Additive only — kwargs gain one extra entry under an underscored slot. No business kwargs change shape. * No worker code reads it (proved by the additive-only canary). * Default-on is identical to default-off — no flag needed. Test count: 31 → 45 (new fairness suite 14; __all__ update from 2 to 3 exports in the seam suite). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Carry fairness key in Celery message headers, not kwargs Bug: producer-side dispatch() merged the fairness key into the task body's kwargs as ``_fairness_key``. Celery then passed it to the task function as a keyword argument, and any task whose signature does NOT accept ``**kwargs`` blew up at consumption time with: TypeError: send_webhook_notification() got an unexpected keyword argument '_fairness_key' Observed on send_webhook_notification (fixed signature). async_execute_bin tolerated it silently via its ``**kwargs`` catch-all — semantically wrong either way: fairness is routing metadata, not business payload. Fix: route fairness via Celery's message-header slot (``headers={"x-fairness-key": ...}``). Headers travel with the AMQP message and reach consumers via ``self.request.headers`` (bind=True), but are NEVER passed to the task body's function signature. Phase 8's reader will look at ``self.request.headers`` regardless, so this is the right slot from the start. Changes: * queue_backend/dispatch.py — pass fairness as ``headers={...}`` instead of merging into ``kwargs``. Caller kwargs are now strictly untouched. * queue_backend/fairness.py — rename ``FAIRNESS_KWARG_NAME`` to ``FAIRNESS_HEADER_NAME`` ("x-fairness-key", HTTP-style spelling matching AMQP convention). Module docstring updated. * tests/test_fairness_key.py — assertions look at ``send_task.call_args.kwargs["headers"]`` instead of kwargs; new positive assertion that business kwargs are unchanged when fairness is present; "no consumer reads it yet" canary updated to grep for the new tokens. Test count unchanged: 14 fairness tests + 17 seam + 14 characterisation = 45 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Address PR review feedback on fairness key Comment Analyzer — stale docs (2 places): * fairness.py module docstring: ``_fairness_key today`` was dead vocabulary post header-migration; replaced with ``x-fairness-key`` and pinned the canary class. * to_dict() docstring no longer references kwargs. Type Design Analyzer: * Bound pipeline_priority to [0, 100] via __post_init__ — out-of-range values raise ValueError at construction. New MIN_PRIORITY / MAX_PRIORITY constants. * Tightened tier from str to Literal["standard", "enterprise", "system"]. New SYSTEM_TIER constant. * FairnessKey.system() now returns cls(org_id=None, tier="system") — encodes the partition in the message shape rather than leaving it implicit via org_id is None. Phase 8 scheduler matches on a single closed-set field. Silent Failure Hunter (HIGH): * for_org replaced **overrides catch-all with explicit keyword-only params (pipeline_priority, tier). A typo like priority=80 or tiers="enterprise" now raises TypeError at the call site instead of silently dropping the override. Code Simplifier (minor): * dispatch.py: ternary instead of mutable headers rebinding. PR Test Analyzer: * test_is_frozen now uses pytest.raises(FrozenInstanceError) — catches the actual exception type, not "any Exception with 'frozen' in the message". * New tests: priority range rejection (below/above/boundaries), for_org typo rejection, system-key tier encoding. * Inventory canary hardened: new test_dispatch_must_be_imported_unaliased walks ImportFrom nodes and forbids ``from queue_backend import dispatch as <alias>``. Without it the bare-name AST visitor would miss alias-imported call sites — exactly the failure mode the canary exists to prevent. * New end-to-end test_header_present_on_outbound_message constructs a real Celery app on a memory broker, wraps send_task, and asserts the fairness header is attached in the documented shape. Catches the rare-but-expensive case where a future Celery/kombu upgrade silently drops unknown headers. Test count: 14 -> 21 fairness tests (45 -> 52 total). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Lower cognitive complexity in fairness canary tests SonarCloud (python:S3776) flagged test_dispatch_must_be_imported_unaliased at cognitive complexity 22 (threshold 15). The companion test_every_production_dispatch_passes_fairness sat just under the threshold but carried the same nesting shape. Refactor: extract three module-level helpers so both tests collapse to flat list comprehensions. * _iter_production_trees() walks workers/, skips tests/__pycache__/htmlcov/.venv/queue_backend, parses each file, swallows SyntaxError, yields (rel_path, tree). * _aliased_dispatch_imports(tree) returns (lineno, alias) for every ``from queue_backend import dispatch as <name>``. * _dispatch_calls_missing_fairness(tree) returns linenos of bare dispatch(...) calls without ``fairness=``. Each test body is now ~6 lines, no nested visitor class, no for/if/try/for/if ladder. Same coverage — both tests still pass and still catch the same offences. The helpers are private (underscore prefix) and live in the test module since no other module needs them. Test count unchanged: 52 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Address Greptile + CodeRabbit findings on fairness key Greptile #1 (P2) — for_org silently accepts org_id=None: * fairness.py: tightened ``for_org`` signature to ``org_id: str`` (was ``str | None``) and added an explicit ``ValueError`` guard. Passing None would have produced ``FairnessKey(org_id=None, tier="standard")`` — inconsistent with the module's own promise that no-tenant tasks use ``FairnessKey.system()`` (``tier="system"``). Phase 8's scheduler no longer has to special-case a None org in the "standard" partition. * New test ``test_for_org_rejects_none_org_id`` locks the rejection. CodeRabbit (Major) — call site can hit the None path: * shared/patterns/notification/helper.py: ``payload.organization_id`` is ``str | None`` (NotificationPayload default). Callback paths build payloads without org context. With Greptile #1's tightening, ``for_org(None)`` raises, so the call site now picks the right constructor: ``FairnessKey.for_org(org_id) if org_id else FairnessKey.system()``. * Scheduler site unchanged — ``ScheduledPipelineContext.organization_id`` is non-Optional and validated in __post_init__, so it cannot be None at the call site. Greptile #2 (P2) — duplicated skip_top_dirs: * test_fairness_key.py: ``test_no_consumer_reads_fairness_header`` used a local ``skip_top_dirs`` copy of the module-level ``_SKIP_TOP_DIRS`` frozenset. If a new dir is added to one set the two canaries would scope differently. Switched to ``_SKIP_TOP_DIRS`` (and ``_WORKERS_ROOT``) so both canaries share one source of truth. Test count: 52 -> 53. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [DOCS] Trim plan-stage references and verbose history from fairness/dispatch comments Per @chandrasekharan-zipstack's NITs on PR #2003: code comments that name specific roadmap stages (``Phase 8 (PG Queue Gate)``, ``PR #15``) or capture migration history (``earlier iteration of this module put the key in kwargs``) go stale fast and should describe what the code does, not when in the plan it lives. Changes: * fairness.py - ``Tier`` Literal comment now explains what tiers actually do (cross-tenant resource-allocation tag for preemption / capacity decisions), addressing Chandra's clarifying question on line 33. - MIN/MAX_PRIORITY comment dropped ``Phase 8 then has to special-case`` rationale; the bound itself is the contract. - FAIRNESS_HEADER_NAME comment dropped the kwargs-iteration history. - Module docstring + ``FairnessKey``/``system()`` docstrings: ``Phase 8`` / ``PG Queue Gate`` / ``the future PG Queue scheduler`` -> ``a future dispatch scheduler`` / ``the scheduler``. * dispatch.py - Module docstring + DispatchHandle Protocol + ``dispatch()`` docstring: same genericisation. No behaviour change. 53/53 tests still pass in 5.51s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Replace invented tier vocabulary with labs-grounded workload_type Following @chandrasekharan-zipstack's question on PR #2003: traced the ``tier`` concept back to the labs design at ``Zipstack/labs:labs-ali/workflow-execution-architecture/docs/pg-queue-implementation-guide.md`` and found three real discrepancies in my implementation: 1. **Tier is server-side, not on the wire.** Per labs: ``CREATE TABLE org_config (org_id UUID PRIMARY KEY, burst_max INTEGER, tier_priority INTEGER NOT NULL DEFAULT 5 -- 1=highest priority)``. The scheduler JOINs ``staging_queue`` against ``org_config`` to find tier; it isn't carried on the task payload. 2. **The third payload field is ``workload_type``, not ``tier``.** Labs ORDER BY: L1: ``oc.tier_priority ASC`` — from org_config JOIN L2: ``(sq.workload_type = 'api')::int DESC`` — from payload L3: ``sq.pipeline_priority DESC`` — from payload So the payload triple is ``(org_id, workload_type, pipeline_priority)``, not ``(org_id, pipeline_priority, tier)``. My memory's ``tier``-named summary fused the L1 dimension's name with the L2 payload field — they're different things. 3. **Priority range is 1..10, not 0..100.** Labs schema: ``pipeline_priority INTEGER DEFAULT 5, -- 1-10``. My bounds were off by an order of magnitude. Changes: * fairness.py - ``Tier = Literal["standard", "enterprise", "system"]`` -> ``WorkloadType = Literal["api", "etl"]`` (matches labs SQL exactly). - ``tier`` field on ``FairnessKey`` -> ``workload_type``. - ``MIN_PRIORITY/MAX_PRIORITY`` 0/100 -> 1/10; ``DEFAULT_PRIORITY`` 50 -> 5. - Dropped ``DEFAULT_TIER``, ``SYSTEM_TIER``, the ``Tier`` alias. - Dropped ``FairnessKey.system()`` and ``FairnessKey.for_org()`` helpers — direct ``FairnessKey(org_id=..., workload_type=...)`` construction is small, typo-safe (dataclass kwargs are checked), and removes guess-y semantics about what tier "system" tasks should carry. - Module docstring now cites the labs source file inline so the architectural source of truth is one click away. * shared/patterns/notification/helper.py — webhook delivery is customer-facing API traffic, so ``workload_type="api"``. ``org_id`` passed as-is (can be None on callback paths). * scheduler/tasks.py — scheduled pipelines fire ETL-style batch executions, so ``workload_type="etl"``. ``context.organization_id`` is non-Optional and validated upstream. * tests/test_fairness_key.py — restructured. Direct construction throughout (no for_org / system helpers to test). New tests pin workload_type semantics, priority bounds at 1..10, and typo-rejection via the dataclass. Test count: 53 -> 51 (lost 4 helper-method tests, gained 2 workload_type tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [REFACTOR] WorkloadType -> StrEnum; scope fairness to workflow-execution dispatches; trim comments Three corrections rolled together, all triggered by review feedback: WorkloadType is now a StrEnum (one source of truth) Replaced the Literal["api", "non_api"] form with class WorkloadType(StrEnum). Producers import WorkloadType.API / NON_API instead of scattering string literals across call sites. StrEnum members are str subclasses so JSON serialisation is free; to_dict() still calls .value explicitly so the wire payload is a plain string (downstream consumers don't need to import the enum to compare). Fairness applies only to workflow-execution dispatches Earlier confusion treated workload_type as a worker-type tag. Corrected: it's the type of the WORKFLOW being executed (API-deployment vs ETL/Task/App). Notification dispatch (send_webhook_notification on the notifications queue) is a worker-internal task, not a workflow execution — it now passes fairness=None explicitly. The scheduler dispatch (async_execute_bin) does start a workflow execution and keeps FairnessKey(..., workload_type=WorkloadType.NON_API). The inventory canary still requires fairness= to be present on every dispatch (any value, including None) — the point is the conscious choice at the call site, not the value. New test test_explicit_fairness_none_no_header_sent proves fairness=None produces the same on-wire shape as omitting the arg, without forcing producers to invent a workflow_type for tasks that aren't workflow executions. Comment hygiene Trimmed docstrings and inline comments to keep only what's necessary (external citations, surprising design choices, non-obvious invariants). Removed: * Plan-stage references (Phase 8, PR #15). * Private-repo path (Zipstack/labs:labs-ali/...) from the public OSS codebase — same rule as PR descriptions. * Migration history paragraphs that belong in commit messages, not source. * Test docstrings that paraphrased the test name. * Inline comments that restated what the code obviously does. Net: ~267 lines deleted, ~114 added across 7 files (~50% reduction in the seam module's documentation footprint). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * UN-3501 [FIX] Canary error message pointed at deleted FairnessKey.system() Greptile P1: the failure message for the inventory canary told future developers to call ``FairnessKey.system()`` — a classmethod removed in the prior commit when helper constructors were dropped in favour of direct dataclass construction. A developer who triggered the canary and followed the message verbatim would have hit ``AttributeError: type object 'FairnessKey' has no attribute 'system'``, defeating the canary's self-service purpose. Updated the message to point at the two real options the canary actually accepts: * ``fairness=FairnessKey(org_id=..., workload_type=WorkloadType...)`` for a workflow-execution dispatch. * ``fairness=None`` for a worker-internal task that doesn't start a workflow execution (notifications, callbacks, healthchecks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ocr tool with adapter support

f9921f8

vishnuszipstack requested review from chandrasekharan-zipstack, gaya3-zipstack, jaseemjaskp, muhammad-ali-e and nehabagdia February 29, 2024 04:23

Merge branch 'main' into feature/OCR-Tool

f71a405

gaya3-zipstack reviewed Feb 29, 2024

View reviewed changes

Comment thread tools/ocr/requirements.txt Outdated

gaya3-zipstack reviewed Feb 29, 2024

View reviewed changes

Comment thread tools/ocr/src/main.py Outdated

chandrasekharan-zipstack approved these changes Feb 29, 2024

View reviewed changes

Comment thread tools/ocr/README.md

Comment thread tools/ocr/requirements.txt Outdated

Comment thread tools/ocr/src/config/properties.json

gaya3-zipstack reviewed Feb 29, 2024

View reviewed changes

Comment thread unstract/tool-registry/src/unstract/tool_registry/public_tools.json Outdated

vishnuszipstack and others added 2 commits February 29, 2024 13:23

Merge branch 'main' into feature/OCR-Tool

cd136e6

PR comments addressed

8484bf6

sdk version update in requirements ocr_adapter.process() used to write output readme updated tools json typo fix

vishnuszipstack requested review from chandrasekharan-zipstack and gaya3-zipstack February 29, 2024 08:02

nehabagdia approved these changes Feb 29, 2024

View reviewed changes

Merge branch 'main' into feature/OCR-Tool

77adf4f

athul-rs approved these changes Feb 29, 2024

View reviewed changes

gaya3-zipstack approved these changes Feb 29, 2024

View reviewed changes

nehabagdia merged commit 0071dd8 into main Feb 29, 2024

nehabagdia deleted the feature/OCR-Tool branch February 29, 2024 08:20

pk-zipstack pushed a commit that referenced this pull request Aug 20, 2025

Merge pull request #15 from Zipstack/feature/OCR-Tool

46d0424

OCR tool with adapter support

kirtimanmishrazipstack mentioned this pull request May 28, 2026

UN-3056 [FEAT] Failure-only notifications with batched dispatch #1936

Open

muhammad-ali-e mentioned this pull request Jun 1, 2026

UN-3497 [FEAT] Introduce queue_backend seam and migrate Celery call sites #2001

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCR tool with adapter support#15

OCR tool with adapter support#15
nehabagdia merged 5 commits into
mainfrom
feature/OCR-Tool

vishnuszipstack commented Feb 29, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

chandrasekharan-zipstack left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athul-rs left a comment

Uh oh!

gaya3-zipstack left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

vishnuszipstack commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

TODO

Why

How

Relevant Docs

Related Issues or PRs

Dependencies Versions / Env Variables

Notes on Testing

Screenshots

Checklist

Uh oh!

Uh oh!

Uh oh!

chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

athul-rs left a comment

Choose a reason for hiding this comment

Uh oh!

gaya3-zipstack left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vishnuszipstack commented Feb 29, 2024 •

edited

Loading