Skip to content

OCR tool with adapter support#15

Merged
nehabagdia merged 5 commits into
mainfrom
feature/OCR-Tool
Feb 29, 2024
Merged

OCR tool with adapter support#15
nehabagdia merged 5 commits into
mainfrom
feature/OCR-Tool

Conversation

@vishnuszipstack
Copy link
Copy Markdown
Contributor

@vishnuszipstack vishnuszipstack commented Feb 29, 2024

What

  • OCR tool with adapter support

TODO

  • Create docker image and publish
  • Raise PR against unstract-cloud-platform for supporting this tool

Why

...

How

...

Relevant Docs

Related Issues or PRs

Dependencies Versions / Env Variables

Notes on Testing

...

Screenshots

Screenshot_20240228_193726

Checklist

I have read and understood the Contribution Guidelines.

Comment thread tools/ocr/requirements.txt Outdated
Comment thread tools/ocr/src/main.py Outdated
Copy link
Copy Markdown
Contributor

@chandrasekharan-zipstack chandrasekharan-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left some nit comments

Comment thread tools/ocr/README.md
Comment thread tools/ocr/requirements.txt Outdated
Comment thread tools/ocr/src/config/properties.json
Comment thread unstract/tool-registry/src/unstract/tool_registry/public_tools.json Outdated
vishnuszipstack and others added 2 commits February 29, 2024 13:23
sdk version update in requirements
ocr_adapter.process() used to write output
readme updated
tools json typo fix
Copy link
Copy Markdown
Contributor

@athul-rs athul-rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Copy Markdown
Contributor

@gaya3-zipstack gaya3-zipstack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@nehabagdia nehabagdia merged commit 0071dd8 into main Feb 29, 2024
@nehabagdia nehabagdia deleted the feature/OCR-Tool branch February 29, 2024 08:20
pk-zipstack pushed a commit that referenced this pull request Aug 20, 2025
muhammad-ali-e added a commit that referenced this pull request Jun 1, 2026
Critical:
* run-worker.sh:300 pgrep regression — anchored matcher required a
  trailing `-` after `-worker`, but default hostname is
  `${type}-worker@%h` (no dash unless WORKER_INSTANCE_ID set). Status,
  restart, and kill_one_worker silently reported STOPPED for any worker
  not setting that env var. Pattern now matches `(@|-)`.
* test_queue_backend_seam.py — replaced `hasattr(t, "apply")` tautology
  with real Celery registration assertions (force PromiseProxy
  resolution, verify `current_app.tasks` membership, round-trip via
  `.apply().get()`).
* test_queue_backend_seam.py — added explicit decorator-kwarg
  passthrough test pinning `name`, `bind`, `autoretry_for`,
  `max_retries`, `default_retry_delay`. Guards against a refactor like
  `return shared_task(*args)` silently stripping every retry policy.

Important:
* run-worker.sh:423 kill_workers — bulk path used loose
  `pgrep -f "uv run celery.*worker"` which would kill unrelated celery
  procs. Iterate canonical dir list and delegate to kill_one_worker so
  the anchored matcher is the single source of truth.
* run-worker.sh:414 silent EPERM — dropped `2>/dev/null || true` from
  kill invocations; re-check PIDs after sleep; warn on survivors.
* test_dispatch_sites_characterisation.py — replaced regex-based
  inventory canary with AST walker. Now catches aliased imports
  (`from celery import current_app as app`), locally-constructed apps
  (`Celery(...).send_task(...)`), and `apply_async`. Excludes
  `plugins/manual_review` (pre-existing `.apply_async` baggage, out of
  this PR's scope).
* test_dispatch_sites_characterisation.py — added
  `test_substrate_failure_surfaces_as_false` patching at the substrate
  boundary (`queue_backend.dispatch.current_app.send_task`) with
  `ConnectionError`. Complements the helper-side test so the full
  helper → seam → broker failure path is exercised without either side
  being trivially mocked.

Suggestions:
* dispatch.py — dropped redundant `args is not None else []` coercion;
  Celery normalises `None` internally. Avoids subtle list-vs-tuple
  default behaviour change for third-party routers checking
  `isinstance(args, tuple)`.
* dispatch.py — return type Any → `DispatchHandle` Protocol exposing
  `.id: str` (already in use at scheduler/tasks.py). Documents the
  real invariant the PG Queue handle will also need. Relaxed
  `list[Any]` → `Sequence[Any]`, `dict[str, Any]` → `Mapping[str, Any]`.
* Removed `PR #15` placeholder + env var name pinning
  (`WORKER_PG_QUEUE_ENABLED_TASKS`) from docstrings — describe
  behaviour, not unstable identifiers.
* test_queue_backend_seam.py:11 — replaced stale `scheduler/tasks.py:157`
  line ref with symbol-based citation
  (`_execute_scheduled_workflow` / `send_notification_to_worker`).

SonarCloud (S1192):
* run-worker.sh — defined `readonly IDE_CALLBACK_WORKER_TYPE` constant
  and replaced the 8 literal occurrences, mirroring the existing
  `EXECUTOR_WORKER_TYPE` pattern.

Test count: 28 → 31 (seam 15→17, characterisation 13→14).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
chandrasekharan-zipstack pushed a commit that referenced this pull request Jun 1, 2026
…ites (#2001)

* UN-3497 [FEAT] Introduce queue_backend seam and migrate Celery call sites

Wrap @shared_task and current_app.send_task behind a single
workers/queue_backend/ module so a later phase can route specific tasks
through PG Queue without touching call sites. Pure passthrough today —
behavioural no-op over Celery, locked in by 28 equivalence and
characterisation tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3497 [FIX] Address PR review feedback on queue_backend seam

Critical:
* run-worker.sh:300 pgrep regression — anchored matcher required a
  trailing `-` after `-worker`, but default hostname is
  `${type}-worker@%h` (no dash unless WORKER_INSTANCE_ID set). Status,
  restart, and kill_one_worker silently reported STOPPED for any worker
  not setting that env var. Pattern now matches `(@|-)`.
* test_queue_backend_seam.py — replaced `hasattr(t, "apply")` tautology
  with real Celery registration assertions (force PromiseProxy
  resolution, verify `current_app.tasks` membership, round-trip via
  `.apply().get()`).
* test_queue_backend_seam.py — added explicit decorator-kwarg
  passthrough test pinning `name`, `bind`, `autoretry_for`,
  `max_retries`, `default_retry_delay`. Guards against a refactor like
  `return shared_task(*args)` silently stripping every retry policy.

Important:
* run-worker.sh:423 kill_workers — bulk path used loose
  `pgrep -f "uv run celery.*worker"` which would kill unrelated celery
  procs. Iterate canonical dir list and delegate to kill_one_worker so
  the anchored matcher is the single source of truth.
* run-worker.sh:414 silent EPERM — dropped `2>/dev/null || true` from
  kill invocations; re-check PIDs after sleep; warn on survivors.
* test_dispatch_sites_characterisation.py — replaced regex-based
  inventory canary with AST walker. Now catches aliased imports
  (`from celery import current_app as app`), locally-constructed apps
  (`Celery(...).send_task(...)`), and `apply_async`. Excludes
  `plugins/manual_review` (pre-existing `.apply_async` baggage, out of
  this PR's scope).
* test_dispatch_sites_characterisation.py — added
  `test_substrate_failure_surfaces_as_false` patching at the substrate
  boundary (`queue_backend.dispatch.current_app.send_task`) with
  `ConnectionError`. Complements the helper-side test so the full
  helper → seam → broker failure path is exercised without either side
  being trivially mocked.

Suggestions:
* dispatch.py — dropped redundant `args is not None else []` coercion;
  Celery normalises `None` internally. Avoids subtle list-vs-tuple
  default behaviour change for third-party routers checking
  `isinstance(args, tuple)`.
* dispatch.py — return type Any → `DispatchHandle` Protocol exposing
  `.id: str` (already in use at scheduler/tasks.py). Documents the
  real invariant the PG Queue handle will also need. Relaxed
  `list[Any]` → `Sequence[Any]`, `dict[str, Any]` → `Mapping[str, Any]`.
* Removed `PR #15` placeholder + env var name pinning
  (`WORKER_PG_QUEUE_ENABLED_TASKS`) from docstrings — describe
  behaviour, not unstable identifiers.
* test_queue_backend_seam.py:11 — replaced stale `scheduler/tasks.py:157`
  line ref with symbol-based citation
  (`_execute_scheduled_workflow` / `send_notification_to_worker`).

SonarCloud (S1192):
* run-worker.sh — defined `readonly IDE_CALLBACK_WORKER_TYPE` constant
  and replaced the 8 literal occurrences, mirroring the existing
  `EXECUTOR_WORKER_TYPE` pattern.

Test count: 28 → 31 (seam 15→17, characterisation 13→14).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3497 [FIX] Extract get_worker_pids_oneline helper

Address SonarCloud S1192: ``get_worker_pids ... | tr '\n' ' ' | sed 's/ $//'``
was repeated at 4 call sites in run-worker.sh. Wrap the formatting pipeline
in a single helper so the literal lives in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
muhammad-ali-e added a commit that referenced this pull request Jun 2, 2026
…irness/dispatch comments

Per @chandrasekharan-zipstack's NITs on PR #2003: code comments that
name specific roadmap stages (``Phase 8 (PG Queue Gate)``, ``PR #15``)
or capture migration history (``earlier iteration of this module put
the key in kwargs``) go stale fast and should describe what the code
does, not when in the plan it lives.

Changes:
* fairness.py
  - ``Tier`` Literal comment now explains what tiers actually do
    (cross-tenant resource-allocation tag for preemption / capacity
    decisions), addressing Chandra's clarifying question on line 33.
  - MIN/MAX_PRIORITY comment dropped ``Phase 8 then has to special-case``
    rationale; the bound itself is the contract.
  - FAIRNESS_HEADER_NAME comment dropped the kwargs-iteration history.
  - Module docstring + ``FairnessKey``/``system()`` docstrings:
    ``Phase 8`` / ``PG Queue Gate`` / ``the future PG Queue scheduler``
    -> ``a future dispatch scheduler`` / ``the scheduler``.
* dispatch.py
  - Module docstring + DispatchHandle Protocol + ``dispatch()``
    docstring: same genericisation.

No behaviour change. 53/53 tests still pass in 5.51s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
muhammad-ali-e added a commit that referenced this pull request Jun 2, 2026
…w-execution dispatches; trim comments

Three corrections rolled together, all triggered by review feedback:

WorkloadType is now a StrEnum (one source of truth)

Replaced the Literal["api", "non_api"] form with class
WorkloadType(StrEnum). Producers import WorkloadType.API /
NON_API instead of scattering string literals across call sites.
StrEnum members are str subclasses so JSON serialisation is free;
to_dict() still calls .value explicitly so the wire payload is a
plain string (downstream consumers don't need to import the enum
to compare).

Fairness applies only to workflow-execution dispatches

Earlier confusion treated workload_type as a worker-type tag.
Corrected: it's the type of the WORKFLOW being executed
(API-deployment vs ETL/Task/App). Notification dispatch
(send_webhook_notification on the notifications queue) is a
worker-internal task, not a workflow execution — it now passes
fairness=None explicitly. The scheduler dispatch (async_execute_bin)
does start a workflow execution and keeps
FairnessKey(..., workload_type=WorkloadType.NON_API). The inventory
canary still requires fairness= to be present on every dispatch (any
value, including None) — the point is the conscious choice at the
call site, not the value.

New test test_explicit_fairness_none_no_header_sent proves
fairness=None produces the same on-wire shape as omitting the arg,
without forcing producers to invent a workflow_type for tasks that
aren't workflow executions.

Comment hygiene

Trimmed docstrings and inline comments to keep only what's necessary
(external citations, surprising design choices, non-obvious
invariants). Removed:

* Plan-stage references (Phase 8, PR #15).
* Private-repo path (Zipstack/labs:labs-ali/...) from the public
  OSS codebase — same rule as PR descriptions.
* Migration history paragraphs that belong in commit messages, not
  source.
* Test docstrings that paraphrased the test name.
* Inline comments that restated what the code obviously does.

Net: ~267 lines deleted, ~114 added across 7 files (~50% reduction
in the seam module's documentation footprint).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
johnyrahul pushed a commit that referenced this pull request Jun 2, 2026
* UN-3501 [FEAT] Plumb fairness key on every dispatch() call site

Phase 5.1 of the PG Queue rollout (epic UN-3445). Adds multi-tenant
routing metadata to every dispatch() — org_id, pipeline_priority, tier —
emitted under kwargs["_fairness_key"]. No consumer reads it yet;
Phase 8 (PG Queue Gate) will introduce the reader.

What

* New workers/queue_backend/fairness.py: FairnessKey frozen dataclass +
  FairnessKey.system() / FairnessKey.for_org(...) constructors +
  FAIRNESS_KWARG_NAME constant. to_dict() produces a JSON-safe nested
  dict that round-trips through Celery's serializer.
* queue_backend.dispatch() accepts optional fairness= kwarg; when
  provided, merges into a *copy* of the outgoing kwargs (no caller
  mutation).
* Two existing production dispatch sites pass fairness:
  - shared/patterns/notification/helper.py::send_notification_to_worker
    derives org_id from payload.organization_id.
  - scheduler/tasks.py::_execute_scheduled_workflow derives org_id
    from context.organization_id.
* New tests/test_fairness_key.py (14 tests):
  - FairnessKey value-object semantics (frozen, JSON-safe, defaults).
  - dispatch() integration (omit ≡ no field added, provided ≡ slotted
    under _fairness_key, caller kwargs not mutated).
  - AST-based inventory canary: every production dispatch(...) must
    pass fairness=. Restricted to bare-name callees so
    ExecutionDispatcher.dispatch(...) (different concept) isn't
    audited.
  - Additive-only canary: no consumer in workers/ references
    _fairness_key or FAIRNESS_KWARG_NAME. Phase 8 lifts this.

Why

PG Queue's eventual fairness scheduler needs structured routing data
to dispatch on. Producer-side plumbing now means Phase 8 can land the
reader without ever having to backfill payloads at runtime.

No regression risk

* Additive only — kwargs gain one extra entry under an underscored
  slot. No business kwargs change shape.
* No worker code reads it (proved by the additive-only canary).
* Default-on is identical to default-off — no flag needed.

Test count: 31 → 45 (new fairness suite 14; __all__ update from 2 to 3
exports in the seam suite).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Carry fairness key in Celery message headers, not kwargs

Bug: producer-side dispatch() merged the fairness key into the task
body's kwargs as ``_fairness_key``. Celery then passed it to the task
function as a keyword argument, and any task whose signature does NOT
accept ``**kwargs`` blew up at consumption time with:

  TypeError: send_webhook_notification() got an unexpected keyword
  argument '_fairness_key'

Observed on send_webhook_notification (fixed signature). async_execute_bin
tolerated it silently via its ``**kwargs`` catch-all — semantically wrong
either way: fairness is routing metadata, not business payload.

Fix: route fairness via Celery's message-header slot
(``headers={"x-fairness-key": ...}``). Headers travel with the AMQP
message and reach consumers via ``self.request.headers`` (bind=True),
but are NEVER passed to the task body's function signature. Phase 8's
reader will look at ``self.request.headers`` regardless, so this is the
right slot from the start.

Changes:
* queue_backend/dispatch.py — pass fairness as ``headers={...}``
  instead of merging into ``kwargs``. Caller kwargs are now strictly
  untouched.
* queue_backend/fairness.py — rename ``FAIRNESS_KWARG_NAME`` to
  ``FAIRNESS_HEADER_NAME`` ("x-fairness-key", HTTP-style spelling
  matching AMQP convention). Module docstring updated.
* tests/test_fairness_key.py — assertions look at
  ``send_task.call_args.kwargs["headers"]`` instead of kwargs; new
  positive assertion that business kwargs are unchanged when fairness
  is present; "no consumer reads it yet" canary updated to grep for
  the new tokens.

Test count unchanged: 14 fairness tests + 17 seam + 14 characterisation
= 45 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Address PR review feedback on fairness key

Comment Analyzer — stale docs (2 places):
* fairness.py module docstring: ``_fairness_key today`` was dead
  vocabulary post header-migration; replaced with ``x-fairness-key``
  and pinned the canary class.
* to_dict() docstring no longer references kwargs.

Type Design Analyzer:
* Bound pipeline_priority to [0, 100] via __post_init__ — out-of-range
  values raise ValueError at construction. New MIN_PRIORITY /
  MAX_PRIORITY constants.
* Tightened tier from str to Literal["standard", "enterprise",
  "system"]. New SYSTEM_TIER constant.
* FairnessKey.system() now returns cls(org_id=None, tier="system") —
  encodes the partition in the message shape rather than leaving it
  implicit via org_id is None. Phase 8 scheduler matches on a single
  closed-set field.

Silent Failure Hunter (HIGH):
* for_org replaced **overrides catch-all with explicit keyword-only
  params (pipeline_priority, tier). A typo like priority=80 or
  tiers="enterprise" now raises TypeError at the call site instead
  of silently dropping the override.

Code Simplifier (minor):
* dispatch.py: ternary instead of mutable headers rebinding.

PR Test Analyzer:
* test_is_frozen now uses pytest.raises(FrozenInstanceError) — catches
  the actual exception type, not "any Exception with 'frozen' in the
  message".
* New tests: priority range rejection (below/above/boundaries),
  for_org typo rejection, system-key tier encoding.
* Inventory canary hardened: new test_dispatch_must_be_imported_unaliased
  walks ImportFrom nodes and forbids ``from queue_backend import
  dispatch as <alias>``. Without it the bare-name AST visitor would
  miss alias-imported call sites — exactly the failure mode the canary
  exists to prevent.
* New end-to-end test_header_present_on_outbound_message constructs a
  real Celery app on a memory broker, wraps send_task, and asserts the
  fairness header is attached in the documented shape. Catches the
  rare-but-expensive case where a future Celery/kombu upgrade silently
  drops unknown headers.

Test count: 14 -> 21 fairness tests (45 -> 52 total).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Lower cognitive complexity in fairness canary tests

SonarCloud (python:S3776) flagged
test_dispatch_must_be_imported_unaliased at cognitive complexity 22
(threshold 15). The companion test_every_production_dispatch_passes_fairness
sat just under the threshold but carried the same nesting shape.

Refactor: extract three module-level helpers so both tests collapse to
flat list comprehensions.

* _iter_production_trees() walks workers/, skips
  tests/__pycache__/htmlcov/.venv/queue_backend, parses each file,
  swallows SyntaxError, yields (rel_path, tree).
* _aliased_dispatch_imports(tree) returns (lineno, alias) for every
  ``from queue_backend import dispatch as <name>``.
* _dispatch_calls_missing_fairness(tree) returns linenos of bare
  dispatch(...) calls without ``fairness=``.

Each test body is now ~6 lines, no nested visitor class, no
for/if/try/for/if ladder. Same coverage — both tests still pass and
still catch the same offences. The helpers are private (underscore
prefix) and live in the test module since no other module needs them.

Test count unchanged: 52 passed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Address Greptile + CodeRabbit findings on fairness key

Greptile #1 (P2) — for_org silently accepts org_id=None:
* fairness.py: tightened ``for_org`` signature to ``org_id: str`` (was
  ``str | None``) and added an explicit ``ValueError`` guard. Passing
  None would have produced ``FairnessKey(org_id=None, tier="standard")``
  — inconsistent with the module's own promise that no-tenant tasks
  use ``FairnessKey.system()`` (``tier="system"``). Phase 8's scheduler
  no longer has to special-case a None org in the "standard" partition.
* New test ``test_for_org_rejects_none_org_id`` locks the rejection.

CodeRabbit (Major) — call site can hit the None path:
* shared/patterns/notification/helper.py: ``payload.organization_id``
  is ``str | None`` (NotificationPayload default). Callback paths build
  payloads without org context. With Greptile #1's tightening,
  ``for_org(None)`` raises, so the call site now picks the right
  constructor: ``FairnessKey.for_org(org_id) if org_id else
  FairnessKey.system()``.
* Scheduler site unchanged — ``ScheduledPipelineContext.organization_id``
  is non-Optional and validated in __post_init__, so it cannot be None
  at the call site.

Greptile #2 (P2) — duplicated skip_top_dirs:
* test_fairness_key.py: ``test_no_consumer_reads_fairness_header`` used
  a local ``skip_top_dirs`` copy of the module-level
  ``_SKIP_TOP_DIRS`` frozenset. If a new dir is added to one set the
  two canaries would scope differently. Switched to ``_SKIP_TOP_DIRS``
  (and ``_WORKERS_ROOT``) so both canaries share one source of truth.

Test count: 52 -> 53.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [DOCS] Trim plan-stage references and verbose history from fairness/dispatch comments

Per @chandrasekharan-zipstack's NITs on PR #2003: code comments that
name specific roadmap stages (``Phase 8 (PG Queue Gate)``, ``PR #15``)
or capture migration history (``earlier iteration of this module put
the key in kwargs``) go stale fast and should describe what the code
does, not when in the plan it lives.

Changes:
* fairness.py
  - ``Tier`` Literal comment now explains what tiers actually do
    (cross-tenant resource-allocation tag for preemption / capacity
    decisions), addressing Chandra's clarifying question on line 33.
  - MIN/MAX_PRIORITY comment dropped ``Phase 8 then has to special-case``
    rationale; the bound itself is the contract.
  - FAIRNESS_HEADER_NAME comment dropped the kwargs-iteration history.
  - Module docstring + ``FairnessKey``/``system()`` docstrings:
    ``Phase 8`` / ``PG Queue Gate`` / ``the future PG Queue scheduler``
    -> ``a future dispatch scheduler`` / ``the scheduler``.
* dispatch.py
  - Module docstring + DispatchHandle Protocol + ``dispatch()``
    docstring: same genericisation.

No behaviour change. 53/53 tests still pass in 5.51s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Replace invented tier vocabulary with labs-grounded workload_type

Following @chandrasekharan-zipstack's question on PR #2003: traced the
``tier`` concept back to the labs design at
``Zipstack/labs:labs-ali/workflow-execution-architecture/docs/pg-queue-implementation-guide.md``
and found three real discrepancies in my implementation:

1. **Tier is server-side, not on the wire.** Per labs:
   ``CREATE TABLE org_config (org_id UUID PRIMARY KEY, burst_max INTEGER,
   tier_priority INTEGER NOT NULL DEFAULT 5  -- 1=highest priority)``.
   The scheduler JOINs ``staging_queue`` against ``org_config`` to find
   tier; it isn't carried on the task payload.

2. **The third payload field is ``workload_type``, not ``tier``.** Labs
   ORDER BY:
     L1: ``oc.tier_priority ASC``         — from org_config JOIN
     L2: ``(sq.workload_type = 'api')::int DESC`` — from payload
     L3: ``sq.pipeline_priority DESC``    — from payload
   So the payload triple is ``(org_id, workload_type, pipeline_priority)``,
   not ``(org_id, pipeline_priority, tier)``. My memory's
   ``tier``-named summary fused the L1 dimension's name with the L2
   payload field — they're different things.

3. **Priority range is 1..10, not 0..100.** Labs schema:
   ``pipeline_priority INTEGER DEFAULT 5,  -- 1-10``. My bounds were
   off by an order of magnitude.

Changes:
* fairness.py
  - ``Tier = Literal["standard", "enterprise", "system"]`` ->
    ``WorkloadType = Literal["api", "etl"]`` (matches labs SQL exactly).
  - ``tier`` field on ``FairnessKey`` -> ``workload_type``.
  - ``MIN_PRIORITY/MAX_PRIORITY`` 0/100 -> 1/10; ``DEFAULT_PRIORITY``
    50 -> 5.
  - Dropped ``DEFAULT_TIER``, ``SYSTEM_TIER``, the ``Tier`` alias.
  - Dropped ``FairnessKey.system()`` and ``FairnessKey.for_org()``
    helpers — direct ``FairnessKey(org_id=..., workload_type=...)``
    construction is small, typo-safe (dataclass kwargs are checked),
    and removes guess-y semantics about what tier "system" tasks
    should carry.
  - Module docstring now cites the labs source file inline so the
    architectural source of truth is one click away.
* shared/patterns/notification/helper.py — webhook delivery is
  customer-facing API traffic, so ``workload_type="api"``. ``org_id``
  passed as-is (can be None on callback paths).
* scheduler/tasks.py — scheduled pipelines fire ETL-style batch
  executions, so ``workload_type="etl"``. ``context.organization_id``
  is non-Optional and validated upstream.
* tests/test_fairness_key.py — restructured. Direct construction
  throughout (no for_org / system helpers to test). New tests pin
  workload_type semantics, priority bounds at 1..10, and typo-rejection
  via the dataclass.

Test count: 53 -> 51 (lost 4 helper-method tests, gained 2
workload_type tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [REFACTOR] WorkloadType -> StrEnum; scope fairness to workflow-execution dispatches; trim comments

Three corrections rolled together, all triggered by review feedback:

WorkloadType is now a StrEnum (one source of truth)

Replaced the Literal["api", "non_api"] form with class
WorkloadType(StrEnum). Producers import WorkloadType.API /
NON_API instead of scattering string literals across call sites.
StrEnum members are str subclasses so JSON serialisation is free;
to_dict() still calls .value explicitly so the wire payload is a
plain string (downstream consumers don't need to import the enum
to compare).

Fairness applies only to workflow-execution dispatches

Earlier confusion treated workload_type as a worker-type tag.
Corrected: it's the type of the WORKFLOW being executed
(API-deployment vs ETL/Task/App). Notification dispatch
(send_webhook_notification on the notifications queue) is a
worker-internal task, not a workflow execution — it now passes
fairness=None explicitly. The scheduler dispatch (async_execute_bin)
does start a workflow execution and keeps
FairnessKey(..., workload_type=WorkloadType.NON_API). The inventory
canary still requires fairness= to be present on every dispatch (any
value, including None) — the point is the conscious choice at the
call site, not the value.

New test test_explicit_fairness_none_no_header_sent proves
fairness=None produces the same on-wire shape as omitting the arg,
without forcing producers to invent a workflow_type for tasks that
aren't workflow executions.

Comment hygiene

Trimmed docstrings and inline comments to keep only what's necessary
(external citations, surprising design choices, non-obvious
invariants). Removed:

* Plan-stage references (Phase 8, PR #15).
* Private-repo path (Zipstack/labs:labs-ali/...) from the public
  OSS codebase — same rule as PR descriptions.
* Migration history paragraphs that belong in commit messages, not
  source.
* Test docstrings that paraphrased the test name.
* Inline comments that restated what the code obviously does.

Net: ~267 lines deleted, ~114 added across 7 files (~50% reduction
in the seam module's documentation footprint).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* UN-3501 [FIX] Canary error message pointed at deleted FairnessKey.system()

Greptile P1: the failure message for the inventory canary told future
developers to call ``FairnessKey.system()`` — a classmethod removed in
the prior commit when helper constructors were dropped in favour of
direct dataclass construction. A developer who triggered the canary
and followed the message verbatim would have hit
``AttributeError: type object 'FairnessKey' has no attribute 'system'``,
defeating the canary's self-service purpose.

Updated the message to point at the two real options the canary
actually accepts:

* ``fairness=FairnessKey(org_id=..., workload_type=WorkloadType...)``
  for a workflow-execution dispatch.
* ``fairness=None`` for a worker-internal task that doesn't start a
  workflow execution (notifications, callbacks, healthchecks).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants