fix: record errors on missing deps + safe JSON serialization (#472, #473) by axisrow · Pull Request #474 · axisrow/tg_content_factory

axisrow · 2026-04-19T11:42:46Z

Summary

bug: silent skip on missing required dependencies in pipeline handlers and dispatcher #472: Pipeline handlers and dispatcher now record explicit errors instead of silent skipping when required dependencies are missing
bug: json.dumps called on Pydantic/external objects without default= handler — latent serialization crashes #473: Added safe_json_dumps utility to handle non-serializable types (datetime, bytes, Pydantic models) and replaced 28 unsafe json.dumps calls

Changes

Issue #472 — Missing dependency handling

src/services/pipeline_nodes/handlers.py:

RetrieveContextHandler, ImageGenerateHandler, NotifyHandler, SearchQueryTriggerHandler now call context.record_error(code="missing_dependency", ...) instead of silent logger.info/warning + return

src/services/unified_dispatcher.py:

_handle_photo_due and _handle_photo_auto now return CollectionTaskStatus.FAILED instead of COMPLETED when services are not configured

Issue #473 — JSON serialization safety

New file src/utils/json.py:

safe_json_dumps() function with fallback for:
- datetime, date → .isoformat()
- bytes → .hex()
- Pydantic v2 models → .model_dump()
- Unknown types → repr()

Replaced json.dumps with safe_json_dumps in:

src/database/repositories/collection_tasks.py
src/database/repositories/telegram_commands.py
src/web/routes/pipelines.py
src/services/agent_provider_service.py
src/services/image_provider_service.py
src/agent/manager.py

Closes

bug: silent skip on missing required dependencies in pipeline handlers and dispatcher #472 — silent skip on missing required dependencies
bug: json.dumps called on Pydantic/external objects without default= handler — latent serialization crashes #473 — json.dumps without default= handler on external data

Test plan

ruff check passes
pytest tests/ passes (parallel + serial)
Manual: pipeline with missing search_engine now shows error in context
Manual: channel collect with service messages containing datetime/bytes no longer crashes

🤖 Generated with Claude Code

…rnal data - #472: Replace silent skips with explicit error recording in pipeline handlers - RetrieveContextHandler, ImageGenerateHandler, NotifyHandler, SearchQueryTriggerHandler now use context.record_error with code="missing_dependency" - Unified dispatcher photo handlers now return FAILED instead of COMPLETED when services are missing - #473: Add safe_json_dumps utility and replace unsafe json.dumps calls - New src/utils/json.py with safe_json_dumps handling datetime, bytes, Pydantic models - Replace 28 json.dumps calls across collection_tasks, telegram_commands, pipelines, agent_provider_service, image_provider_service, agent/manager Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ect FAILED Aligns tests with the behaviour change in #472: dispatcher now marks tasks as FAILED (not COMPLETED) when PhotoTaskService / PhotoAutoUploadService are absent. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

axisrow · 2026-04-19T11:54:22Z

Что ещё осталось доработать из #472

PR закрывает основную часть issue, но не все HIGH-риски, описанные в #472. Ниже — список того, что нужно сделать в этом PR или в follow-up.

1. `PublishService` — `pool.get_client_by_phone()` без None guard

src/services/publish_service.py:~87 — client_pool передаётся в конструктор как object (без опциональности), но внутри используется без проверки is None. Если кто-то передаст None (как это делал CLI до #470), получим AttributeError вместо понятной ошибки.

Fix: либо типизировать параметр как обязательный и добавить runtime-проверку в __init__, либо добавить явный if self._client_pool is None: raise ConfigurationError(...) в publish_run.

2. `UnifiedDispatcher._handle_content_publish` — unguarded `self._client_pool`

src/services/unified_dispatcher.py:~683 — pool = self._client_pool используется без проверки на None. Задача помечается COMPLETED / крашится, а не FAILED.

Fix: добавить проверку в начале handler'а, аналогично _handle_photo_due из этого PR:

if self._client_pool is None:
    await self._tasks.update_collection_task(
        task.id,
        CollectionTaskStatus.FAILED,
        error="client_pool not configured",
    )
    return

3. `ReactHandler` / `ForwardHandler` / `DeleteMessageHandler`

У них context.record_error(code="no_client_pool", ...) уже был до этого PR, так что формально «ошибка записана». Но есть нюанс:

Ошибка видна только в generation_runs.metadata.node_errors — в логах её нет (см. feat(pipeline): structured execution logging — per-node success/error audit trail #468)
result_count всё равно вычисляется как 0, и пользователь видит status=completed result_count=0 без явного сигнала о проблеме

Идея: если handler не может выполнить ни одной операции из-за missing dependency — имеет смысл пробрасывать exception, чтобы весь run помечался status=failed, а не completed. Текущее поведение маскирует misconfiguration как «успешное выполнение 0 операций».

4. Medium-risk места из #473

В issue #473 помимо HIGH были ещё места, которые нужно было ревьюнуть:

src/services/agent_provider_service.py:~168 — заменено ✅
src/services/image_provider_service.py:~122 — заменено ✅
src/agent/manager.py:271, 290, 301, 742 — заменено ✅

Всё, что было в аудите, покрыто этим PR. Но имеет смысл добавить ruff-правило или pre-commit хук, который будет флагать новые json.dumps(...) без default= в чувствительных модулях (src/database/repositories/, src/services/, src/web/routes/), чтобы не откатиться.

5. Проверка тестами нового поведения

Текущие тесты обновлены под FAILED вместо COMPLETED — это хорошо. Но стоит добавить позитивные тесты для нового контракта:

ReactHandler без client_pool → context.get_errors() содержит entry с code="no_client_pool" (уже есть?)
RetrieveContextHandler без search_engine → context.get_errors() содержит code="missing_dependency" (новое поведение этого PR)
Аналогично для ImageGenerate, Notify, SearchQueryTrigger

Резюме: PR закрывает ~80% issue #472 и 100% issue #473. Оставшиеся 20% (PublishService, _handle_content_publish, поведение react/forward/delete при client_pool=None) можно доделать здесь же, либо вынести в follow-up — на ваше усмотрение.

- PublishService._publish_to_target: early return when client_pool is None - UnifiedDispatcher._handle_content_publish: fail task when client_pool missing - React/Forward/Delete handlers: raise RuntimeError after record_error - Tests updated: pytest.raises for RuntimeError, assert missing_dependency errors Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

axisrow · 2026-04-19T13:58:11Z

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

claude · 2026-04-19T13:58:24Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

axisrow · 2026-04-19T14:01:39Z

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

claude · 2026-04-19T14:01:54Z

Claude encountered an error —— View job

I'll analyze this and get back to you.

axisrow · 2026-04-19T14:23:28Z

@claude please review this PR. Focus only on critical issues.

claude · 2026-04-19T14:23:41Z

Claude finished @axisrow's task in 3m 43s —— View job

PR Review

Read changed files
Review safe_json_dumps utility
Review missing dependency handling in handlers
Review dispatcher changes
Check for critical bugs / data loss risks

Critical Issues

1. `context.record_error()` before `raise RuntimeError()` is wasted — node_errors never persisted

src/services/pipeline_nodes/handlers.py — ReactHandler, ForwardHandler, DeleteMessageHandler

context.record_error(node_id=node_id, code="no_client_pool", ...)
logger.warning("ReactHandler[%s]: no client_pool, skipping", node_id)
raise RuntimeError("ReactHandler: client_pool not available")   # ← NEW in this PR

When RuntimeError is raised, execution unwinds through pipeline_executor.py (line 123: re-raises after logging), then through content_generation_service.py (line 183: marks run failed, re-raises). The node_errors = context.get_errors() at pipeline_executor.py:135 is never reached, so the structured error recorded by context.record_error() is never written to generation_runs.metadata.

The net result: the run has status=failed and error="ReactHandler: client_pool not available" in the task, but the metadata.node_errors that the UI reads is absent — worse than the previous behavior where node_errors was present. The context.record_error() call before the raise is a no-op and should either be removed or the executor needs to persist context.get_errors() in the exception path.

Fix this →

2. `repr()` fallback silently corrupts DB records for unknown types

src/utils/json.py:27

return repr(o)   # last-resort fallback

When an unexpected non-serializable type reaches collection_tasks.py or telegram_commands.py, the row is written with a Python repr() string like <TLMessage object at 0x7f...>. The original json.dumps would raise TypeError here (catching the bug early); the new code silently stores a value that cannot be deserialized, corrupting the record and potentially breaking task replay or recovery.

The fallback is safe for SSE/streaming paths (agent manager) where the string is just displayed. But for DB write paths it changes fail-fast into silent data corruption. Consider either raising in those call sites or adding a sentinel so corrupt payloads are detectable:

raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")

versus keeping repr() only for display paths (SSE, logging).

Fix this →

Non-Critical Observations

safe_json_dumps in hash computation (agent_provider_service.py:463,474): In practice secret_payload and payload only contain strings/simple types so repr() will never trigger — no functional risk. However, if it ever did trigger, the hash would be non-deterministic across Python versions, causing spurious cache invalidation.
Behavioral asymmetry: RetrieveContextHandler, ImageGenerateHandler, NotifyHandler, SearchQueryTriggerHandler — record error and return (pipeline continues). ReactHandler, ForwardHandler, DeleteMessageHandler — record error and raise (pipeline aborts). This is intentional per the PR but worth documenting: React/Forward/Delete on a misconfigured worker will now fail the entire run, not just those steps.
PublishService None guard (publish_service.py:84): Correct fix, PublishResult(success=False, error=...) is the right return for that method signature.
_handle_content_publish None guard (unified_dispatcher.py:677): Correct placement, properly before pool = self._client_pool is dereferenced.

Summary

Two issues warrant attention before merge:

The context.record_error() / raise RuntimeError combination in React/Forward/Delete handlers produces worse observability than intended — the structured error is silently dropped.
The repr() fallback in safe_json_dumps is appropriate for streaming display but risky for DB write paths where it could silently corrupt records.

- pipeline_executor: wrap node loop in try/finally so context.get_errors() is always captured, even when a handler raises RuntimeError - safe_json_dumps: replace repr() fallback with TypeError to prevent silent DB corruption from non-serializable objects Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

axisrow · 2026-04-19T14:38:18Z

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

claude · 2026-04-19T14:38:29Z

Claude finished @axisrow's task in 2m 42s —— View job

PR Review

Read changed files
Review safe_json_dumps utility and its callers
Review missing dependency handling in handlers
Review pipeline executor exception path
Check dispatcher changes
Check for critical bugs / data loss risks

Critical Issue

`node_errors` is silently dropped when React/Forward/Delete handlers raise

src/services/pipeline_executor.py:133-134 + src/services/pipeline_nodes/handlers.py:313-320

When ReactHandler, ForwardHandler, or DeleteMessageHandler encounters client_pool is None, the flow is:

# handlers.py — record error, then raise
context.record_error(node_id=node_id, code="no_client_pool", ...)
raise RuntimeError("ReactHandler: client_pool not available")

# pipeline_executor.py — outer try/finally
try:
    for node in ordered:
        try:
            await handler.execute(...)   # ← raises here
        except Exception:
            logger.exception(...)
            raise
        finally:
            services["_current_node_id"] = prev   # ← restores node id
finally:
    node_errors = context.get_errors()   # ← runs, but only a local assignment!

# ← exception has already propagated; this return is never reached:
return {..., "node_errors": node_errors, ...}

The outer finally does capture the errors into the local node_errors variable, but since the exception propagates past the return statement, the errors are never included in any return value. Back in content_generation_service.generate(), the except Exception at line 183 only calls set_status(run_id, "failed") with no metadata — so generation_runs.metadata.node_errors is never persisted.

The unit tests in test_pipeline_nodes_handlers.py (test_react_records_error_when_no_client_pool, etc.) verify the handler-level behavior correctly, but don't catch the integration-level loss.

Net result: the run gets status=failed with a bare error string, but the structured node_errors (with code, detail) that the UI reads from metadata is silently lost. This is functionally worse than recording nothing, because it creates a false expectation that node_errors is populated.

Fix option A — persist errors in the exception path inside pipeline_executor.py:

except Exception as exc:
    logger.exception(...)
    raise RuntimeError(str(exc)) from exc  # carry node_errors separately

Or better: catch and re-raise with node_errors stored in a custom exception attribute.

Fix option B — remove context.record_error() before raise in these three handlers (the RuntimeError message already surfaces as the run's error string; there's no point recording it as structured node error if it can't be persisted).

Fix this →

Previously Flagged Issue — Now Resolved

The prior review raised concern about repr() silently corrupting DB records for unknown types. The current code at src/utils/json.py:27 correctly raises TypeError (fail-fast) instead:

raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")

This is the right behavior. ✅

Everything Else Looks Correct

PublishService._publish_to_target (publish_service.py:84-86): if pool is None: return PublishResult(success=False, ...) — correct placement and return type.
_handle_content_publish None guard (unified_dispatcher.py:677-683): mirrors the _handle_photo_due pattern from this PR. Correct.
_handle_photo_due / _handle_photo_auto returning FAILED instead of COMPLETED on missing services: correct.
RetrieveContextHandler, ImageGenerateHandler, NotifyHandler, SearchQueryTriggerHandler: record error + return (pipeline continues). This is intentional and tested.
safe_json_dumps in hash paths (agent_provider_service.py:463,474): secret_payload and payload only contain strings/dicts in practice; TypeError will never trigger here, so no functional risk.

Summary: One critical issue remains — the node_errors captured in pipeline_executor.py's finally block is discarded when a handler raises, so structured errors from React/Forward/Delete misconfiguration are never persisted to generation_runs.metadata. Everything in the #473 JSON serialization fix is correct as-is.

…wn types - pipeline_executor: attach node_errors to exception via exc.node_errors so callers can persist structured errors even on RuntimeError - content_generation_service: extract exc.node_errors and save to DB metadata when marking run as failed - generation_runs repo: set_status now accepts optional metadata param - safe_json_dumps: replace repr() fallback with TypeError (fail-fast) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

axisrow · 2026-04-19T14:51:08Z

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

claude · 2026-04-19T14:51:20Z

Claude finished @axisrow's task in 2m 15s —— View job

PR Review

Read changed files
Review safe_json_dumps utility and callers
Review pipeline executor exception path
Review handler missing-dep changes
Review dispatcher changes
Check for critical bugs / data loss risks

All Previously Flagged Critical Issues Are Now Fixed

`node_errors` propagation via `exc.node_errors` ✅

Prior reviews flagged that context.record_error() before raise RuntimeError in ReactHandler/ForwardHandler/DeleteMessageHandler would lose the structured errors. This is now fully resolved:

pipeline_executor.py:124-129

except Exception as exc:
    logger.exception("Node %s (%s) failed during pipeline execution", ...)
    errors = context.get_errors()      # picks up the no_client_pool error
    if errors:
        exc.node_errors = errors       # attached to the exception
    raise

content_generation_service.py:189-191

node_errors = getattr(exc, "node_errors", None)
meta = {"node_errors": node_errors} if node_errors else None
await self._db.repos.generation_runs.set_status(run_id, "failed", metadata=meta)

The context.record_error() call in the handlers is not wasted — errors are captured before the stack unwinds, attached to the exception object, and persisted to generation_runs.metadata. ✅

`repr()` fallback replaced with `TypeError` ✅

src/utils/json.py:27

raise TypeError(f"Object of type {type(o).__name__} is not JSON serializable")

Fail-fast for unknown types — no silent DB corruption. ✅

`PublishService` / `_handle_content_publish` None guards ✅

publish_service.py:84-86: if pool is None: return PublishResult(success=False, ...) ✅
unified_dispatcher.py:677-683: guard before pool = self._client_pool dereference ✅

No New Critical Issues Found

The complete serialization chain for node_errors is safe: record_error() stores only str/int values → context.get_errors() returns list[dict] with those types → json.dumps(metadata) in generation_runs.set_status serializes cleanly with no safe_json_dumps needed.

_handle_photo_due / _handle_photo_auto returning FAILED instead of COMPLETED on missing services is correct behavior and properly tested.

Summary

All three critical issues identified in prior review cycles have been addressed. The PR is correct as-is and can be merged.

- FakeGenerationRunsRepo.set_status: accept optional metadata param - content_publish tests: add client_pool mock to dispatcher fixtures Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Hapi and others added 2 commits April 19, 2026 11:42

fix(tests): update fake repos and publish tests for new signatures

cc3d4ee

- FakeGenerationRunsRepo.set_status: accept optional metadata param - content_publish tests: add client_pool mock to dispatcher fixtures Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

axisrow merged commit 42d2ae0 into main Apr 19, 2026
5 checks passed

axisrow deleted the fix/472-473-missing-dependencies-json-serialization branch April 19, 2026 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: record errors on missing deps + safe JSON serialization (#472, #473)#474

fix: record errors on missing deps + safe JSON serialization (#472, #473)#474
axisrow merged 6 commits intomainfrom
fix/472-473-missing-dependencies-json-serialization

axisrow commented Apr 19, 2026

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

axisrow commented Apr 19, 2026

Summary

Changes

Issue #472 — Missing dependency handling

Issue #473 — JSON serialization safety

Closes

Test plan

Uh oh!

axisrow commented Apr 19, 2026

Что ещё осталось доработать из #472

1. PublishService — pool.get_client_by_phone() без None guard

2. UnifiedDispatcher._handle_content_publish — unguarded self._client_pool

3. ReactHandler / ForwardHandler / DeleteMessageHandler

4. Medium-risk места из #473

5. Проверка тестами нового поведения

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Critical Issues

1. context.record_error() before raise RuntimeError() is wasted — node_errors never persisted

2. repr() fallback silently corrupts DB records for unknown types

Non-Critical Observations

Summary

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Critical Issue

node_errors is silently dropped when React/Forward/Delete handlers raise

Previously Flagged Issue — Now Resolved

Everything Else Looks Correct

Uh oh!

axisrow commented Apr 19, 2026

Uh oh!

claude bot commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

All Previously Flagged Critical Issues Are Now Fixed

node_errors propagation via exc.node_errors ✅

repr() fallback replaced with TypeError ✅

PublishService / _handle_content_publish None guards ✅

No New Critical Issues Found

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `PublishService` — `pool.get_client_by_phone()` без None guard

2. `UnifiedDispatcher._handle_content_publish` — unguarded `self._client_pool`

3. `ReactHandler` / `ForwardHandler` / `DeleteMessageHandler`

claude bot commented Apr 19, 2026 •

edited

Loading

claude bot commented Apr 19, 2026 •

edited

Loading

claude bot commented Apr 19, 2026 •

edited

Loading

1. `context.record_error()` before `raise RuntimeError()` is wasted — node_errors never persisted

2. `repr()` fallback silently corrupts DB records for unknown types

claude bot commented Apr 19, 2026 •

edited

Loading

`node_errors` is silently dropped when React/Forward/Delete handlers raise

claude bot commented Apr 19, 2026 •

edited

Loading

`node_errors` propagation via `exc.node_errors` ✅

`repr()` fallback replaced with `TypeError` ✅

`PublishService` / `_handle_content_publish` None guards ✅