Skip to content

feat: RSS, tags, S3 storage, refinement pipeline, ReAct fallback, test fixes#271

Merged
axisrow merged 11 commits intomainfrom
feat/issues-batch-236-230-239-240-246-228
Mar 27, 2026
Merged

feat: RSS, tags, S3 storage, refinement pipeline, ReAct fallback, test fixes#271
axisrow merged 11 commits intomainfrom
feat/issues-batch-236-230-239-240-246-228

Conversation

@axisrow
Copy link
Copy Markdown
Owner

@axisrow axisrow commented Mar 27, 2026

Summary

Test plan

  • ruff check src/ tests/ passes
  • pytest tests/ -m "not aiosqlite_serial" -n auto — all pass
  • pytest tests/ -m aiosqlite_serial — all pass
  • RSS/Atom endpoints return valid XML
  • Tag CRUD endpoints work via /channels/tags

🤖 Generated with Claude Code

Hapi and others added 10 commits March 27, 2026 21:20
Re-applies changes from prior session that were lost when remote
updated local main:

- #236: RSS 2.0 (/rss.xml) and Atom 1.0 (/atom.xml) feed endpoints
- #230: Channel tags (tags/channel_tags tables, repo methods, endpoints)
- #239: S3-compatible image storage (S3Store + ImageGenerationService wiring)
- #240: Multi-step pipeline refinement (refinement_steps on ContentPipeline)
- #246: ReAct fallback for Ollama models without native FC (react_agent.py)
- #228: Remove duplicate cli_db fixtures from test_cli_extra/extended
- #255: Migration for refinement_steps and channel_tags tables
- Test model names updated from kimi-k2.5 to llama3 for import-error tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds /rss.xml and /atom.xml public endpoints for collected messages.
Both support channel_id and limit query params (1–200, default 50).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds tags/channel_tags tables, migration, tag CRUD on ChannelsRepository,
and REST endpoints under /channels/tags and /channels/{pk}/tags.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds S3Store class (boto3, optional). ImageGenerationService auto-uploads
local adapter results to S3 when S3_ENDPOINT/S3_BUCKET/S3_ACCESS_KEY/
S3_SECRET_KEY are configured. Falls back to local path otherwise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds refinement_steps JSON field to ContentPipeline. After initial
generation, each step's prompt (with {text} placeholder) is fed to
the LLM sequentially. New endpoints GET/POST /pipelines/{id}/refinement-steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…closes #246)

Adds OllamaReActAgent for models in _OLLAMA_NO_NATIVE_FC (kimi-k2.5,
kimi-k1.5). Uses text-based tool descriptions + JSON tool call parsing
loop, bypassing deepagents tool binding. Tests updated to use llama3
for import-error path testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed duplicate cli_db from test_cli_extra.py (identical to conftest)
and duplicate cli_db from test_cli_extended.py. Both now use the shared
conftest.py fixtures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The system has python3 but no python symlink, causing FileNotFoundError.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace outdated llama3.2/qwen2.5/deepseek-r1 with current top
open-source models: qwen3, deepseek-v3.2, gemma3, mistral-large3, glm4.7.
Default OLLAMA_DEFAULT_MODEL fallback changed from llama3.2 to qwen3.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@axisrow
Copy link
Copy Markdown
Owner Author

axisrow commented Mar 27, 2026

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @axisrow's task in 3m 0s —— View job


Code Review — PR #271

  • Read changed files
  • Analyze for critical issues
  • Post review findings

Critical / High-severity Issues

1. asyncio.get_event_loop() deprecated/incorrect in async context — s3_store.py:42

loop = asyncio.get_event_loop()
return await loop.run_in_executor(None, _upload)

upload_file is already an async function, so there is always a running loop. asyncio.get_event_loop() is deprecated since Python 3.10 and raises a DeprecationWarning (and fails under some 3.12 configurations) when called inside a coroutine without an explicit loop. The correct call is asyncio.get_running_loop().

Fix this →


2. RSS item <link> contains unencoded URL — rss.py:68

item_link = f"{base_url}/search?q={html.escape(item_title[:40])}"

html.escape is an XML-escaping function, not a URL encoder. Message titles containing spaces, +, %, #, or non-ASCII characters produce invalid URLs. For example, a title like "hello world" yields …/search?q=hello world — a URL with a literal space. RSS readers that follow <link> hrefs will get a 400 or broken request. Use urllib.parse.quote (or quote_plus) for the query value:

from urllib.parse import quote
item_link = f"{base_url}/search?q={quote(item_title[:40])}"

Fix this →


Medium-severity Issues

3. Silent swallowing of all DB exceptions in RSS/Atom — rss.py:52-53, rss.py:110-111

except Exception:
    messages = []

Any database error (connection failure, schema mismatch, aiosqlite error) silently returns an empty feed with HTTP 200. The caller (RSS reader, monitoring) has no way to distinguish "no messages" from "DB is broken." At minimum, log the exception:

except Exception:
    logger.exception("Failed to fetch messages for RSS feed")
    messages = []

4. ReAct agent system prompt is hardcoded in Russian — react_agent.py:27-34

_REACT_SUFFIX is entirely in Russian and is appended unconditionally. If the user's configured LLM is tuned for a different language (e.g., English-only models), the mixed-language system prompt can degrade output quality. This applies to the instruction telling the model which JSON format to emit — a malformed tool-call JSON from the model causes the agent to return a partial response after max_steps iterations with no useful feedback.


5. _OLLAMA_NO_NATIVE_FC allowlist is too narrow — manager.py:48-51

_OLLAMA_NO_NATIVE_FC: frozenset[str] = frozenset({
    "kimi-k2.5",
    "kimi-k1.5",
})

The PR description says this is a "ReAct fallback for Ollama models without native function calling," but many widely-used Ollama models (e.g. vanilla Llama 3, Mistral, Gemma) also don't support native function calling. The current list only covers two kimi models. If a user runs one of the hundreds of non-FC-capable Ollama models that aren't on this list, they silently get a broken agent rather than the ReAct fallback. Consider making the denylist opt-out (e.g., default to ReAct for all Ollama models unless the model is in a separate _OLLAMA_NATIVE_FC allowlist).


Low-severity / Notes

  • create_tag has no length validation (channels.py:231-236): tag names are only stripped, not length-capped. A tag of arbitrary length is accepted and stored. A name[:100] truncation or HTTP 422 for names over a reasonable limit would be safe.

  • S3 URL is path-style only (s3_store.py:40): f"{self._endpoint}/{self._bucket}/{key}" works for MinIO/Ceph but not for virtual-hosted-style AWS S3 endpoints. The docstring mentions MinIO explicitly, so this is likely intentional — but should be documented clearly in the env var guidance.

  • set_channel_tags — not explicitly transactional: the DELETE + multi-INSERT sequence relies on aiosqlite's implicit transaction, which is fine for SQLite. However, if aiosqlite is opened in autocommit mode (isolation_level=None), the delete commits before the inserts and a crash midway leaves the channel with no tags. Worth verifying the connection's isolation_level.


Summary

Two items need fixes before merge: the asyncio.get_event_loop() call in s3_store.py (will fail or warn on Python 3.10+/3.12) and the unencoded RSS <link> URLs (breaks RSS reader navigation). The rest are medium/low severity.

@axisrow
Copy link
Copy Markdown
Owner Author

axisrow commented Mar 27, 2026

Review triage

Fixed (2):

  1. asyncio.get_event_loop()asyncio.get_running_loop() in s3_store.py — fixed
  2. Silent DB exception swallowing in RSS — added logger.exception() in both endpoints

Skipped (4):

  1. RSS <link> URL encoding — low impact, search convenience link only; will address separately if needed
  2. _OLLAMA_NO_NATIVE_FC too narrow — intentional design: light workaround (tool names in system prompt, lines 634-638) already covers ALL Ollama models; ReAct is the heavy fallback reserved for known-bad models only
  3. create_tag no length validation — cosmetic, web UI provides input limits
  4. set_channel_tags not transactional — low-risk: tag operations are simple INSERTs unlikely to fail mid-way

Irrelevant (2):

  1. Russian system prompt in ReAct — Russian is the deliberate, consistent convention throughout the entire agent system (DEFAULT_AGENT_PROMPT_TEMPLATE, tool suffix in manager.py:636, etc.). This is a Russian-language Telegram tool.
  2. S3 URL path-style only — by design: class targets S3/MinIO (path-style), stated in docstring

- Replace deprecated asyncio.get_event_loop() with get_running_loop()
  in S3Store.upload_file() (Python 3.10+ compatibility)
- Add logger.exception() for silent DB errors in RSS/Atom feed endpoints

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@axisrow
Copy link
Copy Markdown
Owner Author

axisrow commented Mar 27, 2026

@claude review. Focus on critical issues: bugs, security vulnerabilities, logical errors, data loss risks, performance problems. Do NOT nitpick style, naming conventions, minor formatting, or subjective preferences — only flag issues that could break functionality or cause real harm in production.

@claude
Copy link
Copy Markdown

claude bot commented Mar 27, 2026

Claude finished @axisrow's task in 2m 56s —— View job


Code Review — PR #271 (Round 2)

  • Read changed files and verify fixes from round 1
  • Check for remaining critical issues
  • Post findings

Fixes Verified

Both items flagged in round 1 are confirmed fixed:

asyncio.get_running_loop()s3_store.py:42
asyncio.get_running_loop() is now used correctly.

RSS/Atom exception logging — rss.py:52-53, rss.py:111-112
Both endpoints now call logger.exception(...) before returning an empty feed.


Remaining (deferred by author)

RSS <link> URL encoding — rss.py:69
Still unfixed: html.escape(item_title[:40]) is used inside a URL query string. The outer html.escape(item_link) on line 74 handles XML escaping, but spaces and special characters in the title will produce invalid hrefs. Low impact as noted, but worth tracking.


New Findings

No new critical or high-severity issues found.

Two items worth noting:

panel_auth.py:52 — RSS/Atom intentionally public
is_public_path explicitly exempts /rss.xml and /atom.xml from auth. This is consistent with "public feed endpoints" in the PR description. Confirm this is intentional if the instance stores sensitive Telegram content.

assembly.py:156pipelines_router registered twice

app.include_router(pipelines_router, prefix="/pipelines")  # line 146
...
app.include_router(pipelines_router, prefix="/pipelines")  # line 156

This is a pre-existing issue not introduced by this PR, but it means all pipeline routes are registered twice — FastAPI silently deduplicates for routing but it doubles the route table. Not a blocker.


Summary

Both critical fixes from round 1 are confirmed. No new critical or high-severity issues. The RSS URL encoding deferral is the only open item. Ready to merge pending that tracking.

@axisrow axisrow merged commit 2af529e into main Mar 27, 2026
2 checks passed
@axisrow axisrow deleted the feat/issues-batch-236-230-239-240-246-228 branch March 27, 2026 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants