Skip to content

feat(OCR): 添加基于大模型的OCR作为PaddleOCR的替代方案#243

Open
ModerRAS wants to merge 6 commits intomasterfrom
issue-241-ocr-alternative
Open

feat(OCR): 添加基于大模型的OCR作为PaddleOCR的替代方案#243
ModerRAS wants to merge 6 commits intomasterfrom
issue-241-ocr-alternative

Conversation

@ModerRAS
Copy link
Copy Markdown
Owner

@ModerRAS ModerRAS commented Mar 19, 2026

变更摘要

实现Issue #241 - 添加基于大模型的OCR作为PaddleOCR的替代方案

新增功能

  1. IOCRService接口和OCREngine枚举

    • 定义统一的OCR服务接口
    • 支持PaddleOCR和LLM两种引擎
  2. LLMOCRService实现

    • 使用现有的GeneralLLMService.AnalyzeImageAsync进行图片识别
    • 自动将图片转为JPEG格式并保存到临时文件
    • 完整的错误处理和日志记录
  3. OCR配置状态机

    • OCRConfState枚举定义配置状态
    • EditOCRConfRedisHelper存储状态到Redis
    • EditOCRConfService处理状态转换逻辑
    • EditOCRConfController和EditOCRConfView提供Bot交互界面
  4. AutoOCRController增强

    • 支持动态OCR引擎切换
    • 通过配置自动选择PaddleOCR或LLM引擎
    • 向后兼容,默认使用PaddleOCR

使用方式

通过Bot私聊发送以下命令进行OCR配置:

    • 进入OCR配置主菜单
  • 或 - 切换到PaddleOCR
  • 或 - 切换到LLM OCR
  • 或 - 退出配置

文件变更

新增文件:

    • OCR服务接口
    • 配置状态枚举
    • Redis状态存储
    • 状态机逻辑
    • LLM OCR实现
    • Bot控制器
    • 视图组件

修改文件:

    • 支持引擎切换
    • 实现IOCRService接口

Summary by CodeRabbit

  • New Features

    • Option to run a local Telegram Bot API server at startup (configurable port)
    • Runtime-selectable OCR engines with an LLM-based OCR service and PaddleOCR support
    • Interactive OCR configuration: new management flow, controller, Redis-backed state helper, and a view to edit/persist OCR settings
  • Chores

    • Windows CI workflow updated to cache/build a local Telegram Bot API binary and include it in outputs

Copilot AI and others added 4 commits March 3, 2026 08:33
…e bundle support

- Add EnableLocalBotAPI, TelegramBotApiId, TelegramBotApiHash, LocalBotApiPort to Config/Env
- Auto-set BaseUrl and IsLocalAPI when EnableLocalBotAPI is true
- Launch telegram-bot-api.exe as a ChildProcessManager-managed process in GeneralBootstrap
- Add conditional Content entry for telegram-bot-api.exe in .csproj
- Build telegram-bot-api from source via cmake+vcpkg in GitHub Actions push workflow

Co-authored-by: ModerRAS <28183976+ModerRAS@users.noreply.github.com>
- 新增IOCRService接口和OCREngine枚举
- 新增LLMOCRService实现,使用GeneralLLMService.AnalyzeImageAsync
- 新增OCRConfState枚举和状态机
- 新增EditOCRConfRedisHelper用于Redis状态存储
- 新增EditOCRConfService实现OCR配置状态机逻辑
- 新增EditOCRConfController和EditOCRConfView用于Bot私聊配置
- 修改AutoOCRController支持OCR引擎动态切换
- 修改PaddleOCRService实现IOCRService接口

支持通过Bot私聊发送"OCR设置"进行配置
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 19, 2026

📝 Walkthrough

Walkthrough

Adds optional local Telegram Bot API build and startup to CI/startup; introduces an IOCRService abstraction plus an LLM-based OCR implementation; implements a Redis-backed, stateful per-chat OCR configuration flow (service, controller, view) and runtime OCR engine selection in the AutoOCRController.

Changes

Cohort / File(s) Summary
CI / Native Bot API
.github/workflows/push.yml, TelegramSearchBot/TelegramSearchBot.csproj
Windows workflow: cache and conditionally build telegram-bot-api and copy telegram-bot-api.exe into the app; project now includes the exe as content to always copy to output when present.
Env & Startup
TelegramSearchBot.Common/Env.cs, TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs
New config fields for local Bot API (enable, api id/hash, port). Startup may launch telegram-bot-api.exe, register the process, and poll TCP port until ready.
OCR abstraction & services
TelegramSearchBot/Interface/AI/OCR/IOCRService.cs, TelegramSearchBot/Service/AI/OCR/PaddleOCRService.cs, TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs
Added IOCRService + OCREngine enum; PaddleOCR implements interface; new LLMOCRService performs image -> JPEG -> LLM analysis using IGeneralLLMService with temp-file handling.
Auto OCR selection
TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs
Controller now accepts multiple IOCRService implementations and IAppConfigurationService; selects engine at runtime via stored config, falls back to PaddleOCR, and invokes the chosen service.
OCR config state & helpers
TelegramSearchBot/Model/AI/OCRConfState.cs, TelegramSearchBot/Helper/EditOCRConfRedisHelper.cs
Added OCRConfState enum + description extension; added Redis helper for per-chat state/data/channel get/set/delete.
OCR config flow (service/controller/view)
TelegramSearchBot/Service/Manage/EditOCRConfService.cs, TelegramSearchBot/Controller/Manage/EditOCRConfController.cs, TelegramSearchBot/View/EditOCRConfView.cs
New state-driven EditOCRConfService handling menu, engine/channel/model selection, persistence via EF and Redis; admin-only controller to route commands; view to render responses with fluent builders.
Misc formatting / small edits
many files (migrations, tests, LLM services, helpers, extensions, etc.)
Predominantly whitespace/formatting tweaks across migrations, tests, and LLM-related code with no behavior changes.

Sequence Diagram

sequenceDiagram
    actor Admin
    participant Telegram
    participant Controller as EditOCRConfController
    participant Service as EditOCRConfService
    participant Redis as EditOCRConfRedisHelper
    participant DB as DataDbContext
    participant View as EditOCRConfView

    Admin->>Telegram: Send OCR config command
    Telegram->>Controller: Deliver update
    Controller->>Controller: Validate admin & extract command
    Controller->>Service: ExecuteAsync(command, chatId)
    Service->>Redis: Initialize / GetStateAsync()
    Redis-->>Service: Return current state

    alt MainMenu -> Engine selection
        Service->>Redis: SetStateAsync(SelectingEngine)
        Service->>DB: Upsert OCR:Engine (if applicable)
    else SelectingLLMChannel
        Service->>Redis: SetStateAsync(SelectingLLMChannel)
        Service->>DB: Query ChannelsWithModel
    else SelectingLLMModel
        Service->>Redis: GetChannelIdAsync()
        Service->>DB: Upsert LLM model/config items
        Service->>Redis: DeleteKeysAsync()
    end

    Service-->>Controller: (status, message)
    Controller->>View: WithChatId().WithReplyTo().WithMessage()
    Controller->>View: Render()
    View->>Telegram: Send response
    Telegram->>Admin: Show result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Poem

🐰 I hopped in code and found a trail,

Engines chosen, LLMs set sail.
Redis crumbs across each chat,
Local API boots the hat.
Hoppy configs, bright and neat — a rabbit's cheer for features sweet!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.27% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding LLM-based OCR as an alternative to PaddleOCR with backward compatibility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-241-ocr-alternative
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 19, 2026

🔍 PR检查报告

📋 检查概览

🧪 测试结果

平台 状态 详情
Ubuntu 🔴 失败 测试结果不可用
Windows 🔴 失败 测试结果不可用

📊 代码质量

  • ✅ 代码格式化检查
  • ✅ 安全漏洞扫描
  • ✅ 依赖包分析
  • ✅ 代码覆盖率收集

📁 测试产物

  • 测试结果文件已上传为artifacts
  • 代码覆盖率已上传到Codecov

🔗 相关链接


此报告由GitHub Actions自动生成

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Nitpick comments (4)
TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs (1)

97-131: Windows-only implementation - consider documenting or guarding.

This implementation is Windows-specific:

  1. telegram-bot-api.exe is a Windows binary
  2. childProcessManager uses Windows job objects (see AppBootstrap.cs:19-143)

Linux users with EnableLocalBotAPI = true will see the warning at line 129 but might be confused. Consider adding platform documentation or a more explicit message.

🔧 Suggested improvement for cross-platform clarity
             if (Env.EnableLocalBotAPI) {
+#if !WINDOWS
+                Log.Warning("EnableLocalBotAPI is currently only supported on Windows. Feature disabled.");
+#else
                 string botApiExePath = Path.Combine(AppContext.BaseDirectory, "telegram-bot-api.exe");
                 // ... existing code ...
+#endif
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs` around lines 97 - 131,
The current local telegram-bot-api startup block (guarded by
Env.EnableLocalBotAPI) assumes a Windows executable and Windows job-object
management (see botApiExePath, childProcessManager and WaitForLocalBotApiReady);
update the logic to first check the OS (e.g.,
RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) and only attempt to locate
and start "telegram-bot-api.exe" and register it with childProcessManager on
Windows, otherwise log a clear platform-specific message that EnableLocalBotAPI
is not supported on the current OS and suggest the recommended Linux alternative
or documentation link; ensure the log includes which platform was detected and
keep the existing Windows startup flow (ArgumentList, AddProcess,
WaitForLocalBotApiReady) unchanged when on Windows.
.github/workflows/push.yml (1)

36-53: Build step lacks explicit error handling for vcpkg/cmake failures.

If vcpkg install or cmake fails, the step will fail the entire workflow without clear indication of which part failed. Consider adding explicit error checks or using $ErrorActionPreference = 'Stop' at the start.

Also, the recursive clone of tdlib/telegram-bot-api can be slow and bandwidth-intensive. The current caching strategy helps, but initial builds or cache misses will be slow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/push.yml around lines 36 - 53, The "Build telegram-bot-api
from source" step lacks explicit failure handling and does a full recursive
clone; make the step fail-fast and emit clear errors by setting PowerShell to
stop on errors (e.g., set $ErrorActionPreference = 'Stop' at the top of the run
block) and wrap the key commands (git clone, C:\vcpkg\vcpkg.exe install, cmake
configure/build) in try/catch that logs which command failed (include the
command name like git clone, vcpkg install, cmake) and rethrows/exit with a
non‑zero code; also replace the recursive git clone with a shallow non-recursive
clone (e.g., --depth 1 and no --recursive) or explicitly fetch only needed
submodules to reduce bandwidth on cache misses.
TelegramSearchBot/TelegramSearchBot.csproj (1)

99-103: Cross-platform consideration: Windows-only binary packaging.

The project targets both win-x64 and linux-x64 (line 5), but only telegram-bot-api.exe is packaged. Linux deployments would need telegram-bot-api (no .exe extension) built for Linux.

The Condition="Exists('telegram-bot-api.exe')" gracefully handles the missing file case, so Linux builds won't fail. However, users enabling EnableLocalBotAPI on Linux won't have the binary available.

Consider documenting this Windows-only limitation or adding Linux binary support in a future iteration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/TelegramSearchBot.csproj` around lines 99 - 103, The
project currently packages only the Windows binary "telegram-bot-api.exe"
(ItemGroup Content Include="telegram-bot-api.exe") while the runtime identifiers
include both "win-x64" and "linux-x64"; update the packaging to account for
Linux by either adding a conditional Content entry for the Linux binary name
"telegram-bot-api" (and/or a platform-specific MSBuild Condition based on
RuntimeIdentifier) or document that EnableLocalBotAPI is Windows-only so Linux
users know they must provide a Linux build; locate the ItemGroup that references
"telegram-bot-api.exe" and adjust it to include the Linux artifact or add clear
documentation referencing EnableLocalBotAPI and the runtime identifiers.
TelegramSearchBot/Service/Manage/EditOCRConfService.cs (1)

215-234: Persist LLM channel/model atomically to avoid partial configuration

SetLLMConfigAsync performs two separate writes, each with its own SaveChangesAsync. If the second write fails, config can become partially updated.

💡 Suggested fix
 private async Task SetLLMConfigAsync(int channelId, string modelName) {
-    await SetConfigAsync(OCRLLMChannelIdKey, channelId.ToString());
-    await SetConfigAsync(OCRLLMModelNameKey, modelName);
+    await UpsertConfigEntityAsync(OCRLLMChannelIdKey, channelId.ToString());
+    await UpsertConfigEntityAsync(OCRLLMModelNameKey, modelName);
+    await DataContext.SaveChangesAsync();
 }
 
-private async Task SetConfigAsync(string key, string value) {
+private async Task UpsertConfigEntityAsync(string key, string value) {
     var config = await DataContext.AppConfigurationItems
         .FirstOrDefaultAsync(x => x.Key == key);
     
     if (config == null) {
         await DataContext.AppConfigurationItems.AddAsync(new AppConfigurationItem {
             Key = key,
             Value = value
         });
     } else {
         config.Value = value;
     }
-    
-    await DataContext.SaveChangesAsync();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs` around lines 215 -
234, SetLLMConfigAsync currently calls SetConfigAsync twice which invokes
DataContext.SaveChangesAsync for each key, risking partial updates; change
SetLLMConfigAsync to set or upsert both OCRLLMChannelIdKey and
OCRLLMModelNameKey on the DataContext.AppConfigurationItems in-memory first and
then call a single SaveChangesAsync (or wrap both SetConfigAsync operations in a
database transaction using DataContext.Database.BeginTransactionAsync and commit
once) so both values are persisted atomically; update or add
AppConfigurationItem entries for OCRLLMChannelIdKey and OCRLLMModelNameKey then
perform one DataContext.SaveChangesAsync (or commit the transaction) to avoid
partial configuration.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/push.yml:
- Around line 54-57: The copy step "Copy telegram-bot-api binary to project"
runs unconditionally but will fail if neither the cache hit nor the build
produced telegram-bot-api-bin\telegram-bot-api.exe; update that job step to only
run when the binary is available by adding an if condition checking the cache
hit or build success (for example: if: steps.cache.outputs.cache-hit == 'true'
|| steps.build.outcome == 'success'), or alternatively check the file existence
before copying (using a conditional script or an if: expression) so the step is
skipped when the binary doesn't exist.

In `@TelegramSearchBot.Common/Env.cs`:
- Around line 86-89: Add documentation entries in the Docs/ folder for the newly
persisted configuration fields added to the Config/Env class: EnableLocalBotAPI,
TelegramBotApiId, TelegramBotApiHash, and LocalBotApiPort. Update the
user-facing configuration reference (e.g., the config schema docs and any
example config files) to describe each field, its type (bool, string, string,
int), default values (false, null/empty, null/empty, 8081), expected usage, and
any security notes for TelegramBotApiId/TelegramBotApiHash; also add a short
migration note so users know these are now persistent. Ensure the docs reference
the same symbols (EnableLocalBotAPI, TelegramBotApiId, TelegramBotApiHash,
LocalBotApiPort) so they match the code.

In `@TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs`:
- Around line 108-126: The ProcessStartInfo currently sets
RedirectStandardOutput and RedirectStandardError to true but never consumes
those streams, which can cause the child process to hang; fix by either
disabling redirection (set RedirectStandardOutput = false and
RedirectStandardError = false on the ProcessStartInfo used to start the bot API)
or, if you need logs, consume the streams asynchronously after Process.Start by
attaching asynchronous readers to botApiProcess.StandardOutput and
botApiProcess.StandardError (or use BeginOutputReadLine/BeginErrorReadLine or
Task-based reads) and forward them to Log before calling
childProcessManager.AddProcess and awaiting WaitForLocalBotApiReady so buffers
never fill.

In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs`:
- Around line 74-79: The controller currently awaits LLM OCR inline
(GetOCREngineAsync -> selecting _ocrServices where Engine == OCREngine.LLM and
calling LLMOCRService.ExecuteAsync), causing long-running work inside the
IOnUpdate path; change the flow so the controller does not run LLM OCR
synchronously: detect when GetOCREngineAsync returns OCREngine.LLM and instead
enqueue or dispatch the image to the existing subprocess/scheduler service used
for PaddleOCR (or the AppBootstrap-launched OCR subprocess) rather than calling
LLMOCRService.ExecuteAsync directly; keep synchronous/fast handling for update
handlers by selecting the local OCR service (e.g., PaddleOCR) for immediate
ExecuteAsync and push LLM requests to the background scheduler/service (use the
project’s scheduler enqueue method or subprocess launcher used elsewhere) so
IOnUpdate never performs image conversion/temp-file I/O or LLM network calls
inline.

In `@TelegramSearchBot/Controller/Manage/EditOCRConfController.cs`:
- Around line 18-20: EditOCRConfController currently holds a mutable
EditOCRConfView instance that gets reused by ControllerExecutor, causing
cross-request state corruption; remove the view field from EditOCRConfController
and either (A) change EditOCRConfView to be stateless by replacing instance
fields/_chatId/_replyToMessageId/_messageText with a Render(chatId,
replyToMessageId, messageText) method, or (B) instantiate a new EditOCRConfView
inside the controller execution path each request before calling its render/send
methods; update all places in EditOCRConfController that read or mutate
EditOCRConfView (e.g., where you set chatId/replyToMessageId/messageText) to use
the stateless Render signature or the fresh instance, ensuring the controller
remains stateless as required by ControllerExecutor.

In `@TelegramSearchBot/Helper/EditOCRConfRedisHelper.cs`:
- Around line 30-31: The Redis keys for the transient edit flow are never
expiring; update SetStateAsync (and the other write methods referenced around
lines 38-39 and 50-51) to set a short TTL and refresh it on every write:
introduce a single TTL constant (e.g., TimeSpan editFlowTtl =
TimeSpan.FromMinutes(10)) in the class and use GetDatabase().StringSetAsync(key,
value, editFlowTtl) (or call KeyExpireAsync immediately after StringSetAsync)
for StateKey and the other keys written by SetSourceAsync / SetDocIdAsync so
each write resets the expiry and stale keys are automatically removed.
- Around line 42-51: Change the channel ID methods and storage to use long
instead of int: update GetChannelIdAsync to parse the stored string with
long.TryParse and return a long? (nullable long) and update SetChannelIdAsync to
accept a long channelId and store channelId.ToString(); ensure you reference
GetChannelIdAsync, SetChannelIdAsync and ChannelKey (and align with the existing
_chatId constructor parameter) so the Redis read/write uses 64-bit values and
preserves negative -100... Telegram IDs.

In `@TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs`:
- Around line 37-38: The call to _generalLLMService.AnalyzeImageAsync(tempPath,
0) discards the originating chat context; update IOCRService.ExecuteAsync to
accept a chatId parameter (thread e.Message.Chat.Id from the caller), propagate
that chatId into LLMOCRService (or wherever ExecuteAsync is implemented), and
replace the hardcoded 0 with the passed chatId when calling
_generalLLMService.AnalyzeImageAsync(tempPath, chatId) so chat-scoped
routing/configuration in GeneralLLMService receives the correct ChatId.
- Around line 30-35: The current use of Path.GetTempFileName() + ".jpg" creates
an unused .tmp file each request; change the tempPath generation in
LLMOCRService (tempPath variable) to create a unique JPG path without
pre-creating a file (e.g., Path.Combine(Path.GetTempPath(),
Guid.NewGuid().ToString() + ".jpg") or Path.GetRandomFileName() + ".jpg"), write
the image to that path in your existing FileStream block, and ensure you delete
the tempPath in a finally/cleanup block after processing to avoid leaking temp
files.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs`:
- Around line 60-63: The handler currently treats both "退出" and "返回" the same
and clears state via redis.DeleteKeysAsync(), which conflicts with the UI prompt
that says 输入`返回`返回主菜单; update the logic so only the "退出" command triggers
redis.DeleteKeysAsync() and returns the exit message, while "返回" should return
(true, "返回主菜单") or equivalent without clearing state; adjust any related
prompt/response text (the view prompt that mentions 输入`返回`返回主菜单) to remain
consistent with this new behavior.
- Around line 77-91: HandleMainMenuAsync currently ignores the incoming command
and always re-renders the menu; update it to parse the command and set the next
interaction state on the EditOCRConfRedisHelper so the flow can transition
(e.g., set redis.State to SelectingEngine when command == "1", to ViewingConfig
when command == "2", and handle "3" as return/exit). Keep returning the menu
text when command is empty or invalid, but when a valid option is chosen return
the appropriate state change (and a brief confirmation message) so downstream
handlers (SelectingEngine/ViewingConfig) are reachable; reference the
HandleMainMenuAsync method, the redis parameter, and the
SelectingEngine/ViewingConfig state names when making the change.
- Around line 70-75: The code currently returns (false, string.Empty) when
currentState is not found in _stateHandlers, which leaves the conversation
dead-ended; instead reset the conversation state to MainMenu and return a
friendly message. In the else branch where
_stateHandlers.TryGetValue(currentState, out var handler) is false, write the
new state ("MainMenu") back to Redis (or call the existing state setter used
elsewhere), ensure any in-memory state is updated, and return (true, "<brief
prompt for main menu>") or similar so the user is guided back to MainMenu;
reference _stateHandlers, currentState, handler and MainMenu when making the
change.

---

Nitpick comments:
In @.github/workflows/push.yml:
- Around line 36-53: The "Build telegram-bot-api from source" step lacks
explicit failure handling and does a full recursive clone; make the step
fail-fast and emit clear errors by setting PowerShell to stop on errors (e.g.,
set $ErrorActionPreference = 'Stop' at the top of the run block) and wrap the
key commands (git clone, C:\vcpkg\vcpkg.exe install, cmake configure/build) in
try/catch that logs which command failed (include the command name like git
clone, vcpkg install, cmake) and rethrows/exit with a non‑zero code; also
replace the recursive git clone with a shallow non-recursive clone (e.g.,
--depth 1 and no --recursive) or explicitly fetch only needed submodules to
reduce bandwidth on cache misses.

In `@TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs`:
- Around line 97-131: The current local telegram-bot-api startup block (guarded
by Env.EnableLocalBotAPI) assumes a Windows executable and Windows job-object
management (see botApiExePath, childProcessManager and WaitForLocalBotApiReady);
update the logic to first check the OS (e.g.,
RuntimeInformation.IsOSPlatform(OSPlatform.Windows)) and only attempt to locate
and start "telegram-bot-api.exe" and register it with childProcessManager on
Windows, otherwise log a clear platform-specific message that EnableLocalBotAPI
is not supported on the current OS and suggest the recommended Linux alternative
or documentation link; ensure the log includes which platform was detected and
keep the existing Windows startup flow (ArgumentList, AddProcess,
WaitForLocalBotApiReady) unchanged when on Windows.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs`:
- Around line 215-234: SetLLMConfigAsync currently calls SetConfigAsync twice
which invokes DataContext.SaveChangesAsync for each key, risking partial
updates; change SetLLMConfigAsync to set or upsert both OCRLLMChannelIdKey and
OCRLLMModelNameKey on the DataContext.AppConfigurationItems in-memory first and
then call a single SaveChangesAsync (or wrap both SetConfigAsync operations in a
database transaction using DataContext.Database.BeginTransactionAsync and commit
once) so both values are persisted atomically; update or add
AppConfigurationItem entries for OCRLLMChannelIdKey and OCRLLMModelNameKey then
perform one DataContext.SaveChangesAsync (or commit the transaction) to avoid
partial configuration.

In `@TelegramSearchBot/TelegramSearchBot.csproj`:
- Around line 99-103: The project currently packages only the Windows binary
"telegram-bot-api.exe" (ItemGroup Content Include="telegram-bot-api.exe") while
the runtime identifiers include both "win-x64" and "linux-x64"; update the
packaging to account for Linux by either adding a conditional Content entry for
the Linux binary name "telegram-bot-api" (and/or a platform-specific MSBuild
Condition based on RuntimeIdentifier) or document that EnableLocalBotAPI is
Windows-only so Linux users know they must provide a Linux build; locate the
ItemGroup that references "telegram-bot-api.exe" and adjust it to include the
Linux artifact or add clear documentation referencing EnableLocalBotAPI and the
runtime identifiers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8202ab74-6587-4b38-8fed-2af0e43f4739

📥 Commits

Reviewing files that changed from the base of the PR and between d68516d and b8431f8.

📒 Files selected for processing (13)
  • .github/workflows/push.yml
  • TelegramSearchBot.Common/Env.cs
  • TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs
  • TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs
  • TelegramSearchBot/Controller/Manage/EditOCRConfController.cs
  • TelegramSearchBot/Helper/EditOCRConfRedisHelper.cs
  • TelegramSearchBot/Interface/AI/OCR/IOCRService.cs
  • TelegramSearchBot/Model/AI/OCRConfState.cs
  • TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs
  • TelegramSearchBot/Service/AI/OCR/PaddleOCRService.cs
  • TelegramSearchBot/Service/Manage/EditOCRConfService.cs
  • TelegramSearchBot/TelegramSearchBot.csproj
  • TelegramSearchBot/View/EditOCRConfView.cs

Comment on lines +54 to +57
- name: Copy telegram-bot-api binary to project
shell: pwsh
run: |
Copy-Item "telegram-bot-api-bin\telegram-bot-api.exe" "TelegramSearchBot\telegram-bot-api.exe"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Copy step may fail if both cache miss and build fail.

The copy step runs unconditionally, but telegram-bot-api-bin\telegram-bot-api.exe only exists if the cache was hit OR the build succeeded. If the build step fails (e.g., network issue cloning, CMake error), this step will fail the workflow.

Consider adding an if condition or combining with the build step.

🔧 Proposed fix
     - name: Copy telegram-bot-api binary to project
+      if: success()
       shell: pwsh
       run: |
+        if (Test-Path "telegram-bot-api-bin\telegram-bot-api.exe") {
           Copy-Item "telegram-bot-api-bin\telegram-bot-api.exe" "TelegramSearchBot\telegram-bot-api.exe"
+        } else {
+          Write-Warning "telegram-bot-api.exe not found, skipping copy"
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Copy telegram-bot-api binary to project
shell: pwsh
run: |
Copy-Item "telegram-bot-api-bin\telegram-bot-api.exe" "TelegramSearchBot\telegram-bot-api.exe"
- name: Copy telegram-bot-api binary to project
if: success()
shell: pwsh
run: |
if (Test-Path "telegram-bot-api-bin\telegram-bot-api.exe") {
Copy-Item "telegram-bot-api-bin\telegram-bot-api.exe" "TelegramSearchBot\telegram-bot-api.exe"
} else {
Write-Warning "telegram-bot-api.exe not found, skipping copy"
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/push.yml around lines 54 - 57, The copy step "Copy
telegram-bot-api binary to project" runs unconditionally but will fail if
neither the cache hit nor the build produced
telegram-bot-api-bin\telegram-bot-api.exe; update that job step to only run when
the binary is available by adding an if condition checking the cache hit or
build success (for example: if: steps.cache.outputs.cache-hit == 'true' ||
steps.build.outcome == 'success'), or alternatively check the file existence
before copying (using a conditional script or an if: expression) so the step is
skipped when the binary doesn't exist.

Comment on lines +86 to +89
public bool EnableLocalBotAPI { get; set; } = false;
public string TelegramBotApiId { get; set; }
public string TelegramBotApiHash { get; set; }
public int LocalBotApiPort { get; set; } = 8081;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if documentation exists and mentions these new config options
fd -t f -e md . Docs/ 2>/dev/null | head -5
rg -l "EnableLocalBotAPI|TelegramBotApiId|TelegramBotApiHash|LocalBotApiPort" Docs/ 2>/dev/null || echo "No documentation found for new config fields"

Repository: ModerRAS/TelegramSearchBot

Length of output: 268


Update documentation in Docs/ for new configuration fields.

New persistent configuration fields (EnableLocalBotAPI, TelegramBotApiId, TelegramBotApiHash, LocalBotApiPort) have been added to the Config class but are not documented. Per coding guidelines, user documentation must be updated when modifying persistent configuration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot.Common/Env.cs` around lines 86 - 89, Add documentation
entries in the Docs/ folder for the newly persisted configuration fields added
to the Config/Env class: EnableLocalBotAPI, TelegramBotApiId,
TelegramBotApiHash, and LocalBotApiPort. Update the user-facing configuration
reference (e.g., the config schema docs and any example config files) to
describe each field, its type (bool, string, string, int), default values
(false, null/empty, null/empty, 8081), expected usage, and any security notes
for TelegramBotApiId/TelegramBotApiHash; also add a short migration note so
users know these are now persistent. Ensure the docs reference the same symbols
(EnableLocalBotAPI, TelegramBotApiId, TelegramBotApiHash, LocalBotApiPort) so
they match the code.

Comment on lines +108 to +126
var startInfo = new ProcessStartInfo {
FileName = botApiExePath,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true
};
startInfo.ArgumentList.Add("--local");
startInfo.ArgumentList.Add($"--api-id={Env.TelegramBotApiId}");
startInfo.ArgumentList.Add($"--api-hash={Env.TelegramBotApiHash}");
startInfo.ArgumentList.Add($"--dir={botApiDataDir}");
startInfo.ArgumentList.Add($"--http-port={Env.LocalBotApiPort}");
var botApiProcess = Process.Start(startInfo);
if (botApiProcess == null) {
Log.Warning("telegram-bot-api 进程启动失败");
} else {
childProcessManager.AddProcess(botApiProcess);
Log.Information("telegram-bot-api 已启动,等待端口 {Port} 就绪...", Env.LocalBotApiPort);
await WaitForLocalBotApiReady(Env.LocalBotApiPort);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Redirected stdout/stderr are never consumed - potential process hang.

When RedirectStandardOutput and RedirectStandardError are set to true, the output streams must be read. If the internal buffer fills up (typically 4KB), the child process will block on writes, potentially causing a hang.

Either disable redirection (if you don't need the output) or consume the streams asynchronously.

🔧 Option 1: Disable redirection (simplest)
                         var startInfo = new ProcessStartInfo {
                             FileName = botApiExePath,
                             UseShellExecute = false,
-                            RedirectStandardOutput = true,
-                            RedirectStandardError = true
+                            RedirectStandardOutput = false,
+                            RedirectStandardError = false
                         };
🔧 Option 2: Consume streams asynchronously
                         var botApiProcess = Process.Start(startInfo);
                         if (botApiProcess == null) {
                             Log.Warning("telegram-bot-api 进程启动失败");
                         } else {
+                            // Consume stdout/stderr to prevent buffer blocking
+                            botApiProcess.OutputDataReceived += (s, e) => { if (e.Data != null) Log.Debug("[bot-api stdout] {Line}", e.Data); };
+                            botApiProcess.ErrorDataReceived += (s, e) => { if (e.Data != null) Log.Debug("[bot-api stderr] {Line}", e.Data); };
+                            botApiProcess.BeginOutputReadLine();
+                            botApiProcess.BeginErrorReadLine();
                             childProcessManager.AddProcess(botApiProcess);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var startInfo = new ProcessStartInfo {
FileName = botApiExePath,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true
};
startInfo.ArgumentList.Add("--local");
startInfo.ArgumentList.Add($"--api-id={Env.TelegramBotApiId}");
startInfo.ArgumentList.Add($"--api-hash={Env.TelegramBotApiHash}");
startInfo.ArgumentList.Add($"--dir={botApiDataDir}");
startInfo.ArgumentList.Add($"--http-port={Env.LocalBotApiPort}");
var botApiProcess = Process.Start(startInfo);
if (botApiProcess == null) {
Log.Warning("telegram-bot-api 进程启动失败");
} else {
childProcessManager.AddProcess(botApiProcess);
Log.Information("telegram-bot-api 已启动,等待端口 {Port} 就绪...", Env.LocalBotApiPort);
await WaitForLocalBotApiReady(Env.LocalBotApiPort);
}
var startInfo = new ProcessStartInfo {
FileName = botApiExePath,
UseShellExecute = false,
RedirectStandardOutput = false,
RedirectStandardError = false
};
startInfo.ArgumentList.Add("--local");
startInfo.ArgumentList.Add($"--api-id={Env.TelegramBotApiId}");
startInfo.ArgumentList.Add($"--api-hash={Env.TelegramBotApiHash}");
startInfo.ArgumentList.Add($"--dir={botApiDataDir}");
startInfo.ArgumentList.Add($"--http-port={Env.LocalBotApiPort}");
var botApiProcess = Process.Start(startInfo);
if (botApiProcess == null) {
Log.Warning("telegram-bot-api 进程启动失败");
} else {
childProcessManager.AddProcess(botApiProcess);
Log.Information("telegram-bot-api 已启动,等待端口 {Port} 就绪...", Env.LocalBotApiPort);
await WaitForLocalBotApiReady(Env.LocalBotApiPort);
}
Suggested change
var startInfo = new ProcessStartInfo {
FileName = botApiExePath,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true
};
startInfo.ArgumentList.Add("--local");
startInfo.ArgumentList.Add($"--api-id={Env.TelegramBotApiId}");
startInfo.ArgumentList.Add($"--api-hash={Env.TelegramBotApiHash}");
startInfo.ArgumentList.Add($"--dir={botApiDataDir}");
startInfo.ArgumentList.Add($"--http-port={Env.LocalBotApiPort}");
var botApiProcess = Process.Start(startInfo);
if (botApiProcess == null) {
Log.Warning("telegram-bot-api 进程启动失败");
} else {
childProcessManager.AddProcess(botApiProcess);
Log.Information("telegram-bot-api 已启动,等待端口 {Port} 就绪...", Env.LocalBotApiPort);
await WaitForLocalBotApiReady(Env.LocalBotApiPort);
}
var startInfo = new ProcessStartInfo {
FileName = botApiExePath,
UseShellExecute = false,
RedirectStandardOutput = true,
RedirectStandardError = true
};
startInfo.ArgumentList.Add("--local");
startInfo.ArgumentList.Add($"--api-id={Env.TelegramBotApiId}");
startInfo.ArgumentList.Add($"--api-hash={Env.TelegramBotApiHash}");
startInfo.ArgumentList.Add($"--dir={botApiDataDir}");
startInfo.ArgumentList.Add($"--http-port={Env.LocalBotApiPort}");
var botApiProcess = Process.Start(startInfo);
if (botApiProcess == null) {
Log.Warning("telegram-bot-api 进程启动失败");
} else {
// Consume stdout/stderr to prevent buffer blocking
botApiProcess.OutputDataReceived += (s, e) => { if (e.Data != null) Log.Debug("[bot-api stdout] {Line}", e.Data); };
botApiProcess.ErrorDataReceived += (s, e) => { if (e.Data != null) Log.Debug("[bot-api stderr] {Line}", e.Data); };
botApiProcess.BeginOutputReadLine();
botApiProcess.BeginErrorReadLine();
childProcessManager.AddProcess(botApiProcess);
Log.Information("telegram-bot-api 已启动,等待端口 {Port} 就绪...", Env.LocalBotApiPort);
await WaitForLocalBotApiReady(Env.LocalBotApiPort);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/AppBootstrap/GeneralBootstrap.cs` around lines 108 - 126,
The ProcessStartInfo currently sets RedirectStandardOutput and
RedirectStandardError to true but never consumes those streams, which can cause
the child process to hang; fix by either disabling redirection (set
RedirectStandardOutput = false and RedirectStandardError = false on the
ProcessStartInfo used to start the bot API) or, if you need logs, consume the
streams asynchronously after Process.Start by attaching asynchronous readers to
botApiProcess.StandardOutput and botApiProcess.StandardError (or use
BeginOutputReadLine/BeginErrorReadLine or Task-based reads) and forward them to
Log before calling childProcessManager.AddProcess and awaiting
WaitForLocalBotApiReady so buffers never fill.

Comment on lines +74 to +79
var engine = await GetOCREngineAsync();
var ocrService = _ocrServices.FirstOrDefault(s => s.Engine == engine)
?? _ocrServices.First(s => s.Engine == OCREngine.PaddleOCR);

logger.LogInformation($"使用OCR引擎: {engine}");
OcrStr = await ocrService.ExecuteAsync(new MemoryStream(PhotoStream));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep LLM OCR off the update-handler path.

With OCREngine.LLM, this controller now awaits LLMOCRService.ExecuteAsync(...), which does image conversion, temp-file I/O, and the LLM call inline. That makes ExecuteAsync a long-running handler and can stall the controller pipeline for other updates. Dispatch the LLM OCR through the same subprocess/scheduler boundary as PaddleOCR instead of doing it inside IOnUpdate. As per coding guidelines, "OCR, ASR, and QR code processing must be delegated to independent subprocess services launched via AppBootstrap" and "Long-running operations like OCR and downloads must be placed in Service/Scheduler or separate processes, not in IOnUpdate handlers".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs` around lines 74 -
79, The controller currently awaits LLM OCR inline (GetOCREngineAsync ->
selecting _ocrServices where Engine == OCREngine.LLM and calling
LLMOCRService.ExecuteAsync), causing long-running work inside the IOnUpdate
path; change the flow so the controller does not run LLM OCR synchronously:
detect when GetOCREngineAsync returns OCREngine.LLM and instead enqueue or
dispatch the image to the existing subprocess/scheduler service used for
PaddleOCR (or the AppBootstrap-launched OCR subprocess) rather than calling
LLMOCRService.ExecuteAsync directly; keep synchronous/fast handling for update
handlers by selecting the local OCR service (e.g., PaddleOCR) for immediate
ExecuteAsync and push LLM requests to the background scheduler/service (use the
project’s scheduler enqueue method or subprocess launcher used elsewhere) so
IOnUpdate never performs image conversion/temp-file I/O or LLM network calls
inline.

Comment on lines +18 to +20
protected readonly AdminService AdminService;
protected readonly EditOCRConfService EditOCRConfService;
protected readonly EditOCRConfView EditOCRConfView;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't keep a mutable view instance on the controller.

TelegramSearchBot/Executor/ControllerExecutor.cs:19-37 reuses controller instances, but this controller stores EditOCRConfView as a field and mutates its _chatId/_replyToMessageId/_messageText per request. Two overlapping admin updates can overwrite each other and send the response to the wrong chat/message. Make the view stateless (Render(chatId, replyTo, message)) or create a fresh instance per execution. As per coding guidelines, "Controllers must be stateless; delegate all persistent operations to Service layer classes".

Also applies to: 22-30, 49-55

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Controller/Manage/EditOCRConfController.cs` around lines 18
- 20, EditOCRConfController currently holds a mutable EditOCRConfView instance
that gets reused by ControllerExecutor, causing cross-request state corruption;
remove the view field from EditOCRConfController and either (A) change
EditOCRConfView to be stateless by replacing instance
fields/_chatId/_replyToMessageId/_messageText with a Render(chatId,
replyToMessageId, messageText) method, or (B) instantiate a new EditOCRConfView
inside the controller execution path each request before calling its render/send
methods; update all places in EditOCRConfController that read or mutate
EditOCRConfView (e.g., where you set chatId/replyToMessageId/messageText) to use
the stateless Render signature or the fresh instance, ensuring the controller
remains stateless as required by ControllerExecutor.

Comment on lines +30 to +35
var tempPath = Path.GetTempFileName() + ".jpg";

try {
using (var fs = new FileStream(tempPath, FileMode.Create, FileAccess.Write)) {
tg_img_data.SaveTo(fs);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In .NET, does Path.GetTempFileName() create a file on disk, and what is the recommended way to generate a temporary filename with a custom .jpg extension without leaving the original temp file behind?

💡 Result:

Yes, Path.GetTempFileName creates a uniquely named, zero-byte temporary file on disk with a .tmp extension in the user's temporary folder (Path.GetTempPath). The recommended way to generate a temporary filename with a custom .jpg extension without creating the original temp file is to use Path.GetTempPath combined with Path.GetRandomFileName (which generates a random filename string without creating a file) and change the extension to .jpg: string tempJpgPath = Path.Combine(Path.GetTempPath, Path.ChangeExtension(Path.GetRandomFileName, ".jpg")); This produces a cryptographically strong random name (e.g., "a1b2c3d4.e5f6.jpg") that is statistically unique, avoiding collisions and unnecessary file creation. When ready to use it, create the file yourself (e.g., via File.Create) and delete it after use, as temp files are not auto-deleted. Alternative using GUID (slightly less random but unique): string tempJpgPath = Path.Combine(Path.GetTempPath, Guid.NewGuid.ToString + ".jpg");

Citations:


🏁 Script executed:

cat -n TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs | head -50

Repository: ModerRAS/TelegramSearchBot

Length of output: 2305


Don't leak a temp file on every OCR request.

Path.GetTempFileName() creates a file on disk immediately. Appending .jpg to that path writes the image to a separate file while leaving the original .tmp file behind, steadily growing the temp directory under load.

Fix
-                var tempPath = Path.GetTempFileName() + ".jpg";
+                var tempPath = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid():N}.jpg");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var tempPath = Path.GetTempFileName() + ".jpg";
try {
using (var fs = new FileStream(tempPath, FileMode.Create, FileAccess.Write)) {
tg_img_data.SaveTo(fs);
}
var tempPath = Path.Combine(Path.GetTempPath(), $"{Guid.NewGuid():N}.jpg");
try {
using (var fs = new FileStream(tempPath, FileMode.Create, FileAccess.Write)) {
tg_img_data.SaveTo(fs);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs` around lines 30 - 35, The
current use of Path.GetTempFileName() + ".jpg" creates an unused .tmp file each
request; change the tempPath generation in LLMOCRService (tempPath variable) to
create a unique JPG path without pre-creating a file (e.g.,
Path.Combine(Path.GetTempPath(), Guid.NewGuid().ToString() + ".jpg") or
Path.GetRandomFileName() + ".jpg"), write the image to that path in your
existing FileStream block, and ensure you delete the tempPath in a
finally/cleanup block after processing to avoid leaking temp files.

Comment on lines +37 to +38
_logger.LogInformation("正在使用LLM进行OCR识别...");
var result = await _generalLLMService.AnalyzeImageAsync(tempPath, 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pass the real chat id into the LLM call.

TelegramSearchBot/Interface/AI/LLM/IGeneralLLMService.cs:23-24 defines the second AnalyzeImageAsync argument as ChatId. Hardcoding 0 here drops the originating chat context, so any chat-scoped LLM routing/configuration inside GeneralLLMService is bypassed. Please thread e.Message.Chat.Id through IOCRService.ExecuteAsync(...) and use it here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs` around lines 37 - 38, The
call to _generalLLMService.AnalyzeImageAsync(tempPath, 0) discards the
originating chat context; update IOCRService.ExecuteAsync to accept a chatId
parameter (thread e.Message.Chat.Id from the caller), propagate that chatId into
LLMOCRService (or wherever ExecuteAsync is implemented), and replace the
hardcoded 0 with the passed chatId when calling
_generalLLMService.AnalyzeImageAsync(tempPath, chatId) so chat-scoped
routing/configuration in GeneralLLMService receives the correct ChatId.

Comment on lines +60 to +63
if (command == "退出" || command == "返回") {
await redis.DeleteKeysAsync();
return (true, "已退出OCR配置");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

返回 command behavior conflicts with the view prompt

Line 187 tells users “输入返回返回主菜单”, but Line 60 currently treats 返回 as exit and clears state. This creates a misleading UX and can terminate the flow unexpectedly.

💡 Suggested fix
- if (command == "退出" || command == "返回") {
+ if (command == "退出") {
     await redis.DeleteKeysAsync();
     return (true, "已退出OCR配置");
 }
- sb.AppendLine("输入\"返回\"返回主菜单");
+ sb.AppendLine("输入任意内容返回主菜单");

Also applies to: 187-190

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs` around lines 60 - 63,
The handler currently treats both "退出" and "返回" the same and clears state via
redis.DeleteKeysAsync(), which conflicts with the UI prompt that says
输入`返回`返回主菜单; update the logic so only the "退出" command triggers
redis.DeleteKeysAsync() and returns the exit message, while "返回" should return
(true, "返回主菜单") or equivalent without clearing state; adjust any related
prompt/response text (the view prompt that mentions 输入`返回`返回主菜单) to remain
consistent with this new behavior.

Comment on lines +70 to +75
if (_stateHandlers.TryGetValue(currentState, out var handler)) {
return await handler(redis, command);
}

return (false, string.Empty);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unknown Redis state should self-recover instead of returning empty response

At Line 74, returning (false, string.Empty) on an unrecognized state gives no user guidance and can dead-end the conversation. Resetting to MainMenu is safer.

💡 Suggested fix
 if (_stateHandlers.TryGetValue(currentState, out var handler)) {
     return await handler(redis, command);
 }
 
-return (false, string.Empty);
+await redis.SetStateAsync(OCRConfState.MainMenu.GetDescription());
+return await HandleMainMenuAsync(redis, command);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs` around lines 70 - 75,
The code currently returns (false, string.Empty) when currentState is not found
in _stateHandlers, which leaves the conversation dead-ended; instead reset the
conversation state to MainMenu and return a friendly message. In the else branch
where _stateHandlers.TryGetValue(currentState, out var handler) is false, write
the new state ("MainMenu") back to Redis (or call the existing state setter used
elsewhere), ensure any in-memory state is updated, and return (true, "<brief
prompt for main menu>") or similar so the user is guided back to MainMenu;
reference _stateHandlers, currentState, handler and MainMenu when making the
change.

Comment on lines +77 to +91
private async Task<(bool, string)> HandleMainMenuAsync(EditOCRConfRedisHelper redis, string command) {
var sb = new StringBuilder();
sb.AppendLine("🔧 OCR配置");
sb.AppendLine();

var currentEngine = await GetCurrentEngineAsync();
sb.AppendLine($"当前OCR引擎: {currentEngine}");
sb.AppendLine();
sb.AppendLine("请选择操作:");
sb.AppendLine("1. 切换OCR引擎");
sb.AppendLine("2. 查看配置详情");
sb.AppendLine("3. 返回");

return (true, sb.ToString());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Main menu never transitions state, so config flow cannot proceed

At Line 77, the handler ignores command and always re-renders the menu. Inputs like 1/2 never move to SelectingEngine/ViewingConfig, making those flows unreachable from normal interaction.

💡 Suggested fix
 private async Task<(bool, string)> HandleMainMenuAsync(EditOCRConfRedisHelper redis, string command) {
+    if (command == "1" || command == "切换OCR引擎") {
+        await redis.SetStateAsync(OCRConfState.SelectingEngine.GetDescription());
+        return (true, "请选择OCR引擎:\n1. PaddleOCR\n2. LLM");
+    }
+
+    if (command == "2" || command == "查看配置详情") {
+        await redis.SetStateAsync(OCRConfState.ViewingConfig.GetDescription());
+        return await HandleViewingConfigAsync(redis, command);
+    }
+
     var sb = new StringBuilder();
     sb.AppendLine("🔧 OCR配置");
     sb.AppendLine();
     
     var currentEngine = await GetCurrentEngineAsync();
     sb.AppendLine($"当前OCR引擎: {currentEngine}");
     sb.AppendLine();
     sb.AppendLine("请选择操作:");
     sb.AppendLine("1. 切换OCR引擎");
     sb.AppendLine("2. 查看配置详情");
     sb.AppendLine("3. 返回");
 
     return (true, sb.ToString());
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Service/Manage/EditOCRConfService.cs` around lines 77 - 91,
HandleMainMenuAsync currently ignores the incoming command and always re-renders
the menu; update it to parse the command and set the next interaction state on
the EditOCRConfRedisHelper so the flow can transition (e.g., set redis.State to
SelectingEngine when command == "1", to ViewingConfig when command == "2", and
handle "3" as return/exit). Keep returning the menu text when command is empty
or invalid, but when a valid option is chosen return the appropriate state
change (and a brief confirmation message) so downstream handlers
(SelectingEngine/ViewingConfig) are reachable; reference the HandleMainMenuAsync
method, the redis parameter, and the SelectingEngine/ViewingConfig state names
when making the change.

Test User added 2 commits March 19, 2026 12:26
Fix whitespace formatting issues detected by CI on Windows
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs (1)

74-79: ⚠️ Potential issue | 🟠 Major

Keep LLM OCR off the IOnUpdate path.

When GetOCREngineAsync() resolves to OCREngine.LLM, this handler now awaits the full OCR pipeline inline. That puts image conversion, temp-file I/O, and the LLM call on the update path and can stall other updates. Route the LLM branch through the existing subprocess/scheduler boundary instead of calling ExecuteAsync() here.

As per coding guidelines, "OCR, ASR, and QR code processing must be delegated to independent subprocess services launched via AppBootstrap" and "Long-running operations like OCR and downloads must be placed in Service/Scheduler or separate processes, not in IOnUpdate handlers".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs` around lines 74 -
79, The handler currently calls GetOCREngineAsync() and then invokes
ocrService.ExecuteAsync inline, which blocks IOnUpdate when engine ==
OCREngine.LLM; change the flow so that if GetOCREngineAsync() returns
OCREngine.LLM you do not call ExecuteAsync here but instead enqueue the OCR job
through the existing subprocess/scheduler boundary (use the app's
subprocess/service bootstrap or scheduler API used elsewhere) and return
immediately, while keeping the existing inline ExecuteAsync behavior only for
non-LLM engines; update references around GetOCREngineAsync, OCREngine.LLM,
_ocrServices, and ExecuteAsync to implement this routing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs`:
- Around line 75-78: The code in AutoOCRController is fragile: wrap the config
read (_configService.GetConfigurationValueAsync) in a try/catch and treat
failures as a missing/invalid engine so OCR selection continues; when selecting
the OCR service, use _ocrServices.FirstOrDefault(s => s.Engine == engine) ??
_ocrServices.FirstOrDefault(s => s.Engine == OCREngine.PaddleOCR) and if that
returns null throw a clear InvalidOperationException indicating no OCR
implementations are registered; finally change the log to emit the actual
selected engine (logger.LogInformation($"使用OCR引擎: {ocrService.Engine}")) instead
of the requested value; apply the same changes to the second selection block
around lines 116-121 in AutoOCRController.

---

Duplicate comments:
In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs`:
- Around line 74-79: The handler currently calls GetOCREngineAsync() and then
invokes ocrService.ExecuteAsync inline, which blocks IOnUpdate when engine ==
OCREngine.LLM; change the flow so that if GetOCREngineAsync() returns
OCREngine.LLM you do not call ExecuteAsync here but instead enqueue the OCR job
through the existing subprocess/scheduler boundary (use the app's
subprocess/service bootstrap or scheduler API used elsewhere) and return
immediately, while keeping the existing inline ExecuteAsync behavior only for
non-LLM engines; update references around GetOCREngineAsync, OCREngine.LLM,
_ocrServices, and ExecuteAsync to implement this routing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 68887bb4-edea-4c7d-9ee0-ea1cc6fbb3d3

📥 Commits

Reviewing files that changed from the base of the PR and between b8431f8 and 0ce76ac.

📒 Files selected for processing (23)
  • TelegramSearchBot.Common/Attributes/McpAttributes.cs
  • TelegramSearchBot.Common/Model/Tools/IterationLimitReachedPayload.cs
  • TelegramSearchBot.Database/Migrations/20260303031828_AddUserWithGroupUniqueIndex.cs
  • TelegramSearchBot.Database/Migrations/20260313124507_AddChannelWithModelIsDeleted.cs
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/GeneralLLMServiceTests.cs
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/ModelCapabilityServiceTests.cs
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/OpenAIProviderHistorySerializationTests.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/GeminiService.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/McpToolHelper.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/OllamaService.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs
  • TelegramSearchBot.LLM/Service/Mcp/McpClient.cs
  • TelegramSearchBot.LLM/Service/Mcp/McpServerManager.cs
  • TelegramSearchBot.LLM/Service/Tools/FileToolService.cs
  • TelegramSearchBot.Test/Helper/WordCloudHelperTests.cs
  • TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs
  • TelegramSearchBot/Extension/ServiceCollectionExtension.cs
  • TelegramSearchBot/Model/AI/OCRConfState.cs
  • TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs
  • TelegramSearchBot/Service/BotAPI/SendMessageService.Streaming.cs
  • TelegramSearchBot/Service/Manage/EditOCRConfService.cs
  • TelegramSearchBot/Service/Storage/MessageService.cs
  • TelegramSearchBot/Service/Vector/FaissVectorService.cs
✅ Files skipped from review due to trivial changes (20)
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/OpenAIProviderHistorySerializationTests.cs
  • TelegramSearchBot.Common/Model/Tools/IterationLimitReachedPayload.cs
  • TelegramSearchBot.Common/Attributes/McpAttributes.cs
  • TelegramSearchBot/Service/BotAPI/SendMessageService.Streaming.cs
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/GeneralLLMServiceTests.cs
  • TelegramSearchBot/Extension/ServiceCollectionExtension.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/GeminiService.cs
  • TelegramSearchBot.Database/Migrations/20260303031828_AddUserWithGroupUniqueIndex.cs
  • TelegramSearchBot/Service/Vector/FaissVectorService.cs
  • TelegramSearchBot.LLM/Service/Mcp/McpClient.cs
  • TelegramSearchBot.LLM/Service/Mcp/McpServerManager.cs
  • TelegramSearchBot.LLM.Test/Service/AI/LLM/ModelCapabilityServiceTests.cs
  • TelegramSearchBot.Database/Migrations/20260313124507_AddChannelWithModelIsDeleted.cs
  • TelegramSearchBot.Test/Helper/WordCloudHelperTests.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/OpenAIService.cs
  • TelegramSearchBot.LLM/Service/Tools/FileToolService.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/McpToolHelper.cs
  • TelegramSearchBot.LLM/Service/AI/LLM/OllamaService.cs
  • TelegramSearchBot/Model/AI/OCRConfState.cs
  • TelegramSearchBot/Service/Storage/MessageService.cs
🚧 Files skipped from review as they are similar to previous changes (2)
  • TelegramSearchBot/Service/AI/OCR/LLMOCRService.cs
  • TelegramSearchBot/Service/Manage/EditOCRConfService.cs

Comment on lines +75 to +78
var ocrService = _ocrServices.FirstOrDefault(s => s.Engine == engine)
?? _ocrServices.First(s => s.Engine == OCREngine.PaddleOCR);

logger.LogInformation($"使用OCR引擎: {engine}");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden the fallback path and log the engine actually used.

This only falls back when the stored value is empty or invalid. If _configService.GetConfigurationValueAsync(...) throws, the handler still fails before OCR starts, and Line 76 still throws if PaddleOCR is not registered. In the fallback case, Line 78 also logs the requested engine instead of the selected service. That regresses the PR’s “default PaddleOCR” behavior under transient config or DI issues.

🛠️ Suggested hardening
-                var engine = await GetOCREngineAsync();
-                var ocrService = _ocrServices.FirstOrDefault(s => s.Engine == engine)
-                    ?? _ocrServices.First(s => s.Engine == OCREngine.PaddleOCR);
-
-                logger.LogInformation($"使用OCR引擎: {engine}");
+                var engine = await GetOCREngineAsync();
+                var ocrService = _ocrServices.FirstOrDefault(s => s.Engine == engine);
+                if (ocrService is null) {
+                    logger.LogWarning(
+                        "Configured OCR engine {ConfiguredEngine} is unavailable. Falling back to PaddleOCR.",
+                        engine);
+                    ocrService = _ocrServices.FirstOrDefault(s => s.Engine == OCREngine.PaddleOCR)
+                        ?? throw new InvalidOperationException("No PaddleOCR service is registered.");
+                }
+
+                logger.LogInformation("使用OCR引擎: {Engine}", ocrService.Engine);
                 OcrStr = await ocrService.ExecuteAsync(new MemoryStream(PhotoStream));
@@
-        private async Task<OCREngine> GetOCREngineAsync() {
-            var engineStr = await _configService.GetConfigurationValueAsync(EditOCRConfService.OCREngineKey);
-            if (!string.IsNullOrEmpty(engineStr) && Enum.TryParse<OCREngine>(engineStr, out var engine)) {
-                return engine;
-            }
-            return OCREngine.PaddleOCR;
-        }
+        private async Task<OCREngine> GetOCREngineAsync() {
+            try {
+                var engineStr = await _configService.GetConfigurationValueAsync(EditOCRConfService.OCREngineKey);
+                if (!string.IsNullOrEmpty(engineStr) &&
+                    Enum.TryParse<OCREngine>(engineStr, ignoreCase: true, out var engine)) {
+                    return engine;
+                }
+            } catch (Exception ex) {
+                logger.LogWarning(ex, "Failed to load OCR engine configuration. Falling back to PaddleOCR.");
+            }
+
+            return OCREngine.PaddleOCR;
+        }

Also applies to: 116-121

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@TelegramSearchBot/Controller/AI/OCR/AutoOCRController.cs` around lines 75 -
78, The code in AutoOCRController is fragile: wrap the config read
(_configService.GetConfigurationValueAsync) in a try/catch and treat failures as
a missing/invalid engine so OCR selection continues; when selecting the OCR
service, use _ocrServices.FirstOrDefault(s => s.Engine == engine) ??
_ocrServices.FirstOrDefault(s => s.Engine == OCREngine.PaddleOCR) and if that
returns null throw a clear InvalidOperationException indicating no OCR
implementations are registered; finally change the log to emit the actual
selected engine (logger.LogInformation($"使用OCR引擎: {ocrService.Engine}")) instead
of the requested value; apply the same changes to the second selection block
around lines 116-121 in AutoOCRController.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants