Skip to content

feat: add Firecrawl web search tools#7764

Merged
Soulter merged 7 commits intoAstrBotDevs:masterfrom
wjiajian:feat/firecrawl-web-search
Apr 26, 2026
Merged

feat: add Firecrawl web search tools#7764
Soulter merged 7 commits intoAstrBotDevs:masterfrom
wjiajian:feat/firecrawl-web-search

Conversation

@wjiajian
Copy link
Copy Markdown
Contributor

@wjiajian wjiajian commented Apr 24, 2026

Closes #7761

添加 Firecrawl 作为内置网页搜索提供商。

本 PR 在内置 Function Tool 网页搜索层接入 Firecrawl,并对齐现有 Tavily 的使用方式。用户可以在网页搜索提供商中选择 Firecrawl,用于网页搜索和指定 URL 页面内容提取。

Modifications / 改动点

  • 新增 web_search_firecrawl 内置网页搜索工具,使用 Firecrawl /v2/search

  • 新增 firecrawl_extract_web_page 内置页面内容提取工具,使用 Firecrawl /v2/scrape

  • 新增 provider_settings.websearch_firecrawl_key 配置项,支持多个 Firecrawl API Key 轮询。

  • 在内置网页搜索提供商配置选项中加入 firecrawl

  • 更新 Agent 网页搜索工具注入逻辑,选择 firecrawl 时同时注册搜索工具和页面提取工具。

  • 更新 Dashboard 配置元数据翻译,补充 Firecrawl API Key 文案。

  • 更新旧版 ChatUI 网页搜索结果解析逻辑,支持识别 web_search_firecrawl

  • 新增单元测试,覆盖 Firecrawl 工具注册、配置迁移、搜索参数映射、页面提取输出和 Agent 工具注入。

  • This is NOT a breaking change. / 这不是一个破坏性变更。

Screenshots or Test Results / 运行截图或测试结果


日志1
测试日志1
日志2
测试日志2
配置栏
普通配置-firecrawl
使用截图
img_v3_02112_ddf476ed-d66d-4c83-92bb-1fb6300f96fg


Checklist / 检查清单

  • 😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
    / 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过
    我在此issue提到了该功能[Feature]关于网页搜索的自定义引擎问题 #7761

  • 👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
    / 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”

  • 🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
    / 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到 requirements.txtpyproject.toml 文件相应位置。

  • 😮 My changes do not introduce malicious code.
    / 我的更改没有引入恶意代码。

Summary by Sourcery

Integrate Firecrawl as a built-in web search provider alongside existing engines, including both search and page-extraction capabilities, configuration, and agent/dashboard wiring.

New Features:

  • Add Firecrawl-based web search tool web_search_firecrawl backed by the Firecrawl Search API.
  • Add Firecrawl-based page extraction tool firecrawl_extract_web_page backed by the Firecrawl scrape API.
  • Expose firecrawl as a selectable web search provider in chat configuration with support for multiple API keys via provider_settings.websearch_firecrawl_key.
  • Enable old ChatUI to parse and render Firecrawl web search results in the same way as other web search tools.

Enhancements:

  • Update agent web search tool injection to register both Firecrawl search and extract tools when Firecrawl is selected as the provider.
  • Normalize legacy config for websearch_firecrawl_key to support list-based key rotation and align with other providers.
  • Ensure Firecrawl tools are registered and retrievable via the built-in function tool manager.
  • Add i18n metadata entries for Firecrawl API key configuration across supported locales.

Tests:

  • Add unit tests covering Firecrawl config migration, search parameter mapping, scrape output handling, builtin tool registration, and agent tool injection behavior.

@auto-assign auto-assign Bot requested review from advent259141 and anka-afk April 24, 2026 07:30
@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels Apr 24, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The Firecrawl search and scrape helpers duplicate the same API key/header/ClientSession setup logic; consider extracting a shared internal helper to reduce duplication and keep future changes (e.g., base URL or headers) in one place.
  • Both _firecrawl_search and _firecrawl_scrape raise a generic Exception for HTTP errors; using more specific exception types (e.g., a custom web-search error or RuntimeError/ValueError) would make it easier for callers to distinguish between configuration, network, and API-level failures.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The Firecrawl search and scrape helpers duplicate the same API key/header/ClientSession setup logic; consider extracting a shared internal helper to reduce duplication and keep future changes (e.g., base URL or headers) in one place.
- Both `_firecrawl_search` and `_firecrawl_scrape` raise a generic `Exception` for HTTP errors; using more specific exception types (e.g., a custom web-search error or `RuntimeError`/`ValueError`) would make it easier for callers to distinguish between configuration, network, and API-level failures.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates Firecrawl as a new web search and page extraction provider, adding the FirecrawlWebSearchTool and FirecrawlExtractWebPageTool. The changes span the core agent logic, configuration defaults, tool implementations, dashboard UI, and unit tests. Review feedback identifies a critical bug in the Firecrawl API response parsing that would lead to an AttributeError, the inclusion of unsupported search parameters (tbs), potential TypeErrors when handling null arguments, and opportunities to improve performance and code reuse by refactoring HTTP session management.

Comment thread astrbot/core/tools/web_search_tools.py Outdated
Comment thread astrbot/core/tools/web_search_tools.py Outdated
Comment thread astrbot/core/tools/web_search_tools.py Outdated
Comment thread astrbot/core/tools/web_search_tools.py Outdated
Comment thread astrbot/core/tools/web_search_tools.py
Copy link
Copy Markdown
Member

@Soulter Soulter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按原来那样创建新的 aiohttp.session 实例会更好一些,防止资源泄漏以及不使用websearch功能的用户可以避免创建模块级 session 实例。

Comment thread astrbot/core/tools/web_search_tools.py Outdated
@wjiajian wjiajian requested a review from Soulter April 25, 2026 10:41
@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 26, 2026
@Soulter Soulter merged commit 17aea1a into AstrBotDevs:master Apr 26, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]关于网页搜索的自定义引擎问题

2 participants