Knowledge-enhanced multimodal agent for fire-safety hazard image analysis.
FireAgent 是一个面向消防安全巡检的多模态智能体系统。它不是简单地把图片丢给大模型生成结论,而是将 视觉事实抽取、消防法规 RAG、证据树检索、开放集守护与四层记忆机制 串成一条可解释、可审计、可持续治理的安全分析链路。
一句话概括:先看清现场,再检索依据,最后基于证据给出风险判断和整改建议。
| Dimension | Design |
|---|---|
| Task | Fire-safety hazard image analysis |
| Core method | Two-hop inference + evidence-tree RAG + open-set guardrail |
| Knowledge base | Structured fire-safety rules, scene guides and production-safety playbooks |
| Backend | FastAPI, SQLAlchemy, SQLite, modular services |
| Frontend | Static Web demo and WeChat Mini Program |
| Evaluation | Retrieval ablation, strong baselines, open-set guardrail benchmark |
| Output | Risk level, hazard explanation, cited evidence, corrective actions and history |
消防隐患识别不是普通图像分类任务。真实巡检场景中,一张图片可能同时包含通道堵塞、电气隐患、可燃物堆放、设备遮挡等多类风险;如果直接让大模型自由回答,容易出现无依据判断、幻觉引用、遗漏隐患和过度自信等问题。
FireAgent 的核心目标是把“大模型能看图”升级为“系统能基于证据做安全判断”:
- Evidence-first:输出风险结论前必须先召回规则证据。
- Safety-aware:证据不足时触发开放集守护,避免强行下结论。
- Scene-grounded:根据校园、住宅、施工、工业等场景动态路由规则。
- Traceable:保留查询、候选规则、阈值、守护状态和历史记录。
- Deployable:提供 Web 页面、微信小程序端和 FastAPI 后端原型。
| Capability | Description |
|---|---|
| Multimodal inspection | Upload a fire-safety scene image and extract visual facts. |
| Evidence-grounded reasoning | Retrieve fire-safety rules before producing risk conclusions. |
| Scene-aware retrieval | Route queries by campus, residential, industrial and construction contexts. |
| Conservative refusal | Refuse or ask for human review when evidence is weak. |
| Memory and governance | Track historical cases, recurring hazards, open tasks and risk trends. |
| Debuggable pipeline | Return retrieval queries, evidence IDs, guardrail decisions and latency fields. |
系统将图片分析拆成两个阶段:
Stage 1: Visual Fact Extraction
Y1 = g_vlm(I, s, u) = {o, h, k, r_h}
Stage 2: Evidence-Constrained Reasoning
Y2 = g_llm(Y1, E) = {r, Z, A}
Y1只负责抽取现场事实、候选隐患和检索关键词。E来自消防规则知识库和检索增强模块。Y2基于证据生成风险等级、隐患结论和整改建议。
这种设计将“看到了什么”和“是否构成隐患”解耦,降低多模态模型的自由发挥空间。
FireAgent 使用面向消防规则的混合检索策略,融合关键词匹配、向量相似度、Dense 双塔语义索引和场景化加权:
S_base(d,q) = alpha * S_lex + beta * S_vec + gamma * S_dense
S(d,q) = S_base + b_scene + b_path + b_hint
检索查询由三类路径组成:
- 主查询:由图片观察事实和关键词生成。
- 场景查询:由宿舍、楼道、工业、施工等场景模板生成。
- 隐患查询:由电气、堵塞、烟火、疏散等隐患类型生成。
为了避免单一查询漏召回,系统将检索过程建模为证据树:
Score_tree(d) = sum_{p in P(d)} w_p * s_p(d)
同一条规则如果被主查询、场景查询和隐患查询多路径命中,会获得更高可信度。证据树让规则召回从“单点关键词命中”变成“多视角证据汇聚”。
最终证据不是简单取 Top-k,而是选择一组互补证据:
Delta(d | E)
= w_R * R(d)
+ w_N * novelty(d, E)
+ w_sD * source_diversity(d, E)
+ w_hD * hazard_coverage(d, E)
该目标同时考虑相关性、新颖性、来源多样性和隐患覆盖,避免多条相似法规重复占据证据列表。
复杂图片需要更多规则证据,简单图片不应引入过多噪声。FireAgent 根据查询长度、隐患数量、工业场景和冲突线索估计检索预算:
C(q) = lambda_1 * len(q)
+ lambda_2 * |h|
+ lambda_3 * I(scene = industrial)
+ lambda_4 * conflict(h)
K(q) = K_min + eta * sigmoid(C(q))
消防安全场景中,“错误自信”比“不回答”更危险。系统在证据不足时触发保守策略:
Guard(q) = I(n_cand < tau_n or s_top < tau_s or cover < tau_c)
守护机制从证据数量、最高证据分数和隐患覆盖度三个角度判断是否应该继续生成。
FireAgent 将巡检过程建模为持续治理任务,而不是一次性识别:
- Core Memory:系统级安全边界与输出规范。
- Task Memory:当前巡检任务目标、场景和状态。
- Short-Term Memory:会话内上下文、追问与工具调用轨迹。
- Long-Term Memory:历史案例、复发风险、整改任务和风险画像。
更多算法细节见 docs/ALGORITHMS.md。
flowchart LR
U["Web / WeChat Mini Program"] --> API["FastAPI Service"]
API --> A["FireAgent Orchestrator"]
A --> V["Stage1 VLM<br/>visual facts"]
V --> Q["Query Builder<br/>main / scene / hazard"]
Q --> R["Hybrid Retriever<br/>lexical + vector + dense"]
R --> T["Evidence Tree<br/>path aggregation"]
T --> S["Set Selection<br/>diversity + coverage"]
S --> G["Open-Set Guardrail"]
G --> L["Stage2 LLM<br/>evidence-constrained reasoning"]
L --> DB["SQLite<br/>records / memory / cache"]
DB --> U
.
├── backend/ # FastAPI backend and experiments
│ ├── app/
│ │ ├── api/ # REST APIs
│ │ ├── core/ # config, CORS, settings
│ │ ├── data/ # fire rules and scene playbooks
│ │ ├── db/ # SQLAlchemy models and CRUD
│ │ └── services/ # analyzer, retriever, guardrail, memory
│ ├── eval_*.py # retrieval / guardrail / baseline evaluation
│ └── requirements.txt
├── web/ # static Web demo
├── pages/ utils/ app.* # WeChat mini program entry
├── docs/ # algorithm notes and technical documentation
The project includes reproducible retrieval and guardrail evaluation scripts under backend/.
| Setting | Dataset | Top-1 | Top-3 | MRR |
|---|---|---|---|---|
| Dynamic retrieval | dev-650 | 0.5092 | 0.9123 | 0.7019 |
| Set selection | dev-650 | 0.5092 | 0.9000 | 0.7047 |
| Evidence tree | dev-650 | 0.5938 | 0.9015 | 0.7447 |
| Dynamic retrieval | blind-650 | 0.4554 | 0.8769 | 0.6555 |
| Set selection | blind-650 | 0.4554 | 0.8662 | 0.6560 |
| Evidence tree | blind-650 | 0.5338 | 0.8585 | 0.6899 |
Open-set guardrail evaluation:
| Guardrail | Samples | Refusal Precision | Refusal Recall | False Accept |
|---|---|---|---|---|
| hybrid_local | 200 | 1.000 | 0.290 | 142 |
| strict_guardrail | 200 | 1.000 | 0.365 | 127 |
Interpretation:
- Evidence tree improves Top-1 by about 8.46 points on the dev set and 7.84 points on the blind set compared with dynamic retrieval.
- Guardrail precision is high, but refusal recall remains an important future-work direction.
- The benchmark focuses on rule retrieval and evidence grounding; the system is a research prototype rather than a certified fire-safety product.
cd backend
pip install -r requirements.txt
# Optional: configure real multimodal / text model calls.
$env:SILICONFLOW_API_KEY="your_api_key"
uvicorn app.main:app --host 0.0.0.0 --port 8010 --reloadUseful endpoints:
- API docs:
http://127.0.0.1:8010/docs - Health check:
http://127.0.0.1:8010/health - Web demo mounted by backend:
http://127.0.0.1:8010/web/
The static frontend lives in web/ and uses http://127.0.0.1:8010 as the default backend.
# Option 1: served by FastAPI after backend startup
http://127.0.0.1:8010/web/
# Option 2: standalone static server
python -m http.server 5500
http://127.0.0.1:5500/web/index.htmlOpen the repository root in WeChat DevTools. The mini program entry files are:
app.jsapp.jsonpages/utils/
Set utils/config.js to the reachable backend address, for example http://127.0.0.1:8010 or a LAN IP address during device debugging.
All versioned APIs are under /api/v1.
| Method | Endpoint | Description |
|---|---|---|
POST |
/analysis/upload |
Upload an image and run two-hop analysis |
GET |
/records |
Query historical analysis records |
GET |
/records/{record_id} |
Get one analysis detail |
GET |
/records/insights |
Aggregate recent risk trends |
POST |
/agent/chat |
Chat with the safety inspection agent |
GET |
/memory/snapshot |
Export four-layer memory snapshot |
GET |
/scene-guides |
Get scene-specific inspection guidance |
Representative scripts:
cd backend
# Retrieval ablation
python eval_ablation.py
# Open-set guardrail evaluation
python eval_open_set_guardrail.py
# Publication-style baselines
python eval_publication_baselines.py
python eval_publication_strong_baselines.pyFireAgent is a research prototype. Its outputs should be treated as auxiliary inspection suggestions, not as legally binding fire-safety certification or professional inspection conclusions.
This project is open-sourced under the MIT License.
FireAgent is a research prototype for multimodal fire-safety analysis. The MIT License covers the source code, while real-world fire-safety deployment still requires professional inspection, domain validation and local compliance review.