NLU 意图识别服务

重要声明，此项目为AI生成的代码，请谨慎使用

功能

Excel/CSV 上传训练（列名：text, intent）
Sentence-BERT 句向量 + 类中心相似度
阈值判定 OOS（非已知意图返回 null）
FastAPI 预测接口，返回 JSON

快速开始（Windows）

安装依赖：

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

启动服务：

uvicorn app.main:app --host 0.0.0.0 --port 8000

Web 页面：打开 http://localhost:8000/，可上传语料并手动触发训练。
- 可上传 .xlsx/.xls/.csv
- 模板（任选其一）：
  - 长表：data/template.csv（两列：text,intent）
  - 宽表：data/template_wide.csv（首行为意图名，列内为该意图语料）

模型下载与网络建议

默认模型：sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

优先从本地 models/paraphrase-multilingual-MiniLM-L12-v2 读取；若不存在将尝试联网下载。若你在受限网络环境下，推荐以下任一方式：

使用国内镜像（PowerShell 示例）

$env:HF_ENDPOINT = "https://hf-mirror.com"
pip install -U huggingface_hub -i https://pypi.tuna.tsinghua.edu.cn/simple

# 使用镜像站克隆模型仓库到本地 models/ 下
git clone https://hf-mirror.com/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 `models/paraphrase-multilingual-MiniLM-L12-v2`

# 若无 Git LFS，请先安装：https://git-lfs.com/
git lfs install

完成后，服务会自动从 models/paraphrase-multilingual-MiniLM-L12-v2 加载，无需联网。

完全离线运行（可选）

$env:TRANSFORMERS_OFFLINE = "1"
$env:HF_HUB_OFFLINE = "1"

离线TF‑IDF模式（无需下载模型）

若网络受限，可在训练页面的“模型名”填写：offline-tfidf

训练：使用 TF-IDF(1~2gram) 提取向量，计算类中心；
预测：计算与类中心的余弦相似度进行判定；
阈值机制与接口保持不变；
工件会额外保存 tfidf_vectorizer.joblib。

训练（接口方式，备选）：
- 准备 Excel（.xlsx），包含两列：text（语句），intent（意图名）。
- 通过 Swagger UI 上传：访问 http://localhost:8000/docs，调用 POST /train。
预测：

curl -X POST "http://localhost:8000/predict" ^
  -H "Content-Type: application/json" ^
  -d "{\"text\": \"我是本人\", \"threshold\": 0.55, \"top_k\": 3}"

接口说明

POST /train（multipart/form-data）
- file: Excel（.xlsx/.xls）
- options.threshold（可选）：覆盖默认阈值
- options.model_name（可选）：Sentence-BERT 模型名称
POST /predict（application/json）
- text 或 texts: 单条或批量文本
- threshold（可选）：覆盖训练时设置/默认阈值
- top_k（默认1）：返回前K个候选及分数
- 响应：若最高分 < 阈值，top_intent 为 null，passed_threshold=false。

工件

保存在 artifacts/：

centroids.npy：类中心向量
intents.txt：意图名称（与中心向量顺序一致）
meta.json：model_name 与 threshold

建议

阈值可从 0.5–0.7 网格搜索；根据验证集宏F1选择。
样本不均衡时，保证每个意图≥20 条更稳定；必要时增强或合并。

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
data		data
static		static
templates		templates
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLU 意图识别服务

功能

快速开始（Windows）

模型下载与网络建议

离线TF‑IDF模式（无需下载模型）

接口说明

工件

建议

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NLU 意图识别服务

功能

快速开始（Windows）

模型下载与网络建议

离线TF‑IDF模式（无需下载模型）

接口说明

工件

建议

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages