# CapsWriter-Offline 独立热词与纠错系统 (Portable Version)

本 Notebook 整合了音素处理 (algo_phoneme)、FastRAG 加速检索 (rag_fast)、拼音纠错 (PhonemeCorrector)、规则纠错 (RuleCorrector) 和纠错历史 RAG (RectificationRAG) 的完整逻辑。

**依赖安装：**
```bash
pip install pypinyin numba numpy
```

In [None]:
# --- A. 数据准备 ---

hotwords_data = """
    Claude
    Bilibili
    Microsoft
    买当劳
    肯德基
    # 这是一个注释
    VsCode
    VsCodes
"""
    
rectify_data = """
# 纠错历史演示
把那个锯子给我
把那个句子给我
---
cloud code is good
Claude Code is good
---
今天天其不错
今天天气不错
"""


test_cases_text = """
我想去吃买当劳和肯得鸡
Hello klaude
喜欢刷Bili Bili
请把那个锯子发给我一下
今天天及真的很好
I think klaud code is very good
"""
cases = [l.strip() for l in test_cases_text.strip().split('\n') if l.strip()]

In [None]:
# --- B. 系统初始化与数据加载 ---

# 初始化纠错器和检索器
corrector = PhonemeCorrector(threshold=0.8)
rectifier = RectificationRAG(threshold=0.5)

# 从字符串加载热词
corrector.update_hotwords(hotwords_data)
rectifier.load_rectify_text(rectify_data)

# 从文本文件加载热词
# corrector.load_hotwords_file("hot.txt")
# rectifier.load_rectify_file("hot-rectify.txt")

In [None]:
# --- C. 执行综合纠错演示 ---
print("\n" + "="*50)
print("【 CapsWriter-Offline 综合纠错系统演示 】")
print("="*50)

for i, t in enumerate(cases):
    print(f"\nCase {i+1}: '{t}'")
    res, matched, similars = corrector.correct(t)
    print(f"  [纠错结果] {res}")
    if matched: print(f"  [匹配热词] {matched}")
    if similars: print(f"  [相似推荐] {similars}")
    rag_results = rectifier.search(t)
    if rag_results:
        print(f"  [RAG 相似历史]")
        for wrong, right, score in rag_results:
            print(f"    - '{wrong}' => '{right}' (相似度: {score:.3f})")