#### **1. 导入模块**

导入 Python 标准库、第三方库和本项目自定义库

In [1]:
# 标准库
import os
import sys

# 将上级目录加入系统路径
# 以便导入项目自定义库
sys.path.append(os.path.abspath('..'))

# 自定义库
from src.llm_client import LLMClient

#### **2. 加载模型**

In [2]:
# 指定模型名称
# 可选模型包括：
# qwen-flash, qwen-plus, qwen3-max, glm-4.7, deepseek-v3.2

# 加载模型前，请登录阿里云百炼平台：https://bailian.console.aliyun.com/
# 申请调用大模型服务的 API-Key
# 并在 config 文件中设置 LLM_API_KEY=sk-********

# 新注册用户可免费调用部分模型的 API
# 登录后可在模型服务页面查看免费模型列表

model = "deepseek-v3.2"

In [3]:
# 初始化大模型 API 接口
client = LLMClient(model)

#### **3. 非结构化输出**

非结构化提示词及输出结果

In [4]:
# 英文词性标注提示词
# 未指定大模型输出格式

text_prompt = """
You are a professional corpus linguist specialized in Part-of-Speech (POS) tagging for English text.

Your task is to perform POS tagging on the given English sentence.
First tokenize the sentence, then assign a POS tag to each token.

Guideline:
Use the Penn Treebank (PTB) tagset to annotate the given sentence.

Sentence: The Little Countess blinked three times.
"""

In [5]:
# 调用大模型 API
# 开始标注

# 注意：
# 为节省 API 调用成本，本项目将大模型生成内容保存于本地缓存 data/llm_cache
# 完成首次调用后，再次调用只需从本地数据库读取生成结果

# 若需测试 API 连接是否正常
# 可修改提示词后，重新标注

text_result = client.get_text_response(
    prompt=text_prompt,
)

# 输出大模型标注结果
print(text_result)

**Tokenization & POS Tagging (Penn Treebank Tagset):**

1. **The** – DT (Determiner)  
2. **Little** – JJ (Adjective)  
3. **Countess** – NN (Noun, singular or mass)  
4. **blinked** – VBD (Verb, past tense)  
5. **three** – CD (Cardinal number)  
6. **times** – NNS (Noun, plural)  
7. **.** – . (Sentence-final punctuation)  

**Tagged Sentence:**  
The/DT Little/JJ Countess/NN blinked/VBD three/CD times/NNS ./.

**Explanation:**  
- *The* is a definite article, tagged as DT.  
- *Little* modifies *Countess*, so it is an adjective (JJ).  
- *Countess* is a singular noun (NN).  
- *blinked* is a past-tense verb (VBD).  
- *three* is a cardinal number (CD).  
- *times* is a plural noun (NNS).  
- The period is sentence-final punctuation (.).


#### **4. 结构化输出**

结构化提示词及输出结果

In [6]:
# 英文词性标注提示词
# 要求大模型以 JSON 格式返回结果

json_prompt = """
You are a professional corpus linguist specialized in Part-of-Speech (POS) tagging for English text.

Your task is to perform POS tagging on the given English sentence.
First tokenize the sentence, then assign a POS tag to each token.

Guideline:
Use the Penn Treebank (PTB) tagset to annotate the given sentence.

Sentence: The Little Countess blinked three times.

Output format:
Return output in JSON format with the following fields:
- tokens: List of tokens (words and punctuation)
- pos_tags: List of POS tags (must correspond one-to-one with tokens)
"""

In [7]:
# 调用大模型 API
# 开始标注

json_result = client.get_json_response(
    prompt=json_prompt,
)

# 输出大模型标注结果
print(json_result)

{'tokens': ['The', 'Little', 'Countess', 'blinked', 'three', 'times', '.'], 'pos_tags': ['DT', 'JJ', 'NN', 'VBD', 'CD', 'NNS', '.']}
