# 生成 RAG SFT 数据集



## 1. 环境搭建
使用 langchain 对 PDF 进行分块，后期考虑用 MinerU 进行提取文本

In [None]:
%pip install -q langchain==0.1.10 pypdf pandas tqdm openai

## 2. PDF 准备
读取文件夹中的文件，提取文本，设置 chunk_size 和 chunk_overlap 进行分块

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders.pdf import PyPDFDirectoryLoader


# Load PDF documents from directory
loader = PyPDFDirectoryLoader("/root/app/langchain_data_gen/pdf")
# extract evey page of pdf
documents = loader.load()

# Use recursive character splitter, works better for this PDF data set
text_splitter = RecursiveCharacterTextSplitter(

    # Split documents into small chunks
    chunk_size = 500,

    # Overlap chunks to reduce cutting sentences in half
    chunk_overlap  = 100,
    separators=["\n\n", "\n", "。", "！", "？", "；"],

)

# Split loaded documents into chunks
docs = text_splitter.split_documents(documents)

In [None]:
documents

### 打印分块后的文档长度

In [52]:
avg_doc_length = lambda documents: sum([len(doc.page_content) for doc in documents])//len(documents)
avg_char_count_pre = avg_doc_length(documents)
avg_char_count_post = avg_doc_length(docs)

print(f'已加载 {len(documents)} 篇文档,平均每块字符数为 {avg_char_count_pre}。')
print(f'分割后共有 {len(docs)} 个文本块。') 
print(f'分割后的 {len(docs)} 个文本块平均字符数为 {avg_char_count_post}。')

已加载 278 篇文档,平均每块字符数为 496。
分割后共有 443 个文本块。
分割后的 443 个文本块平均字符数为 343。


### 自定义一个 OpenAI 类，用于调用 LLM 生成数据


In [17]:
from openai import OpenAI

class CustomOpenAI(OpenAI):
    def invoke(self, prompt):
        response = self.chat.completions.create(
            model="deepseek-chat",
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message.content

# 使用示例
api_key="sk-bba63edb9a2545a2b1b7567329c427"
base_url="https://api.deepseek.com"

llm = CustomOpenAI(api_key=api_key,base_url=base_url)
result = llm.invoke("你好")
print(result)

Hello! How can I assist you today?


In [18]:
llm.invoke("今天天气怎么样？")

'要获取今天的天气信息，您可以查看当地的天气预报或使用天气应用程序。通常，天气预报会提供温度、降水概率、风速和天气状况（如晴、多云、雨等）的详细信息。如果您能提供具体的城市或地区名称，我可以尝试为您查找相关的天气信息。'

## 3. 生成问题

根据上下文中提供的信息生成生成单个问题，可以要求 LLM 通过提示生成多个、按我们需求的问题


In [8]:
from langchain.prompts import PromptTemplate

# Create a prompt template to generate a question a end-user could have about a given context
initial_question_prompt_template = PromptTemplate(
    input_variables=["context"],
    template="""\
<Instructions>
Here is some context:
<context>
{context}
</context>

Your task is to generate 1 question that can be answered using the provided context, following these rules:

<rules>
1. The question should make sense to humans even when read without the given context.
2. The question should be fully answered from the given context.
3. The question should be framed from a part of context that contains important information. It can also be from tables, code, etc.
4. The answer to the question should not contain any links.
5. The question should be of moderate difficulty.
6. The question must be reasonable and must be understood and responded by humans.
7. Do not use phrases like 'provided context', etc. in the question.
8. Avoid framing questions using the word "and" that can be decomposed into more than one question.
9. The question should not contain more than 20 words, make use of abbreviations wherever possible.
10. The question should be in Chinese.
</rules>

To generate the question, first identify the most important or relevant part of the context. Then frame a question around that part that satisfies all the rules above.

Output only the generated question, no other text or characters.
</Instructions>

""")

def generate_question(doc, llm):

    # Pass in values to the input variables
    initial_question_prompt = initial_question_prompt_template.format(context=doc)

    initial_question = llm.invoke(initial_question_prompt)

    return initial_question

In [10]:
docs[1]

Document(metadata={'source': '/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf', 'page': 2}, page_content='导言 \n前言\n敬告用户\n尊敬的用户，感谢您选择小米汽车 SU7 车型（以下简称“SU7”）。SU7 是一款 C 级豪华科技\n轿车，在您使用 SU7 的过程中，小米汽车将竭力为您提供贴心周到的服务。\n在使用 SU7 前请您务必仔细阅读《用户手册》，尤其是“危险”、“注意”、“说明”等提示信\n息。通过《用户手册》您可以了解车辆功能、装备以及车辆维护和定期保养等信息，能够更\n好地帮助您安全驾驶车辆和延长车辆使用寿命。\n您可以通过以下方式查阅 SU7《用户手册》：\n• 登录小米汽车官网。\n• 登录手机端小米汽车 APP。\n• 进入车辆中控屏“用户手册”APP。\n小米汽车及其关联公司拥有本手册全部信息（包括文字、图片、音视频、网页、图表、数据\n等）的所有权利，包括但不限于著作权（包括计算机软件著作权）、专利权、商标权、服务\n标记、域名、商业秘密等。未经小米汽车的事先书面同意，您不能自行实施利用、转让或许\n可任何第三方实施、更改上述权利，您亦不能修改、复制、复印、提取、重新发布、转载、\n翻译本手册的内容。\n小米汽车可能出于遵守法律法规、保障安全等考量或为了提升您的驾乘体验，对车辆及相关')

In [20]:
# generate a question based on a given context
question = generate_question(docs[1], llm)
print(f"Intial question: {question}")

Intial question: 如何查阅小米汽车SU7的《用户手册》？


In [21]:
print(question)

如何查阅小米汽车SU7的《用户手册》？


## 4. 生成答案

In [22]:
# Create a prompt template that takes into consideration the the question and generates an answer
answer_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""
    <Instructions>
    <Task>
    <role>You are an experienced QA Engineer for building large language model applications.</role>
    <task>It is your task to generate an answer to the following question <question>{question}</question> only based on the <context>{context}</context></task>
    The output should be only the answer generated from the context.

    <rules>
    1. Only use the given context as a source for generating the answer.
    2. Be as precise as possible with answering the question.
    3. Be concise in answering the question and only answer the question at hand rather than adding extra information.
    4. The answer should be in Chinese.
    </rules>

    Only output the generated answer. No extra characters.
    </Task>
    </Instructions>

    Assistant:""")

def generate_answer(question: str, doc, llm):

    answer_prompt = answer_prompt_template.format(question = question, context=doc)

    answer = llm.invoke(answer_prompt)

    return answer

In [23]:
answer = generate_answer(question, docs[1], llm)
print(f"Intial question: {question}")
print("---")
print(f"Reference Answer: {answer}")

Intial question: 如何查阅小米汽车SU7的《用户手册》？
---
Reference Answer: 您可以通过以下方式查阅小米汽车SU7的《用户手册》：
• 登录小米汽车官网。
• 登录手机端小米汽车 APP。
• 进入车辆中控屏“用户手册”APP。


## 5. 提取相关句子
从给定上下文中提取与答案相关句子

In [24]:
# To check whether an answer was correctly formulated by the large language model you get the relevant text passages from the documents used for answering the questions.
source_prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""Human:
    <Instructions>
    Here is the context:
    <context>
    {context}
    </context>

    Your task is to extract the relevant sentences from the given context that can potentially help answer the following question. You are not allowed to make any changes to the sentences from the context.

    <question>
    {question}
    </question>

    Output only the relevant sentences you found, one sentence per line, without any extra characters or explanations.
    </Instructions>
    Assistant:""")

def generate_source(question: str, doc, llm):

    source_prompt = source_prompt_template.format(question = question, context=doc)

    source = llm.invoke(source_prompt)

    return source

In [25]:
source_sentence = generate_source(question, docs[1], llm)
print(f"Intial question: {question}")
print("---")
print(f"Reference Answer: {answer}")
print("---")
print(f"Source Sentence: {source_sentence}")

Intial question: 如何查阅小米汽车SU7的《用户手册》？
---
Reference Answer: 您可以通过以下方式查阅小米汽车SU7的《用户手册》：
• 登录小米汽车官网。
• 登录手机端小米汽车 APP。
• 进入车辆中控屏“用户手册”APP。
---
Source Sentence: 您可以通过以下方式查阅 SU7《用户手册》：
• 登录小米汽车官网。
• 登录手机端小米汽车 APP。
• 进入车辆中控屏“用户手册”APP。


## 6. 生成更简洁的问题-模拟用户行为

当根据整个数据集的相同提示生成问题和答案对时，问题可能会出现重复、形式相似，因此不会模仿真实的最终用户行为。我们将改进现有的生成问题，例如使其更短、更精确。

In [26]:
# To generate a more versatile testing dataset you alternate the questions to see how your RAG systems performs against differently formulated of questions
question_compress_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    <role>You are an experienced linguistics expert for building testsets for large language model applications.</role>

    <task>It is your task to rewrite the following question in a more indirect and compressed form, following these rules:

    <rules>
    1. Make the question more indirect
    2. Make the question shorter
    3. Use abbreviations if possible
    </rules>

    <question>
    {question}
    </question>

    Your output should only be the rewritten question with a question mark "?" at the end. Do not provide any other explanation or text.
    </task>
    </Instructions>

    """)


def compress_question(question):
    # Pass in values to the input variables
    question_compress_prompt = question_compress_prompt_template.format(question=question)

    question_compressed = llm.invoke(question_compress_prompt)

    return question_compressed

In [28]:
compressed_question = compress_question(question)
print(f"Intial question: {question}")
print("---")
print(f"Reference Answer: {answer}")
print("---")
print(f"Source Sentence: {source_sentence}")
print("---")
print(f"Compressed Question: {compressed_question}")


Intial question: 如何查阅小米汽车SU7的《用户手册》？
---
Reference Answer: 您可以通过以下方式查阅小米汽车SU7的《用户手册》：
• 登录小米汽车官网。
• 登录手机端小米汽车 APP。
• 进入车辆中控屏“用户手册”APP。
---
Source Sentence: 您可以通过以下方式查阅 SU7《用户手册》：
• 登录小米汽车官网。
• 登录手机端小米汽车 APP。
• 进入车辆中控屏“用户手册”APP。
---
Compressed Question: 小米SU7手册怎么查？


## 7. 自动数据集生成

为了扩展数据集生成的过程，您需要迭代上下文的所有块，为每个块生成问题、答案、相关句子和演变，并将它们保存到 pandas 数据框。

In [56]:
docs_subset = docs[:20] # for testing

In [57]:
from langchain_core.documents.base import Document

def generate_qa_dataset_doc(doc: Document, llm, dataset, doc_number):
    """A function to create a test dataset of questions for a given Document(Langchain Document type)
    Args:
        doc: 一个 Langchain Document 类型的文档
        llm: 一个 LLM 实例
        dataset: 一个 pandas 数据框，用于存储生成的数据
        doc_number: 当前文档的编号
    """

    # generate the initial question for the RAG testdataset
    question = generate_question(doc, llm)
    dataset.at[doc_number, "question"] = question

    # generate compressed  question to variate the dataset
    compressed_question = compress_question(question)
    dataset.at[doc_number, "question_compressed"] = compressed_question


    answer = generate_answer(question, doc, llm)
    dataset.at[doc_number, "reference_answer"] = answer

    source_sentence = generate_source(question, doc, llm)
    dataset.at[doc_number, "source_sentence"] = source_sentence

    source_raw = doc
    dataset.at[doc_number, "source_raw"] = source_raw.page_content

    source_document = doc.metadata["source"]
    dataset.at[doc_number, "source_document"] = source_document


    return dataset


In [58]:
# create a dataset class that in the end can be used to generate the dataset
import pandas as pd
import time

dataset = pd.DataFrame(columns=["question", "question_compressed", "reference_answer", "source_sentence","source_raw","source_document" ])

In [59]:
from langchain_core.documents.base import Document
from tqdm import tqdm

def generate_dataset(documents: Document,llm, dataset):

    print(f"start generating dataset from {len(documents)} docuements")
    print("---")
    generation_time_start = time.time()

    for doc in tqdm(range(len(documents))):
        q_generation_time_start = time.time()
        dataset = generate_qa_dataset_doc(doc = documents[doc], llm = llm, dataset = dataset, doc_number = doc)
        q_generation_time_end = time.time()
        total_elapsed_time_generation = q_generation_time_end - q_generation_time_start


        print(f"Finished creating evaluation data for chunk {doc+1}")
        print(f"Generation time for doc: {total_elapsed_time_generation}")
        print("---")

    generation_time_end = time.time()
    total_elapsed_time= generation_time_end - generation_time_start
    print(f"Generation time for all docs: {total_elapsed_time}")

    return dataset

In [55]:
docs_subset[0]

Document(metadata={'source': '/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf', 'page': 0}, page_content='SU7 用户手册')

In [None]:
dataset_df = generate_dataset(docs_subset, llm, dataset)

num_questions_generated = dataset_df.shape[0]
print(f"Generated a total of {num_questions_generated} questions.")

In [61]:
# display the first rows of the generated dataset
dataset_df.head()

Unnamed: 0,question,question_compressed,reference_answer,source_sentence,source_raw,source_document
0,小米SU7用户手册的来源是什么？,小米SU7用户手册来源？,小米SU7用户手册的来源是/root/app/langchain_data_gen/pdf/...,metadata={'source': '/root/app/langchain_data_...,SU7 用户手册,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf
1,如何查阅小米汽车SU7的《用户手册》？,小米SU7手册怎么查？?,您可以通过以下方式查阅小米汽车SU7的《用户手册》：\n• 登录小米汽车官网。\n• 登录手...,您可以通过以下方式查阅 SU7《用户手册》：\n• 登录小米汽车官网。\n• 登录手机端小米...,导言 \n前言\n敬告用户\n尊敬的用户，感谢您选择小米汽车 SU7 车型（以下简称“SU7...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf
2,小米汽车客服热线是多少？,小米客服电话？,400-182-6888,若您对本手册有任何问题、意见或建议，请致电小米汽车客服热线 400-182-6888。,可任何第三方实施、更改上述权利，您亦不能修改、复制、复印、提取、重新发布、转载、\n翻译本手...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf
3,手册中“危险”标识的含义是什么？,手册中“危险”标识的含义？,如未按照该危险标识内容操作可能会直接造成车辆损毁或人身伤亡。,如未按照该危险标识内容操作可能会直接造成车辆损毁或人身伤亡。,本手册中带“*”的描述仅适用于部分配置的车型，您的车辆可能未配备这些功能，请以实车\n为准。...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf
4,驾驶车辆时应注意哪些行为以防止事故发生？,驾驶时如何避免事故？?,驾驶车辆时应注意以下行为以防止事故发生：\n1. 保持清醒的驾驶状态，切勿在饮酒或服用具有镇...,• 保持清醒的驾驶状态，切勿在饮酒或服用具有镇静、嗜睡、疲倦、头痛、视力模糊等副\n作用的药...,注意:\n如未按照该注意事项操作可能会导致车辆相关功能无法使用，严重时造成车辆损坏。\n说明...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf


In [65]:
dataset_df.to_excel("dataset_df.xlsx", index=False)

## 8. 评估生成数据的质量

**Relevance**

相关性衡量问题对于特定领域或上下文的有用性和适用性。在财务和业务分析的背景下，相关性提示根据以下标准评估问题：

- 这个问题与华尔街金融和商业分析师的工作直接相关吗？
- 该问题是否解决了分析师可能遇到的实际问题或用例？
- 问题是否清晰明确，避免含糊或模糊？
- 该问题是否需要实质性答案来证明对金融主题的理解？
- 回答这个问题是否会提供可应用于现实世界公司评估任务的见解或知识？

相关性得分范围为 1 到 5，得分越高表示对财务和业务分析师的相关性和实用性越高。

**Groundedness**

Groundedness 衡量根据所提供的上下文或信息回答问题的程度。接地性提示根据以下标准评估问题：

- 能否仅使用给定上下文中提供的信息来回答问题？
- 上下文是否提供了回答问题所需的很少、部分、大量或全部信息？

Groundedness 得分也从 1 到 5 不等，具有以下解释：

- 根据给定的上下文根本无法回答这个问题。
- 上下文提供了很少的相关信息来回答这个问题。
- 上下文提供了一些相关信息来部分回答问题。
- 上下文提供了大量信息来回答问题的大部分方面。
- 上下文提供了完整、明确地回答问题所需的所有信息。



In [44]:
groundedness_check_prompt_template = PromptTemplate(
    input_variables=["context","question"],
    template="""
    <Instructions>
    You will be given a context and a question related to that context.

    Your task is to provide an evaluation of how well the given question can be answered using only the information provided in the context. Rate this on a scale from 1 to 5, where:

    1 = The question cannot be answered at all based on the given context
    2 = The context provides very little relevant information to answer the question
    3 = The context provides some relevant information to partially answer the question
    4 = The context provides substantial information to answer most aspects of the question
    5 = The context provides all the information needed to fully and unambiguously answer the question

    First, read through the provided context carefully:

    <context>
    {context}
    </context>

    Then read the question:

    <question>
    {question}
    </question>

    <rules>The evaluation should be in Chinese.</rules>
         
    Evaluate how well you think the question can be answered using only the context information. Provide your reasoning first in an <evaluation> section, explaining what relevant or missing information from the context led you to your evaluation score in only one sentence.

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>
    
    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>
    
    </Instructions>

    """)

relevance_check_prompt_template = PromptTemplate(
    input_variables=["question"],
    template="""
    <Instructions>
    You will be given a question related to 小米SU7用户手册. Your task is to evaluate how useful this question would be for a customer who is interested in 小米SU7.

    To evaluate the usefulness of the question, consider the following criteria:

    1. Relevance: Is the question directly relevant to your work? Questions that are too broad or unrelated to this domain should receive a lower rating.

    2. Practicality: Does the question address a practical problem or use case that analysts might encounter? Theoretical or overly academic questions may be less useful.

    3. Clarity: Is the question clear and well-defined? Ambiguous or vague questions are less useful.

    4. Depth: Does the question require a substantive answer that demonstrates understanding of financial topics? Surface-level questions may be less useful.

    5. Applicability: Would answering this question provide insights or knowledge that could be applied to real-world company evaluation tasks? Questions with limited applicability should receive a lower rating.

    <rules>The evaluation should be in Chinese.</rules>

    Provide your evaluation in the following format:

    <rating>(Your rating from 1 to 5)</rating>

    <evaluation>(Your evaluation and reasoning for the rating)</evaluation>

    Here is an example:
    <evaluation>The question is very relevant to the persona because it asks about financial information of a company</evaluation>
    <rating>5</rating>

    Here is the question:

    {question}
    </Instructions>
    """)

In [45]:
def generate_groundedness_check(question, source_raw):
    # Pass in values to the input variables
    groundedness_prompt = groundedness_check_prompt_template.format(question=question, context=source_raw)

    groundedness_rating = llm.invoke(groundedness_prompt)

    return groundedness_rating

def generate_relevance_check(question):
    # Pass in values to the input variables
    relevance_prompt = relevance_check_prompt_template.format(question=question)

    relevance_rating = llm.invoke(relevance_prompt)

    return relevance_rating

In [46]:
# Evaluating one of the generated questions for groundedness and relevance
groundedness_rating = generate_groundedness_check(dataset_df.question[0], dataset_df.source_raw[0])
relevance_rating = generate_relevance_check(dataset_df.question[0])

print("groundedness Score:")
print(groundedness_rating)

print("---")

print("Relevance Score:")
print(relevance_rating)


groundedness Score:
<rating>2</rating>

<evaluation>上下文仅提供了“SU7 用户手册”的标题，没有提供关于其来源的具体信息。</evaluation>
---
Relevance Score:
<rating>3</rating>

<evaluation>
这个问题与小米SU7用户手册直接相关，因此具有一定的相关性。它询问用户手册的来源，这可能涉及到用户手册的编写背景或参考资料，对于理解用户手册的内容和可靠性有一定帮助。然而，这个问题相对简单，不需要深入的分析或解释，因此深度和实用性较低。总体来说，这个问题对于用户手册的初步了解是有用的，但不如更深入的问题有价值。
</evaluation>


In [47]:
import re
# Helper functions to extract values from the string response by the LLM Critique Agents.
def extract_rating(text):
    pattern = r'<rating>(.*?)</rating>'
    match = re.search(pattern, text)
    if match:
        rating = match.group(1)
        return rating
    else:
        return None

def extract_reasoning(text):
    pattern = r'<evaluation>(.*?)</evaluation>'
    match = re.search(pattern, text)
    if match:
        rating = match.group(1)
        return rating
    else:
        return None

In [48]:
def evaluate_dataset(dataset):
    for index, row in dataset.iterrows():

        question = row['question']
        source_raw = row['source_raw']

        # Generate groundedness check
        groundedness_check = generate_groundedness_check(question, source_raw)
        groundedness_score = extract_rating(groundedness_check)
        groundedness_score_reasoning = extract_reasoning(groundedness_check)

        dataset.at[index, 'groundedness_score'] = groundedness_score
        dataset.at[index, 'groundedness_score_reasoning'] = groundedness_score_reasoning

        # Generate relevance check
        relevance_check = generate_relevance_check(question)
        relevancy_score = extract_rating(relevance_check)
        relevancy_score_reasoning = extract_reasoning(relevance_check)

        dataset.at[index, 'relevancy_score'] = relevancy_score
        dataset.at[index, 'relevancy_score_reasoning'] = relevancy_score_reasoning

    return dataset

现在评估已经建立，包括对基础性和相关性分数的提示，迭代生成的数据集数据集并为每个问题分配一个分数。根据需要可以从数据集中消除分数低于特定阈值的问题。

In [49]:
dataset_evaluated = evaluate_dataset(dataset_df)

In [50]:
dataset_evaluated.head()

Unnamed: 0,question,question_compressed,reference_answer,source_sentence,source_raw,source_document,groundedness_score,groundedness_score_reasoning,relevancy_score,relevancy_score_reasoning
0,小米SU7用户手册的来源是什么？,小米SU7用户手册来源？,小米SU7用户手册的来源是/root/app/langchain_data_gen/pdf/...,metadata={'source': '/root/app/langchain_data_...,SU7 用户手册,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf,2,上下文仅提供了“SU7 用户手册”这一信息，没有具体说明其来源，因此只能提供很少的相关信息来...,3,该问题询问小米SU7用户手册的来源，虽然与小米SU7相关，但并不是一个非常实用或深入的问题。...
1,如何查阅小米汽车SU7的《用户手册》？,小米SU7手册怎么查?,您可以通过以下方式查阅小米汽车SU7的《用户手册》：\n• 登录小米汽车官网。\n• 登录手...,您可以通过以下方式查阅 SU7《用户手册》：\n• 登录小米汽车官网。\n• 登录手机端小米...,导言 \n前言\n敬告用户\n尊敬的用户，感谢您选择小米汽车 SU7 车型（以下简称“SU7...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf,5,上下文中明确提供了三种查阅小米汽车SU7《用户手册》的方式，信息完整且具体。,5,该问题非常相关，因为它直接涉及到小米SU7用户手册的查阅方法，这是潜在用户或现有车主可能会遇...
2,小米汽车客服热线号码是多少？,小米客服电话？,400-182-6888,若您对本手册有任何问题、意见或建议，请致电小米汽车客服热线 400-182-6888。,可任何第三方实施、更改上述权利，您亦不能修改、复制、复印、提取、重新发布、转载、\n翻译本手...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf,5,上下文中明确提供了小米汽车客服热线号码为400-182-6888。,3,该问题与小米SU7用户手册有一定的相关性，因为它涉及到用户可能需要联系客服的情况。然而，这个...
3,手册中“危险”标识的含义是什么？,手册中“危险”标识的含义？,如未按照该危险标识内容操作可能会直接造成车辆损毁或人身伤亡。,危险:\n如未按照该危险标识内容操作可能会直接造成车辆损毁或人身伤亡。,本手册中带“*”的描述仅适用于部分配置的车型，您的车辆可能未配备这些功能，请以实车\n为准。...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf,5,手册中明确解释了“危险”标识的含义，即未按照该标识内容操作可能会直接造成车辆损毁或人身伤亡。,4,该问题直接关联到小米SU7用户手册中的内容，特别是关于“危险”标识的含义。对于用户来说，了解...
4,驾驶车辆时应注意哪些行为以防止事故发生？,驾驶时应注意哪些行为以避免事故？,驾驶车辆时应注意以下行为以防止事故发生：\n1. 保持清醒的驾驶状态，切勿在饮酒或服用具有镇...,• 保持清醒的驾驶状态，切勿在饮酒或服用具有镇静、嗜睡、疲倦、头痛、视力模糊等副\n作用的药...,注意:\n如未按照该注意事项操作可能会导致车辆相关功能无法使用，严重时造成车辆损坏。\n说明...,/root/app/langchain_data_gen/pdf/小米SU7用户手册.pdf,5,上下文中明确提到了驾驶车辆时应注意的行为，如保持清醒、遵守交通法规、避免分散注意力等，这些信...,4,这个问题与小米SU7用户手册直接相关，因为它涉及到驾驶安全，这是任何车辆用户手册中的重要部分...
