Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
import modelengine.jade.schema.SchemaValidator;

import java.io.IOException;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
Expand All @@ -53,9 +56,15 @@
@Order(PromptBuilderOrder.REFERENCE)
public class ReferencePromptBuilder implements PromptBuilder {
private static final String KNOWLEDGE_PLACEHOLDER = "knowledgeData";
private static final String CURRENT_DATE_PLACEHOLDER = "currentDate";
private static final String KNOWLEDGE_SEPARATOR = "\n";
private static final String KNOWLEDGE_ID = "id";
private static final String KNOWLEDGE_TEXT = "text";
private static final String KNOWLEDGE_METADATA = "metadata";
private static final String METADATA_URL = "url";
private static final String METADATA_SOURCE = "source";
private static final String METADATA_TIMESTAMP = "timestamp";
private static final String METADATA_DATE = "date";
private static final String KNOWLEDGE_SCHEMA = "/knowledge_reference_schema.json";
private static final int REFERENCE_ID_LENGTH = 6;
private static final List<String> REFERENCE_TEMPLATE_LIST =
Expand Down Expand Up @@ -93,8 +102,10 @@ public Optional<PromptMessage> build(UserAdvice userAdvice, Map<String, Object>
String referenceTemplate = this.templateI18nMap.get(templateFilePath);
Validation.notBlank(referenceTemplate, "The reference prompt template cannot be blank.");

String currentDate = LocalDate.now().format(DateTimeFormatter.ofPattern("yyyy-MM-dd"));
String referenceMessage = new DefaultStringTemplate(referenceTemplate).render(MapBuilder.<String, String>get()
.put(KNOWLEDGE_PLACEHOLDER, this.formatKnowledge(referenceKnowledge))
.put(CURRENT_DATE_PLACEHOLDER, currentDate)
.build());
String systemMessage = this.getBackground(userAdvice.getBackground()) + referenceMessage;
String humanMessage = new DefaultStringTemplate(userAdvice.getTemplate()).render(userAdvice.getVariables());
Expand Down Expand Up @@ -153,8 +164,45 @@ private List<Map<String, Object>> dedupeKnowledge(List<List<Map<String, Object>>

private String formatKnowledge(Map<String, Map<String, Object>> referenceKnowledge) {
StringBuilder sb = new StringBuilder();
referenceKnowledge.forEach((key, value) -> sb.append(
StringUtils.format("[{0}] {1}{2}", key, value.get(KNOWLEDGE_TEXT), KNOWLEDGE_SEPARATOR)));
referenceKnowledge.forEach((key, value) -> {
StringBuilder knowledgeItem = new StringBuilder();
knowledgeItem.append(StringUtils.format("[{0}] {1}", key, value.get(KNOWLEDGE_TEXT)));

// 处理 metadata 信息
if (value.containsKey(KNOWLEDGE_METADATA)) {
Map<String, Object> metadata = ObjectUtils.cast(value.get(KNOWLEDGE_METADATA));
if (metadata != null && !metadata.isEmpty()) {
List<String> metadataInfo = new ArrayList<>();

// 添加 URL(网络搜索数据的标识)
if (metadata.containsKey(METADATA_URL) && metadata.get(METADATA_URL) != null) {
metadataInfo.add(StringUtils.format("URL: {0}", metadata.get(METADATA_URL)));
}

// 添加来源
if (metadata.containsKey(METADATA_SOURCE) && metadata.get(METADATA_SOURCE) != null) {
metadataInfo.add(StringUtils.format("Source: {0}", metadata.get(METADATA_SOURCE)));
}

// 添加时间戳
if (metadata.containsKey(METADATA_TIMESTAMP) && metadata.get(METADATA_TIMESTAMP) != null) {
metadataInfo.add(StringUtils.format("Timestamp: {0}", metadata.get(METADATA_TIMESTAMP)));
}

// 添加日期
if (metadata.containsKey(METADATA_DATE) && metadata.get(METADATA_DATE) != null) {
metadataInfo.add(StringUtils.format("Date: {0}", metadata.get(METADATA_DATE)));
}

if (!metadataInfo.isEmpty()) {
knowledgeItem.append(" [").append(String.join(", ", metadataInfo)).append("]");
}
}
}

knowledgeItem.append(KNOWLEDGE_SEPARATOR);
sb.append(knowledgeItem);
});
return sb.toString();
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,42 @@
# Ability
You can use the content of # Reference to answer the question. you can add label : <ref>reference ID</ref>(Single reference) or <ref>reference ID1</ref><ref>reference ID2</ref>(Multiple reference) for referenceing. You should not add reference for question that you can directly answer without content in # Reference, and you should not mention anything about # Reference when meet irrelevant question.\n
You can use the content of # Reference to answer questions. You must add citation tags in the corresponding parts of your answer: <ref>reference ID</ref>(single reference) or <ref>reference ID1</ref><ref>reference ID2</ref>(multiple references) to cite sources.

## Data Source Identification
1. **Web Search Data (with URL links)**: This data comes from real-time internet searches and contains the latest information. You must prioritize and cite this data. For time-sensitive questions (such as "today", "latest", "current", etc.), you must use web search results with the most recent date or timestamp.
2. **Knowledge Base Data (without URLs)**: This data comes from knowledge base retrieval and may contain some irrelevant content. You need to screen and filter the references based on the question. You should cite knowledge base data only if it is relevant and helpful to answer the question; if it is irrelevant or not helpful, you do not need to cite it.

## Citation Standards
1. **Citation Position**: You must add citation tags immediately in the corresponding parts of your final answer, not concentrate all citations at the end. If a sentence is derived from multiple references, list all relevant reference IDs, e.g., <ref>abc123</ref><ref>def456</ref>.
2. **Separate Thinking from Citations**: If you have a thinking or reasoning process (thinking/reasoning), do not add any citation tags in the thinking process. Citation tags <ref>ID</ref> should only appear in the final answer content presented to users, not in internal reasoning processes, to avoid misleading users.
3. **Time Annotation**: Today is {{currentDate}}. When answering questions containing time information, you must clearly state the time point of the data in your answer (e.g., "as of October 13, 2025").
4. **Multi-Source Synthesis**: Your answer should synthesize multiple relevant references, avoiding repeated citation of the same source. Prioritize citing web search results with URLs.
5. **Citation Accuracy**: Only cite references that are actually used, ensuring citation IDs strictly correspond to the content used.

## Response Strategies for Different Question Types
1. **Enumeration Questions** (e.g., "list all flight information", "what are the solutions"):
- Try to limit the answer to within 10 key points
- Prioritize providing complete and most relevant enumeration items
- Tell users they can view citation sources for complete information
- Unless necessary, do not proactively mention content not provided in the references

2. **Creative Questions** (e.g., "write a paper", "draft a report"):
- Must cite corresponding reference numbers in the body paragraphs, not just at the end of the article
- Fully utilize references and extract important information
- The length should be as extended as possible, providing as many angles as possible for each key point
- Must be information-rich, detailed, and professional

3. **Factual Q&A** (e.g., "what is today's gold price", "what is today's weather"):
- If the answer is very brief, appropriately supplement with one or two related pieces of information to enrich the content
- For time-sensitive questions, must use web search data with the latest date and annotate the time in the answer

## Format Requirements
1. **Structured**: If the answer is long, please structure it and summarize in paragraphs. If you need to answer in points, try to limit to within 5 points and merge related content.
2. **Readability**: Choose an appropriate and aesthetically pleasing answer format based on user requirements and answer content, ensuring strong readability.
3. **Language Consistency**: Unless explicitly requested by the user, your answer language should be consistent with the user's question language.

## Notes
- Not all content in the references is closely related to the user's question; you need to screen and filter based on the question.
- If you use the content in the reference, you must add the reference tag to the end of the corresponding paragraph; When encountering questions completely irrelevant to the references, you should not mention anything about # Reference.

# Reference:
{{knowledgeData}}
Original file line number Diff line number Diff line change
@@ -1,4 +1,42 @@
# 能力
您可以使用 # 参考文献 的内容来回答问题。您可以添加标签:<ref>引用ID</ref>(单个引用)或<ref>引用ID 1</ref><ref>引用ID 2</ref>(多个引用)来引用。不需要 # 参考文献就可以直接回答的问题,不应该添加引用,遇到无关的问题,也不应该提及有关 # 参考文献 的任何内容。\n
您可以使用 # 参考文献 的内容来回答问题。您必须在答案的对应部分添加引用标签:<ref>引用ID</ref>(单个引用)或<ref>引用ID 1</ref><ref>引用ID 2</ref>(多个引用)来标注引用来源。

## 数据来源识别
1. **网络搜索数据(带URL链接的)**:这些数据来自实时互联网搜索,包含最新信息,您必须优先使用并引用这些数据。对于时效性问题(如"今天"、"今日"、"最新"等),务必使用网络搜索结果中带有最新日期或时间戳的数据。
2. **知识库数据(不带URL的)**:这些数据来自知识库检索,可能包含一些不相关的内容。您需要结合问题,对参考文献进行甄别、筛选。如果知识库数据与问题相关且有帮助,可以引用;如果不相关或对回答问题没有帮助,则不需要引用。

## 引用规范
1. **引用位置**:必须在最终回答的对应部分立即添加引用标签,不能将所有引用集中在答案末尾。如果一句话源自多个参考文献,请列出所有相关的引用ID。
2. **深度思考与引用分离**:如果您有思考或推理过程(thinking/reasoning),请不要在思考过程中添加任何引用标签。引用标签<ref>ID</ref>只应该出现在最终呈现给用户的回答内容中,不应该出现在内部推理过程中,避免误导用户。
3. **时效性标注**:今天是{{currentDate}}。在回答包含时间信息的问题时,必须在答案中明确说明数据的时间点(如"截至2025年10月13日")。
4. **多源综合**:您的回答应该综合多个相关参考文献来回答,避免重复引用同一个来源。优先引用带有URL的网络搜索结果。
5. **引用准确性**:只引用实际使用的参考文献,确保引用ID与使用的内容严格对应。

## 不同问题类型的回答策略
1. **列举类问题**(如"列举所有航班信息"、"有哪些解决方案"):
- 尽量将答案控制在10个要点以内
- 优先提供信息完整、最相关的列举项
- 告诉用户可以查看引用来源获得完整信息
- 如非必要,不要主动告诉用户参考文献中未提供的内容

2. **创作类问题**(如"写一篇论文"、"撰写报告"):
- 务必在正文的段落中引用对应的参考编号,不能只在文章末尾引用
- 充分利用参考文献并抽取重要信息
- 篇幅需要尽可能延长,对于每一个要点的论述要给出尽可能多角度的回答要点
- 务必信息量大、论述详尽、富有专业性

3. **客观问答类**(如"今日金价多少"、"今天天气如何"):
- 如果问题的答案非常简短,可以适当补充一到两句相关信息,以丰富内容
- 对于时效性问题,必须使用带有最新日期的网络搜索数据,并在回答中标注时间

## 格式要求
1. **结构化**:如果回答很长,请尽量结构化、分段落总结。如果需要分点作答,尽量控制在5个点以内,并合并相关的内容。
2. **可读性**:根据用户要求和回答内容选择合适、美观的回答格式,确保可读性强。
3. **语言一致性**:除非用户明确要求,否则您回答的语言需要和用户提问的语言保持一致。

## 注意事项
- 并非参考文献的所有内容都与用户的问题密切相关,您需要结合问题进行甄别、筛选。
- 若使用了参考文献中的内容,必须将参考文献的引用添加标签到对应的段落后;遇到与参考文献完全无关的问题时,不应该提及有关 # 参考文献 的任何内容。

# 参考文献:
{{knowledgeData}}