代码片段\n\n[结构保持,语义保留,敏感信息脱敏处理(如手机号、保密标记等)]",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "KnowledgeCleanerPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "raw_chunk",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "cleaned_chunk",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 122,
+ "name": "KBCTextCleanerBatch",
+ "description": "知识清洗算子:对原始知识内容进行标准化处理,包括HTML标签清理、特殊字符规范化、链接处理和结构优化,提升RAG知识库的质量。主要功能:\n1. 移除冗余HTML标签但保留语义化标签\n2. 标准化引号/破折号等特殊字符\n3. 处理超链接同时保留文本\n4. 保持原始段落结构和代码缩进\n5. 确保事实性内容零修改",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "KnowledgeCleanerPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "cleaned_chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 123,
+ "name": "KBCMultiHopQAGeneratorBatch",
+ "description": "('MultiHopQAGenerator 是多跳问答对生成处理器,支持从文本中自动生成需要多步推理的问题与答案。', '处理流程包括:文本预处理、信息抽取、问题生成与回答生成,支持自定义语言模型后端和参数。', '输出格式如下:', '输入:\\ntext: <原始上下文文本>', '输出:\\n{\\n \"text\": <处理后的文本字符串>,\\n \"qa_pairs\": [\\n {\\n \"question\": <字符串:生成的问题>,\\n \"reasoning_steps\": [\\n {\"step\": <推理过程的步骤 1>},\\n {\"step\": <步骤 2>} ...\\n ],\\n \"answer\": <字符串:最终答案>,\\n \"supporting_facts\": [<支持该答案的事实 1>, <事实 2>, ...],\\n \"type\": <可选:问题类型,如“生物学”、“历史”等>\\n },\\n ...\\n ],\\n \"metadata\": {\\n \"source\": <数据来源>,\\n \"timestamp\": <时间戳字符串>,\\n \"complexity\": <整数:问题复杂度标记>\\n }\\n}')",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "Text2MultiHopQAGeneratorPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "seed",
+ "default": 0,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "enhanced_chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 124,
+ "name": "QAExtractor",
+ "description": "QA对提取器 - 将嵌套的QA_pairs转换为Alpaca微调格式\n\n核心功能:\n从结构化的QA对数据中提取问答内容,自动整合推理步骤和支持事实,\n输出符合Stanford Alpaca标准的instruction-input-output格式。\n\n初始化参数:\n• qa_key: QA对的字段名 (默认: 'QA_pairs')\n• output_json_file: 输出JSON文件路径 (可选,不指定则只更新DataFrame)\n• instruction: 统一的指令前缀 (默认: 'Please answer the following question...')\n\n运行参数 (input_key):\n• None - 包含所有字段 (question + reasoning_steps + supporting_facts)\n• '' - 空字符串,不包含额外上下文\n• 'reasoning_steps' - 只包含推理步骤\n• 'question,reasoning_steps' - 逗号分隔多个字段\n• ['question', 'supporting_facts'] - 列表格式\n\n输出字段:\n• instruction: 问题指令\n• input: 上下文信息 (根据input_key动态拼接)\n• output: 答案\n\n适用场景: 知识库QA微调、领域问答模型训练",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [],
+ "parameter": {
+ "init": [
+ {
+ "name": "qa_key",
+ "default": "QA_pairs",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_json_file",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "instruction",
+ "default": "Please answer the following question based on the provided information.",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ }
+ ],
+ "reasoning": [
+ {
+ "node": 125,
+ "name": "ReasoningAnswerGenerator",
+ "description": "该算子用于为给定问题生成答案,调用大语言模型进行推理。\n输入参数:\n- llm_serving:LLM服务实例,用于生成答案\n- prompt_template:提示模板对象,用于构建生成提示词\n输出参数:\n- output_key:生成的答案字段,默认'generated_cot'",
+ "type": {
+ "level_1": "reasoning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "MathAnswerGeneratorPrompt",
+ "GeneralAnswerGeneratorPrompt",
+ "DiyAnswerGeneratorPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": "代码片段\n\n[结构保持,语义保留,敏感信息脱敏处理(如手机号、保密标记等)]",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "KnowledgeCleanerPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "raw_chunk",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "cleaned_chunk",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 122,
+ "name": "KBCTextCleanerBatch",
+ "description": "知识清洗算子:对原始知识内容进行标准化处理,包括HTML标签清理、特殊字符规范化、链接处理和结构优化,提升RAG知识库的质量。主要功能:\n1. 移除冗余HTML标签但保留语义化标签\n2. 标准化引号/破折号等特殊字符\n3. 处理超链接同时保留文本\n4. 保持原始段落结构和代码缩进\n5. 确保事实性内容零修改",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "KnowledgeCleanerPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "cleaned_chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 123,
+ "name": "KBCMultiHopQAGeneratorBatch",
+ "description": "('MultiHopQAGenerator 是多跳问答对生成处理器,支持从文本中自动生成需要多步推理的问题与答案。', '处理流程包括:文本预处理、信息抽取、问题生成与回答生成,支持自定义语言模型后端和参数。', '输出格式如下:', '输入:\\ntext: <原始上下文文本>', '输出:\\n{\\n \"text\": <处理后的文本字符串>,\\n \"qa_pairs\": [\\n {\\n \"question\": <字符串:生成的问题>,\\n \"reasoning_steps\": [\\n {\"step\": <推理过程的步骤 1>},\\n {\"step\": <步骤 2>} ...\\n ],\\n \"answer\": <字符串:最终答案>,\\n \"supporting_facts\": [<支持该答案的事实 1>, <事实 2>, ...],\\n \"type\": <可选:问题类型,如“生物学”、“历史”等>\\n },\\n ...\\n ],\\n \"metadata\": {\\n \"source\": <数据来源>,\\n \"timestamp\": <时间戳字符串>,\\n \"complexity\": <整数:问题复杂度标记>\\n }\\n}')",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "Text2MultiHopQAGeneratorPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "seed",
+ "default": 0,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "lang",
+ "default": "en",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": "chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": "enhanced_chunk_path",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 124,
+ "name": "QAExtractor",
+ "description": "QA对提取器 - 将嵌套的QA_pairs转换为Alpaca微调格式\n\n核心功能:\n从结构化的QA对数据中提取问答内容,自动整合推理步骤和支持事实,\n输出符合Stanford Alpaca标准的instruction-input-output格式。\n\n初始化参数:\n• qa_key: QA对的字段名 (默认: 'QA_pairs')\n• output_json_file: 输出JSON文件路径 (可选,不指定则只更新DataFrame)\n• instruction: 统一的指令前缀 (默认: 'Please answer the following question...')\n\n运行参数 (input_key):\n• None - 包含所有字段 (question + reasoning_steps + supporting_facts)\n• '' - 空字符串,不包含额外上下文\n• 'reasoning_steps' - 只包含推理步骤\n• 'question,reasoning_steps' - 逗号分隔多个字段\n• ['question', 'supporting_facts'] - 列表格式\n\n输出字段:\n• instruction: 问题指令\n• input: 上下文信息 (根据input_key动态拼接)\n• output: 答案\n\n适用场景: 知识库QA微调、领域问答模型训练",
+ "type": {
+ "level_1": "knowledge_cleaning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [],
+ "parameter": {
+ "init": [
+ {
+ "name": "qa_key",
+ "default": "QA_pairs",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_json_file",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "instruction",
+ "default": "Please answer the following question based on the provided information.",
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ],
+ "run": [
+ {
+ "name": "storage",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "input_key",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "output_key",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ }
+ ]
+ },
+ "required": "",
+ "depends_on": [],
+ "mode": ""
+ },
+ {
+ "node": 125,
+ "name": "ReasoningAnswerGenerator",
+ "description": "该算子用于为给定问题生成答案,调用大语言模型进行推理。\n输入参数:\n- llm_serving:LLM服务实例,用于生成答案\n- prompt_template:提示模板对象,用于构建生成提示词\n输出参数:\n- output_key:生成的答案字段,默认'generated_cot'",
+ "type": {
+ "level_1": "reasoning",
+ "level_2": "generate"
+ },
+ "allowed_prompts": [
+ "MathAnswerGeneratorPrompt",
+ "GeneralAnswerGeneratorPrompt",
+ "DiyAnswerGeneratorPrompt"
+ ],
+ "parameter": {
+ "init": [
+ {
+ "name": "llm_serving",
+ "default": null,
+ "kind": "POSITIONAL_OR_KEYWORD"
+ },
+ {
+ "name": "prompt_template",
+ "default": "代码片段\n\n[结构保持,语义保留,敏感信息脱敏处理(如手机号、保密标记等)]",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "KnowledgeCleanerPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "raw_chunk",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "cleaned_chunk",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 122,
- "name": "KBCTextCleanerBatch",
- "description": "知识清洗算子:对原始知识内容进行标准化处理,包括HTML标签清理、特殊字符规范化、链接处理和结构优化,提升RAG知识库的质量。主要功能:\n1. 移除冗余HTML标签但保留语义化标签\n2. 标准化引号/破折号等特殊字符\n3. 处理超链接同时保留文本\n4. 保持原始段落结构和代码缩进\n5. 确保事实性内容零修改",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "KnowledgeCleanerPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "cleaned_chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 123,
- "name": "KBCMultiHopQAGeneratorBatch",
- "description": "('MultiHopQAGenerator 是多跳问答对生成处理器,支持从文本中自动生成需要多步推理的问题与答案。', '处理流程包括:文本预处理、信息抽取、问题生成与回答生成,支持自定义语言模型后端和参数。', '输出格式如下:', '输入:\\ntext: <原始上下文文本>', '输出:\\n{\\n \"text\": <处理后的文本字符串>,\\n \"qa_pairs\": [\\n {\\n \"question\": <字符串:生成的问题>,\\n \"reasoning_steps\": [\\n {\"step\": <推理过程的步骤 1>},\\n {\"step\": <步骤 2>} ...\\n ],\\n \"answer\": <字符串:最终答案>,\\n \"supporting_facts\": [<支持该答案的事实 1>, <事实 2>, ...],\\n \"type\": <可选:问题类型,如“生物学”、“历史”等>\\n },\\n ...\\n ],\\n \"metadata\": {\\n \"source\": <数据来源>,\\n \"timestamp\": <时间戳字符串>,\\n \"complexity\": <整数:问题复杂度标记>\\n }\\n}')",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "Text2MultiHopQAGeneratorPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "seed",
- "default": 0,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "enhanced_chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 124,
- "name": "QAExtractor",
- "description": "QA对提取器 - 将嵌套的QA_pairs转换为Alpaca微调格式\n\n核心功能:\n从结构化的QA对数据中提取问答内容,自动整合推理步骤和支持事实,\n输出符合Stanford Alpaca标准的instruction-input-output格式。\n\n初始化参数:\n• qa_key: QA对的字段名 (默认: 'QA_pairs')\n• output_json_file: 输出JSON文件路径 (可选,不指定则只更新DataFrame)\n• instruction: 统一的指令前缀 (默认: 'Please answer the following question...')\n\n运行参数 (input_key):\n• None - 包含所有字段 (question + reasoning_steps + supporting_facts)\n• '' - 空字符串,不包含额外上下文\n• 'reasoning_steps' - 只包含推理步骤\n• 'question,reasoning_steps' - 逗号分隔多个字段\n• ['question', 'supporting_facts'] - 列表格式\n\n输出字段:\n• instruction: 问题指令\n• input: 上下文信息 (根据input_key动态拼接)\n• output: 答案\n\n适用场景: 知识库QA微调、领域问答模型训练",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [],
- "parameter": {
- "init": [
- {
- "name": "qa_key",
- "default": "QA_pairs",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_json_file",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "instruction",
- "default": "Please answer the following question based on the provided information.",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- }
- ],
- "reasoning": [
- {
- "node": 125,
- "name": "ReasoningAnswerGenerator",
- "description": "该算子用于为给定问题生成答案,调用大语言模型进行推理。\n输入参数:\n- llm_serving:LLM服务实例,用于生成答案\n- prompt_template:提示模板对象,用于构建生成提示词\n输出参数:\n- output_key:生成的答案字段,默认'generated_cot'",
- "type": {
- "level_1": "reasoning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "MathAnswerGeneratorPrompt",
- "GeneralAnswerGeneratorPrompt",
- "DiyAnswerGeneratorPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": "代码片段\n\n[结构保持,语义保留,敏感信息脱敏处理(如手机号、保密标记等)]",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "KnowledgeCleanerPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "raw_chunk",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "cleaned_chunk",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 122,
- "name": "KBCTextCleanerBatch",
- "description": "知识清洗算子:对原始知识内容进行标准化处理,包括HTML标签清理、特殊字符规范化、链接处理和结构优化,提升RAG知识库的质量。主要功能:\n1. 移除冗余HTML标签但保留语义化标签\n2. 标准化引号/破折号等特殊字符\n3. 处理超链接同时保留文本\n4. 保持原始段落结构和代码缩进\n5. 确保事实性内容零修改",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "KnowledgeCleanerPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "cleaned_chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 123,
- "name": "KBCMultiHopQAGeneratorBatch",
- "description": "('MultiHopQAGenerator 是多跳问答对生成处理器,支持从文本中自动生成需要多步推理的问题与答案。', '处理流程包括:文本预处理、信息抽取、问题生成与回答生成,支持自定义语言模型后端和参数。', '输出格式如下:', '输入:\\ntext: <原始上下文文本>', '输出:\\n{\\n \"text\": <处理后的文本字符串>,\\n \"qa_pairs\": [\\n {\\n \"question\": <字符串:生成的问题>,\\n \"reasoning_steps\": [\\n {\"step\": <推理过程的步骤 1>},\\n {\"step\": <步骤 2>} ...\\n ],\\n \"answer\": <字符串:最终答案>,\\n \"supporting_facts\": [<支持该答案的事实 1>, <事实 2>, ...],\\n \"type\": <可选:问题类型,如“生物学”、“历史”等>\\n },\\n ...\\n ],\\n \"metadata\": {\\n \"source\": <数据来源>,\\n \"timestamp\": <时间戳字符串>,\\n \"complexity\": <整数:问题复杂度标记>\\n }\\n}')",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "Text2MultiHopQAGeneratorPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "seed",
- "default": 0,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "lang",
- "default": "en",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": "chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": "enhanced_chunk_path",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 124,
- "name": "QAExtractor",
- "description": "QA对提取器 - 将嵌套的QA_pairs转换为Alpaca微调格式\n\n核心功能:\n从结构化的QA对数据中提取问答内容,自动整合推理步骤和支持事实,\n输出符合Stanford Alpaca标准的instruction-input-output格式。\n\n初始化参数:\n• qa_key: QA对的字段名 (默认: 'QA_pairs')\n• output_json_file: 输出JSON文件路径 (可选,不指定则只更新DataFrame)\n• instruction: 统一的指令前缀 (默认: 'Please answer the following question...')\n\n运行参数 (input_key):\n• None - 包含所有字段 (question + reasoning_steps + supporting_facts)\n• '' - 空字符串,不包含额外上下文\n• 'reasoning_steps' - 只包含推理步骤\n• 'question,reasoning_steps' - 逗号分隔多个字段\n• ['question', 'supporting_facts'] - 列表格式\n\n输出字段:\n• instruction: 问题指令\n• input: 上下文信息 (根据input_key动态拼接)\n• output: 答案\n\n适用场景: 知识库QA微调、领域问答模型训练",
- "type": {
- "level_1": "knowledge_cleaning",
- "level_2": "generate"
- },
- "allowed_prompts": [],
- "parameter": {
- "init": [
- {
- "name": "qa_key",
- "default": "QA_pairs",
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_json_file",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "instruction",
- "default": "Please answer the following question based on the provided information.",
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ],
- "run": [
- {
- "name": "storage",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "input_key",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "output_key",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- }
- ]
- },
- "required": "",
- "depends_on": [],
- "mode": ""
- },
- {
- "node": 125,
- "name": "ReasoningAnswerGenerator",
- "description": "该算子用于为给定问题生成答案,调用大语言模型进行推理。\n输入参数:\n- llm_serving:LLM服务实例,用于生成答案\n- prompt_template:提示模板对象,用于构建生成提示词\n输出参数:\n- output_key:生成的答案字段,默认'generated_cot'",
- "type": {
- "level_1": "reasoning",
- "level_2": "generate"
- },
- "allowed_prompts": [
- "MathAnswerGeneratorPrompt",
- "GeneralAnswerGeneratorPrompt",
- "DiyAnswerGeneratorPrompt"
- ],
- "parameter": {
- "init": [
- {
- "name": "llm_serving",
- "default": null,
- "kind": "POSITIONAL_OR_KEYWORD"
- },
- {
- "name": "prompt_template",
- "default": "