# LLM 作为裁判：对比金融分析报告

**目的:** 使用大型语言模型（LLM）作为专家裁判，对比分析由不同模型生成的两份（或三份）金融报告的优劣。输入包括待比较的报告文件（Markdown 格式）和一个参考信息文件，输出 LLM 的详细对比评估。

**用法:**
1.  **设置 OpenAI API 密钥:** 确保 `OPENAI_API_KEY` 环境变量已设置，或者在下面的配置单元格中直接设置 `openai.api_key`。
2.  **配置参数:** 在 "配置参数" 单元格中，设置报告文件路径 (`report_paths`)、参考文件路径 (`reference_path`)、要使用的 LLM 模型 (`model_name`) 和温度 (`temperature`)。
3.  **运行单元格:** 按顺序执行所有单元格。评估结果将显示在最后一个单元格的输出中。


In [None]:
import os
import openai
from typing import List, Dict, Optional
import datetime


API_KEY = ""
BASE_URL = ""

## 2. 配置参数
在这里修改你要比较的文件路径和模型设置。

In [5]:
# === 输入文件路径 ===
# 需要比较的报告文件列表 (至少 2 个，最多 3 个)
report_paths: List[str] = [
    "/home/dasoumao/NLP/test/600519SH_20250418_223225/report_600519SH_20250418_223225.md",
    "/home/dasoumao/NLP/test/600519SH_20250418_225415/report_600519SH_20250418_225415.md",
    # "report3.md"  # 如果有第三个报告，取消注释并添加路径
]

# 参考信息文件路径
# reference_path: Optional[str] = "reference.md"
reference_path: Optional[str] = None

# === OpenAI 模型设置 ===
# 使用的模型 (例如 "gpt-4", "gpt-4-turbo", "gpt-3.5-turbo")
model_name: str = "gpt-4o"

# 温度 (0.0 表示更确定的输出, 更高值表示更多样性/创造性)
temperature: float = 0.0

# API 调用时的最大 Token 数
max_tokens: int = 10000

# --- 新增：设置保存结果的基础目录 ---
save_dir = "/home/dasoumao/NLP/test_judge" # 您可以修改为您希望的目录名

## 3. 辅助函数

In [6]:
def load_file(path: str) -> str:
    """读取文件内容（支持 Markdown 或纯文本）。"""
    try:
        with open(path, 'r', encoding='utf-8') as f:
            return f.read()
    except FileNotFoundError:
        print(f"错误：文件未找到 '{path}'")
        raise
    except Exception as e:
        print(f"读取文件 '{path}' 时出错: {e}")
        raise


def construct_messages_for_comparison(reports: List[str], reference_content: Optional[str]) -> List[Dict[str, str]]:
    """
    构造 Chat Completion 的 messages 列表。
    根据是否提供了 reference_content，生成不同的 System 和 User prompts。
    """
    num_reports = len(reports)
    report_mentions = " 和 ".join([f"报告 {i+1}" for i in range(num_reports)])

    if reference_content:
        # --- Prompt WITH Reference ---
        system_prompt = f"""
You are a highly experienced and objective senior financial analyst. Your primary task is to **critically compare** {num_reports} financial analysis reports provided below, evaluating them **strictly based on their fidelity and relevance to the accompanying reference material**.

**Your Goal:** Determine which report offers a more insightful, accurate, well-structured, and relevant analysis *relative to the others and the provided reference*. Focus on a **direct comparison** using the reference as the ground truth.

**Evaluation Aspects for Comparison (against Reference):**
When comparing {report_mentions}, critically assess their relative performance on:
1.  **Accuracy & Fidelity:** How faithfully and accurately does each report represent the facts, figures, and nuances present *in the reference material*? Which report demonstrates superior accuracy and avoids misinterpretations *of the reference*?
2.  **Analytical Depth & Insight (from Reference):** Compare the depth of analysis *derived from the reference*. Which report goes beyond surface-level summarization of the reference to offer sharper insights, identify key trends *mentioned or implied in the reference*, or draw more meaningful conclusions *supported by the reference*?
3.  **Clarity & Organization:** Evaluate the structure, readability, and conciseness of each report. Which one is better organized and easier for a financial professional to understand? (General criterion)
4.  **Relevance & Focus (to Reference):** How effectively does each report utilize the reference material to address the implied analytical task? Which report is more focused on the reference content and avoids irrelevant information or tangents?

**Output Structure:**
Your response must follow this structure strictly:
1.  **Overall Comparative Assessment:** Begin with a concise paragraph stating which report you judge to be superior overall *based on its handling of the reference material* and briefly outline the core reasons for your judgment.
2.  **Detailed Side-by-Side Comparison:**
    *   **Accuracy (vs. Reference):** Compare the reports on accuracy *against the reference*. State which is better and provide specific examples referencing the reports and the reference material.
    *   **Depth & Insight (from Reference):** Compare the reports on analytical depth *derived from the reference*. State which is better and justify with examples linked to the reference.
    *   **Clarity & Organization:** Compare the reports on clarity. State which is better and justify.
    *   **Relevance & Focus (to Reference):** Compare the reports on relevance *to the reference*. State which is better and justify.
3.  **Strengths & Weaknesses Summary (relative to Reference):** Provide bullet points summarizing the key comparative strengths and weaknesses of each report *in relation to the reference*.
    *   Report 1: Strengths - [...], Weaknesses - [...]
    *   Report 2: Strengths - [...], Weaknesses - [...]
    *   (If applicable) Report 3: Strengths - [...], Weaknesses - [...]
4.  **Relative Scoring (reflecting Reference usage):** *After* the detailed comparison, provide relative scores (0–10) reflecting your comparative judgment *based on the reference*. Explain briefly how the scores reflect the identified differences.
    (Score format example follows)

**Important:** Your analysis must be **grounded in the provided reference material**. Be specific and provide evidence for your claims. Avoid generic statements or using external knowledge. Your value lies in the **comparative judgment based on the reference**.
"""
        user_content = "**Reference Material:**\n```markdown\n" + reference_content + "\n```\n\n"
        for i, rpt in enumerate(reports, start=1):
            user_content += f"**Report {i}:**\n```markdown\n{rpt}\n```\n\n"
        user_content += f"Please perform the detailed comparative evaluation of {report_mentions} **based strictly on the provided reference material** and the instructions above."

    else:
        # --- Prompt WITHOUT Reference ---
        system_prompt = f"""
You are a highly experienced and objective senior financial analyst. **No reference material is provided.** Your primary task is to **critically compare** the {num_reports} financial analysis reports below based on their internal quality, structure, and analytical rigor.

**Your Goal:** Determine which report presents a more insightful, well-reasoned, clear, and potentially plausible financial analysis *relative to the others*. Focus on a **direct comparison** of the reports themselves.

**Evaluation Aspects for Comparison (Internal Quality):**
When comparing {report_mentions}, critically assess their relative performance on:
1.  **Internal Consistency & Logical Flow:** Does the analysis within each report follow logically? Are the arguments and conclusions well-supported by the data or statements *presented within that report*? Which report demonstrates superior internal consistency?
2.  **Analytical Depth & Insight:** Compare the depth and sophistication of the analysis presented. Which report goes beyond surface-level descriptions to offer sharper insights, identify potentially significant patterns or trends (based on its own content), or draw more meaningful conclusions?
3.  **Clarity & Organization:** Evaluate the structure, readability, and conciseness of each report. Which one is better organized and easier for a financial professional to understand?
4.  **Plausibility & Professionalism:** Based on general financial principles (without assuming specific external facts), which report seems more plausible, professional, and well-structured for a typical financial analysis task? Does one report make clearly unrealistic or unsupported claims?

**Output Structure:**
Your response must follow this structure strictly:
1.  **Overall Comparative Assessment:** Begin with a concise paragraph stating which report you judge to be superior overall *based on its internal quality and analytical presentation* and briefly outline the core reasons for your judgment.
2.  **Detailed Side-by-Side Comparison:**
    *   **Consistency & Logic:** Compare the reports on internal consistency and logical flow. State which is better and provide specific examples from the reports.
    *   **Depth & Insight:** Compare the reports on analytical depth. State which is better and justify with examples from the reports.
    *   **Clarity & Organization:** Compare the reports on clarity. State which is better and justify.
    *   **Plausibility & Professionalism:** Compare the reports on plausibility and overall professionalism. State which is better and justify.
3.  **Strengths & Weaknesses Summary (Internal):** Provide bullet points summarizing the key comparative strengths and weaknesses of each report based on their internal characteristics.
    *   Report 1: Strengths - [...], Weaknesses - [...]
    *   Report 2: Strengths - [...], Weaknesses - [...]
    *   (If applicable) Report 3: Strengths - [...], Weaknesses - [...]
4.  **Relative Scoring (reflecting Internal Quality):** *After* the detailed comparison, provide relative scores (0–10) reflecting your comparative judgment *based on internal quality*. Explain briefly how the scores reflect the identified differences.
    (Score format example follows)

**Important:** Your analysis must be based **solely on the content of the reports provided**. Compare them against each other using principles of good financial analysis, logic, and clarity. Do not assume external knowledge or data. Your value lies in the **comparative judgment of the reports themselves**.
"""
        user_content = "" # Start fresh, no reference section
        for i, rpt in enumerate(reports, start=1):
            user_content += f"**Report {i}:**\n```markdown\n{rpt}\n```\n\n"
        user_content += f"Please perform the detailed comparative evaluation of {report_mentions} **based on their internal quality and the instructions above, as no reference material was provided.**"

    # Common part for score format example (added to both system prompts)
    score_format_example = """
    Example Score Format:
    Report 1:
    - [Criteria 1 Name]: X/10
    - [Criteria 2 Name]: X/10
    - [Criteria 3 Name]: X/10
    - [Criteria 4 Name]: X/10
    - Overall: X/10 (Reflects its standing relative to the other reports based on the specified evaluation context)

    Report 2:
    - [Criteria 1 Name]: Y/10
    ...
    - Overall: Y/10

    (If applicable) Report 3:
    ...
    """
    # Append the score format example to the system prompt correctly
    system_prompt = system_prompt.replace("(Score format example follows)", score_format_example)


    return [
        {"role": "system", "content": system_prompt.strip()},
        {"role": "user",   "content": user_content.strip()},
    ]


def construct_messages_for_comparison(reports: List[str], reference_content: Optional[str]) -> List[Dict[str, str]]:
    """
    Generates the system and user prompts for evaluating financial reports.

    Args:
        reports (list): A list of strings, where each string is a financial report.
        reference_content (str, optional): The reference material. Defaults to None.

    Returns:
        list: A list containing the system and user message dictionaries.
    """
    num_reports = len(reports)
    report_mentions = " and ".join([f"Report {i+1}" for i in range(num_reports)])
    if num_reports > 2:
        report_mentions = ", ".join([f"Report {i+1}" for i in range(num_reports-1)]) + f", and Report {num_reports}"

    # --- Define Report Modules (Based on Generation Prompt) ---
    report_modules = [
        "Market review",
        "Risk characteristics",
        "Technical indicators/signals",
        "Recent News Analysis",
        "Additional summary recommendations"
    ]

    # --- Define Scoring Rubric (Inspired by Reference Materials) ---
    # Using a 1-5 scale as suggested by Prometheus example and good practice
    score_rubric = """
**Scoring Rubric (1-5 Scale):**
*   **5 (Excellent):** Significantly exceeds expectations for this module compared to the other report(s). Demonstrates strong adherence to reference (if provided), exceptional insight, clarity, and relevance/completeness for the module's goal.
*   **4 (Good):** Meets expectations well and performs better than the other report(s) on this module. Shows good accuracy, depth, clarity, and relevance.
*   **3 (Average/Adequate):** Meets basic expectations for the module, potentially on par with or slightly better/worse than another report. May have minor shortcomings in accuracy, depth, clarity, or relevance. (Use this as a baseline).
*   **2 (Below Average):** Has noticeable deficiencies in this module compared to the other report(s). Lacks sufficient accuracy, depth, clarity, or relevance.
*   **1 (Poor):** Significantly fails to meet the requirements for this module compared to the other report(s). Major issues with accuracy, depth, clarity, or relevance.
"""

    if reference_content:
        # --- Prompt WITH Reference ---
        system_prompt = f"""
You are a highly experienced, meticulous, and objective senior financial analyst. Your task is to **critically compare** {num_reports} financial analysis reports provided below. Your evaluation must be **strictly based on their fidelity, relevance, and depth derived *from the accompanying reference material***, comparing them against each other module by module.

**Your Goal:** Determine which report offers a more insightful, accurate, well-structured, and relevant analysis *relative to the others and the provided reference material* for each required section of a professional financial report. Focus on a **direct, comparative judgment** using the reference as the ground truth.

**Evaluation Aspects Per Module (Against Reference):**
When comparing {report_mentions} for each module above, critically assess their relative performance based *strictly on the reference material*:
1.  **Accuracy & Fidelity (vs. Reference):** How faithfully does each report represent facts/figures *from the reference* for this module? Which is superior in avoiding misinterpretations *of the reference*?
2.  **Analytical Depth & Insight (from Reference):** Compare the depth of analysis *derived directly from the reference* for this module. Which report provides sharper insights or more meaningful conclusions *supported explicitly or implicitly by the reference*?
3.  **Relevance & Focus (to Reference):** How effectively does each report use the reference material *relevant to this module*? Which is more focused and avoids irrelevant content?
4.  **Clarity & Organization (within Module):** Which report presents the information for this module more clearly and logically?

**Chain-of-Thought (CoT) Instruction:** Before giving a final verdict or score for each module and overall, you MUST first articulate your detailed comparative reasoning process.

**Output Structure (Strictly Follow):**

1.  **Overall Comparative Assessment:**
    *   **Overall Reasoning (CoT):** Briefly explain your thought process comparing the reports holistically based on the reference material.
    *   **Overall Winner:** State which report you judge superior overall *based on its handling of the reference material* and why.

2.  **Detailed Modular Comparison (Perform for EACH module listed above):**
    *   **Module Name:** (e.g., "Market review")
    *   **Comparative Rationale (CoT):** Provide a detailed side-by-side analysis comparing how each report handled this specific module *based on the reference material* and the evaluation aspects. Discuss specific examples.
    *   **Module Winner:** State which report performed better *for this specific module* relative to the others and the reference.
    *   **Justification:** Briefly explain the primary reason for the module winner decision, linking back to the rationale.

3.  **Strengths & Weaknesses Summary (Relative to Reference):**
    *   Report 1: Strengths - [...], Weaknesses - [...] (Focus on comparative performance using reference)
    *   Report 2: Strengths - [...], Weaknesses - [...]
    *   (If applicable) Report 3: Strengths - [...], Weaknesses - [...]

4.  **Detailed Scoring (Reflecting Reference Usage):**
    *   {score_rubric}
    *   **Score Justification Logic:** Scores must reflect the *relative comparison* discussed in the modular analysis. A higher score indicates better performance *compared to the other report(s)* on that module, strictly judged against the reference. Explain *briefly* how scores reflect the comparison.
    *   **Scores:**
        *   Report 1:
            *   Market review: [Score 1-5] - Justification: [...]
            *   Risk characteristics: [Score 1-5] - Justification: [...]
            *   Technical indicators/signals: [Score 1-5] - Justification: [...]
            *   Recent News Analysis: [Score 1-5] - Justification: [...]
            *   Additional summary recommendations: [Score 1-5] - Justification: [...]
            *   **Overall Comparative Score:** [Score 1-5] - Justification: [Reflects overall standing relative to others based on reference usage]
        *   Report 2:
            *   ... (Repeat structure for all modules + Overall) ...
        *   (If applicable) Report 3:
            *   ... (Repeat structure for all modules + Overall) ...

**Important:** Ground your entire analysis **exclusively in the provided reference material**. Be specific, cite examples from the reports and reference. Avoid external knowledge. Your value is the **comparative judgment based on the reference**.
"""
        user_content = "**Reference Material:**\n```markdown\n" + reference_content + "\n```\n\n"
        for i, rpt in enumerate(reports, start=1):
            user_content += f"**Report {i}:**\n```markdown\n{rpt}\n```\n\n"
        user_content += f"Please perform the detailed comparative evaluation of {report_mentions} **based strictly on the provided reference material**, following all instructions including the module-by-module comparison, CoT reasoning, and scoring rubric."

    else:
        # --- Prompt WITHOUT Reference ---
        system_prompt = f"""
You are a highly experienced, meticulous, and objective senior financial analyst. **No reference material is provided.** Your task is to **critically compare** the {num_reports} financial analysis reports below based on their internal quality, structure, analytical rigor, and professional presentation, comparing them against each other module by module.

**Your Goal:** Determine which report presents a more insightful, well-reasoned, clear, and plausible financial analysis *relative to the others* for each required section of a professional financial report. Focus on a **direct, comparative judgment** based on the reports' own content.

**Evaluation Aspects Per Module (Internal Quality):**
When comparing {report_mentions} for each module above, critically assess their relative performance based *solely on the content within the reports*:
1.  **Internal Consistency & Logical Flow:** Does the analysis within this module follow logically? Are arguments supported by data/statements *within the report*? Which report shows superior internal consistency for this module?
2.  **Analytical Depth & Insight:** Compare the depth of analysis presented *within this module*. Which report offers sharper insights, identifies more significant patterns (based on its own content), or draws more meaningful conclusions for this module?
3.  **Plausibility & Professionalism:** Based on general financial principles (without external facts), which report seems more plausible and professional *in its presentation and claims* for this module? Does one make clearly unrealistic or unsupported claims here?
4.  **Clarity & Organization (within Module):** Which report presents the information for this module more clearly and logically?

**Chain-of-Thought (CoT) Instruction:** Before giving a final verdict or score for each module and overall, you MUST first articulate your detailed comparative reasoning process.

**Output Structure (Strictly Follow):**

1.  **Overall Comparative Assessment:**
    *   **Overall Reasoning (CoT):** Briefly explain your thought process comparing the reports holistically based on internal quality and structure.
    *   **Overall Winner:** State which report you judge superior overall *based on its internal quality and analytical presentation* and why.

2.  **Detailed Modular Comparison (Perform for EACH module listed above):**
    *   **Module Name:** (e.g., "Market review")
    *   **Comparative Rationale (CoT):** Provide a detailed side-by-side analysis comparing how each report handled this specific module *based on internal quality* and the evaluation aspects. Discuss specific examples from the reports themselves.
    *   **Module Winner:** State which report performed better *for this specific module* relative to the others based on internal quality.
    *   **Justification:** Briefly explain the primary reason for the module winner decision, linking back to the rationale.

3.  **Strengths & Weaknesses Summary (Internal):**
    *   Report 1: Strengths - [...], Weaknesses - [...] (Focus on comparative internal quality)
    *   Report 2: Strengths - [...], Weaknesses - [...]
    *   (If applicable) Report 3: Strengths - [...], Weaknesses - [...]

4.  **Detailed Scoring (Reflecting Internal Quality):**
    *   {score_rubric}
    *   **Score Justification Logic:** Scores must reflect the *relative comparison* discussed in the modular analysis. A higher score indicates better performance *compared to the other report(s)* on that module, judged on internal quality. Explain *briefly* how scores reflect the comparison.
    *   **Scores:**
        *   Report 1:
            *   Market review: [Score 1-5] - Justification: [...]
            *   Risk characteristics: [Score 1-5] - Justification: [...]
            *   Technical indicators/signals: [Score 1-5] - Justification: [...]
            *   Recent News Analysis: [Score 1-5] - Justification: [...]
            *   Additional summary recommendations: [Score 1-5] - Justification: [...]
            *   **Overall Comparative Score:** [Score 1-5] - Justification: [Reflects overall standing relative to others based on internal quality]
        *   Report 2:
            *   ... (Repeat structure for all modules + Overall) ...
        *   (If applicable) Report 3:
            *   ... (Repeat structure for all modules + Overall) ...

**Important:** Base your analysis **solely on the content of the reports provided**. Compare them against each other using principles of good financial analysis, logic, and clarity. Do not assume external knowledge. Your value is the **comparative judgment of the reports themselves**.
"""
        user_content = "" # Start fresh, no reference section
        for i, rpt in enumerate(reports, start=1):
            user_content += f"**Report {i}:**\n```markdown\n{rpt}\n```\n\n"
        user_content += f"Please perform the detailed comparative evaluation of {report_mentions} **based solely on their internal quality**, following all instructions including the module-by-module comparison, CoT reasoning, and scoring rubric, as no reference material was provided."

    # --- Construct the final message list ---
    return [
        {"role": "system", "content": system_prompt.strip()},
        {"role": "user",   "content": user_content.strip()},
    ]


def evaluate_reports_comparison(report_paths: List[str], reference_path: Optional[str],
                                model: str, temperature: float, max_tokens: int) -> str:
    """
    读取文件 (reference 是可选的), 构造对比 prompt, 调用 OpenAI API, 并返回 LLM 的评审结果文本。
    """
    if not (2 <= len(report_paths) <= 3):
         raise ValueError("请提供 2 或 3 个报告文件路径进行比较。")

    reference_content: Optional[str] = None # Initialize as None
    try:
        # 读取报告内容
        print("正在加载报告文件...")
        reports_content = [load_file(p) for p in report_paths]
        print("报告文件加载完成。")

        # **有条件地** 读取参考文件内容
        if reference_path:
            print(f"正在加载参考文件: {reference_path}...")
            reference_content = load_file(reference_path)
            print("参考文件加载完成。")
        else:
            print("未提供参考文件，将进行基于内部质量的比较。")

        # 构造对话 messages (函数内部处理 reference_content 是否为 None)
        print("正在构造 Prompt...")
        messages = construct_messages_for_comparison(reports_content, reference_content)
        # print("\n--- DEBUG: System Prompt ---")
        # print(messages[0]['content'])
        # print("\n--- DEBUG: User Prompt Snippet ---")
        # print(messages[1]['content'][:500] + "...") # 打印部分 user prompt 供检查
        # print("-----------------------------\n")



        # 调用 ChatCompletion
        print(f"正在调用 OpenAI API (模型: {model}, 温度: {temperature})...")

        client = openai.Client(api_key=API_KEY, base_url=BASE_URL)
        
        response = client.chat.completions.create( # 使用 v1.x+ SDK
            model=model,
            messages=messages,
            temperature=temperature,
            max_tokens=max_tokens,
            # top_p=1.0, # 可以考虑添加其他参数
            # frequency_penalty=0.0,
            # presence_penalty=0.0
        )
        print("API 调用完成。")
        return response.choices[0].message.content.strip() # v1.x+ SDK
    except FileNotFoundError:
        # load_file 函数内部已处理打印，这里直接重新抛出
        raise
    except Exception as e:
        print(f"\n运行评估时发生意外错误：{e}")
        raise

## 4. 执行评估

In [7]:
print("开始执行 LLM 评审流程...")

try:
    # 调用评估函数
    llm_evaluation_result = evaluate_reports_comparison(
        report_paths=report_paths,
        reference_path=reference_path,
        model=model_name,
        temperature=temperature,
        max_tokens=max_tokens
    )

    # 打印结果
    print("\n" + "="*20 + " LLM 评审结果 " + "="*20)
    # 使用 Markdown 格式打印，在 Jupyter 中通常会渲染得更好
    from IPython.display import display, Markdown
    display(Markdown(llm_evaluation_result))
    print("="*55)

    try:
        # 1. 确保保存目录存在
        os.makedirs(save_dir, exist_ok=True)

        # 2. 构建文件名
        # 提取报告文件的基本名称（不含路径和扩展名）
        report_basenames = [os.path.splitext(os.path.basename(p))[0] for p in report_paths]
        # 将报告名称用连字符连接起来（如果太长可以考虑截断或哈希）
        reports_part = "-".join(report_basenames)
        # 获取当前时间戳
        timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
        # 组合文件名
        filename = f"{reports_part}_{timestamp}_evaluation.md" # 保存为 Markdown 文件

        # 3. 构建完整保存路径
        save_path = os.path.join(save_dir, filename)

        # 4. 写入文件 (使用 UTF-8 编码)
        with open(save_path, 'w', encoding='utf-8') as f:
            f.write(llm_evaluation_result)

        print(f"\n[+] 评估结果已成功保存至: {save_path}")

    except Exception as save_e:
        # 如果保存过程中出现错误，打印错误信息，但不中断主流程
        print(f"\n[!] 保存评估结果时发生错误: {save_e}")
    # --- 结束：保存评估结果 ---

except FileNotFoundError:
    print("\n评估失败：一个或多个输入文件未找到。请检查 '配置参数' 单元格中的文件路径。")
except ValueError as ve:
    print(f"\n评估失败：{ve}")
except Exception as e:
    # 其他错误已在 evaluate_reports_comparison 中打印部分信息
    print(f"\n评估因意外错误而终止。")

print("\nLLM 评审流程结束。")

开始执行 LLM 评审流程...
正在加载报告文件...
报告文件加载完成。
未提供参考文件，将进行基于内部质量的比较。
正在构造 Prompt...
正在调用 OpenAI API (模型: gpt-4o, 温度: 0.0)...
API 调用完成。



1. **Overall Comparative Assessment:**
   * **Overall Reasoning (CoT):** Both reports provide a structured analysis of the stock 600519.SH, covering market review, risk characteristics, technical indicators, recent news, and summary outlook. Report 1 and Report 2 are similar in content, but differ slightly in presentation and depth. Report 1 offers a more concise analysis, while Report 2 provides additional details, particularly in the news analysis section. Both reports maintain internal consistency and logical flow, but Report 2's additional insights in the news analysis section give it a slight edge in analytical depth.
   * **Overall Winner:** Report 2 is judged superior overall based on its internal quality and analytical presentation. It provides a more detailed analysis in the news section, offering additional insights that enhance the understanding of the stock's situation.

2. **Detailed Modular Comparison:**

   * **Module Name:** Market Review
     * **Comparative Rationale (CoT):** Both reports present the market review with similar data points, including the 1-year, 3-month, and 1-month price changes, as well as the latest closing price. The presentation is clear and consistent in both reports, with no significant differences in the information provided.
     * **Module Winner:** Tie
     * **Justification:** Both reports offer a clear and consistent market review, with no discernible difference in quality or depth.

   * **Module Name:** Risk Characteristics
     * **Comparative Rationale (CoT):** Both reports describe the stock's risk profile using the same metrics: 1-year annualized daily return volatility and historical maximum drawdown. The presentation is straightforward and consistent in both reports, with no additional insights or analysis provided.
     * **Module Winner:** Tie
     * **Justification:** Both reports present the risk characteristics with equal clarity and consistency, using the same metrics.

   * **Module Name:** Technical Indicators/Signals
     * **Comparative Rationale (CoT):** Both reports note the absence of significant moving average crossover signals, indicating no strong technical indicators for trend changes. The analysis is brief and similar in both reports, with no additional insights or depth.
     * **Module Winner:** Tie
     * **Justification:** Both reports provide a similar level of analysis regarding technical indicators, with no significant differences in content or presentation.

   * **Module Name:** Recent News Analysis
     * **Comparative Rationale (CoT):** Report 2 provides a more detailed analysis of the recent news, including key points and potential impacts. It discusses Moutai's brand value decline and its implications for investor sentiment, while also highlighting Moutai's continued dominance as a top brand. Report 1 offers a concise summary but lacks the depth and additional insights found in Report 2.
     * **Module Winner:** Report 2
     * **Justification:** Report 2 offers a more comprehensive analysis of the recent news, providing additional insights into the potential impact on investor sentiment and stock performance.

   * **Module Name:** Summary & Outlook
     * **Comparative Rationale (CoT):** Both reports summarize the stock's performance and outlook similarly, noting mixed price performance, high volatility, and the impact of brand value changes. Report 2 includes a slightly more detailed discussion of Moutai's brand strength as a mitigating factor, which adds depth to the analysis.
     * **Module Winner:** Report 2
     * **Justification:** Report 2 provides a slightly more detailed summary and outlook, incorporating additional insights into Moutai's brand strength and its potential impact on investor confidence.

3. **Strengths & Weaknesses Summary (Internal):**
   * Report 1: Strengths - Concise presentation, clear structure; Weaknesses - Lacks depth in news analysis.
   * Report 2: Strengths - Detailed news analysis, comprehensive summary; Weaknesses - Slightly verbose in some sections.

4. **Detailed Scoring (Reflecting Internal Quality):**

   * **Scores:**
     * Report 1:
       * Market review: 3 - Justification: Clear and consistent presentation, similar to Report 2.
       * Risk characteristics: 3 - Justification: Equal clarity and consistency with Report 2.
       * Technical indicators/signals: 3 - Justification: Similar level of analysis as Report 2.
       * Recent News Analysis: 2 - Justification: Lacks depth compared to Report 2.
       * Summary & Outlook: 3 - Justification: Clear summary, slightly less detailed than Report 2.
       * **Overall Comparative Score:** 3 - Justification: Adequate overall, but less detailed in news analysis.

     * Report 2:
       * Market review: 3 - Justification: Clear and consistent presentation, similar to Report 1.
       * Risk characteristics: 3 - Justification: Equal clarity and consistency with Report 1.
       * Technical indicators/signals: 3 - Justification: Similar level of analysis as Report 1.
       * Recent News Analysis: 4 - Justification: Provides additional insights and depth.
       * Summary & Outlook: 4 - Justification: More detailed summary and outlook.
       * **Overall Comparative Score:** 4 - Justification: Superior overall due to detailed news analysis and comprehensive summary.


[+] 评估结果已成功保存至: /home/dasoumao/NLP/test_judge/report_600519SH_20250418_223225-report_600519SH_20250418_225415_20250419_103251_evaluation.md

LLM 评审流程结束。


## 5. （可选）进一步分析


你可以在这里添加更多单元格来：
*   解析 LLM 输出的评分。
*   对结果进行可视化。
*   与之前的运行结果进行比较。
*   等等。