# LLM as a judge: 

This notebook demonstrates an evaluation pipeline where an LLM judges how well AI responses align with paper hypotheses.

In [1]:
%reload_ext autoreload
%autoreload 2

import configparser

from openai import OpenAI
from IPython.display import Markdown

config = configparser.ConfigParser()
config.read("../secrets.ini")

['../secrets.ini']

In [2]:
API_KEY = config["DEFAULT"]["API_KEY"]
client = OpenAI(api_key=API_KEY) 

In [None]:
def generate_response(
    ai_response: str,
    paper_key_findings: str,
    client,
    model: str = "gpt-4o",
    temperature: float = 0.9
) -> str:
    """
    Generate a response using the OpenAI API with formatted prompt.
    
    Parameters
    ----------
    ai_response : str
        The AI response to include in the prompt
    paper_key_findings : dict
        The key findings from the paper to include in the prompt
    client : OpenAI client
        The initialized OpenAI client
    model : str, optional
        The model to use for the API call, by default "gpt-4o"
    temperature : float, optional
        Temperature parameter for response generation, by default 0.9
        
    Returns
    -------
    str
        The content of the model's response
    """
    PROMPT = """\
    # Instruction

    Your task is to compare the analysis of proteomics data from an original research paper with an AI-generated response. You will need to match corresponding positions and findings between the two analyses, identifying which parts of the AI analysis are present in the original research paper.

    # Evaluation Process

    ## Matching Procedure

    We will provide you with a dictionary that contains the key findings of an original research paper, which serve you as a ground truth.

    STEP 1: Extract key claims and identified protein patterns from the AI-generated response and write them down.

    STEP 2: Compare the key findings of the original research paper with the key claims extracted in step 1 and the AI-generated response itself. Categorise the matches as: Aligned, Partially Aligned, Misaligned, Not Addressed.

    STEP 3: Create a matching table with the following columns:
    * Original Paper Finding
    * Citation from Corresponding AI Analysis
    * Alignment Assessment (Aligned, Partially Aligned, Misaligned, Not Addressed)
    * Notes on Differences or Similarities

    STEP 4: Provide an overall assessment of how well the AI analysis captured the original paper's findings.

    ## Input Materials

    ### Dictionary with key findings from original research paper

    {paper_key_findings}

    ### AI-Generated Analysis

    {ai_response}

    ## Output Format

    ### Matching Table

    | Original Paper Finding | Citation from Corresponding AI Analysis | Alignment Assessment | Notes |
    |------------------------|---------------------------|----------------------|-------|
    | Hypothesis 1: [Description] | [AI phrase that matches] | [Aligned/Partially/Misaligned] | [Explanation] |
    | Hypothesis 2: [Description] | [AI phrase that matches] | [Aligned/Partially/Misaligned] | [Explanation] |
    | [etc...] | | | |
    
    ### Findings Not Addressed

    List any significant findings from the original paper that were not addressed in the AI analysis.

    ### AI Claims Without Paper Support**

    List any claims made in the AI analysis that are not supported by the original paper excerpt.

    ### Overall Assessment

    Provide a summary assessment of how well the AI analysis matched and interpreted the original paper findings, noting strengths and weaknesses in the correspondence.

    ### Task Overview

    Your task is to compare the original analysis from the research paper with the AI-generated analysis of the same proteomics data. You need to identify corresponding points between the two analyses and evaluate how well the AI interpretation aligns with the scientific findings from the paper."""
    
    formatted_prompt = PROMPT.format(
        ai_response=ai_response,
        paper_key_findings=paper_key_findings
    )
    
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "You are an expert evaluator specializing in proteomics and molecular biology analysis."},
            {"role": "user", "content": formatted_prompt}
        ],
        temperature=temperature
    )
    
    return response.choices[0].message.content

In [7]:
paper_key_findings = {
    "Hypothesis1": "The proteasome complex is downregulated.",
    "Hypothesis2": "Heat shock proteins are upregulated.",
    "Hypothesis3": "The (pre-)ribosome is downregulated.",
    "Hypothesis4": "Cytosolic protein folding machinery is upregulated.",
    "Hypothesis5": "Regulators of proteasome assembly are upregulated.",
    "Hypothesis6": "Proteins involved in sterol synthesis are upregulated.",
    "Hypothesis7": "Multiple targets of the unfolded protein response are upregulated.",
    "Hypothesis10": "Protein homeostatis is disrupted, as evident from increased protein folding efforts and decreased protein synthesis.",
    "Hypothesis11": "Reduced proteasome capacity overly affects ERAD substrates."
}

ai_response = """
# Analysis of RPN4 Knockout Strain

## 1. Functional Analysis

* **Protein Synthesis & Ribosome Biogenesis (Downregulated):** Many ribosomal proteins (RPLs, RPSs, etc.) and factors involved in ribosome assembly and function, such as NSA1 and DED1, are downregulated. This suggests a reduction in protein synthesis capacity and altered ribosome biogenesis in the RPN4 knockout strain. This may affect the cell's ability to translate proteins efficiently.

* **Proteasome and Proteasome Regulatory Proteins (Downregulated):** Several proteins associated with the proteasome complex, like PRE1, RPN1, and RPN6, are downregulated. This could indicate altered protein degradation mechanisms, which might affect protein turnover and stress response.

* **Heat Shock Proteins (Upregulated):** Proteins like HSP82, SSA1, and HSP104 are upregulated, suggesting an activated stress response, likely compensating for proteostasis imbalance.

* **Metabolic Enzymes (Upregulated):** Enzymes like TDH1 (glyceraldehyde-3-phosphate dehydrogenase) and CIT1 (citrate synthase) are upregulated, hinting at alterations in glycolysis and the TCA cycle, potentially adapting to metabolic needs in the altered cellular state due to RPN4 knockout.

## 2. Ontology Analysis

* **Stress Response and Heat Shock Response (Upregulated):** The upregulation of heat shock proteins implies activation of cellular mechanisms to maintain protein stability under stress conditions.

* **Ribosome Biogenesis and Assembly (Downregulated):** Enrichment in downregulated ribosomal proteins reveals diminished synthesis capacity impacting overall cellular growth and proliferation processes.

* **Metabolic Pathways (Upregulated):** Increased expression of metabolic enzymes suggests a shift in cellular metabolism, possibly enhancing pathways to meet energetic or biosynthetic needs under the RPN4-deleted condition.

## 3. Critical Review

* There is consistency across the analyses indicating a downregulation in protein synthesis capability accompanied by compensatory mechanisms (upregulated stress proteins).

* The simultaneous downregulation of proteasome components alongside ribosomal subunits suggests a strategic slowdown in overall protein turnover and synthesis, enabling the cell to handle stress effectively.

## 4. Biological Context

* **Experimental Design:** The comparison examines the RPN4 knockout against wildtype yeast, focusing on initial cellular responses at zero hours post-variable introduction.

* **Biological Implications:** The increased stress response, downregulated protein synthesis, and highlighted metabolic adjustments indicate cellular adaptation strategies to maintain homeostasis in response to genetic deletion of RPN4.

## 5. Focused Analysis

Key aspects worth investigating further:

1. **Heat Shock Response Pathway:** Given its role in managing protein unfolding and aggregation under stress, further exploring this could uncover strategies to mitigate proteotoxic stress.

2. **Metabolic Adaptation in Central Metabolism:** Understanding how glycolysis and TCA cycle enzymes adapt provides insights into metabolic flexibility and energy homeostasis.

3. **Ribosome and Proteasome Interplay:** Analyzing how these complexes are coregulated under stress or genetic modifications offers insight into protein homeostasis control mechanisms.

4. **RPN4's Role in Regulation:** Understanding RPN4's network regulations could clarify its overarching influence on cellular stress mechanisms and adaptation pathways.

Given these aspects, please specify which area you would like to delve further into, and I can perform a deeper, focused analysis in that area.
"""

evaluation =generate_response(
    ai_response,
    paper_key_findings,
    client,
)

Markdown(evaluation)  

### Matching Table

| Original Paper Finding                                  | Citation from Corresponding AI Analysis                                                                                      | Alignment Assessment | Notes                                                                                              |
|---------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------|----------------------------------------------------------------------------------------------------|
| Hypothesis 1: The proteasome complex is downregulated.  | **Proteasome and Proteasome Regulatory Proteins (Downregulated):** Several proteins associated with the proteasome complex... | Aligned              | The AI analysis correctly identifies the downregulation of the proteasome complex components.      |
| Hypothesis 2: Heat shock proteins are upregulated.      | **Heat Shock Proteins (Upregulated):** Proteins like HSP82, SSA1, and HSP104 are upregulated...                              | Aligned              | The AI correctly notes the upregulation of heat shock proteins, aligning with the original finding.|
| Hypothesis 3: The (pre-)ribosome is downregulated.      | **Ribosome Biogenesis and Assembly (Downregulated):** Enrichment in downregulated ribosomal proteins...                      | Aligned              | The AI analysis accurately matches the original finding regarding downregulation of ribosome components. |
| Hypothesis 4: Cytosolic protein folding machinery is upregulated. | **Stress Response and Heat Shock Response (Upregulated):** The upregulation of heat shock proteins implies activation...     | Partially Aligned    | The AI mentions stress response but does not explicitly address cytosolic protein folding components. |
| Hypothesis 5: Regulators of proteasome assembly are upregulated. | Not Addressed                                                                                                                 | Not Addressed        | The AI analysis does not mention the upregulation of proteasome assembly regulators.              |
| Hypothesis 6: Proteins involved in sterol synthesis are upregulated. | Not Addressed                                                                                                                 | Not Addressed        | The AI analysis does not mention sterol synthesis pathways.                                       |
| Hypothesis 7: Multiple targets of the unfolded protein response are upregulated. | **Heat Shock Response Pathway:** Given its role in managing protein unfolding and aggregation under stress...                 | Partially Aligned    | The AI analysis implies stress response but lacks specific reference to UPR targets.              |
| Hypothesis 10: Protein homeostasis is disrupted...      | **The increased stress response, downregulated protein synthesis, and highlighted metabolic adjustments...**                  | Partially Aligned    | The AI addresses a disruption in homeostasis but focuses more on stress and metabolic changes rather than synthesis. |
| Hypothesis 11: Reduced proteasome capacity overly affects ERAD substrates. | Not Addressed                                                                                                                 | Not Addressed        | The AI analysis does not mention the impact on ERAD substrates specifically.                      |

### Findings Not Addressed

- Hypothesis 5: Regulators of proteasome assembly are upregulated.
- Hypothesis 6: Proteins involved in sterol synthesis are upregulated.
- Hypothesis 11: Reduced proteasome capacity overly affects ERAD substrates.

### AI Claims Without Paper Support

- **Metabolic Enzymes (Upregulated):** AI analysis mentions metabolic enzymes like TDH1 and CIT1, indicating changes in glycolysis and TCA cycle, which are not discussed in the provided original paper findings.

### Overall Assessment

The AI analysis captured several key points from the original research paper, notably the downregulation of the proteasome complex and ribosome components, and the upregulation of heat shock proteins. It effectively communicates the overall theme of disrupted homeostasis due to altered protein synthesis and stress response activation.

However, several original hypotheses are not addressed, particularly those involving proteasome assembly regulators, sterol synthesis proteins, and ERAD substrates. Additionally, the AI analysis introduces aspects of metabolic pathway changes that were not present in the provided findings, which could either reflect a broader analysis or an unsupported claim.

Overall, while the AI analysis aligns well with some of the original paper's findings, it misses several specific hypotheses and includes some unsupported claims, indicating partial alignment with the source material.

In [8]:
ai_response = """
To provide a thorough analysis of the proteomics data received from your proteomics experiment comparing the RPN4 knockout strain to wild type yeast, we will go through each step of analysis outlined in the provided structure.

## 1. Functional Analysis:

### Upregulated Proteins:

- **Chaperones and Heat Shock Proteins** (HSP82, HSC82, STI1, HSP26, KAR2): Involvement in protein folding and stress response. These proteins are often involved in cellular response to stress conditions and may facilitate protein refolding during stress or damage [PMID:27365453].
- **Metabolic Enzymes** (TDH1, CIT1, ENO1, HXK1, GLK1, SDH2): Enzymes involved in glycolytic pathways, metabolic processes, and the tricarboxylic acid (TCA) cycle. Their upregulation suggests increased cellular metabolism or compensatory mechanisms due to perturbations in energy production.
- **Proteases and Protein Modifying Enzymes** (PEP4, PRB1, PBI2, APE1): Involve in protein turnover and degradation, reflecting possible stress adaptation or altered protein quality control requirements in the RPN4 knockout.
- **Antioxidant and Stress Response Proteins** (PRX1, TSA1): Part of the defense system against oxidative stress, their upregulation can be indicative of increased reactive oxygen species (ROS) management.
- **Transposon and Mobile Elements** (TY1B-PR1, TY2B-LR1): Often stress-responsive, their upregulation points to genomic instability or retrotransposon activity under stress conditions.

### Downregulated Proteins:

- **Ribosomal Proteins and Translation Machinery** (RPL25, RPS9B, RPS13): Downregulation suggests a general reduction in protein synthesis, possibly as a cellular tradeoff in response to stress conditions or as part of a wider adaptive response possibly yielding energy for other processes.
- **Proteasome Subunits** (PRE1, PRE9, PRE8, RPN3): This implies a possible decrease in proteasomal protein degradation pathways, which could result in protein accumulation or functional shifts in protein catabolism.
- **Transport and Trafficking Proteins** (RPT5, RPN2): Suggestive of impaired protein and organelle transport, potentially affecting cellular protein recycling or activation pathways.
- **Energy Metabolism Components** (HXT2, URA1): Loss of competence in glucose transport and related metabolic adaptations could imply an energetic shift or a metabolic bottleneck.

## 2. Ontology Analysis:

- **Proteasome Complex Pathways** (GO:CC, KEGG): Strongly represented, indicating the proteasome's role critically underlies differential response, possibly compensating for increased misfolded protein presence or oxidative damage.
- **Chaperone and Folding Pathways** (GO:MF): Highlighting increased protein refolding and stress adaptation processes, possibly as a compensatory reaction to proteostasis imbalance.
- **Ribosomal and Translation Inhibition** (WP): Correlates with downregulated ribosomal proteins and implies possible overall reduction of protein synthesis effectiveness under RPN4 knockout conditions.

## 3. Critical Review:

The analyses of upregulated and downregulated proteins display congruent patterns with enrichment results, primarily reflecting adaptations in protein folding, degradation, and possible biosynthetic shifts toward survival under stress. There are no notable contradictions evident in the results.

## 4. Biological Context:

The observed changes, particularly in proteostasis regulators such as proteasomes and chaperones, suggest compensatory stress responses in protein damage control. The decrease in translational machinery components suggests a shift in cellular resource allocation priorities, potentially lowering energy expenditure or reallocating resources to manage oxidative stress more effectively. This observation is consistent with a cellular response prioritization strategy, which could influence cell growth, division, and survival under altered environmental or genetic conditions.

## 5. Focused Analysis Recommendations:

- **Proteasome-Related Pathways**: Given their crucial roles in proteostasis and stress response, further exploration into proteasome dynamics might reveal deeper insights into cellular priority shifts in RPN4 knockout.
- **Heat Shock Response**: Understanding the specifics behind chaperone and heat shock protein upregulation may lead to novel regulatory mechanisms underpinning stress responses.
- **Energy Metabolism Dynamics**: Investigation into changes in energy-producing pathways such as glycolysis and TCA cycle, given the altered expression of corresponding enzymes.
- **Translational Machinery**: Reduced ribosomal proteins implicate translational control which can impact cell growth, necessitating in-depth study of ribosome biosynthesis and function.
- **Retrotransposition Activities**: Implications of mobile element activation as part of stress-induced genomic stability responses provide a novel angle on cellular adaptability under perturbation conditions.

I recommend beginning the deeper analysis focusing on the Proteasome-Related Pathways due to their central role in mediating proteostasis and potential broader implications on cellular homeostasis. Would you like to start with this aspect, or is there another focus area you prefer to investigate further?
"""

evaluation =generate_response(
    ai_response,
    paper_key_findings,
    client,
)

Markdown(evaluation)

### Matching Table

| Original Paper Finding                                   | Citation from Corresponding AI Analysis                                                                                                                                   | Alignment Assessment | Notes                                                                                                                                   |
|----------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------|
| Hypothesis 1: The proteasome complex is downregulated.   | **Downregulated Proteins:** "Proteasome Subunits (PRE1, PRE9, PRE8, RPN3): This implies a possible decrease in proteasomal protein degradation pathways..."                | Aligned              | The AI correctly identifies the downregulation of proteasome subunits, aligning with the original paper's finding.                       |
| Hypothesis 2: Heat shock proteins are upregulated.       | **Upregulated Proteins:** "Chaperones and Heat Shock Proteins (HSP82, HSC82, STI1, HSP26, KAR2): Involvement in protein folding and stress response..."                    | Aligned              | The AI analysis identifies heat shock proteins as upregulated, matching the original paper's finding.                                   |
| Hypothesis 3: The (pre-)ribosome is downregulated.       | **Downregulated Proteins:** "Ribosomal Proteins and Translation Machinery (RPL25, RPS9B, RPS13): Downregulation suggests a general reduction in protein synthesis..."     | Aligned              | The AI highlights the downregulation of ribosomal proteins, consistent with the original paper's observation.                           |
| Hypothesis 4: Cytosolic protein folding machinery is upregulated. | **Upregulated Proteins:** "Chaperones and Heat Shock Proteins (HSP82, HSC82, STI1, HSP26, KAR2): Involvement in protein folding and stress response..." | Aligned              | The AI implicitly covers cytosolic protein folding machinery through upregulated chaperones and heat shock proteins.                      |
| Hypothesis 5: Regulators of proteasome assembly are upregulated. | No direct citation found. | Not Addressed | The AI does not mention upregulation of proteasome assembly regulators, missing this original paper finding. |
| Hypothesis 6: Proteins involved in sterol synthesis are upregulated. | No direct citation found. | Not Addressed | The AI analysis does not cover the upregulation of sterol synthesis proteins.                                                          |
| Hypothesis 7: Multiple targets of the unfolded protein response are upregulated. | **Upregulated Proteins:** "Chaperones and Heat Shock Proteins... stress conditions and may facilitate protein refolding during stress or damage" | Partially Aligned | While the AI covers general stress responses, it does not specifically address the unfolded protein response targets.                     |
| Hypothesis 10: Protein homeostasis is disrupted, as evident from increased protein folding efforts and decreased protein synthesis. | Several sections, particularly **Functional Analysis** and **Ontology Analysis**, discuss increased protein folding and reduced protein synthesis. | Aligned | The AI analysis captures both increased protein folding efforts and decreased protein synthesis, consistent with the original finding.   |
| Hypothesis 11: Reduced proteasome capacity overly affects ERAD substrates. | No direct citation found. | Not Addressed | The AI does not directly discuss the impact on ERAD substrates, missing this specific detail from the original paper.                    |

### Findings Not Addressed

- Regulators of proteasome assembly are upregulated.
- Proteins involved in sterol synthesis are upregulated.
- Reduced proteasome capacity overly affects ERAD substrates.

### AI Claims Without Paper Support

- **Metabolic Enzymes:** The AI analysis discusses upregulation of metabolic enzymes involved in glycolysis and the TCA cycle which are not directly supported by the original paper's findings.
- **Antioxidant and Stress Response Proteins:** The focus on antioxidant proteins is not mentioned in the original findings provided.

### Overall Assessment

The AI analysis captures several key findings from the original paper effectively, particularly the broader trends of protein upregulation and downregulation related to proteasomal activity and protein synthesis machinery. However, the AI misses some specific findings such as the impact on proteasome assembly regulators, sterol synthesis proteins, and ERAD substrates. Additionally, the AI introduces some claims related to metabolic enzymes and oxidative stress responses that are not supported by the original paper. Overall, while the AI analysis reflects an understanding of general stress response patterns, it lacks specificity in addressing all the nuances of the original research findings.