# Introduction

This notebook will contain experimentation (and hopefully the final implementation) of me using LLMs to analyse the Kallisto quantification data.

The ideal final outcome is a workflow where I can take in my Kallisto quantification files and perform DEG analysis. However, the exploratory steps that I'd be interested in:
- How well does the LLM produce a working R (I can more comfortably work with R) pipeline?
- How well does it handle inputs/outputs?
- How well will it handle the METADATA?
- How much guidance do I need to give? e.g. with the libraries that are available (in theory, I'd like this to be a "step" that the LLM is smart enough to know to implement). I don't want to have the LLM install new packages, that feels like a security risk.

Other notes:
- For the moment, I'll have the LLM use the "LLM Playground" directory to save its outputs
- In my head, this "workflow" will be "hi, here's what I want, do some steps to achieve this" - a bit like the worked example of solving an equation
- I also need to integrate this with 

In [3]:
# Load modules
from openai import OpenAI
import openai # I need this and above
import os
from tqdm import tqdm
import time
import numpy as np
import pandas as pd
from dotenv import load_dotenv
from pydantic import BaseModel, Field
from typing import List, Dict, Literal
import subprocess
import glob
import asyncio
import json

In [4]:
# Quick OpenAI API test - note this does not reflect what I intend my end prompt to be, just want to get a quick idea of what I get...

load_dotenv('../../.env')

openai_api_key = os.getenv('OPENAI_API_KEY')

# Test OpenAI API...

client = OpenAI(
  api_key=openai_api_key,
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Could you provide code to import abundance.tsv Kallisto files into R and identify DEGs?",
        }
    ],
    model="gpt-4o-mini",
)

result = chat_completion.choices[0].message.content
print(result)

Certainly! Below is a step-by-step guide and example R code to import Kallisto abundance estimates from a `abundance.tsv` file, followed by steps to identify differentially expressed genes (DEGs) using the DESeq2 package.

### Step 1: Install and Load Required Libraries

First, ensure you have the necessary R packages installed. You can install them using the `BiocManager` package for Bioconductor packages.

```R
# Install BiocManager if it's not already installed
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# Install DESeq2
BiocManager::install("DESeq2")

# Load necessary libraries
library(DESeq2)
```

### Step 2: Read in the Kallisto `abundance.tsv` Files

Assuming you have multiple Kallisto `abundance.tsv` files in a directory, you would typically read them and combine the counts into a single data frame. Here's how to do it:

```R
# Set working directory where the abundance.tsv files are located
setwd("path/to/kallisto/files")

# List a

Obviously a one-sentence prompt will get nowhere.

# Investigating metadata

I technically have a separate notebook analysing metadata, but I will more formally do my tests here.

The initial test case is to give a metadata CSV and see if the LLM is able to identify what contrasts would be interesting. However, I would eventually probably want a separate function for finding the CSV, and I would later also need to determine what specific outputs I want.

At least in the initial conceptualisation stage, I'm not sure where I'll be integrating this (i.e. will this be something I do separately, then feed as input into the LLM), but nonetheless my goal is to develop a prompt that will get meaningful results

In [5]:
meta = pd.read_csv("/home/myuser/work/notebooks/Testing/GSE268034/GSE268034_series_matrix_metadata.csv")
meta

Unnamed: 0,title,geo_accession,status,submission_date,last_update_date,type,channel_count,source_name_ch1,organism_ch1,characteristics_ch1,...,library_selection,library_source,library_strategy,relation,relation.1,supplementary_file_1,cell line:ch1,cell type:ch1,genotype:ch1,treatment:ch1
0,SUDHL4_LacZ_RGFP0_1,GSM8284502,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,WT,DMSO
1,SUDHL4_LacZ_RGFP0_2,GSM8284503,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,WT,DMSO
2,SUDHL4_LacZ_RGFP5_1,GSM8284504,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,WT,RGFP966 (5 µM)
3,SUDHL4_LacZ_RGFP5_2,GSM8284505,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,WT,RGFP966 (5 µM)
4,SUDHL4_GNASKO2_RGFP0_1,GSM8284506,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,DMSO
5,SUDHL4_GNASKO2_RGFP0_2,GSM8284507,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,DMSO
6,SUDHL4_GNASKO2_RGFP5_1,GSM8284508,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,RGFP966 (5 µM)
7,SUDHL4_GNASKO2_RGFP5_2,GSM8284509,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,RGFP966 (5 µM)
8,SUDHL4_GNASKO3_RGFP0_1,GSM8284510,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,DMSO
9,SUDHL4_GNASKO3_RGFP0_2,GSM8284511,Public on Aug 08 2024,May 21 2024,Aug 08 2024,SRA,1,SU-DHL-4,Homo sapiens,cell line: SU-DHL-4,...,cDNA,transcriptomic,RNA-Seq,BioSample: https://www.ncbi.nlm.nih.gov/biosam...,SRA: https://www.ncbi.nlm.nih.gov/sra?term=SRX...,ftp://ftp.ncbi.nlm.nih.gov/geo/samples/GSM8284...,SU-DHL-4,diffuse large B-cell lymphoma cells,GNAS knockout,DMSO


In [6]:
meta.to_string

<bound method DataFrame.to_string of                      title geo_accession                 status  \
0      SUDHL4_LacZ_RGFP0_1    GSM8284502  Public on Aug 08 2024   
1      SUDHL4_LacZ_RGFP0_2    GSM8284503  Public on Aug 08 2024   
2      SUDHL4_LacZ_RGFP5_1    GSM8284504  Public on Aug 08 2024   
3      SUDHL4_LacZ_RGFP5_2    GSM8284505  Public on Aug 08 2024   
4   SUDHL4_GNASKO2_RGFP0_1    GSM8284506  Public on Aug 08 2024   
5   SUDHL4_GNASKO2_RGFP0_2    GSM8284507  Public on Aug 08 2024   
6   SUDHL4_GNASKO2_RGFP5_1    GSM8284508  Public on Aug 08 2024   
7   SUDHL4_GNASKO2_RGFP5_2    GSM8284509  Public on Aug 08 2024   
8   SUDHL4_GNASKO3_RGFP0_1    GSM8284510  Public on Aug 08 2024   
9   SUDHL4_GNASKO3_RGFP0_2    GSM8284511  Public on Aug 08 2024   
10  SUDHL4_GNASKO3_RGFP5_1    GSM8284512  Public on Aug 08 2024   
11  SUDHL4_GNASKO3_RGFP5_2    GSM8284513  Public on Aug 08 2024   

   submission_date last_update_date type  channel_count source_name_ch1  \
0      May 21 20

In [11]:
prompt = f"""

## IDENTITY AND PURPOSE

You are an expert in bioinformatic analyses. You will be provided with a metadata sheet, and are tasked with identifying contrasts that could be interesting in the metadata, with the intention of analysing these in a edgeR/limma based pipeline.
Take a deep breath, and carefully follow the steps outlined below to achieve the intended task.

## STEPS

1. Carefully consider each column, inferring what each column means from its name, and also the values in the column. 
2. Determine columns that appear to contain data that would be scientifically and biologically interesting to compare within the column.
- Only include comparisons that can be easily analysed in a limma/edgeR based pipeline
- Only include comparisons that would be generally valuable to scientific and medical literature
- Only include comparisons that can be made within this dataset only - i.e. does not require samples from additional datasets

## OUTPUT

1. For each comparison, include the EXACT column name, as well as the EXACT values that should be used for the comparison. Additionally, justify why the comparison would be interesting using up to 3 sentences

## INPUT

Metadata:
{meta.to_string()}

"""

chat_completion = client.beta.chat.completions.parse(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-4o-mini",
)

result = chat_completion.choices[0].message.content
print(result)

Here are the interesting contrasts identified from the dataset for potential analysis in a limma/edgeR-based pipeline:

1. **Comparison 1: Genotype Comparison**
   - **Column Name:** `genotype:ch1`
   - **Values:** `WT`, `GNAS knockout`
   - **Justification:** Comparing gene expression between wild-type (WT) and GNAS knockout (GNAS KO) cellular models can provide insights into the role of GNAS in diffuse large B-cell lymphoma (DLBCL). This comparison may reveal specific pathways or gene targets regulated by GNAS, contributing to our understanding of tumor biology and potential therapeutic targets.

2. **Comparison 2: Treatment Comparison (DMSO vs RGFP966) in WT Genotype**
   - **Column Name:** `treatment:ch1`
   - **Values:** `DMSO`, `RGFP966 (5 µM)`
   - **Justification:** The evaluation of gene expression changes in response to RGFP966, a GNAS inhibitor, compared to DMSO (a control) in WT cells could highlight the pharmacological effects of GNAS inhibition. Understanding the mechanis

The above does seem pretty good - it is capturing everything that I want. However, I could imagine improvements if I
1. Repeated multiple times
2. Collate responses (a bit of experimentation reveals this will most likely be a combination of code, but also an LLM to remove "loose" duplicates)
3. Give scores to responses, to determine what the "final" list of contrasts to analyse should be.

I will therefore adapt the approach I took in identifying relevant datasets, and implement it here (since I did perform both).

I will need to give special consideration to how to evaluate/score the contrasts (perhaps Mr. Claude/ChatGPT will be helpful for me...)

In [78]:
class Assessment(BaseModel):
    name: str = Field(description = "A name to be given to describe the contrast")
    column: str = Field(description = "Column, or column, in the metadata containing the values to be compared")
    values: str = Field(description = "The values in the identified column that are to be compared")
    justification: str = Field(description = "Justification for why the suggested contrast will be of use")

class Contrasts(BaseModel):
    contrasts: list[Assessment]

def identify_contrasts(meta):
    prompt = f"""

## IDENTITY AND PURPOSE

You are an expert in bioinformatic analyses. You will be provided with a metadata sheet, and are tasked with identifying contrasts that could be interesting in the metadata, with the intention of analysing these in a edgeR/limma based pipeline.
Take a deep breath, and carefully follow the steps outlined below to achieve the intended task.

## STEPS

1. Carefully consider each column, inferring what each column means from its name, and also the values in the column. 
2. Determine columns that appear to contain data that would be scientifically and biologically interesting to analyse
- Only consider analyses that would be generally valuable to scientific and medical literature
- Only include analyses that can be made within this dataset only - i.e. does not require samples from additional datasets
- You are permitted to draw comparisons involving multiple different columns
3. Specify the values in the columns that should be used to for the comparison
- Only include comparisons that can be easily analysed in a limma/edgeR based pipeline. 
- Specifically take into consideration how a contrast matrix could be set up using the model.matrix and makeContrasts functions.
- You are permitted to draw comparisons involving multiple different columns


## OUTPUT

1. Include output for each proposed comparison
2. Specify the exact column name(s) that will need to be used for the comparison
3. Specify the exact values that will be used for the comparison
4. Justify why the comparison would be interesting using up to 3 sentences

For points 2 and 3, note that this should include enough information for someone to generate an appropriate contrast matrix using model.matrix and makeContrasts.

## INPUT

Metadata:
{meta.to_string()}

"""
    chat_completion = client.beta.chat.completions.parse(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="gpt-4o-mini",
        response_format = Contrasts
        )
    result = chat_completion.choices[0].message.parsed
    return(result)

async def identify_contrasts_multiple(meta, num_queries: int = 3) -> Contrasts:
    async def single_identify_contrasts():
        return identify_contrasts(meta)

    tasks = [single_identify_contrasts() for _ in range(num_queries)]
    results = await asyncio.gather(*tasks)

    # Combine the results
    all_contrasts = Contrasts(contrasts=[])
    for result in results:
        all_contrasts.contrasts.extend(result.contrasts)

    # Deduplication process to remove duplicate contrasts
    contrasts_dict = all_contrasts.dict()
    seen = set()
    unique_contrasts = []

    for item in contrasts_dict['contrasts']:
        identifier = (item['column'], item['values'])
        if identifier not in seen:
            unique_contrasts.append(item)
            seen.add(identifier)

    # Replace the original list with the filtered one
    contrasts_dict['contrasts'] = unique_contrasts

    # Convert back to the Contrasts model
    unique_contrasts_model = Contrasts(**contrasts_dict)

    return unique_contrasts_model

In [79]:
contrasts = await identify_contrasts_multiple(meta, num_queries=3)
print(contrasts)

contrasts=[Assessment(name='DMSO vs RGFP966 Treatment in WT Cells', column='treatment:ch1', values='DMSO,RGFP966 (5 µM)', justification='Comparing gene expression between DMSO control and RGFP966 treatment in wild-type (WT) cells will help to characterize the effects of RGFP966 on diffuse large B-cell lymphoma. This analysis is critical for understanding the mechanisms through which this drug may affect tumor behavior and cell signaling.'), Assessment(name='WT vs GNAS Knockout under DMSO Treatment', column='genotype:ch1', values='WT,GNAS knockout', justification='This contrast provides insights into how the genotype influences the response to the DMSO treatment. Understanding any differential expression could elucidate the biological relevance of GNAS in diffuse large B-cell lymphoma.'), Assessment(name='Effect of Treatment (DMSO vs. RGFP966) in WT Cells', column='treatment:ch1', values='DMSO, RGFP966 (5 µM)', justification='Comparing the differential expression in wild-type cells trea

In [115]:
class ComparisonEval(BaseModel):
    comparison: str
    score: int
    score_justification: str
    redundant: Literal["Yes", "No"]
    redundant_justification: str
    retain: Literal["Yes", "No"]

class AllEvals(BaseModel):
    evals: list[ComparisonEval]

prompt = f"""

### PURPOSE AND IDENTITY

You are an expert and experienced bioinformatician and scientist, who focuses on clarifying analyses which will be meaningful to perform. 

You have been tasked with evaluating the potential scientific value of proposed comparisons. These comparisons are intended to be performed in a edgeR/limma-based RNA-seq pipeline.

Take a deep breath, and carefully follow the below steps to achieve the best possible outcome.

### STEPS 

1. You will be provided a Python dictionary of proposed scientific comparisons which have been proposed for a limma/edgeR RNAseq pipeline.
- Do not propose any additional scientific comparisons beyond those specified in this Python dictionary
- Throughout your evaluation, keep in mind that the analysis will be based on the construction of a contrast matrix, using the values specified in the column and values.
2. You will also be provided metadata, which contains data that is mentioned in the Python dictionary. 
- Do NOT use this metadata to hallucinate additional comparisons
- Use this metadata ONLY to gather additional context for the defined scientific comparisons.
3. For each proposed analysis, assign a score between 1 - 5, based on the scientific value that can be extracted out of the comparison. Do this independently for each comparison. Use the below as a scoring guide:

Score 5 – Outstanding Scientific Value

	•	The proposed comparison is highly relevant and addresses a significant scientific question or hypothesis.
	•	The comparison is likely to yield new and impactful insights that could lead to meaningful advancements in the field.
	•	The analysis is well-aligned with the biological context provided by the metadata and is expected to generate robust, interpretable results.
	•	The comparison is novel or provides a unique perspective that has not been previously explored.

Score 4 – High Scientific Value

	•	The proposed comparison is scientifically sound and addresses an important question.
	•	The analysis has the potential to contribute valuable insights, though it may be incremental rather than groundbreaking.
	•	The comparison is well-supported by the metadata and is expected to produce meaningful results.
	•	The comparison adds depth to existing knowledge but may not be entirely novel.

Score 3 – Moderate Scientific Value

	•	The proposed comparison is reasonable and could yield useful information.
	•	The analysis addresses a relevant question, though the scientific impact may be limited or somewhat unclear.
	•	The comparison is supported by the metadata but may not be as compelling or novel as higher-scoring comparisons.
	•	The results may be interesting but are likely to confirm existing knowledge rather than provide new insights.

Score 2 – Low Scientific Value

	•	The proposed comparison is somewhat relevant but does not address a particularly important or novel question.
	•	The analysis may yield some useful data, but the scientific impact is expected to be minimal.
	•	The comparison is only partially supported by the metadata, and the results may be difficult to interpret or have limited applicability.
	•	The comparison may be redundant with existing analyses or provide only marginal additional insights.

Score 1 – Minimal or No Scientific Value

	•	The proposed comparison is poorly conceived and unlikely to yield meaningful scientific insights.
	•	The analysis does not address a relevant or important question, or the rationale for the comparison is unclear.
	•	The comparison is not well-supported by the metadata, and the results are likely to be uninterpretable or irrelevant.
	•	The comparison may be redundant, trivial, or based on a flawed premise.

4. For each comparison, also identify if it is redundant and/or overlapping with another comparison.
- An example of this is identical "column" and "values" (e.g. column of "A" and values of "val1, val2" as compared to "val2, val1" or "val1 - val2")
- **Important** Note that if comparison 1 is redundant with comparison 2, BOTH comparisons 1 and 2 should be marked as redundant.
- Comparisons which are similar, but not overlapping, should not be classed as redundant
- Only classify comparisons as redundant if you are highly confident that they are redundant
- Keep in mind the analysis will be based on an edgeR/limma/DESeq2 pipeline - if two analyses are likely to require the identical experimental setup, these are redundant.
- After evaluating all comparisons for redundancy, double check whether the intended repsonse for any other comparison needs to be altered accordingly.
5. Based on your score evaluation and redundancy evaluation, make an evaluation as to whether each comparison should be retained. 
- When there are redundant comparisons, ONLY the comparison with the higher scientific value score should be retained
- If redundant comparisons have the same scientific value score, then only retain one.
6. Prior to reporting results, double check that your responses are reasonable, and you have followed the steps correctly.
7. Report your results in accordance to the instructions in OUTPUT.

### OUTPUT

1. Include output for all proposed comparisons. Use the comparison name to describe each comparison.
2. Specify the scientific evaluation score
3. Include justification for the scientific evaluation score
4. Specify if the comparison is redundant
5. If redundant, justify why it is redundant. If not redundant, specify "Not redundant" for this field.
- The justification for selecting which of the redundant comparisons, if any, should be specified here.
6. Specify if the comparison should be retained or not

### PROPOSED SCIENTIFIC ANALYSES

{contrasts}

### METADATA

{meta.to_string()}
"""

chat_completion = client.beta.chat.completions.parse(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-4o-mini",
    response_format=AllEvals
)
result = chat_completion.choices[0].message.parsed

In [116]:
df = contrasts.dict()
df = pd.DataFrame(df['contrasts'])
df

Unnamed: 0,name,column,values,justification
0,DMSO vs RGFP966 Treatment in WT Cells,treatment:ch1,"DMSO,RGFP966 (5 µM)",Comparing gene expression between DMSO control and RGFP966 treatment in wild-type (WT) cells will help to characterize the effects of RGFP966 on diffuse large B-cell lymphoma. This analysis is critical for understanding the mechanisms through which this drug may affect tumor behavior and cell signaling.
1,WT vs GNAS Knockout under DMSO Treatment,genotype:ch1,"WT,GNAS knockout",This contrast provides insights into how the genotype influences the response to the DMSO treatment. Understanding any differential expression could elucidate the biological relevance of GNAS in diffuse large B-cell lymphoma.
2,Effect of Treatment (DMSO vs. RGFP966) in WT Cells,treatment:ch1,"DMSO, RGFP966 (5 µM)","Comparing the differential expression in wild-type cells treated with DMSO versus RGFP966 will help to elucidate the treatment effects of RGFP966 on gene expression patterns in diffuse large B-cell lymphoma cells, contributing to understanding its therapeutic efficacy."
3,Effect of Genotype (WT vs. GNAS Knockout) in DMSO Treatment,genotype:ch1,"WT, GNAS knockout",Examining the differences in gene expression between wild-type and GNAS knockout cells under DMSO treatment will provide insights into the role of GNAS in cancer progression and response to treatment.
4,Comparison Between WT and GNAS Knockout under RGFP966 Treatment,genotype:ch1; treatment:ch1,"(WT, RGFP966 (5 µM)), (GNAS knockout, RGFP966 (5 µM))",Understanding the impact of RGFP966 on gene expression in both WT and GNAS knockout cells will be crucial to reveal mechanisms of drug action and resistance in cancer therapy.
5,Comparison of Gene Expression in Two Treatments across Genotypes,treatment:ch1; genotype:ch1,"(WT, DMSO), (WT, RGFP966 (5 µM)), (GNAS knockout, DMSO), (GNAS knockout, RGFP966 (5 µM))","By contrasting gene expression patterns across multiple conditions (treatment and genotype), we can gain a comprehensive view of how these factors interact to influence outcomes, potentially leading to better therapeutic strategies."


In [117]:
df = result.dict()
df = pd.DataFrame(df['evals'])
df

Unnamed: 0,comparison,score,score_justification,redundant,redundant_justification,retain
0,DMSO vs RGFP966 Treatment in WT Cells,5,"This comparison directly investigates the impact of RGFP966 on gene expression compared to DMSO in wild-type cells, addressing a vital question about treatment efficacy in diffuse large B-cell lymphoma and exploring potential mechanisms of drug action.",No,Not redundant,Yes
1,WT vs GNAS Knockout under DMSO Treatment,4,"The analysis investigates the differential response of wild-type and GNAS knockout cells to DMSO, providing critical insights into the role of GNAS in modulating treatment responses, which could aid in understanding cancer biology.",No,Not redundant,Yes
2,Effect of Treatment (DMSO vs. RGFP966) in WT Cells,5,"This comparison also evaluates the treatment effects of RGFP966 compared to DMSO, reinforcing findings from the first comparison and potentially elucidating gene expression patterns in a crucial disease context.",Yes,This is redundant with 'DMSO vs RGFP966 Treatment in WT Cells' as both analyses investigate the same comparison of treatments in WT cells.,Yes
3,Effect of Genotype (WT vs. GNAS Knockout) in DMSO Treatment,4,"By comparing expression differences between genotypes under DMSO treatment, this analysis provides valuable insights into cancer progression linked to GNAS, complementing other genetic studies in diffuse large B-cell lymphoma.",No,Not redundant,Yes
4,Comparison Between WT and GNAS Knockout under RGFP966 Treatment,5,"This comparison is critical to understand how RGFP966 facilitates gene expression changes in different genetic backgrounds, potentially revealing mechanisms of drug resistance and action that are vital for therapeutic strategies.",No,Not redundant,Yes
5,Comparison of Gene Expression in Two Treatments across Genotypes,5,"This comprehensive comparison assesses interaction effects between treatment and genotype, providing a broad view on expression changes that can lead to effective therapeutic strategies in diffuse large B-cell lymphoma.",No,Not redundant,Yes


I'm not entirely satisfied with the outcome so far (mainly with the inability to identify redundant contrasts) - however, the contrasts it is identifying do seem of interest, and I must admit is better than the singular one I came up with in my initial testing.

I noted several instances of hallucinations, parituclarly with imagining contrast that I did not specify. I've tried to stamp these out... a bit concerningly, these were sometimes marked as "retain".

My plan at the moment is to leave this as is (at least for the moment), and when I begin the prompt to develop the code a bit more explicitly, I think I might just include another check to see "would the code be functionally identical? -> if yes, ignore". This might be sufficient.