# DSPy
DSPy is a framework for algorithmically optimizing LM prompts and weights. DSPy can help you define your your tasks more accurately and can help you optimize your prompt for your sutaible use case. See [docs](https://dspy-docs.vercel.app/docs/intro) for more details.

> 💡 DSPy stands for **D**eclarative **S**elf-improving Language Programs, **py**thonically.

In [2]:
import pandas as pd
from fastembed import TextEmbedding
from qdrant_client import QdrantClient
from qdrant_client.http import models
from tqdm import tqdm
from datasets import load_dataset

from typing import Optional
import os
import random

In [3]:
import dspy
from dspy.utils import dotdict

### Load the Training and Testing Data

In [4]:
corpora = load_dataset("nirantk/geneticsQA-corpus", split="train").to_pandas()

In [5]:
train = load_dataset("nirantk/geneticsQA-train", split="train").to_pandas()

Each Question in the training data is assosiated with ground_truth label , we will use this to train our model and optimize the prompts. 

In [6]:
pd.set_option("display.max_colwidth", 500)
train_data = train[["question", "contexts", "ground_truth"]]
train_data

Unnamed: 0,question,contexts,ground_truth
0,What is Snord116?,"['Further analysis with array-CGH identified a mosaic 847\u2009kb deletion in 15q11-q13, including SNURF-SNRPN, the snoRNA gene clusters SNORD116 (HBII-85), SNORD115, (HBII-52), SNORD109 A and B (HBII-438A and B), SNORD64 (HBII-13), and NPAP1 (C15ORF2).', 'All three deletions included SNORD116, but only two encompassed parts of SNURF-SNRPN, implicating SNORD116 as the major contributor to the Prader-Willi phenotype. Our case adds further information about genotype-phenotype correlation and s...","['SNORD116 is a small nucleolar (sno) RNA gene cluster (HBII-85) implicated as a major contributor the Prader-Willi phenotype. \nSNORD116 genes appears to be responsible for the major features of PWS. \nSNORD116 is a paternally expressed box C/D snoRNA gene cluster.\nThe mouse C/D box snoRNA MBII-85 (SNORD116) is processed into at least five shorter RNAs using processing sites near known functional elements of C/D box snoRNAs.\nSnord116 expression in the medial hypothalamus, particularly wit..."
1,Are ultraconserved elements often transcribed?,"['Starting from a genome-wide expression profiling, we demonstrate for the first time a functional link between oxygen deprivation and the modulation of long noncoding transcripts from ultraconserved regions, termed transcribed-ultraconserved regions (T-UCRs)', 'Our data gives a first glimpse of a novel functional hypoxic network comprising protein-coding transcripts and noncoding RNAs (ncRNAs) from the T-UCRs category', 'Highly conserved elements discovered in vertebrates are present in non...","['Yes. Especially, a large fraction of non-exonic UCEs is transcribed across all developmental stages examined from only one DNA strand.']"
2,List metalloenzyme inhibitors.,"[' Clinically approved inhibitors were selected as well as several other reported metalloprotein inhibitors in order to represent a broad range of metal binding groups (MBGs), including hydroxamic acid, carboxylate, hydroxypyridinonate, thiol, and N-hydroxyurea functional groups.', 'A total of 21 different raltegravir-chelator derivative (RCD) compounds were prepared that differed only in the nature of the MBG. ', 'At least two compounds (RCD-4, RCD-5) containing a hydroxypyrone MBG were fou...",['Foscarnet\nVT-1129\nVT-1161 \nBB-3497\nhydroxamate molecules\nsiderophores']
3,"Which protein phosphatase has been found to interact with the heat shock protein, HSP20?","[' Moreover, protein phosphatase-1 activity is regulated by two binding partners, inhibitor-1 and the small heat shock protein 20, Hsp20. Indeed, human genetic variants of inhibitor-1 (G147D) or Hsp20 (P20L) result in reduced binding and inhibition of protein phosphatase-1, suggesting aberrant enzymatic regulation in human carriers. ', 'Small heat shock protein 20 interacts with protein phosphatase-1 and enhances sarcoplasmic reticulum calcium cycling.', ' Hsp20 overexpression in intact anim...","['Protein phosphatase-1 activity is regulated by two binding partners, inhibitor-1 and the small heat shock protein 20, Hsp20. Cell fractionation, coimmunoprecipitation, and coimmunolocalization studies, revealed an association between Hsp20 and PP1. Small heat shock protein 20 interacts with protein phosphatase-1 and enhances sarcoplasmic reticulum calcium cycling.', 'Moreover, protein phosphatase-1 activity is regulated by two binding partners, inhibitor-1 and the small heat shock protein ..."
4,Do DNA double-strand breaks play a causal role in carcinogenesis?,"['The DNA non-homologous end-joining repair gene XRCC6/Ku70 plays an important role in the repair of DNA double-strand breaks (DSBs) induced by both exogenous and endogenous DNA-damaging agents. Defects in overall DSB repair capacity can lead to genomic instability and carcinogenesis.', 'The tumor suppressor breast cancer susceptibility protein 1 (BRCA1) protects our cells from genomic instability in part by facilitating the efficient repair of DNA double-strand breaks (DSBs). BRCA1 promotes...","['Yes. It has been demonstrated that induction of DNA double-strand breaks (DSBs) and defects in overall DSBs repair capacity can lead to an accumulation of mutations, resulting in genomic instability of cells. Given that genomic instability is the hallmark of cancer, DSBs play a causal role in carcinogenesis.']"
...,...,...,...
1654,Is thyroid hormone therapy indicated in patients with heart failure?,"['Patients with chronic heart failure and subclinical hypothyroidism significantly improved their physical performance when normal TSH levels were reached.', 'Early and sustained physiological restoration of circulating L-T3 levels after MI halves infarct scar size and prevents the progression towards heart failure. This beneficial effect is likely due to enhanced capillary formation and mitochondrial protection.', 'These data indicate that T(3) replacement to euthyroid levels improves systo...",['There are several experimental and clinical evidences of the potential benefits of Thyroid hormone replacement therapy in heart failure. Initial clinical data showed also a good safety profile and tolerance of TH replacement therapy in patients withheart failure. \nHowever currently there is no indication to treat patients with heart failure withTHreplacementtherapy.']
1655,Is protein Fbw7 a SCF type of E3 ubiquitin ligase?,"['FBW7 (F-box and WD repeat domain-containing 7) is the substrate recognition component of an evolutionary conserved SCF (complex of SKP1, CUL1 and F-box protein)-type ubiquitin ligase.', 'However, very few E3 ubiquitin ligases are known to target G-CSFR for ubiquitin-proteasome pathway. Here we identified F-box and WD repeat domain-containing 7 (Fbw7), a substrate recognizing component of Skp-Cullin-F box (SCF) E3 ubiquitin Ligase physically associates with G-CSFR and promotes its ubiquitin...","['Fbxw7 (also known as Fbw7, SEL-10, hCdc4, or hAgo) is the F-box protein subunit of an Skp1-Cul1-F-box protein (SCF)-type ubiquitin ligase complex that plays a central role in the degradation of Notch family members.The F-box protein Fbw7 (also known as Fbxw7, hCdc4 and Sel-10) functions as a substrate recognition component of a SCF-type E3 ubiquitin ligase. SCF(Fbw7) facilitates polyubiquitination and subsequent degradation of various proteins such as Notch, cyclin E, c-Myc and c-Jun.', 'T..."
1656,Is Annexin V an apoptotic marker?,"['The apoptosis of the MSCs was induced by subjecting the cells to OGD conditions for 4 h and was detected by Annexin V/PI and Hoechst 33258 staining. ', 'In addition to the antimicrobial activity, we found that treatment of the cancer cell lines, Jurkat T-cells, Granta cells, and melanoma cells, with the Pseudomonas sp. In5 crude extract increased staining with the apoptotic marker Annexin V while no staining of healthy normal cells, i.e., naïve or activated CD4 T-cells, was observed.', 'At...","['Yes, annexin V is an early apoptotic marker.', 'Yes, Annexin V is an apoptotic marker.']"
1657,Which are the clinical characteristics of Tuberous Sclerosis?,"['Prevalence and long-term outcome of epilepsy in tuberous sclerosis complex (TSC) is reported to be variable', 'Subependymal giant cell astrocytomas (SEGAs) are benign tumors, most commonly associated with tuberous sclerosis complex (TSC).', 'Lymphangioleiomyomatosis (LAM) is a rare, progressive, frequently lethal cystic lung disease that almost exclusively affects women.', 'Rhabdomyoma is the most common type of cardiac tumor in fetuses and is often associated with tuberous sclerosis compl...","['The clinical characteristics of Tuberous Sclerosis include epilepsy, subependymal giant cell astrocytomas, lymphangioleiomyomatosis, rhabdomyoma, renal angiomyolipomas, cortical tubers, neurofibromas, angiofibromas, mental retardation, and behavioral disorders.']"


In [7]:
from sklearn.model_selection import train_test_split

train_data, test_data = train_test_split(train_data, test_size=0.2, random_state=42)

### Upload Contexts to Qdrant Vector Store

In [8]:
corpora.head()

Unnamed: 0,text
0,"Both 7SL genes and Alu elements are transcribed by RNA polymerase III, and we show here that the internal 7SL promoter lies within the Alu-like part of the 7SL gene"
1,"We performed a comparative analysis in vitro and in vivo of the antitumor effects of three different antibodies targeting different epitopes of ErbB2: Herceptin (trastuzumab), 2C4 (pertuzumab) and Erb-hcAb (human anti-ErbB2-compact antibody), a novel fully human compact antibody produced in our laboratory. Herein, we demonstrate that the growth of both androgen-dependent and independent prostate cancer cells was efficiently inhibited by Erb-hcAb. The antitumor effects induced by Erb-hcAb on ..."
2,"The weight-reducing property of molindone, a recently introduced antipsychotic drug, was tested in 9 hospitalized chronic schizophrenic patients. There was an average weight loss of 7.6 kg after 3 months on molindone; most of the loss occurred during the first month."
3,"Our study identifies a unique heterochromatin state marked by the presence of both H3.3 and H3K9me3, and establishes an important role for H3.3 in control of ERV retrotransposition in embryonic stem cells."
4,"Polyneuropathy, organomegaly, endocrinopathy, monoclonal gammopathy, and skin changes (POEMS) syndrome is an uncommon condition related to a paraneoplastic syndrome secondary to an underlying plasma cell disorder."


# FOR DEMONSTRATION PURPOSE WE ARE USING A VERY SMALL SUBSET OF THE DATASET

In [9]:
corpora = corpora.sample(100)
train_data = train_data.sample(100) # few shot pairs
test_data = test_data.sample(20)

In [10]:
embedding_model = TextEmbedding("BAAI/bge-base-en-v1.5")
qdrant_client = QdrantClient(
    ":memory:"
)  # spin up a local instance if you require more advanced features
# qdrant_client = QdrantClient("http://localhost:6333") # uncomment if you want to use your local instance

if qdrant_client.collection_exists("rag_contexts"):
    qdrant_client.delete_collection("rag_contexts")

qdrant_client.create_collection(
    "rag_contexts",
    vectors_config=models.VectorParams(size=768, distance=models.Distance.COSINE),
)

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

True

In [11]:
# Create and upload points to Qdrant
points = []
for idx, row in tqdm(corpora.iterrows(), total=corpora.shape[0]):
    point = models.PointStruct(
        id=idx,  # Use the dataframe index as the point ID
        vector=list(embedding_model.embed(row["text"]))[
            0
        ],  # Convert the embedding to a list
        payload={"id": idx, "text": row["text"]},  # Use the label_text as the payload
    )
    points.append(point)
qdrant_client.upload_points(collection_name="rag_contexts", points=points)

100%|██████████| 100/100 [00:03<00:00, 25.73it/s]


### Custom Retriever that searchs the contexts from Qdrant Vector Store. 

In [12]:
# use any embedding model
def generate_embeddings(text):
    return list(embedding_model.embed(text))[0]


class QdrantRetriever(dspy.Retrieve):
    def __init__(self, qdrant_collection_name, qdrant_client, k=10):
        super().__init__(k=k)
        self.client = qdrant_client
        self.collection_name = qdrant_collection_name

    def forward(self, query, k: Optional[int] = 10):
        # Generate embedding for the query
        query_embedding = generate_embeddings(query)
        search_results = self.client.search(
            collection_name=self.collection_name,
            query_vector=query_embedding,
            limit=k if k else self.k,
        )
        passages = [result.payload["text"] for result in search_results]
        passages = [dotdict({"long_text": passage}) for passage in passages]
        return passages

In [13]:
openai_api_key = os.environ["OPENAI_API_KEY"]

In [14]:
turbo = dspy.OpenAI(model="gpt-4o", api_key=openai_api_key, max_tokens=1000)
rm = QdrantRetriever("rag_contexts", qdrant_client)

# configure dspy with a RM Model and and LM Model
dspy.settings.configure(lm=turbo, rm=rm)

In [15]:
sample = test_data["question"].iloc[0]
dspy.Retrieve(k=10)(sample).passages

['Valaciclovir (Valtrex), the L-valyl ester of acyclovir, is undergoing clinical development for the treatment and suppression of herpesviral diseases.',
 'Improved pain, physical functioning and health status in patients with rheumatoid arthritis treated with CP-690,550, an orally active Janus kinase (JAK) inhibitor: results from a randomised, double-blind, placebo-controlled trial.',
 'Mutations in the serine protease inhibitor Kazal type 5 (SPINK5) gene leading to lymphoepithelial Kazal-type-related inhibitor (LEKTI) deficiency cause NS.',
 'The authors conclude that rosiglitazone can be safely administered with metformin and, due to the different mechanisms of action of these agents, may offer a therapeutic advantage in patients with type 2 diabetes mellitus.',
 'The human OX2 receptor (OX2R) belongs to the β branch of the rhodopsin family of GPCRs, and can bind to diverse compounds including the native agonist peptides orexin-A and orexin-B and the potent therapeutic inhibitor suv

### Signature Defination for Q/A System

In [16]:
# Define Signatire for the QA system
class GenerateAnswer(dspy.Signature):
    """Answer questions based on the context."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField()

### RM (Retrieval Model) Pipeline Creation

In [17]:
# Define a Custom RAG Pipeline
class RAG(dspy.Module):
    def __init__(self, collection_name="rag_contexts", num_passages=10):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

In [18]:
uncompiled_rag = RAG()

In [19]:
uncompiled_rag(sample)

Prediction(
    context=['Valaciclovir (Valtrex), the L-valyl ester of acyclovir, is undergoing clinical development for the treatment and suppression of herpesviral diseases.', 'Improved pain, physical functioning and health status in patients with rheumatoid arthritis treated with CP-690,550, an orally active Janus kinase (JAK) inhibitor: results from a randomised, double-blind, placebo-controlled trial.', 'Mutations in the serine protease inhibitor Kazal type 5 (SPINK5) gene leading to lymphoepithelial Kazal-type-related inhibitor (LEKTI) deficiency cause NS.', 'The authors conclude that rosiglitazone can be safely administered with metformin and, due to the different mechanisms of action of these agents, may offer a therapeutic advantage in patients with type 2 diabetes mellitus.', 'The human OX2 receptor (OX2R) belongs to the β branch of the rhodopsin family of GPCRs, and can bind to diverse compounds including the native agonist peptides orexin-A and orexin-B and the potent thera

In [20]:
turbo.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «Valaciclovir (Valtrex), the L-valyl ester of acyclovir, is undergoing clinical development for the treatment and suppression of herpesviral diseases.»
[2] «Improved pain, physical functioning and health status in patients with rheumatoid arthritis treated with CP-690,550, an orally active Janus kinase (JAK) inhibitor: results from a randomised, double-blind, placebo-controlled trial.»
[3] «Mutations in the serine protease inhibitor Kazal type 5 (SPINK5) gene leading to lymphoepithelial Kazal-type-related inhibitor (LEKTI) deficiency cause NS.»
[4] «The authors conclude that rosiglitazone can be safely administered with metformin and, due to the different mechanisms of action of these agents, may offer a therapeutic advantage in patien

### Metrics and Assesment Signatures  

In [21]:
metricLM = dspy.OpenAI(
    model="gpt-4o", api_key=openai_api_key, max_tokens=1000, model_type="chat"
)


# Signature for LLM assessments.
class Assess(dspy.Signature):
    """Assess the quality of an answer to a question."""

    context = dspy.InputField(desc="The context for answering the question.")
    assessed_question = dspy.InputField(desc="The evaluation criterion.")
    assessed_answer = dspy.InputField(desc="The answer to the question.")
    correct_answer = dspy.InputField(desc="The correct answer to the question.")
    assessment_answer = dspy.OutputField(
        desc="A rating between 0 and 5. Only output the rating and nothing else."
    )


def llm_metric(gold, pred, trace=None):
    predicted_answer = pred.answer
    gold_question = gold.question
    gold_answer = gold.answer

    detail = "Is the assessed answer detailed?"
    faithful = "Is the assessed text grounded in the context? Say no if it includes significant facts not in the context."
    correctness = f"Compare the given {predicted_answer} and {gold_answer} and assess how correct the answer is"

    with dspy.context(lm=metricLM):
        context = dspy.Retrieve(k=10)(gold_question).passages
        detail = dspy.ChainOfThought(Assess)(
            context="N/A",
            assessed_question=detail,
            assessed_answer=predicted_answer,
            correct_answer=gold_answer,
        )
        faithful = dspy.ChainOfThought(Assess)(
            context=context,
            assessed_question=faithful,
            assessed_answer=predicted_answer,
            correct_answer=gold_answer,
        )
        correctness = dspy.ChainOfThought(Assess)(
            context=context,
            assessed_question=correctness,
            assessed_answer=predicted_answer,
            correct_answer=gold_answer,
        )

    print(f"Faithful: {faithful.assessment_answer}")
    print(f"Detail: {detail.assessment_answer}")
    print(f"Correctness: {correctness.assessment_answer}")

    total = (
        float(detail.assessment_answer)
        + float(faithful.assessment_answer)
        + float(correctness.assessment_answer)
    )
    return total / 10.0

Reference for the above is taken from below cited sources 
- [Reference_1](https://dspy-docs.vercel.app/docs/building-blocks/metrics#intermediate-using-ai-feedback-for-your-metric)
- [Reference_2](https://github.com/weaviate/recipes/blob/main/integrations/dspy/1.Getting-Started-with-RAG-in-DSPy.ipynb)

Let's format the data in a specific way how the DSPY modules are expecting and then use some of the data for training and evaluation. 

In [22]:
trainset_dspy = train_data.sample(frac=0.8)
valset_dspy = train_data.drop(trainset_dspy.index)

In [23]:
from ast import literal_eval
import dspy


def read_list_from_string(s):
    try:
        return literal_eval(s)
    except (ValueError, SyntaxError):
        return s.split() if isinstance(s, str) else []


def stringify_list_elements(lst):
    lst = read_list_from_string(lst)
    return " ".join(str(e) for e in lst)


trainset = [
    dspy.Example(
        question=row["question"],
        #  contexts=stringify_list_elements(row['contexts']),
        answer=stringify_list_elements(row["ground_truth"]),
    ).with_inputs("question")
    for i, row in trainset_dspy.iterrows()
]

valset = [
    dspy.Example(
        question=row["question"],
        # contexts=stringify_list_elements(row['contexts']),
        answer=stringify_list_elements(row["ground_truth"]),
    ).with_inputs("question")
    for i, row in valset_dspy.iterrows()
]

In [24]:
# For the purpose of demonstration let's keep it to 20. Remeber to use it wisely as the evaluation / training is all tied to API calls
devset = valset[:20]

In [25]:
from dspy.evaluate.evaluate import Evaluate

evaluate = Evaluate(
    devset=devset, num_threads=8, display_progress=True, display_table=5
)
uncompile_k_10 = RAG(num_passages=10)
uncompiled_10_metrics = evaluate(
    uncompile_k_10, metric=llm_metric, return_all_scores=True, return_outputs=True
)

Average Metric: 0.1 / 1  (10.0):   5%|▌         | 1/20 [00:05<01:37,  5.11s/it]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 2.1 / 4  (52.5):  15%|█▌        | 3/20 [00:06<00:46,  2.71s/it]               

Faithful: 0Faithful: 3
Detail: 3
Correctness: 4

Detail: 3
Correctness: 0
Faithful: 5
Detail: 2
Correctness: 0


Average Metric: 2.2 / 5  (44.0):  25%|██▌       | 5/20 [00:06<00:12,  1.19it/s]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 2.4000000000000004 / 6  (40.0):  30%|███       | 6/20 [00:06<00:09,  1.47it/s]

Faithful: 0
Detail: 2
Correctness: 0


Average Metric: 3.0000000000000004 / 7  (42.9):  35%|███▌      | 7/20 [00:07<00:08,  1.55it/s]

Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 3.7 / 8  (46.2):  40%|████      | 8/20 [00:07<00:08,  1.45it/s]               

Faithful: 0
Detail: 2
Correctness: 5


Average Metric: 4.5 / 9  (50.0):  45%|████▌     | 9/20 [00:10<00:12,  1.16s/it]

Faithful: 0
Detail: 4
Correctness: 4


Average Metric: 4.7 / 10  (47.0):  50%|█████     | 10/20 [00:11<00:10,  1.07s/it]

Faithful: 0
Detail: 2
Correctness: 0


Average Metric: 5.5 / 12  (45.8):  60%|██████    | 12/20 [00:11<00:05,  1.53it/s]

Faithful: 0
Detail: 1
Correctness: 4
Faithful: 0
Detail: 2
Correctness: 1


Average Metric: 6.1 / 13  (46.9):  65%|██████▌   | 13/20 [00:11<00:03,  1.94it/s]

Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 6.3 / 14  (45.0):  70%|███████   | 14/20 [00:13<00:04,  1.27it/s]

Faithful: 0
Detail: 2
Correctness: 0


Average Metric: 6.999999999999999 / 16  (43.7):  80%|████████  | 16/20 [00:14<00:02,  1.79it/s] 

Faithful: 0
Detail: 1
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 5


Average Metric: 6.999999999999999 / 17  (41.2):  85%|████████▌ | 17/20 [00:14<00:01,  1.80it/s]

Faithful: 0
Detail: 0
Correctness: 0


Average Metric: 7.299999999999999 / 18  (40.6):  90%|█████████ | 18/20 [00:16<00:01,  1.21it/s]

Faithful: 0
Detail: 3
Correctness: 0


Average Metric: 8.2 / 19  (43.2):  95%|█████████▌| 19/20 [00:16<00:00,  1.35it/s]              

Faithful: 0
Detail: 4
Correctness: 5


Average Metric: 9.299999999999999 / 20  (46.5): 100%|██████████| 20/20 [00:19<00:00,  1.05it/s]

Faithful: 5
Detail: 1
Correctness: 5
Average Metric: 9.299999999999999 / 20  (46.5%)



  df.loc[:, metric_name] = df[metric_name].apply(


Unnamed: 0,question,example_answer,context,pred_answer,llm_metric
0,Are there clinical trials using stem cells for the treatment of cardiac disease?,"Yes, there exists clinical trials for cardiac stem cell based treatment. Yes, there are several clinical trials on the use of stem cells for the...","['Recent clinical trials have achieved favorable initial endpoints with improvements in cardiac function and clinical symptoms following cellular therapy.', 'These results suggest that long-term T4...","Yes, there are clinical trials using stem cells for the treatment of cardiac disease.",1.0
1,Is exon skipping correlated with exon circularization?,"Yes. Circularization of exons is widespread and correlates with exon skipping, a feature that adds considerably to the regulatory complexity of the human transcriptome.","['ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons, suggesting epigenetic marks being part of alternative...",The context does not provide information on whether exon skipping is correlated with exon circularization.,0.1
2,Which is the main difference between Alu and B1 repeats?,B1 is a murine homolog of the human SINE Alu. B1 (Alu-equivalent) is a murine short interspersed element whose amplification probably involved an RNA intermediate....,"['the Tsix gene (antisense counterpart of Xist)', 'ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons,...","The provided contexts do not contain information about Alu and B1 repeats, so the main difference between them cannot be determined from the given data.",0.6
3,Is there a crystal structure of Greek Goat Encephalitis?,Based on results no crustal structure of Greek Goat Encephalitis found.,"['experimental autoimmune encephalomyelitis (EAE), an animal model of multiple sclerosis (MS)', 'Swine dysentery (SD) is a mucohaemorrhagic colitis of pigs resulting from infection of the...",There is no information provided about the crystal structure of Greek Goat Encephalitis in the given context.,0.3
4,Is apremilast effective for psoriasis?,"Yes, apremilast is effective for treatment of psoriasis.","[""NCLUSIONS: Vedolizumab-treated patients with active Crohn's disease were more likely than patients receiving placebo to have a remission, but not a CDAI-100 response, at week...",The context does not provide information on the effectiveness of apremilast for psoriasis.,0.2


In [26]:
def create_score_dataframe(eval_output):
    # Extract questions and answers from the examples
    questions = [ex[0].question for ex in eval_output]
    answers = [ex[1].answer for ex in eval_output]
    scores = [ex[2] for ex in eval_output]
    # Create a DataFrame with questions, answers, and scores
    score_dataframe = pd.DataFrame(
        {"question": questions, "predicted_answer": answers, "score": scores}
    )
    return score_dataframe


In [27]:
pd.set_option("display.max_colwidth", 500)
pd.set_option("display.max_rows", 500)
eval_outs = uncompiled_10_metrics[1]
eval_outs_df = create_score_dataframe(eval_outs)
print(f"Mean Score for the devset is {eval_outs_df['score'].mean()}")
eval_outs_df

Mean Score for the devset is 0.46499999999999997


Unnamed: 0,question,predicted_answer,score
0,Is exon skipping correlated with exon circularization?,The context does not provide information on whether exon skipping is correlated with exon circularization.,0.1
1,Are there clinical trials using stem cells for the treatment of cardiac disease?,"Yes, there are clinical trials using stem cells for the treatment of cardiac disease.",1.0
2,Is there a crystal structure of Greek Goat Encephalitis?,There is no information provided about the crystal structure of Greek Goat Encephalitis in the given context.,0.3
3,What role does CRD-BP play in protecting c-myc mRNA?,The provided context does not contain information about the role of CRD-BP in protecting c-myc mRNA.,0.7
4,List components of the CRSP/Med complex.,The context does not provide information about the components of the CRSP/Med complex.,0.1
5,Is apremilast effective for psoriasis?,The context does not provide information on the effectiveness of apremilast for psoriasis.,0.2
6,Which is the main difference between Alu and B1 repeats?,"The provided contexts do not contain information about Alu and B1 repeats, so the main difference between them cannot be determined from the given data.",0.6
7,what is the role of MEF-2 in cardiomyocyte differentiation?,The provided context does not contain information about the role of MEF-2 in cardiomyocyte differentiation.,0.7
8,Could hypophosphatemic rickets cause craniosynostosis?,"Yes, hypophosphatemic rickets could potentially cause craniosynostosis due to its impact on bone development and metabolism.",0.8
9,Which antibodies cause Riedel thyroiditis?,The context does not provide information about which antibodies cause Riedel thyroiditis.,0.2


### Prompt from the Uncompiled Model

In [28]:
turbo.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «Congenital cataracts facial dysmorphism neuropathy (CCFDN) syndrome: a novel developmental disorder in Gypsies maps to 18qter.»
[2] «FHX1B mutations in patients with Mowat-Wilson syndrome»
[3] «A familial observation of hypophosphatemic rickets with unusual inheritance and evolution, different from that of X linked hypophosphatemia, is reported.»
[4] «Giant axonal neuropathy (GAN, MIM: 256850) is a devastating autosomal recessive disorder characterized by an early onset severe peripheral neuropathy, varying central nervous system involvement and strikingly frizzly hair. Giant axonal neuropathy is usually caused by mutations in the gigaxonin gene (GAN) but genetic heterogeneity has been demonstrated for a milder variant of this disease

In [29]:
# Lets check the Metrics LLM Prompt as well
metricLM.inspect_history(n=3)





Assess the quality of an answer to a question.

---

Follow the following format.

Context: The context for answering the question.

Assessed Question: The evaluation criterion.

Assessed Answer: The answer to the question.

Correct Answer: The correct answer to the question.

Reasoning: Let's think step by step in order to ${produce the assessment_answer}. We ...

Assessment Answer: A rating between 0 and 5. Only output the rating and nothing else.

---

Context:
[1] «Using computational analysis and exploiting the diversity of teleost genomes, we identified a cluster of highly conserved noncoding sequences surrounding the Six3 gene»
[2] «Long Range Epigenetic Silencing (LRES) is a mechanism of gene inactivation that affects multiple contiguous CpG islands and has been described in different human cancer types.»
[3] «the X chromosome of paternal origin (Xp) is silenced during early embryogenesis owing to imprinted expression of the regulatory RNA, Xist (X-inactive specific transcr

In [30]:
# Since 'trainset' is a list and doesn't have a 'sample' method, we will define a function to sample from it
def sample_from_list(lst, fraction):
    sample_size = int(len(lst) * fraction)
    return random.sample(lst, sample_size)


# Now we use the function to sample 2% if the dataset
trainset_truncated = sample_from_list(trainset, 0.02)
len(trainset_truncated)

1

### Optimizer : Bootstrap Random Search Optimization

In [31]:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

teleprompter = BootstrapFewShotWithRandomSearch(
    metric=llm_metric,
    max_bootstrapped_demos=2,
    max_labeled_demos=4,
    max_rounds=1,
    num_candidate_programs=2,
    num_threads=8,
)

few_shot_bootstrap_compiled_rag = teleprompter.compile(
    uncompile_k_10, trainset=trainset_truncated
)

Going to sample between 1 and 2 traces per predictor.
Will attempt to train 2 candidate sets.


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:04<00:00,  4.59s/it]


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Score: 70.0 for set: [0]
New best score: 70.0 for seed -3
Scores so far: [70.0]
Best score: 70.0


Average Metric: 0.5 / 1  (50.0): 100%|██████████| 1/1 [00:05<00:00,  5.06s/it]


Faithful: 0
Detail: 2
Correctness: 3
Average Metric: 0.5 / 1  (50.0%)
Score: 50.0 for set: [1]
Scores so far: [70.0, 50.0]
Best score: 70.0


100%|██████████| 1/1 [00:00<00:00,  6.40it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:01<00:00,  1.49s/it]


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Score: 70.0 for set: [1]
Scores so far: [70.0, 50.0, 70.0]
Best score: 70.0
Average of max per entry across top 1 scores: 0.7
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.7
Average of max per entry across top 5 scores: 0.7
Average of max per entry across top 8 scores: 0.7
Average of max per entry across top 9999 scores: 0.7


100%|██████████| 1/1 [00:00<00:00,  4.66it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  3.43it/s]


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Score: 70.0 for set: [1]
Scores so far: [70.0, 50.0, 70.0, 70.0]
Best score: 70.0
Average of max per entry across top 1 scores: 0.7
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.7
Average of max per entry across top 5 scores: 0.7
Average of max per entry across top 8 scores: 0.7
Average of max per entry across top 9999 scores: 0.7


100%|██████████| 1/1 [00:00<00:00,  2.98it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  3.11it/s]

Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Score: 70.0 for set: [1]
Scores so far: [70.0, 50.0, 70.0, 70.0, 70.0]
Best score: 70.0
Average of max per entry across top 1 scores: 0.7
Average of max per entry across top 2 scores: 0.7
Average of max per entry across top 3 scores: 0.7
Average of max per entry across top 5 scores: 0.7
Average of max per entry across top 8 scores: 0.7
Average of max per entry across top 9999 scores: 0.7
5 candidate programs found.





In [32]:
# Let's check the prompt for this compiled model
turbo.inspect_history(n=1)





Answer questions based on the context.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «KN-93, a membrane-permeant calcium/calmodulin- dependent kinase-selective inhibitor, induces apoptosis in some lines of human tumor cells.»
[2] «Two-dimensional tryptic peptide maps of phosphorylated phospholamban indicated that cAMP-dependent protein kinase phosphorylates at a single site, A, and Ca2+-calmodulin-dependent protein kinase phosphorylates at sites C1 and C2 in the low molecular weight form, where A is different from C1 but may be the same as C2.»
[3] «The human OX2 receptor (OX2R) belongs to the β branch of the rhodopsin family of GPCRs, and can bind to diverse compounds including the native agonist peptides orexin-A and orexin-B and the potent therapeutic inhibitor suvorexant.»
[4] «point mutations in RYR2, the gene enc

You can notice how the prompt has somewhat become more specific in handling the examples and have also added extra instructions. Let's now evaluate on the `devset` we created and see how the model performs. 

In [33]:
few_shot_bootstrap_compiled_rag_evals = evaluate(
    few_shot_bootstrap_compiled_rag,
    metric=llm_metric,
    return_all_scores=True,
    return_outputs=True,
)

Average Metric: 2.8 / 6  (46.7):  25%|██▌       | 5/20 [00:00<00:01,  7.51it/s] 

Faithful: 3
Detail: 3
Correctness: 4
Faithful: 0
Detail: 2
Correctness: 0
Faithful: 5
Detail: 2
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 0
Faithful: 0
Detail: 2
Correctness: 5


Average Metric: 3.6999999999999997 / 8  (46.2):  35%|███▌      | 7/20 [00:00<00:00, 16.36it/s]

Faithful: 5
Detail: 1
Correctness: 0
Faithful: 0
Detail: 3
Correctness: 0
Faithful: 0
Detail: 2
Correctness: 1


Average Metric: 6.3999999999999995 / 15  (42.7):  70%|███████   | 14/20 [00:00<00:00, 16.20it/s]

Faithful: 0Faithful: 0
Detail: 1
Correctness: 0

Detail: 4
Correctness: 4
Faithful: 0
Detail: 2
Correctness: 0
Faithful: 5
Detail: 1
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 4
Faithful: 0
Detail: 2
Correctness: 0


Average Metric: 9.3 / 20  (46.5): 100%|██████████| 20/20 [00:01<00:00, 16.83it/s]               


Faithful: 0
Detail: 1
Correctness: 5
Faithful: 0
Detail: 0
Correctness: 0
Faithful: 5
Detail: 1
Correctness: 5
Faithful: 0
Detail: 3
Correctness: 0
Faithful: 0
Detail: 4
Correctness: 5
Average Metric: 9.3 / 20  (46.5%)


  df.loc[:, metric_name] = df[metric_name].apply(


Unnamed: 0,question,example_answer,context,pred_answer,llm_metric
0,Are there clinical trials using stem cells for the treatment of cardiac disease?,"Yes, there exists clinical trials for cardiac stem cell based treatment. Yes, there are several clinical trials on the use of stem cells for the...","['Recent clinical trials have achieved favorable initial endpoints with improvements in cardiac function and clinical symptoms following cellular therapy.', 'These results suggest that long-term T4...","Yes, there are clinical trials using stem cells for the treatment of cardiac disease.",1.0
1,Is exon skipping correlated with exon circularization?,"Yes. Circularization of exons is widespread and correlates with exon skipping, a feature that adds considerably to the regulatory complexity of the human transcriptome.","['ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons, suggesting epigenetic marks being part of alternative...",The context does not provide information on whether exon skipping is correlated with exon circularization.,0.1
2,Which is the main difference between Alu and B1 repeats?,B1 is a murine homolog of the human SINE Alu. B1 (Alu-equivalent) is a murine short interspersed element whose amplification probably involved an RNA intermediate....,"['the Tsix gene (antisense counterpart of Xist)', 'ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons,...","The provided contexts do not contain information about Alu and B1 repeats, so the main difference between them cannot be determined from the given data.",0.6
3,Is there a crystal structure of Greek Goat Encephalitis?,Based on results no crustal structure of Greek Goat Encephalitis found.,"['experimental autoimmune encephalomyelitis (EAE), an animal model of multiple sclerosis (MS)', 'Swine dysentery (SD) is a mucohaemorrhagic colitis of pigs resulting from infection of the...",There is no information provided about the crystal structure of Greek Goat Encephalitis in the given context.,0.3
4,Is apremilast effective for psoriasis?,"Yes, apremilast is effective for treatment of psoriasis.","[""NCLUSIONS: Vedolizumab-treated patients with active Crohn's disease were more likely than patients receiving placebo to have a remission, but not a CDAI-100 response, at week...",The context does not provide information on the effectiveness of apremilast for psoriasis.,0.2


In [34]:
pd.set_option("display.max_colwidth", 500)
pd.set_option("display.max_rows", 500)
evals_outs_df = create_score_dataframe(few_shot_bootstrap_compiled_rag_evals[1])
print(evals_outs_df["score"].mean())
evals_outs_df

0.465


Unnamed: 0,question,predicted_answer,score
0,Are there clinical trials using stem cells for the treatment of cardiac disease?,"Yes, there are clinical trials using stem cells for the treatment of cardiac disease.",1.0
1,Is apremilast effective for psoriasis?,The context does not provide information on the effectiveness of apremilast for psoriasis.,0.2
2,What role does CRD-BP play in protecting c-myc mRNA?,The provided context does not contain information about the role of CRD-BP in protecting c-myc mRNA.,0.7
3,List components of the CRSP/Med complex.,The context does not provide information about the components of the CRSP/Med complex.,0.1
4,Is exon skipping correlated with exon circularization?,The context does not provide information on whether exon skipping is correlated with exon circularization.,0.1
5,what is the role of MEF-2 in cardiomyocyte differentiation?,The provided context does not contain information about the role of MEF-2 in cardiomyocyte differentiation.,0.7
6,Which is the main difference between Alu and B1 repeats?,"The provided contexts do not contain information about Alu and B1 repeats, so the main difference between them cannot be determined from the given data.",0.6
7,Is there a crystal structure of Greek Goat Encephalitis?,There is no information provided about the crystal structure of Greek Goat Encephalitis in the given context.,0.3
8,Can life style changes reduce oxidative stress,"Yes, lifestyle changes can reduce oxidative stress. Although the provided context does not directly address this, it is well-documented in scientific literature that adopting a healthy lifestyle, including a balanced diet, regular exercise, and avoiding harmful habits, can help mitigate oxidative stress.",0.3
9,Which microRNAs are involved in exercise adaptation?,The context does not provide information about which microRNAs are involved in exercise adaptation.,0.1


In [35]:
few_shot_bootstrap_compiled_rag(sample)

Prediction(
    context=['Valaciclovir (Valtrex), the L-valyl ester of acyclovir, is undergoing clinical development for the treatment and suppression of herpesviral diseases.', 'Improved pain, physical functioning and health status in patients with rheumatoid arthritis treated with CP-690,550, an orally active Janus kinase (JAK) inhibitor: results from a randomised, double-blind, placebo-controlled trial.', 'Mutations in the serine protease inhibitor Kazal type 5 (SPINK5) gene leading to lymphoepithelial Kazal-type-related inhibitor (LEKTI) deficiency cause NS.', 'The authors conclude that rosiglitazone can be safely administered with metformin and, due to the different mechanisms of action of these agents, may offer a therapeutic advantage in patients with type 2 diabetes mellitus.', 'The human OX2 receptor (OX2R) belongs to the β branch of the rhodopsin family of GPCRs, and can bind to diverse compounds including the native agonist peptides orexin-A and orexin-B and the potent thera

### Signature Optmiizer

Optimizing Signature is also a way you can try to improve the performance of your model. You can either plug the above bootstrapped compiled model to this or you can even use the uncompiled model.

In [36]:
from dspy.teleprompt import MIPRO

llm_prompter = dspy.OpenAI(model="gpt-4o", max_tokens=2000, model_type="chat")

teleprompter = MIPRO(
    task_model=dspy.settings.lm,
    metric=llm_metric,
    prompt_model=llm_prompter,
    verbose=False,
)
kwargs = dict(num_threads=8, display_progress=True, display_table=0)
mipro_compiled_rag = teleprompter.compile(
    uncompile_k_10,
    eval_kwargs=kwargs,
    trainset=trainset_truncated,
    num_trials=20,
    max_bootstrapped_demos=2,
    max_labeled_demos=8,
    requires_permission_to_run=False,
)


Please be advised that based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Task Model: [94m[1m1[0m[93m examples in dev set * [94m[1m20[0m[93m trials * [94m[1m# of LM calls in your program[0m[93m = ([94m[1m20 * # of LM calls in your program[0m[93m) task model calls[0m
[93m- Prompt Model: # data summarizer calls (max [94m[1m10[0m[93m) + [94m[1m10[0m[93m * [94m[1m1[0m[93m lm calls in program = [94m[1m20[0m[93m prompt model calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token) 
            + (Number of calls to prompt model * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model Price per Output Token).[0m

For a preliminary estimate of potential costs, w

100%|██████████| 1/1 [00:00<00:00,  8.77it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00, 12.32it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  4.23it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  3.27it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  1.96it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  2.37it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  2.40it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  2.51it/s]


Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.


100%|██████████| 1/1 [00:00<00:00,  2.21it/s]

Faithful: 5
Detail: 1
Correctness: 1
Bootstrapped 1 full traces after 1 examples in round 0.



[I 2024-09-21 21:19:44,527] A new study created in memory with name: no-name-b1d13019-cda2-4704-a8b6-574db6ba1329


Starting trial #0


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:01<00:00,  1.59s/it]
[I 2024-09-21 21:19:46,129] Trial 0 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 1, '14855968592_predictor_demos': 1}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #1


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  3.58it/s]
[I 2024-09-21 21:19:46,472] Trial 1 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 5, '14855968592_predictor_demos': 4}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #2


Average Metric: 0.0 / 1  (0.0): 100%|██████████| 1/1 [00:05<00:00,  5.36s/it]
[I 2024-09-21 21:19:51,838] Trial 2 finished with value: 0.0 and parameters: {'14855968592_predictor_instruction': 3, '14855968592_predictor_demos': 0}. Best is trial 0 with value: 70.0.


Faithful: 0
Detail: 0
Correctness: 0
Average Metric: 0.0 / 1  (0.0%)
Starting trial #3


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  5.67it/s]
[I 2024-09-21 21:19:52,023] Trial 3 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 9, '14855968592_predictor_demos': 3}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #4


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.54it/s]
[I 2024-09-21 21:19:52,463] Trial 4 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 8, '14855968592_predictor_demos': 4}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #5


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.92it/s]
[I 2024-09-21 21:19:52,830] Trial 5 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 4, '14855968592_predictor_demos': 2}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #6


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  4.13it/s]
[I 2024-09-21 21:19:53,082] Trial 6 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 1, '14855968592_predictor_demos': 9}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #7


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  4.99it/s]
[I 2024-09-21 21:19:53,299] Trial 7 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 0, '14855968592_predictor_demos': 4}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #8


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.86it/s]
[I 2024-09-21 21:19:53,663] Trial 8 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 5, '14855968592_predictor_demos': 8}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #9


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.45it/s]
[I 2024-09-21 21:19:54,084] Trial 9 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 2, '14855968592_predictor_demos': 2}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #10


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.45it/s]
[I 2024-09-21 21:19:54,514] Trial 10 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 7, '14855968592_predictor_demos': 1}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #11


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.95it/s]
[I 2024-09-21 21:19:54,863] Trial 11 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 1, '14855968592_predictor_demos': 5}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #12


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.39it/s]
[I 2024-09-21 21:19:55,294] Trial 12 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 6, '14855968592_predictor_demos': 7}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #13


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.49it/s]
[I 2024-09-21 21:19:55,720] Trial 13 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 5, '14855968592_predictor_demos': 1}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #14


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  1.78it/s]
[I 2024-09-21 21:19:56,295] Trial 14 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 1, '14855968592_predictor_demos': 6}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #15


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.30it/s]
[I 2024-09-21 21:19:56,804] Trial 15 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 5, '14855968592_predictor_demos': 4}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #16


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  3.74it/s]
[I 2024-09-21 21:19:57,086] Trial 16 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 2, '14855968592_predictor_demos': 1}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #17


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.32it/s]
[I 2024-09-21 21:19:57,529] Trial 17 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 4, '14855968592_predictor_demos': 8}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Starting trial #18


Average Metric: 0.0 / 1  (0.0): 100%|██████████| 1/1 [00:00<00:00,  2.51it/s]
[I 2024-09-21 21:19:57,952] Trial 18 pruned. 


Faithful: 0
Detail: 0
Correctness: 0
Average Metric: 0.0 / 1  (0.0%)
Trial pruned.
Starting trial #19


Average Metric: 0.7 / 1  (70.0): 100%|██████████| 1/1 [00:00<00:00,  2.88it/s]
[I 2024-09-21 21:19:58,321] Trial 19 finished with value: 70.0 and parameters: {'14855968592_predictor_instruction': 0, '14855968592_predictor_demos': 5}. Best is trial 0 with value: 70.0.


Faithful: 5
Detail: 1
Correctness: 1
Average Metric: 0.7 / 1  (70.0%)
Returning generate_answer = ChainOfThought(GenerateAnswer(context, question -> answer
    instructions='Answer questions based on the context.'
    context = Field(annotation=str required=True json_schema_extra={'desc': 'may contain relevant facts', '__dspy_field_type': 'input', 'prefix': 'Context:'})
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    answer = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Answer:', 'desc': '${answer}'})
)) from continue_program


In [37]:
mipro_compiled_rag_eval = evaluate(
    mipro_compiled_rag,
    metric=llm_metric,
    return_all_scores=True,
    return_outputs=True,
)

Average Metric: 0.4 / 3  (13.3):  10%|█         | 2/20 [00:04<00:34,  1.90s/it]                

Faithful: 0
Detail: 2
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 0
Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 0.5 / 4  (12.5):  15%|█▌        | 3/20 [00:04<00:32,  1.90s/it]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 1.8 / 5  (36.0):  25%|██▌       | 5/20 [00:05<00:09,  1.51it/s]

Faithful: 5
Detail: 3
Correctness: 5


Average Metric: 2.9000000000000004 / 6  (48.3):  30%|███       | 6/20 [00:06<00:10,  1.35it/s]

Faithful: 5
Detail: 1
Correctness: 5


Average Metric: 3.5000000000000004 / 7  (50.0):  35%|███▌      | 7/20 [00:06<00:09,  1.38it/s]

Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 3.6000000000000005 / 8  (45.0):  40%|████      | 8/20 [00:09<00:14,  1.23s/it]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 4.9 / 10  (49.0):  50%|█████     | 10/20 [00:09<00:07,  1.38it/s]             

Faithful: 5
Detail: 2
Correctness: 0
Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 5.0 / 11  (45.5):  55%|█████▌    | 11/20 [00:10<00:05,  1.52it/s]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 6.2 / 12  (51.7):  60%|██████    | 12/20 [00:10<00:04,  1.60it/s]

Faithful: 5
Detail: 2
Correctness: 5


Average Metric: 6.8 / 13  (52.3):  65%|██████▌   | 13/20 [00:11<00:05,  1.34it/s]

Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 6.8999999999999995 / 14  (49.3):  70%|███████   | 14/20 [00:13<00:05,  1.04it/s]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 7.499999999999999 / 15  (50.0):  75%|███████▌  | 15/20 [00:14<00:04,  1.11it/s] 

Faithful: 5
Detail: 1
Correctness: 0


Average Metric: 7.599999999999999 / 16  (47.5):  80%|████████  | 16/20 [00:14<00:02,  1.39it/s]

Faithful: 0
Detail: 1
Correctness: 0


Average Metric: 8.799999999999997 / 18  (48.9):  85%|████████▌ | 17/20 [00:14<00:01,  1.51it/s]

Faithful: 3
Detail: 3
Correctness: 4
Faithful: 0
Detail: 2
Correctness: 0


Average Metric: 10.599999999999996 / 20  (53.0): 100%|██████████| 20/20 [00:15<00:00,  1.30it/s]

Faithful: 5Faithful: 0
Detail: 2
Correctness: 5

Detail: 1
Correctness: 5
Average Metric: 10.599999999999996 / 20  (53.0%)



  df.loc[:, metric_name] = df[metric_name].apply(


Unnamed: 0,question,example_answer,context,pred_answer,llm_metric
0,Are there clinical trials using stem cells for the treatment of cardiac disease?,"Yes, there exists clinical trials for cardiac stem cell based treatment. Yes, there are several clinical trials on the use of stem cells for the...","['Recent clinical trials have achieved favorable initial endpoints with improvements in cardiac function and clinical symptoms following cellular therapy.', 'These results suggest that long-term T4...","The context does mention clinical trials related to cardiac disease. Specifically, it states, ""Recent clinical trials have achieved favorable initial endpoints with improvements in cardiac...",1.0
1,Is exon skipping correlated with exon circularization?,"Yes. Circularization of exons is widespread and correlates with exon skipping, a feature that adds considerably to the regulatory complexity of the human transcriptome.","['ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons, suggesting epigenetic marks being part of alternative...",The context does not provide any information about exon circularization or its correlation with exon skipping.,0.1
2,Which is the main difference between Alu and B1 repeats?,B1 is a murine homolog of the human SINE Alu. B1 (Alu-equivalent) is a murine short interspersed element whose amplification probably involved an RNA intermediate....,"['the Tsix gene (antisense counterpart of Xist)', 'ChIP-sequencing data mapped onto skipped exon events reveal a correlation between histone H3K36 trimethylation peaks and skipped exons,...",The context does not provide any information about Alu and B1 repeats or their differences.,0.6
3,Is there a crystal structure of Greek Goat Encephalitis?,Based on results no crustal structure of Greek Goat Encephalitis found.,"['experimental autoimmune encephalomyelitis (EAE), an animal model of multiple sclerosis (MS)', 'Swine dysentery (SD) is a mucohaemorrhagic colitis of pigs resulting from infection of the...",The context does not provide any information about Greek Goat Encephalitis or its crystal structure.,0.2
4,Is apremilast effective for psoriasis?,"Yes, apremilast is effective for treatment of psoriasis.","[""NCLUSIONS: Vedolizumab-treated patients with active Crohn's disease were more likely than patients receiving placebo to have a remission, but not a CDAI-100 response, at week...",The context does not provide any information about apremilast or its effectiveness for psoriasis.,0.1


In [38]:
pd.set_option("display.max_colwidth", 500)
pd.set_option("display.max_rows", 500)

mipro_compiled_rag_eval_evals_df = create_score_dataframe(
    mipro_compiled_rag_eval[1]
)
print(mipro_compiled_rag_eval_evals_df["score"].mean())
mipro_compiled_rag_eval_evals_df


0.5299999999999999


Unnamed: 0,question,predicted_answer,score
0,Is there a crystal structure of Greek Goat Encephalitis?,The context does not provide any information about Greek Goat Encephalitis or its crystal structure.,0.2
1,Is apremilast effective for psoriasis?,The context does not provide any information about apremilast or its effectiveness for psoriasis.,0.1
2,Is exon skipping correlated with exon circularization?,The context does not provide any information about exon circularization or its correlation with exon skipping.,0.1
3,List components of the CRSP/Med complex.,The context does not provide any information about the components of the CRSP/Med complex.,0.1
4,What role does CRD-BP play in protecting c-myc mRNA?,The context does not provide any information about CRD-BP or its role in protecting c-myc mRNA.,1.3
5,what is the role of MEF-2 in cardiomyocyte differentiation?,The context does not provide any information about MEF-2 or its role in cardiomyocyte differentiation.,1.1
6,Which is the main difference between Alu and B1 repeats?,The context does not provide any information about Alu and B1 repeats or their differences.,0.6
7,Could hypophosphatemic rickets cause craniosynostosis?,The context does not provide any information about hypophosphatemic rickets causing craniosynostosis.,0.1
8,Can NADPH oxidase be inhibited by apocynin and diphenylene iodonium?,The context does not provide any information about the inhibition of NADPH oxidase by apocynin and diphenylene iodonium.,0.7
9,Can life style changes reduce oxidative stress,The context does not provide any information about lifestyle changes or their impact on reducing oxidative stress.,0.6


In [39]:
turbo.inspect_history(n=1)





Proposed Instruction: Given a set of contexts, answer the questions by providing clear, concise, and grammatically correct responses. Ensure that your answers are self-contained by incorporating relevant information from the provided contexts. If the context does not contain the necessary information to answer the question, indicate that explicitly.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: ${answer}

---

Context:
[1] «KN-93, a membrane-permeant calcium/calmodulin- dependent kinase-selective inhibitor, induces apoptosis in some lines of human tumor cells.»
[2] «Two-dimensional tryptic peptide maps of phosphorylated phospholamban indicated that cAMP-dependent protein kinase phosphorylates at a single site, A, and Ca2+-calmodulin-dependent protein kinase phosphorylates at sites C1 and C2 in the low molecular weight form, where A is different fr