In [1]:
!pip install haystack-ai
!pip install "datasets>=2.6.1"
!pip install "sentence-transformers>=3.0.0"

Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch>=1.11.0->sentence-transformers>=3.0.0)
  Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch>=1.11.0->sentence-transformers>=3.0.0)
  Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Using cached nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)
Using cached nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)
[0mInstalling collected packages: nvidia-cudnn-cu12, nvidia-cusolver-cu12
  Attempting uninstall: nvidia-cusolver-cu12
[0m    Found existing installation: nvidia-cusolver-cu12 11.6.3.83
    Uninstalling nvidia-cusolver-cu12-11.6.3.83:
      Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83
[0mSuccessfully installed nvidia-cudnn-cu12 nvidia-cusolver-cu12-11.6.1.9


In [2]:
from haystack.telemetry import tutorial_running

tutorial_running(35)

In [3]:
from datasets import load_dataset
from haystack import Document

dataset = load_dataset("vblagoje/PubMedQA_instruction", split="train")
dataset = dataset.select(range(1000))
all_documents = [Document(content=doc["context"]) for doc in dataset]
all_questions = [doc["instruction"] for doc in dataset]
all_ground_truth_answers = [doc["response"] for doc in dataset]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [4]:
from typing import List
from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.document_stores.types import DuplicatePolicy

document_store = InMemoryDocumentStore()

document_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
document_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)

indexing = Pipeline()
indexing.add_component(instance=document_embedder, name="document_embedder")
indexing.add_component(instance=document_writer, name="document_writer")

indexing.connect("document_embedder.documents", "document_writer.documents")

indexing.run({"document_embedder": {"documents": all_documents}})

Batches:   0%|          | 0/32 [00:00<?, ?it/s]

{'document_writer': {'documents_written': 1000}}

In [5]:
import os
from getpass import getpass
from haystack.components.builders import AnswerBuilder, ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

template = [
    ChatMessage.from_user(
        """
        You have to answer the following question based on the given context information only.

        Context:
        {% for document in documents %}
            {{ document.content }}
        {% endfor %}

        Question: {{question}}
        Answer:
        """
    )
]

rag_pipeline = Pipeline()
rag_pipeline.add_component(
    "query_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
)
rag_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store, top_k=3))
rag_pipeline.add_component("prompt_builder", ChatPromptBuilder(template=template))
rag_pipeline.add_component("generator", OpenAIChatGenerator(model="o1-mini-2024-09-12"))
rag_pipeline.add_component("answer_builder", AnswerBuilder())

rag_pipeline.connect("query_embedder", "retriever.query_embedding")
rag_pipeline.connect("retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder.prompt", "generator.messages")
rag_pipeline.connect("generator.replies", "answer_builder.replies")
rag_pipeline.connect("retriever", "answer_builder.documents")

Enter OpenAI API key:··········


<haystack.core.pipeline.pipeline.Pipeline object at 0x7c4a5e7e52d0>
🚅 Components
  - query_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: ChatPromptBuilder
  - generator: OpenAIChatGenerator
  - answer_builder: AnswerBuilder
🛤️ Connections
  - query_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - retriever.documents -> answer_builder.documents (List[Document])
  - prompt_builder.prompt -> generator.messages (List[ChatMessage])
  - generator.replies -> answer_builder.replies (List[ChatMessage])

In [6]:
question = "Do high levels of procalcitonin in the early phase after pediatric liver transplantation indicate poor postoperative outcome?"

response = rag_pipeline.run(
    {
        "query_embedder": {"text": question},
        "prompt_builder": {"question": question},
        "answer_builder": {"query": question},
    }
)
print(response["answer_builder"]["answers"][0].data)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Yes, high levels of procalcitonin (PCT) in the early postoperative phase after pediatric liver transplantation are indicative of poorer postoperative outcomes. In the observed study, pediatric patients with elevated PCT levels on postoperative day 2 experienced several adverse outcomes, including:

- **Higher International Normalized Ratio (INR) Values on Day 5:** Elevated INR suggests impaired liver function and coagulation abnormalities.
- **Increased Incidence of Primary Graft Non-Function:** Patients were more likely to experience failure of the transplanted liver to function properly.
- **Longer Stay in the Pediatric Intensive Care Unit (ICU):** Higher PCT levels were associated with extended ICU hospitalization.
- **Prolonged Duration on Mechanical Ventilation:** Patients required longer periods of respiratory support.

Importantly, the elevated PCT levels were not correlated with systemic infections but were associated with other indicators of liver stress and dysfunction, such 

In [7]:
import random

questions, ground_truth_answers, ground_truth_docs = zip(
    *random.sample(list(zip(all_questions, all_ground_truth_answers, all_documents)), 25)
)

In [8]:
rag_answers = []
retrieved_docs = []

for question in list(questions):
    response = rag_pipeline.run(
        {
            "query_embedder": {"text": question},
            "prompt_builder": {"question": question},
            "answer_builder": {"query": question},
        }
    )
    print(f"Question: {question}")
    print("Answer from pipeline:")
    print(response["answer_builder"]["answers"][0].data)
    print("\n-----------------------------------\n")

    rag_answers.append(response["answer_builder"]["answers"][0].data)
    retrieved_docs.append(response["answer_builder"]["answers"][0].documents)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Do plasma levels of Galectin-9 reflect disease severity in malaria infection?
Answer from pipeline:
Yes, plasma levels of Galectin-9 reflect disease severity in malaria infection. In the study involving 50 acute malaria cases from Thailand, patients with severe malaria (SM) exhibited significantly higher Galectin-9 levels compared to those with uncomplicated malaria (UM) at both day 0 (923 vs. 617 pg/mL, P = 0.03) and day 7 (659 vs. 348 pg/mL, P = 0.02). Additionally, higher Galectin-9 levels were associated with indicators of more severe disease, such as a blood urea nitrogen to creatinine ratio (BUN/creatinine) ≥20 mg/dL. These findings indicate that elevated plasma Galectin-9 levels are correlated with increased disease severity in malaria infections.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does ginsenoside Rg3 improve erectile function in streptozotocin-induced diabetic rats?
Answer from pipeline:
Yes, ginsenoside Rg3 improves erectile function in streptozotocin-induced diabetic rats. In the study described, diabetic rats treated with 100 mg/kg of Rg3 showed a significantly higher intracavernosal pressure (ICP) relative to mean arterial pressure compared to control groups. Additionally, Rg3 treatment tended to normalize the expression of key proteins related to apoptosis and vascular function, increased the number of positively stained nerve fibers, and reduced the apoptotic index in corpus cavernosum cells. These findings indicate that Rg3 has a protective effect on erectile function in this diabetic rat model.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does dual trigger for final oocyte maturation improve the oocyte retrieval rate of suboptimal responders to gonadotropin-releasing hormone agonist?
Answer from pipeline:
Yes, administering a dual trigger (gonadotropin-releasing hormone agonist [GnRH-a] combined with human chorionic gonadotropin [hCG] at doses of 1,000, 2,000, or 5,000 IU) significantly improves the oocyte retrieval rate in suboptimal responders to GnRH-a. In the referenced study, suboptimal responders who received the dual trigger showed increased oocyte retrieval rates of 60.04%, 68.13%, and 65.76% with hCG doses of 1,000, 2,000, and 5,000 IU, respectively, compared to 48.16% with GnRH-a alone.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Are high levels of serum vitamin D associated with a decreased risk of metabolic diseases in both men and women , but an increased risk for coronary artery calcification in Korean men?
Answer from pipeline:
Yes, according to the provided context, high levels of serum vitamin D are associated with a decreased risk of metabolic diseases, including insulin resistance (IR), metabolic syndrome (MS), and fatty liver (FL), in both men and women. However, in Korean men specifically, higher serum vitamin D levels are associated with an increased risk of coronary artery calcification (CAC). There was no significant association between serum vitamin D levels and CAC in women.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does periodontitis promote the diabetic development of obese rat via miR-147 induced classical macrophage activation?
Answer from pipeline:
Yes, based on the provided context, periodontitis promotes the diabetic development of obese rats through miR-147–induced classical (M1) macrophage activation. The study demonstrated that periodontitis exacerbates impaired glucose tolerance in obese OLETF rats by upregulating miR-147, which in turn enhances the proinflammatory response of classically activated macrophages. This miR-147-mediated activation of M1 macrophages contributes to increased inflammation, thereby linking periodontitis to the worsening of diabetes in the obese rat model.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does over-expression of small ubiquitin-like modifier proteases 1 predict chemo-sensitivity and poor survival in non-small cell lung cancer?
Answer from pipeline:
Yes, the over-expression of small ubiquitin-like modifier proteases 1 (SENP1) in non-small cell lung cancer (NSCLC) is associated with poor survival outcomes and decreased chemotherapy sensitivity. Specifically:

- **Poor Survival:** SENP1 was over-expressed in 55% of NSCLC samples and was identified as an independent prognostic factor for patient survival. Patients with higher SENP1 expression exhibited significantly worse survival rates.
  
- **Chemo-Resistance:** Among patients who received postoperative chemotherapy, those with SENP1 over-expression had a higher rate of recurrence or metastasis (64.7%) compared to those with low SENP1 expression (27.6%). This suggests that SENP1 over-expression is linked to reduced sensitivity to chemotherapy.

Overall, SENP1 over-expression serves as a predictor of both poor su

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Do whole-genome sequencing and epidemiological analysis provide evidence for cross-transmission of mycobacterium abscessus in a cohort of pediatric cystic fibrosis patients?
Answer from pipeline:
Yes, whole-genome sequencing combined with epidemiological analysis provides evidence for cross-transmission of *Mycobacterium abscessus* in the studied cohort of pediatric cystic fibrosis (CF) patients. While the majority of isolates (from 16 out of 20 patients) were genetically unrelated, indicating independent acquisition events, two distinct clusters of closely related isolates were identified:

1. **Sibling Pair Cluster**: This cluster consisted of 8 isolates from a pair of siblings who had intense exposure to each other both inside and outside the hospital. The isolates differed by a maximum of 17 single-nucleotide polymorphisms (SNPs), strongly suggesting transmission between the siblings.

2. **Unlinked Individuals Cluster**: A second cluster included 3 isolates from 2 indivi

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Is community readiness for adolescents ' overweight and obesity prevention low in urban South Africa : a case study?
Answer from pipeline:
Yes, the community readiness for preventing overweight and obesity among adolescents in urban South Africa is low, as demonstrated by the case study. The research conducted in Johannesburg and Soweto utilized the Community Readiness Model (CRM) and found a mean readiness score of **2.57 ± 0.76**, which corresponds to the "**denial/resistance stage**." This indicates that the community is currently not adequately prepared to implement effective overweight and obesity prevention interventions for adolescents. 

Although the dimension for resources scored the highest (**3.77 ± 0.28**) and there was some knowledge of the issue (**3.20 ± 0.51**), the lowest scores were observed in **community knowledge of efforts** (**1.77 ± 1.50**) and **community climate** (**2.00 ± 0.64**). These low scores in critical areas suggest significant challenges in

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does low perfusion index affect the difference in glucose level between capillary and venous blood?
Answer from pipeline:
Yes, a low perfusion index is associated with a greater difference between capillary and venous glucose levels. The study found that in patients with normal vital signs and a perfusion index between 0 and 5, the glucose measurements obtained by finger stick (capillary) and venous methods were less consistent. In contrast, when the perfusion index was 6 or higher, the glucose values from both methods were more consistent. This indicates that a low perfusion index adversely affects the comparability of capillary and venous glucose measurements.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does biolimus-eluting stent with biodegradable polymer improve clinical outcomes in patients with acute myocardial infarction?
Answer from pipeline:
Yes, the use of a biolimus-eluting stent (BES) with a biodegradable polymer significantly improves clinical outcomes in patients with acute myocardial infarction (AMI) compared to a sirolimus-eluting stent (SES) with a durable polymer. 

**Key Findings from the LEADERS Trial:**

1. **Overall AMI Patients:**
   - **Patient-Oriented Composite Endpoint (POCE):** Reduced from 42.3% with SES to 28.9% with BES (Relative Risk [RR] 0.61, p=0.001).
   - **Major Adverse Cardiac Events (MACE):** Decreased from 25.9% with SES to 18.2% with BES (RR 0.67, p=0.025).
   - **Cardiac Death and Stent Thrombosis:** No significant differences observed between BES and SES groups.

2. **Subgroup of Patients with ST-Elevation Myocardial Infarction (STEMI):**
   - **POCE:** Reduced from 39.3% with SES to 24.4% with BES (RR 0.55, p=0.006).
   - **MACE:** 

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does cadmium Exposure enhance Bisphenol A-Induced Genotoxicity through 8-Oxoguanine-DNA Glycosylase-1 OGG1 Inhibition in NIH3T3 Fibroblast Cells?
Answer from pipeline:
Yes, cadmium (Cd) exposure enhances Bisphenol A (BPA)-induced genotoxicity in NIH3T3 fibroblast cells through the inhibition of 8-oxoguanine-DNA glycosylase-1 (OGG1). The study demonstrated that while Cd or BPA alone did not significantly affect cell viability (except BPA at 50 μM), pre-treatment with Cd exacerbated BPA-induced cytotoxic effects. Specifically, Cd exposure led to a decrease in OGG1 expression. Over-expression of OGG1 was able to abolish the enhanced genotoxic and cytotoxic effects caused by the combination of Cd and BPA. This indicates that the inhibition of OGG1 by Cd is a key mechanism by which Cd augments BPA-induced DNA damage and cytotoxicity in NIH3T3 cells.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Is cMV specific cytokine release assay in whole blood optimized by combining synthetic CMV peptides and toll like receptor agonists?
Answer from pipeline:
Yes, the CMV-specific cytokine release assay in whole blood was optimized by combining synthetic CMV peptides with toll-like receptor (TLR) agonists. Specifically, the inclusion of TLR agonists such as lipoteichoic acid (LTA) and polyinosinic-polycytidylic acid (Poly I:C) augmented interferon gamma (IFNγ) responses when using synthetic pp65 peptides, native lysate, or recombinant pp65 in CMV seropositive healthy controls. This enhancement improved the assay's ability to detect CMV-specific cellular immunity without affecting seronegative individuals, thereby optimizing the assay's sensitivity and specificity.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does geographical disparity in breast reconstruction following mastectomy have reduced over time?
Answer from pipeline:
Yes, the geographical disparity in breast reconstruction following mastectomy has significantly reduced over time.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does damage Patterns at the Head-Stem Taper Junction help Understand the Mechanisms of Material Loss?
Answer from pipeline:
Yes, analyzing the damage patterns at the head-stem taper junction aids in understanding the mechanisms of material loss in metal-on-metal total hip arthroplasties. The study conducted a detailed examination of material loss patterns in 155 implants, developing a 5-tier classification system based on visual differences. By correlating these damage patterns with surgical, implant, and patient factors known to influence taper damage, the research provided valuable insights into why material loss occurs at this junction. This enhanced understanding can help in identifying the causes of early implant failure and potentially guide improvements in implant design and surgical techniques to mitigate material loss.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Do rapid and accurate identification of Xanthomonas citri subspecies citri by fluorescence in situ hybridization?
Answer from pipeline:
Yes, fluorescence in situ hybridization (FISH) enables the rapid and accurate identification of **Xanthomonas citri subsp. citri** (Xcc). According to the provided context:

- **Speed:** The FISH-based diagnostic assay typically requires between **2 and 3 hours** to deliver results.
  
- **Specificity and Sensitivity:** The assay has been validated for specificity against a range of **Xanthomonas species and strains**, demonstrating specificity similar to existing PCR diagnostic tools. Its sensitivity is also comparable to PCR-based techniques, making it effective for identifying Xcc in symptomatic plant material.
  
- **Ease of Use:** The method does not require cultivation or DNA extraction, and it is easily transferable to diagnosticians without prior experience using FISH.

Overall, the FISH-based protocol offers a highly efficient altern

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does puerarin inhibit the inflammatory response in atherosclerosis via modulation of the NF-κB pathway in a rabbit model?
Answer from pipeline:
Yes, puerarin inhibits the inflammatory response in atherosclerosis by modulating the NF-κB pathway in a rabbit model. In the study, rabbits fed a high-lipid diet supplemented with puerarin (PUE group) showed reduced thickness of the intima compared to those on the high-lipid diet alone (HLD group). Puerarin treatment led to decreased protein and mRNA levels of adhesion molecules (ICAM-1, VCAM-1, and E-selectin). This reduction was attributed to the inhibition of phosphorylation and degradation of I-κB, which in turn prevented the nuclear translocation of p65 NF-κB. By hindering the activation of the NF-κB pathway, puerarin effectively reduced the inflammatory response associated with atherosclerosis in the rabbit model.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does mediastinal transposition of the omentum reduce infection severity and pharmacy cost for patients undergoing esophagectomy?
Answer from pipeline:
Yes, mediastinal transposition of the omentum reduces both the severity of intrathoracic infections and pharmacy costs for patients undergoing esophagectomy. According to the study:

- **Infection Severity:** The transposition group experienced a lower intrathoracic infection rate (30.6% vs. 48.3%, P=0.009) and the infections were milder in severity (P=0.005) compared to the non-transposition group.
  
- **Pharmacy Costs:** The pharmacy costs were significantly lower in the transposition group (¥21,668) compared to the non-transposition group (¥27,012, P=0.010).

These findings indicate that mediastinal transposition of the omentum is effective in mitigating the severity of infections and reducing pharmacy-related expenses post-esophagectomy.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does n-pentyl-nitrofurantoin induce apoptosis in HL-60 leukemia cell line by upregulating BAX and downregulating BCL-xL gene expression?
Answer from pipeline:
Yes, n-pentyl-nitrofurantoin (NFP) induces apoptosis in HL-60 leukemia cells by upregulating BAX and downregulating BCL-xL gene expression. According to the provided context, NFP treatment led to increased mRNA levels of BAX and decreased mRNA levels of BCL-xL, which are indicative of the apoptotic process in these cells.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Are socioeconomic inequalities still a barrier to full child vaccine coverage in the Brazilian Amazon : a cross-sectional study in Assis Brasil , Acre , Brazil?
Answer from pipeline:
Yes, socioeconomic inequalities remain a significant barrier to achieving full child vaccine coverage in the Brazilian Amazon, specifically in Assis Brasil, Acre. The study found that incomplete vaccination was significantly associated with lower socioeconomic status factors, including insufficient income to purchase a house (adjusted odds ratio [aOR] = 2.12) and low maternal education levels (aOR = 2.60). Additionally, the duration of a child's residence in the urban area was a protective factor against incomplete vaccination (aOR = 0.73). These findings indicate that children from lower-income families and those with less educated mothers are less likely to receive complete vaccination, highlighting ongoing socioeconomic disparities that hinder optimal vaccine coverage in this region.

--------

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does microRNA-19a enhance proliferation of bronchial epithelial cells by targeting TGFβR2 gene in severe asthma?
Answer from pipeline:
Yes, according to the study, microRNA-19a enhances the proliferation of bronchial epithelial cells in severe asthma by targeting the TGF-β receptor 2 (TGFβR2) gene. The research demonstrated that miR-19a was upregulated in the epithelia of severe asthmatic subjects compared to those with mild asthma and healthy controls. Functional assays, including luciferase reporter and Western blot analyses, indicated that miR-19a promotes cell proliferation by targeting TGFβR2 mRNA. Additionally, reducing miR-19a expression led to increased SMAD3 phosphorylation through TGFβR2 signaling and subsequently inhibited bronchial epithelial cell proliferation. This evidence supports the role of miR-19a in facilitating airway remodeling in severe asthma by modulating TGFβR2.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Is percutaneous intervention of circumflex chronic total occlusions associated with worse procedural outcomes : insights from a Multicentre US Registry?
Answer from pipeline:
Yes, percutaneous intervention of circumflex chronic total occlusions (LCX CTOs) is associated with worse procedural outcomes based on the Multicentre US Registry data provided. Specifically:

- **Lower Procedural Success:** The success rate for LCX CTO PCI was significantly lower at 84.6%, compared to 91.7% for right coronary artery (RCA) CTOs and 94.7% for left anterior descending artery (LAD) CTOs (P = 0.016).

- **Higher Complication Rates:** Major complications tended to occur more frequently in LCX PCI (4.3%) compared to RCA (1.0%) and LAD (2.3%) PCI, although this difference approached but did not reach statistical significance (P = 0.07).

- **Increased Procedural Complexity:** LCX CTO interventions required longer fluoroscopy times and more contrast administration, indicating greater procedural 

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does metabolomic analysis of CSF indicate brain metabolic impairment precedes hematological indices of anemia in the iron-deficient infant monkey?
Answer from pipeline:
Yes, the metabolomic analysis of cerebrospinal fluid (CSF) indicates that brain metabolic impairments occur before the hematological indices of anemia become evident in iron-deficient infant monkeys. Specifically, the study found that by **4 months of age**, iron-deficient (ID) infants already showed lower ratios of pyruvate/glutamine and phosphocreatine/creatine (PCr/Cr) in the CSF. At this same time point, the ID infants had developed iron deficiency, as evidenced by transferrin (Tf) saturation levels below 25%, but had not yet become anemic. Anemia, characterized by hemoglobin (Hgb) levels below 110 g/L and mean corpuscular volume (MCV) below 60 fL, only developed by **6 months of age**. This timeline demonstrates that metabolomic changes in the brain's CSF precede the onset of traditional blood-based indic

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does beta-blocker Treatment Worsen Critical Limb Ischemia in Patients Receiving Endovascular Therapy?
Answer from pipeline:
No, beta-blocker treatment does not worsen critical limb ischemia (CLI) in patients receiving endovascular therapy (EVT). The study found no significant differences in amputation-free survival (AFS), limb salvage rates, overall survival, or freedom from major adverse limb events (MALE) between CLI patients treated with beta-blockers and those who were not. Specifically, at three years, AFS was 58.8% for beta-blocker-treated patients compared to 58.5% for non-treated patients (log-rank p = 0.76), and similar non-significant differences were observed for limb salvage rate, overall survival, and freedom from MALE. Therefore, beta-blocker therapy does not appear to worsen clinical outcomes in CLI patients undergoing EVT.

-----------------------------------



Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Does recipient T cell TIM-3 and hepatocyte galectin-9 signalling protect mouse liver transplants against ischemia-reperfusion injury?
Answer from pipeline:
Yes, recipient T cell TIM-3 and hepatocyte galectin-9 (Gal-9) signaling protect mouse liver transplants against ischemia-reperfusion injury (IRI). 

**Evidence from the Context:**

1. **TIM-3 Transgenic Models:**
   - **Protection Against IRI:** Orthotopic liver transplants in groups where both donors and recipients were TIM-3 transgenic (TIM-3Tg→TIM-3Tg) exhibited resistance to IR-stress. This was demonstrated by preserved hepatocellular function (as indicated by serum ALT levels) and intact liver architecture (Suzuki's score).
   - **Susceptibility in Non-Tg Recipients:** In contrast, when only the donor was TIM-3 transgenic but the recipient was wild type (WT or TIM-3Tg→WT), the liver transplants were susceptible to IRI.

2. **Mechanisms of Protection:**
   - **Modulation of T Cell Responses:** TIM-3 induction in recipi

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Question: Are secretory phospholipases A2 secreted from ciliated cells and increase mucin and eicosanoid secretion from goblet cells?
Answer from pipeline:
Yes, secretory phospholipases A₂ (sPLA₂) are secreted from ciliated-enriched airway epithelial cells and subsequently increase both mucin and eicosanoid (specifically cysteinyl leukotrienes) secretion from goblet-enriched cells. The study found that:

- **sPLA₂ Expression and Secretion:** sPLA₂ groups IIa, V, and X showed increased mRNA expression in ciliated-enriched cells but not in goblet-enriched cells. Moreover, sPLA₂ proteins were secreted from the apical side of ciliated-enriched cells and were strongly present in these cells based on immunostaining.

- **Effect on Goblet Cells:** The secreted sPLA₂, particularly sPLA₂ V, enhanced the secretion of cysteinyl leukotrienes (cysLTs) from goblet-enriched cells. Additionally, sPLA₂ V was shown to increase mucin secretion from goblet cells. This mucin secretion was inhibited by lipo

While each evaluator is a component that can be run individually in Haystack, they can also be added into a pipeline. This way, we can construct an `eval_pipeline` that includes all evaluators for the metrics we want to evaluate our pipeline on.

In [9]:
from haystack.components.evaluators.document_mrr import DocumentMRREvaluator
from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator
from haystack.components.evaluators.sas_evaluator import SASEvaluator

eval_pipeline = Pipeline()
eval_pipeline.add_component("doc_mrr_evaluator", DocumentMRREvaluator())
eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator())
eval_pipeline.add_component("sas_evaluator", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2"))

results = eval_pipeline.run(
    {
        "doc_mrr_evaluator": {
            "ground_truth_documents": list([d] for d in ground_truth_docs),
            "retrieved_documents": retrieved_docs,
        },
        "faithfulness": {
            "questions": list(questions),
            "contexts": list([d.content] for d in ground_truth_docs),
            "predicted_answers": rag_answers,
        },
        "sas_evaluator": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)},
    }
)

100%|██████████| 25/25 [01:32<00:00,  3.72s/it]


### Constructing an Evaluation Report

Once we've run our evaluation pipeline, we can also create a full evaluation report. Haystack provides an `EvaluationRunResult` which we can use to display an `aggregated_report` 👇

In [10]:
from haystack.evaluation.eval_run_result import EvaluationRunResult

inputs = {
    "question": list(questions),
    "contexts": list([d.content] for d in ground_truth_docs),
    "answer": list(ground_truth_answers),
    "predicted_answer": rag_answers,
}

evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results)
evaluation_result.aggregated_report()

{'metrics': ['doc_mrr_evaluator', 'faithfulness', 'sas_evaluator'],
 'score': [0.98, np.float64(1.0), np.float64(0.7095784056186676)]}

#### Extra: You can also see a detailed report with the scores for each sample in your dataset, and we will choose the output format as DataFrame

In [11]:
results_df = evaluation_result.detailed_report(output_format='df')
results_df

Unnamed: 0,question,contexts,answer,predicted_answer,doc_mrr_evaluator,faithfulness,sas_evaluator
0,Do plasma levels of Galectin-9 reflect disease...,[Galectin-9 (Gal-9) is a β-galactoside-binding...,"Gal-9 is released during acute malaria, and re...","Yes, plasma levels of Galectin-9 reflect disea...",1.0,1.0,0.549785
1,Does ginsenoside Rg3 improve erectile function...,[Ginsenoside Rg3 is one of the active ingredie...,Oral gavage with Rg3 appears to both prevent d...,"Yes, ginsenoside Rg3 improves erectile functio...",1.0,1.0,0.549989
2,Does dual trigger for final oocyte maturation ...,[To identify the risk factors for suboptimal r...,Basal LH level was useful predictor of the sub...,"Yes, administering a dual trigger (gonadotropi...",0.5,1.0,0.777647
3,Are high levels of serum vitamin D associated ...,[There are conflicting results for relationshi...,High levels of serum vitamin D were associated...,"Yes, according to the provided context, high l...",1.0,1.0,0.826247
4,Does periodontitis promote the diabetic develo...,[Emerging evidence has indicated the bad effec...,This study provided new evidence for the posit...,"Yes, based on the provided context, periodonti...",1.0,1.0,0.949997
5,Does over-expression of small ubiquitin-like m...,[Non-small cell lung cancer (NSCLC) is one of ...,SENP1 may be a promising predictor of survival...,"Yes, the over-expression of small ubiquitin-li...",1.0,1.0,0.744036
6,Do whole-genome sequencing and epidemiological...,[Mycobacterium abscessus has emerged as a majo...,We have not demonstrated cross-transmission of...,"Yes, whole-genome sequencing combined with epi...",1.0,1.0,0.618438
7,Is community readiness for adolescents ' overw...,[South Africa is undergoing epidemiological an...,Religious leaders recognised that they act as ...,"Yes, the community readiness for preventing ov...",1.0,1.0,0.771028
8,Does low perfusion index affect the difference...,"[In emergency cases, finger stick testing is p...","In the emergency department, perfusion index v...","Yes, a low perfusion index is associated with ...",1.0,1.0,0.572686
9,Does biolimus-eluting stent with biodegradable...,[To investigate clinical outcomes of coronary ...,"BES, compared with SES, significantly improved...","Yes, the use of a biolimus-eluting stent (BES)...",1.0,1.0,0.494062


Having our evaluation results as a dataframe can be quite useful. For example, below we can use the pandas dataframe to filter the results to the top 3 best scores for semantic answer similarity (`sas_evaluator`) as well as the bottom 3 👇


In [12]:
import pandas as pd

top_3 = results_df.nlargest(3, "sas_evaluator")
bottom_3 = results_df.nsmallest(3, "sas_evaluator")
pd.concat([top_3, bottom_3])

Unnamed: 0,question,contexts,answer,predicted_answer,doc_mrr_evaluator,faithfulness,sas_evaluator
4,Does periodontitis promote the diabetic develo...,[Emerging evidence has indicated the bad effec...,This study provided new evidence for the posit...,"Yes, based on the provided context, periodonti...",1.0,1.0,0.949997
15,Does puerarin inhibit the inflammatory respons...,[The isoflavone puerarin [7-hydroxy-3-(4-hydro...,This study indicates that the effect of puerar...,"Yes, puerarin inhibits the inflammatory respon...",1.0,1.0,0.885431
16,Does mediastinal transposition of the omentum ...,[The greater omentum has been found to be immu...,Mediastinal transposition of the omentum decre...,"Yes, mediastinal transposition of the omentum ...",1.0,1.0,0.862128
12,Does geographical disparity in breast reconstr...,[Breast reconstruction (BR) following mastecto...,Geographical barriers to accessing BR have red...,"Yes, the geographical disparity in breast reco...",1.0,1.0,0.255245
9,Does biolimus-eluting stent with biodegradable...,[To investigate clinical outcomes of coronary ...,"BES, compared with SES, significantly improved...","Yes, the use of a biolimus-eluting stent (BES)...",1.0,1.0,0.494062
0,Do plasma levels of Galectin-9 reflect disease...,[Galectin-9 (Gal-9) is a β-galactoside-binding...,"Gal-9 is released during acute malaria, and re...","Yes, plasma levels of Galectin-9 reflect disea...",1.0,1.0,0.549785


## What's next

🎉 Congratulations! You've learned how to evaluate a RAG pipeline with model-based evaluation frameworks and without any labeling efforts.

If you liked this tutorial, you may also enjoy:
- [Serializing Haystack Pipelines](https://haystack.deepset.ai/tutorials/29_serializing_pipelines)
-  [Creating Your First QA Pipeline with Retrieval-Augmentation](https://haystack.deepset.ai/tutorials/27_first_rag_pipeline)

To stay up to date on the latest Haystack developments, you can [sign up for our newsletter](https://landing.deepset.ai/haystack-community-updates). Thanks for reading!