# Tugas Akhir STBI
## "Healthcare Chatbot for Assistance Using QA Model with Word Embedding-Based Information Retrieval"

Anggota Kelompok:
* Rhafael Chandra (22/498550/PA/21528)
* Nathanael Aurelino Sulistyo (22/497480/PA/21422)
* Rahmad Ramadhan (22/494516/PA/21278)

In [12]:
import pandas as pd
import numpy as np
import scipy as sp
from gensim.models import KeyedVectors
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, AutoModel
import torch
from tqdm.notebook import tqdm
import time
from nltk.translate.chrf_score import sentence_chrf
from rouge_score import rouge_scorer
from nltk.translate.meteor_score import meteor_score
import nltk
import textwrap

In [13]:
import gensim
import sklearn

## Load Gemini API

In [14]:
import google.generativeai as genai
import os
from dotenv import load_dotenv
load_dotenv()

GEMINI_API_KEY =  os.getenv('GEMINI_API_KEY')
genai.configure(api_key=GEMINI_API_KEY)
model_llm = genai.GenerativeModel("gemini-1.5-flash")

## Load Dataset

In [15]:
df_medquad = pd.read_csv("MedQuad-MedicalQnADataset.csv")
df_medquad.head()

Unnamed: 0.1,Unnamed: 0,qtype,Question,Answer
0,0,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...
1,1,symptoms,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...
2,2,susceptibility,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...
3,3,exams and tests,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos..."
4,4,treatment,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen..."


In [16]:
df_pubmed_train = pd.read_csv("PubMed_200k_RCT/train.csv")
df_pubmed_test = pd.read_csv("PubMed_200k_RCT/test.csv")
df_pubmed = pd.concat([df_pubmed_train, df_pubmed_test], ignore_index=True)
df_pubmed.head()

Unnamed: 0,abstract_id,line_id,abstract_text,line_number,total_lines,target
0,24491034,24491034_0_11,The emergence of HIV as a chronic condition me...,0,11,BACKGROUND
1,24491034,24491034_1_11,This paper describes the design and evaluation...,1,11,BACKGROUND
2,24491034,24491034_2_11,This study is designed as a randomised control...,2,11,METHODS
3,24491034,24491034_3_11,The intervention group will participate in the...,3,11,METHODS
4,24491034,24491034_4_11,The program is based on self-efficacy theory a...,4,11,METHODS


# Preprocessing

In [17]:
df_pubmed = df_pubmed.dropna(subset=['abstract_text'])  # Remove rows with NaN in abstract_text
df_pubmed = df_pubmed.sort_values(by=['line_id'])
df_pubmed = df_pubmed.groupby('abstract_id').agg({'abstract_text':' '.join}).reset_index()
df_pubmed.head()

Unnamed: 0,abstract_id,abstract_text
0,1279170,We conducted this study to assess the clinical...
1,1281030,To determine whether prophylactic treatment wi...
2,1282364,After the discovery of type C hepatitis virus ...
3,1283117,Since it is not clear whether testosterone or ...
4,1283730,The aim was to study the pharmacokinetic param...


In [18]:
# add qa id to medquad
df_medquad['qa_id'] = range(1, len(df_medquad) + 1)
df_medquad = df_medquad[['qa_id', 'Question', 'Answer', 'qtype']]
df_medquad.head()

Unnamed: 0,qa_id,Question,Answer,qtype
0,1,Who is at risk for Lymphocytic Choriomeningiti...,LCMV infections can occur after exposure to fr...,susceptibility
1,2,What are the symptoms of Lymphocytic Choriomen...,LCMV is most commonly recognized as causing ne...,symptoms
2,3,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...,susceptibility
3,4,How to diagnose Lymphocytic Choriomeningitis (...,"During the first phase of the disease, the mos...",exams and tests
4,5,What are the treatments for Lymphocytic Chorio...,"Aseptic meningitis, encephalitis, or meningoen...",treatment


In [19]:
train_medquad, test_medquad = train_test_split(df_medquad, test_size=0.2, random_state=42)

# Information Retrieval Model 1

## Word Embedding: Word2Vec

In [28]:
model_path = 'GoogleNews-vectors-negative300.bin'
word2vec = KeyedVectors.load_word2vec_format(model_path, binary=True)

In [29]:
def preprocess(sentence):
    return [word for word in sentence.lower().split() if word in word2vec]

def get_sentence_vector(sentence):
    words = preprocess(sentence)
    if words:
        return np.mean([word2vec[word] for word in words], axis=0)
    else:
        return np.zeros(word2vec.vector_size)

In [30]:
df_pubmed['abstract_vector'] = df_pubmed['abstract_text'].apply(get_sentence_vector)
train_medquad['question_vector'] = train_medquad['Question'].apply(get_sentence_vector)

### Retrieval Functions

In [22]:
def find_top_k_abstracts(df, query_vector, abstract_vectors_col, k=3):
    cosine_scores = cosine_similarity([query_vector], np.array(df[abstract_vectors_col]).tolist())[0]
    top_k_indices = np.argsort(cosine_scores)[-k:][::-1]
    return df.iloc[top_k_indices]['abstract_id'].values, cosine_scores[top_k_indices]

def find_top_k_answers(df, query_vector, question_vectors_col, k=3):
    cosine_scores = cosine_similarity([query_vector], np.array(df[question_vectors_col]).tolist())[0]
    top_k_indices = np.argsort(cosine_scores)[-k:][::-1]
    return df.iloc[top_k_indices]['qa_id'].values, cosine_scores[top_k_indices]

def get_qa_pairs(df, qa_ids):
    return df[df['qa_id'].isin(qa_ids)][['qa_id', 'Question', 'Answer']]

def get_abstracts(df, abstract_ids):
    return df[df['abstract_id'].isin(abstract_ids)][['abstract_id', 'abstract_text']]

def query_top_k_answers_and_abstracts(df_1, df_2, query, k=10):
    query_vector = get_sentence_vector(query)

    qa_ids, question_scores = find_top_k_answers(df_1, query_vector, 'question_vector', k)
    abstract_ids, abstract_scores = find_top_k_abstracts(df_2, query_vector, 'abstract_vector', k)

    answers_df = get_qa_pairs(df_1, qa_ids)
    abstracts_df = get_abstracts(df_2, abstract_ids)

    answers_df['Similarity Score'] = question_scores
    abstracts_df['Similarity Score'] = abstract_scores

    return answers_df, abstracts_df

In [23]:
queries = [
    "What is the incubation period of COVID-19?",
    "Cure for fever?",
    "Who is at risk for Lymphocytic Choriomeningit",
    "What are the symptoms of Ligma?"
]

In [33]:
for query in queries:
    print(f"Query: {query}")
    answers_df, abstracts_df = query_top_k_answers_and_abstracts(train_medquad, df_pubmed, query)
    print("Top Answers:")
    display(answers_df)
    print("Top Abstracts:")
    display(abstracts_df)
    print("\n")

Query: What is the incubation period of COVID-19?
Top Answers:


Unnamed: 0,qa_id,Question,Answer,Similarity Score
10753,10754,What is (are) dyskeratosis congenita ?,Dyskeratosis congenita is a disorder that can ...,0.647081
252,253,what is the treatment for vancomycin-resistant...,On this Page General Information What is vanco...,0.637043
11895,11896,What is (are) Anonychia congenita ?,Anonychia congenita is an extremely rare nail ...,0.637043
15727,15728,What is (are) Pachyonychia congenita ?,Pachyonychia congenita (PC) is a rare inherite...,0.637043
8918,8919,What is (are) pachyonychia congenita ?,Pachyonychia congenita is a condition that pri...,0.637043
7678,7679,What is (are) paramyotonia congenita ?,Paramyotonia congenita is a disorder that affe...,0.637043
10058,10059,What is (are) Waldenstrm macroglobulinemia ?,Waldenstrm macroglobulinemia is a rare blood c...,0.637043
392,393,What is the outlook for Agenesis of the Corpus...,Prognosis depends on the extent and severity o...,0.633984
12772,12773,What is (are) Paramyotonia congenita ?,Paramyotonia congenita is an inherited conditi...,0.631612
728,729,What is the outlook for Neurosyphilis ?,Prognosis can change based on the type of neur...,0.630091


Top Abstracts:


Unnamed: 0,abstract_id,abstract_text,Similarity Score
8186,8612858,To establish whether time to down-regulation a...,0.685417
26790,10748773,Irrigation suction drainage ( ISD ) is an addi...,0.684163
44733,12526238,Heart rate has been used to measure infants ' ...,0.683375
51337,14601817,The purpose of this study was to investigate w...,0.682391
60374,15561795,To measure the impact of a computerized guidel...,0.676083
94430,18295761,To determine the optimum time interval between...,0.674333
98287,18577202,Two types of methods are used to assess learni...,0.67266
124381,20696729,The goal was to assess the feasibility of earl...,0.672455
141788,22068638,Near infrared ( NIR ) spectroscopy is a techno...,0.672451
188672,25432920,Does culture in a closed system result in an i...,0.671429




Query: Cure for fever?
Top Answers:


Unnamed: 0,qa_id,Question,Answer,Similarity Score
1327,1328,What are the treatments for Myotonia ?,"Treatment for myotonia may include mexiletine,...",0.604811
9647,9648,What are the treatments for Parkinson disease ?,These resources address the diagnosis or manag...,0.604055
10887,10888,What are the treatments for myotonia congenita ?,These resources address the diagnosis or manag...,0.604055
9412,9413,What are the treatments for retinitis pigmento...,These resources address the diagnosis or manag...,0.601695
919,920,What are the treatments for Leukodystrophy ?,Treatment for most of the leukodystrophies is ...,0.601124
9122,9123,What are the treatments for Alzheimer disease ?,These resources address the diagnosis or manag...,0.601097
10822,10823,"What are the treatments for neuropathy, ataxia...",These resources address the diagnosis or manag...,0.599236
2383,2384,What are the treatments for Indigestion ?,Some people may experience relief from symptom...,0.599236
827,828,What are the treatments for Metachromatic Leuk...,There is no cure for MLD. Bone marrow transpla...,0.599208
1081,1082,What are the treatments for Myotonia Congenita ?,Most people with myotonia congenita dont requi...,0.599208


Top Abstracts:


Unnamed: 0,abstract_id,abstract_text,Similarity Score
201,1392793,To evaluate the effect of short term treatment...,0.575126
40028,12000378,"Tinea capitis , a common clinical pattern of d...",0.558888
48860,12856053,To compare the parasitological and clinical ef...,0.556633
69288,16240515,To determine whether a single dose of Clindess...,0.555034
85533,17504616,Applying three treatment methods for enuresis ...,0.55409
91250,17998493,There is a paucity of data on the efficacy of ...,0.553726
95903,18401974,At the present the clinical treatment of choic...,0.546881
111544,19663597,Treatment of visceral leishmaniasis ( VL ) is ...,0.545302
141414,22039269,The aim of this study is to evaluate the effec...,0.543007
179615,24673608,Long-duration beta-lactam antibiotics are used...,0.540354




Query: Who is at risk for Lymphocytic Choriomeningit
Top Answers:


Unnamed: 0,qa_id,Question,Answer,Similarity Score
3921,3922,Who is at risk for Parkinson's Disease? ?,"About 60,000 Americans are diagnosed with Park...",1.0
1576,1577,Who is at risk for Diverticular Disease? ?,Diverticulosis becomes more common as people a...,0.951077
4546,4547,Who is at risk for Bronchopulmonary Dysplasia? ?,The more premature an infant is and the lower ...,0.947501
4161,4162,Who is at risk for Alpha-1 Antitrypsin Deficie...,Alpha-1 antitrypsin (AAT) deficiency occurs in...,0.943087
4379,4380,Who is at risk for Hemolytic Anemia? ?,Hemolytic anemia can affect people of all ages...,0.942907
3135,3136,Who is at risk for Chronic Lymphocytic Leukemi...,Older age can affect the risk of developing ch...,0.942187
2,3,Who is at risk for Lymphocytic Choriomeningiti...,Individuals of all ages who come into contact ...,0.938165
2761,2762,Who is at risk for Parathyroid Cancer? ?,Having certain inherited disorders can increas...,0.935887
4312,4313,Who is at risk for Electrocardiogram? ?,An electrocardiogram (EKG) has no serious risk...,0.918098
4492,4493,Who is at risk for Thrombocythemia and Thrombo...,Primary Thrombocythemia\n \nThr...,0.918098


Top Abstracts:


Unnamed: 0,abstract_id,abstract_text,Similarity Score
29215,10975790,1-2 % of all patients under non-steroidal anti...,0.81117
33936,11403365,The Hypertension Optimal Treatment ( HOT ) Stu...,0.794185
62696,15742336,The efficacy of allogeneic hematopoietic stem ...,0.790489
117630,20139767,This study evaluates the Alzheimer disease ris...,0.784338
118754,20216073,To determine whether family medical history as...,0.783373
154880,22992357,"Family Healthware , a tool developed by the CD...",0.782582
155669,23046591,Atrial fibrillation ( AF ) is the most common ...,0.781664
179817,24686885,There is little evidence to inform the targete...,0.781331
186065,25188543,Civilian posttraumatic stress disorder ( PTSD ...,0.780802
189146,25467619,To determine the perceived risk of type 2 diab...,0.780116




Query: What are the symptoms of Ligma?
Top Answers:


Unnamed: 0,qa_id,Question,Answer,Similarity Score
14104,14105,What are the symptoms of Phacomatosis pigmento...,What are the signs and symptoms of phacomatosi...,1.0
13641,13642,What are the symptoms of Camptodactyly taurinu...,What are the signs and symptoms of Camptodacty...,1.0
15548,15549,What are the symptoms of Coccygodynia ?,What signs and symptoms are associated with co...,1.0
14526,14527,What are the symptoms of GM1 gangliosidosis ?,What are the signs and symptoms of GM1 ganglio...,1.0
13644,13645,What are the symptoms of Tetramelic monodactyly ?,What are the signs and symptoms of Tetramelic ...,1.0
15074,15075,What are the symptoms of Syringoma ?,What are the signs and symptoms of Syringoma? ...,1.0
1787,1788,What are the symptoms of Cystocele ?,The symptoms of a cystocele may include\n ...,1.0
15997,15998,What are the symptoms of Microhydranencephaly ?,What are the signs and symptoms of Microhydran...,1.0
16349,16350,What are the symptoms of Microtia-Anotia ?,What are the signs and symptoms of Microtia-An...,1.0
11302,11303,What are the symptoms of D-glycericacidemia ?,What are the signs and symptoms of D-glycerica...,1.0


Top Abstracts:


Unnamed: 0,abstract_id,abstract_text,Similarity Score
13964,9300508,The majority of patients presenting to cardiac...,0.77045
33146,11329526,Although the patient experiences the symptoms ...,0.759774
60558,15572868,"According to homeopathic theory , symptoms pro...",0.756948
65381,15938885,To investigate the states of chronic symptoms ...,0.756407
82492,17291177,Expectancy and modeling have been cited as fac...,0.755323
89162,17805217,Generalized anxiety disorder ( GAD ) is a chro...,0.75319
101073,18806203,"Occasional leg symptoms , like feelings of hea...",0.751274
104312,19127072,Subjects with atopic syndrome often perceive s...,0.749022
118662,20211300,The timely and accurate identification of symp...,0.748671
146081,22364776,Tobacco withdrawal symptoms may be confounded ...,0.747349






# Pass Context to LLM

In [34]:
def call_llm(context, query):
    prompt = f"You are a medical expert assistant. Answer the following medical question comprehensively and accurately. If the provided context contains relevant information, use it. If not, use your general medical knowledge to provide the best possible answer."

    prompt += f"\nQuestion: {query}\n"

    prompt += "\nContext:\n"
    prompt += "\nRelevant Medical QA Pairs:\n"
    for i, qa in enumerate(context['answers'], 1):
        prompt += f"\nQA Pair {i}:\nQ: {qa['Question']}\nA: {qa['Answer']}\n"
    
    prompt += "\nRelevant Medical Research Abstracts:\n"
    for i, abstract in enumerate(context['abstracts'], 1):
        prompt += f"\nAbstract {i}:\n{abstract['abstract_text']}\n"
    
    prompt += "\nProvide a clear, direct, and comprehensive answer to the question. Focus on being helpful and informative to the user."

    time.sleep(2)
    response = model_llm.generate_content(prompt)
    
    return response 

In [35]:
def process_medical_query(df_1, df_2, query):
    answers_df, abstracts_df = query_top_k_answers_and_abstracts(df_1, df_2, query)
    
    if len(answers_df) == 0 and len(abstracts_df) == 0:
        return {
            'response': "I apologize, but I don't have enough reliable information to answer this question.",
            'confidence': 'low'
        }
    
    context = {
        'answers': answers_df.to_dict(orient='records'),
        'abstracts': abstracts_df.to_dict(orient='records')
    }
    
    llm_response = call_llm(context, query)
    
    return {
        'response': llm_response.text,
        'confidence': 'high'
    }

test_query = "Who is at risk for Lymphocytic Choriomeningit?"
result = process_medical_query(train_medquad, df_pubmed, test_query)
print(result['response'])

Lymphocytic choriomeningitis (LCM) risk is primarily associated with contact with infected rodents, specifically their urine, feces, saliva, or blood.  This means individuals who handle wild mice or pet mice and hamsters (especially those from potentially contaminated colonies) are at increased risk.  Human fetuses are also at risk of vertical transmission from an infected mother.  Laboratory workers handling the virus or infected animals face a risk, although this can be mitigated through proper safety precautions and using animals from reliably tested sources.  The risk extends to all ages.



# Evaluation

## Test on Medquad Test Dataset

In [36]:
def normalize_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove extra whitespace
    text = ' '.join(text.split())
    return text

def truncate_or_pad_text(text, target_length):
    words = text.split()
    if len(words) > target_length:
        # Truncate to target length
        return ' '.join(words[:target_length])
    return text

def calculate_rouge(prediction, reference):
    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    scores = scorer.score(prediction, reference)
    return {
        'ROUGE-1': scores['rouge1'].fmeasure,
        'ROUGE-L': scores['rougeL'].fmeasure
    }

def calculate_meteor(prediction, reference):
    pred_tokens = nltk.word_tokenize(prediction.lower())
    ref_tokens = nltk.word_tokenize(reference.lower())
    return meteor_score([ref_tokens], pred_tokens)

def calculate_chrf(prediction, reference):
    return sentence_chrf(reference, prediction, min_len=1, beta=2.0)

def calculate_metrics(predictions, references):
    individual_scores = []
    
    for pred, ref in zip(predictions, references):
        # Normalize texts to lowercase and remove extra whitespace
        pred = ' '.join(pred.lower().split())
        ref = ' '.join(ref.lower().split())
        
        # Calculate length ratio for normalization
        pred_len = len(pred.split())
        ref_len = len(ref.split())
        # Option 1: Use min(pred_len, ref_len) / max(pred_len, ref_len)
        ratio = pred_len / ref_len if pred_len > 0 else 1
        
        scores = {}
        
        # Calculate ROUGE scores and normalize
        rouge_scores = calculate_rouge(pred, ref)
        scores['ROUGE-1'] = rouge_scores['ROUGE-1'] * ratio
        scores['ROUGE-L'] = rouge_scores['ROUGE-L'] * ratio
        
        # Calculate and normalize METEOR score
        scores['METEOR'] = calculate_meteor(pred, ref) * ratio
        
        # Calculate and normalize chrF2 score
        scores['chrF2'] = calculate_chrf(pred, ref) * ratio
        
        individual_scores.append(scores)
    
    # Calculate average scores
    avg_scores = {
        metric: np.mean([s[metric] for s in individual_scores])
        for metric in individual_scores[0].keys()
    }
    
    return avg_scores, individual_scores

def evaluate_llm(test_df, medquad_embeddings, pubmed_embeddings):
    # Define specific test questions
    test_questions = [
    "What is (are) lactate dehydrogenase deficiency ?",
    "Is 21-hydroxylase deficiency inherited ?",
    "Is Loeys-Dietz syndrome inherited ?",
    "What are the genetic changes related to cerebral cavernous malformation ?"
    ]
    
    # Get corresponding true answers from test_df
    test_sample = test_df[test_df['Question'].isin(test_questions)]
    
    questions = test_sample['Question'].tolist()
    true_answers = test_sample['Answer'].tolist()
    
    # Get predictions
    predicted_answers = []
    for question in questions:
        result = process_medical_query(medquad_embeddings, pubmed_embeddings, question)
        predicted_answers.append(str(result['response']))
    
    # Ensure we have predictions
    if len(predicted_answers) == 0:
        raise ValueError("No predictions were generated")
    
    # Calculate all metrics
    avg_metrics, individual_metrics = calculate_metrics(predicted_answers, true_answers)
    
    # Create evaluation dataframe
    eval_data = []
    
    for i, (question, true_ans, pred_ans, metrics) in enumerate(zip(
        questions, true_answers, predicted_answers, individual_metrics)):
        
        true_ans_norm = normalize_text(true_ans)
        pred_ans_norm = truncate_or_pad_text(
            normalize_text(pred_ans), 
            len(true_ans_norm.split())
        )
        
        eval_data.append({
            'Question': question,
            'True Answer': true_ans,
            'Predicted Answer': pred_ans,
            'Normalized True Answer': true_ans_norm,
            'Normalized Predicted Answer': pred_ans_norm,
            'Original True Length': len(true_ans.split()),
            'Original Predicted Length': len(pred_ans.split()),
            'Normalized Length': len(true_ans_norm.split()),
            **metrics
        })
    
    eval_df = pd.DataFrame(eval_data)
    
    print("\nDetailed Evaluation Results:")
    print("\nMetrics Summary:")
    metrics_cols = ['ROUGE-1', 'ROUGE-L', 'METEOR', 'chrF2']
    print(eval_df[metrics_cols].describe())
    
    print("\nLength Analysis:")
    print(f"Average Original True Length: {eval_df['Original True Length'].mean():.1f} words")
    print(f"Average Original Predicted Length: {eval_df['Original Predicted Length'].mean():.1f} words")
    print(f"Average Normalized Length: {eval_df['Normalized Length'].mean():.1f} words")
    
    return eval_df, avg_metrics

# Run evaluation
print("Starting evaluation...")
eval_df, avg_metrics = evaluate_llm(test_medquad, train_medquad, df_pubmed)


Starting evaluation...

Detailed Evaluation Results:

Metrics Summary:
        ROUGE-1   ROUGE-L    METEOR     chrF2
count  4.000000  4.000000  4.000000  4.000000
mean   0.449122  0.297260  0.297185  0.435532
std    0.214061  0.173475  0.187704  0.247642
min    0.260749  0.168720  0.150969  0.206665
25%    0.307863  0.180318  0.172518  0.272271
50%    0.396873  0.238095  0.237522  0.384035
75%    0.538132  0.355036  0.362188  0.547296
max    0.741994  0.544129  0.562728  0.767395

Length Analysis:
Average Original True Length: 133.8 words
Average Original Predicted Length: 120.5 words
Average Normalized Length: 133.8 words


In [37]:
print("\nAverage Metrics:")
avg_metrics = eval_df[['ROUGE-1', 'ROUGE-L', 'METEOR', 'chrF2']].mean()

# Creating a DataFrame with average metrics
avg_metrics_df = pd.DataFrame(avg_metrics, columns=['Average Value']).reset_index()
avg_metrics_df.columns = ['Metric', 'Average Value']
display(avg_metrics_df)

print("\nEvaluation DataFrame:")
# Adjust display options
pd.set_option('display.max_colwidth', 200)

# Create a new DataFrame with shortened answers
eval_df_short = eval_df.copy()
eval_df_short['True Answer'] = eval_df_short['True Answer'].apply(lambda x: textwrap.shorten(x, width=200, placeholder='...'))
eval_df_short['Predicted Answer'] = eval_df_short['Predicted Answer'].apply(lambda x: textwrap.shorten(x, width=200, placeholder='...'))
display(eval_df[['Question', 'True Answer', 'Predicted Answer', 'ROUGE-1', 'ROUGE-L', 'METEOR', 'chrF2']].head())




Average Metrics:


Unnamed: 0,Metric,Average Value
0,ROUGE-1,0.449122
1,ROUGE-L,0.29726
2,METEOR,0.297185
3,chrF2,0.435532



Evaluation DataFrame:


Unnamed: 0,Question,True Answer,Predicted Answer,ROUGE-1,ROUGE-L,METEOR,chrF2
0,Is Loeys-Dietz syndrome inherited ?,"Loeys-Dietz syndrome is considered to have an autosomal dominant pattern of inheritance, which means one copy of the altered gene in each cell is sufficient to cause the disorder. In about 75 per...","Yes, Loeys-Dietz syndrome is inherited. It's an autosomal dominant disorder, meaning that only one altered copy of a gene (either *TGFBR1* or *TGFBR2*) in each cell is sufficient to cause the con...",0.741994,0.544129,0.562728,0.767395
1,What are the genetic changes related to cerebral cavernous malformation ?,"Mutations in at least three genes, KRIT1 (also known as CCM1), CCM2, and PDCD10 (also known as CCM3), cause familial cerebral cavernous malformations. The precise functions of these genes are not...","Cerebral cavernous malformations (CCMs) are caused by genetic mutations in one of three genes: *KRIT1*, *CCM2*, and *PDCD10*. These genes are involved in the regulation of angiogenesis (the form...",0.323567,0.184185,0.179702,0.294139
2,What is (are) lactate dehydrogenase deficiency ?,"Lactate dehydrogenase deficiency is a condition that affects how the body breaks down sugar to use as energy in cells, primarily muscle cells. There are two types of this condition: lactate dehyd...","Lactate dehydrogenase (LDH) deficiency is a rare inherited metabolic disorder affecting the body's ability to break down sugar for energy, primarily within muscle cells. There are two main types:...",0.470178,0.292005,0.295342,0.47393
3,Is 21-hydroxylase deficiency inherited ?,How is 21-hydroxylase-deficient congenital adrenal hyperplasia passed through families? 21-hydroxylase-deficient congenital adrenal hyperplasia has an autosomal recessive pattern of inheritance. I...,"Yes, 21-hydroxylase deficiency is inherited. It follows an autosomal recessive pattern. This means that a person needs to inherit two copies of a mutated gene (one from each parent) to have the c...",0.260749,0.16872,0.150969,0.206665
