This notebook evaluates the results on first 20 rows, with explanations available, of the MEDMCQA validation dataset (https://huggingface.co/datasets/medmcqa/viewer/default/validation) as follows

* Model: MedAlpaca
* CoT strategies: Zero and few shot prompts
* Evaluation: sentence BERT comparison, GPT-4 comparison, KG based, ROUGE/BLEU

### Common imports

In [38]:
import itertools
import torch

In [3]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm
  return torch._C._cuda_getDeviceCount() > 0


In [4]:
import evaluate
bleu_score = evaluate.load("bleu")
rouge_score = evaluate.load('rouge')

In [5]:
import pandas as pd
semmed_db = pd.read_csv("/data/ajayago/KB_embeddings/dataset/polypharmacy/PREDICATIONS_OCCURS.csv")
semmed_db.head()

Unnamed: 0,PREDICATE,SUBJECT_CUI,SUBJECT_NAME,SUBJECT_SEMTYPE,SUBJECT_NOVELTY,SUBJECT_SCORE,OBJECT_CUI,OBJECT_NAME,OBJECT_SEMTYPE,OBJECT_NOVELTY,OBJECT_SCORE,OCCURS
0,ADMINISTERED_TO,C0000248,2-Acetylaminofluorene,hops,1.0,1000,C0018561,Mesocricetus auratus,mamm,1.0,1000,2
1,ADMINISTERED_TO,C0000248,2-Acetylaminofluorene,hops,1.0,1000,C0034693,Rattus norvegicus,mamm,1.0,861,31
2,ADMINISTERED_TO,C0000248,2-Acetylaminofluorene,hops,1.0,1000,C0034693,Rattus norvegicus,mamm,1.0,1000,31
3,ADMINISTERED_TO,C0000275,2-Chloroadenosine,phsu,1.0,1000,C0034693,Rattus norvegicus,mamm,1.0,1000,2
4,ADMINISTERED_TO,C0000407,3-Hydroxyanthranilic Acid,bacs,1.0,1000,C0029974,Ovum,cell,1.0,888,1


In [6]:
semmed_db["SUBJECT_NAME"] = semmed_db["SUBJECT_NAME"].apply(lambda x: x.upper())
semmed_db["OBJECT_NAME"] = semmed_db["OBJECT_NAME"].apply(lambda x: x.upper())

In [7]:
semmed_db.head()

Unnamed: 0,PREDICATE,SUBJECT_CUI,SUBJECT_NAME,SUBJECT_SEMTYPE,SUBJECT_NOVELTY,SUBJECT_SCORE,OBJECT_CUI,OBJECT_NAME,OBJECT_SEMTYPE,OBJECT_NOVELTY,OBJECT_SCORE,OCCURS
0,ADMINISTERED_TO,C0000248,2-ACETYLAMINOFLUORENE,hops,1.0,1000,C0018561,MESOCRICETUS AURATUS,mamm,1.0,1000,2
1,ADMINISTERED_TO,C0000248,2-ACETYLAMINOFLUORENE,hops,1.0,1000,C0034693,RATTUS NORVEGICUS,mamm,1.0,861,31
2,ADMINISTERED_TO,C0000248,2-ACETYLAMINOFLUORENE,hops,1.0,1000,C0034693,RATTUS NORVEGICUS,mamm,1.0,1000,31
3,ADMINISTERED_TO,C0000275,2-CHLOROADENOSINE,phsu,1.0,1000,C0034693,RATTUS NORVEGICUS,mamm,1.0,1000,2
4,ADMINISTERED_TO,C0000407,3-HYDROXYANTHRANILIC ACID,bacs,1.0,1000,C0029974,OVUM,cell,1.0,888,1


In [8]:
import networkx as nx
# load Semmed DB into memory
G = nx.from_pandas_edgelist(semmed_db, source="SUBJECT_NAME", target="OBJECT_NAME", edge_attr="OCCURS", edge_key="PREDICATE")

In [9]:
print(nx.number_of_nodes(G), nx.number_of_edges(G))

296597 14087555


In [10]:
concepts = set(nx.nodes(G))
len(concepts)

296597

In [11]:
from datasets import load_dataset
medmcqa_data = load_dataset("medmcqa", split="validation")

In [14]:
from transformers import AutoTokenizer, AutoModel
from transformers import pipeline
medalpaca_model = pipeline("text-generation", model="medalpaca/medalpaca-7b", tokenizer="medalpaca/medalpaca-7b")

Loading checkpoint shards: 100%|██████████| 3/3 [00:58<00:00, 19.41s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


### Question 1

```
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: A
Exp: Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries
```

In [15]:
print(f"Question: {medmcqa_data['question'][1]}")
print(f"Option A: {medmcqa_data['opa'][1]}")
print(f"Option B: {medmcqa_data['opb'][1]}")
print(f"Option C: {medmcqa_data['opc'][1]}")
print(f"Option D: {medmcqa_data['opd'][1]}")
print(f"Answer: {['A', 'B', 'C', 'D'][medmcqa_data['cop'][1]]}")

Question: Which of the following is not true about glomerular capillaries')
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: A


#### Zero Shot Prompt 

Prompt:
```
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: 
```


In [None]:
input = f"""
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: 
"""
generated_answer = medalpaca_model(input, max_new_tokens = 128)


In [23]:
generated_answer[0]["generated_text"][len(input):]

'Option A is true\nOption B is true\nOption C is true\nOption D is false\nQuestion: Which of the following is not true about the afferent arteriole?\nOption A: It is permeable to water but not to solutes\nOption B: It is a high-resistance artery\nOption C: It constricts in response to increased hydrostatic pressure\nOption D: It is permeable to solutes but not to water\nAnswer: \nOption A is true\nOption B is true\nOption C is true\nOption D is false'

**Sentence BERT comparison**

In [24]:
generated_cot = generated_answer[0]["generated_text"][len(input):]
cot_gt = """Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries"""
sentences_to_compare = [
    generated_cot,
    cot_gt
]
sentence_embeddings = model.encode(sentences_to_compare)


In [25]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(sentence_embeddings[[0]], sentence_embeddings[[1]])

array([[0.36436364]], dtype=float32)

**GPT-4 comparison** (based on https://chat.lmsys.org/ `gpt-4-1106-preview`)

Prompt:
```
How similar are the following sentences? Rate the similarity using a score between 0 and 1 with a reasoning for the same.
Sentence A: Option A is true\nOption B is true\nOption C is true\nOption D is false\nQuestion: Which of the following is not true about the afferent arteriole?\nOption A: It is permeable to water but not to solutes\nOption B: It is a high-resistance artery\nOption C: It constricts in response to increased hydrostatic pressure\nOption D: It is permeable to solutes but not to water\nAnswer: \nOption A is true\nOption B is true\nOption C is true\nOption D is false
Sentence B: Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries
```
Response:
```
To rate the similarity between Sentence A and Sentence B, we should consider their content, structure, and topic.

Sentence A provides a set of true or false statements about the properties of the afferent arteriole and then poses a question about which statement is not true. It concludes by asserting the truth of options A, B, and C and the falsity of option D.

Sentence B discusses the concept of oncotic pressure in capillaries and the glomerulus, mentioning the higher oncotic pressure due to plasma proteins and the isotonic nature of the filtrate in Bowman's capsule with respect to glucose concentration.

Here's the breakdown of the comparison:

- **Content**: Sentence A is about the properties of the afferent arteriole, while Sentence B talks about oncotic pressure and the properties of filtrate and plasma in the context of capillaries and Bowman's capsule. The subjects are related (both discuss aspects of renal physiology), but the specific focus is different.
- **Structure**: Sentence A is structured as a series of declarative statements followed by a question and answer, while Sentence B is a continuous statement providing information about oncotic pressure.
- **Topic**: Both sentences are related to physiology, specifically renal physiology. However, Sentence A is more focused on the afferent arteriole, while Sentence B discusses the glomerular capillaries and Bowman's capsule in the context of oncotic pressure.

Given these considerations, the similarity score would not be very high, as the specific content differs significantly, even though they share a general topic. I would rate the similarity as follows:

**Similarity Score**: 0.3

**Reasoning**: The sentences share a general topic of renal physiology but differ in focus, structure, and the specific physiological concepts they address. There is a slight overlap in the context of discussing kidney function, but the details provided do not align closely. Therefore, the score reflects a low-to-moderate level of similarity based on the broad topic connection but significant differences in the specifics.
```

**Knowledge Graph based comparison**

This needs a few key phrases in the text to be highlighted by user - this can be offloaded to a biomedical NER system later. These phrases must also be present in the knowledge graph used.

Currently, key phrases are being extracted using GPT-4 from chat.lmsys.org or are manually created. The manual ones have a higher chance of being found in SemmedDB.

After identifying the key terms in the Q and all options, shortest paths are found for these concepts in the KG. For the terms found, the overlap with each option is found. The option with most overlap in terms is the chosen answer from the KG.

In [26]:
optionA_phrases = [w.upper() for w in medmcqa_data["opa"][1].split(" ")]
optionB_phrases = [w.upper() for w in medmcqa_data["opb"][1].split(" ")]
optionC_phrases = [w.upper() for w in medmcqa_data["opc"][1].split(" ")]
optionD_phrases = [w.upper() for w in medmcqa_data["opd"][1].split(" ")]

In [27]:
# phrases from generated cot that could be connected in the KG
candidate_key_phrases = ["glucose", "glucose concentration", "capillary", "capillaries", "glomerular", "filterate", "oncotic pressure", "aeriole", "hematocrit"] # manually generated
# candidate_key_phrases = list(set(optionA_phrases) | set(optionB_phrases) | set(optionC_phrases) | set(optionD_phrases))
print(candidate_key_phrases)
# candidate_key_phrases = ["glomerular capillaries", # from GPT-4
# "oncotic pressure",
# "glucose concentration",
# "glomerular filtrate",
# "constriction of afferent arteriole",
# "blood flow to the glomerulus",
# "hematocrit"]
key_phrases = []
for candidate in candidate_key_phrases:
    if candidate.upper() in concepts:
        key_phrases.append(candidate.upper())
print(key_phrases)


['glucose', 'glucose concentration', 'capillary', 'capillaries', 'glomerular', 'filterate', 'oncotic pressure', 'aeriole', 'hematocrit']
['GLUCOSE', 'ONCOTIC PRESSURE']


In [28]:
# only if more than 2 concepts are present in the KG
if len(key_phrases) < 2:
    print("Cannot perform this analysis without atleast 2 concepts present in the KG")
else:
    # find shortest paths that exist between the concepts
    combinations_of_concepts = itertools.combinations(key_phrases, 2)
    paths = []
    for combination in combinations_of_concepts:
        shortest_paths = list(nx.all_shortest_paths(G, source=combination[0], target=combination[1]))
        paths.extend(shortest_paths)
    if len(paths) == 0:
        print("No paths exist between the concepts")
    else:
        print("Paths exist between the concepts")
        print(paths)

Paths exist between the concepts
[['GLUCOSE', 'CONTRACTION', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'LIPOGENESIS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'EXCRETORY FUNCTION', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'NEOPLASM', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'KIDNEY FAILURE, ACUTE', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'LYMPH FLOW', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'ALBUMINS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'PHARMACEUTICAL PREPARATIONS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'WATER', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'CHLORIDES', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'FATTY ACIDS, NONESTERIFIED', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'SODIUM CHLORIDE', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'POTASSIUM', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'STARCH', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'PROTEINS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'ENDOTOXINS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'ELECTROLYTES', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'ANGIOTENSINS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'IONS', 'ONCOTIC PRESSURE'], ['GLUCOSE', 'THYROXINE', 'ONCOTIC

In [29]:
counts_dict_per_option = {
    "A": 0, "B": 0, "C": 0, "D": 0
}
optionA = " ".join([w.upper() for w in medmcqa_data["opa"][1].split(" ")])
optionB = " ".join([w.upper() for w in medmcqa_data["opb"][1].split(" ")])
optionC = " ".join([w.upper() for w in medmcqa_data["opc"][1].split(" ")])
optionD = " ".join([w.upper() for w in medmcqa_data["opd"][1].split(" ")])
for path in paths:
    for term in path:
        if term in optionA:
            counts_dict_per_option["A"] += 1
        if term in optionB:
            counts_dict_per_option["B"] += 1
        if term in optionC:
            counts_dict_per_option["C"] += 1
        if term in optionD:
            counts_dict_per_option["D"] += 1
print(counts_dict_per_option)

{'A': 82, 'B': 82, 'C': 0, 'D': 0}


In [30]:
max(counts_dict_per_option, key=counts_dict_per_option.get) # answer from KG - B also has same number of terms as A

'A'

**Graph Edit distance**

In [31]:
generated_cot

'Option A is true\nOption B is true\nOption C is true\nOption D is false\nQuestion: Which of the following is not true about the afferent arteriole?\nOption A: It is permeable to water but not to solutes\nOption B: It is a high-resistance artery\nOption C: It constricts in response to increased hydrostatic pressure\nOption D: It is permeable to solutes but not to water\nAnswer: \nOption A is true\nOption B is true\nOption C is true\nOption D is false'

In [32]:
# Evaluating CoT
# candidate_cot_terms = [w.upper() for w in generated_cot.split(" ")]
candidate_cot_terms = ["afferent arteriole", "permeable", "water", "solutes", "high-resistance", "artery", "constricts", "hydrostatic pressure"]
cot_terms = []
for candidate in candidate_cot_terms:
    if candidate.upper() in concepts:
        cot_terms.append(candidate.upper())
print(cot_terms)

['WATER', 'HYDROSTATIC PRESSURE']


In [33]:
# create graph chaining these terms in the order in which they occur.
cot_graph = nx.DiGraph()
cot_graph.add_nodes_from(cot_terms)
for c in cot_terms:
    if cot_terms.index(c) < len(cot_terms) - 1:
        cot_graph.add_edge(c, cot_terms[cot_terms.index(c) + 1])
print(list(cot_graph.nodes()))
print(list(cot_graph.edges()))

['WATER', 'HYDROSTATIC PRESSURE']
[('WATER', 'HYDROSTATIC PRESSURE')]


In [34]:
# graph edit distance
gedistances = []
for path in paths:
    path_graph = nx.DiGraph()
    path_graph.add_nodes_from(path)
    for c in path:
        if path.index(c) < len(path) - 1:
            path_graph.add_edge(c, path[path.index(c) + 1])
    gedistances.append(nx.graph_edit_distance(path_graph, cot_graph))
print(gedistances)

[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]


In [35]:
print(max(gedistances))

2.0


**Comparison of CoT embedding and paths from KG**

Combine the triples from KG paths and compare the cosine similarity of resulting ClinicalBERT embedding against generated CoT embedding.

In [36]:
from transformers import AutoTokenizer, AutoModel
clinicalBert_tokenizer = AutoTokenizer.from_pretrained("medicalai/ClinicalBERT")
clinicalBert_model = AutoModel.from_pretrained("medicalai/ClinicalBERT")

In [39]:
clinicalbert_path_embeddings = []
for path in paths:
    # print(path)
    inputs = clinicalBert_tokenizer(" ".join(path), return_tensors="pt")
    outputs = clinicalBert_model(**inputs)
    clinicalbert_path_embeddings.append(outputs.last_hidden_state.mean(dim=1))
print(torch.cat(clinicalbert_path_embeddings, dim=0).shape)
print(len(paths))


torch.Size([82, 768])
82


In [40]:
generated_cot_clinicalbert_embedding = clinicalBert_model(**clinicalBert_tokenizer(generated_cot, return_tensors="pt")).last_hidden_state.mean(dim=1)
generated_cot_clinicalbert_embedding.shape

torch.Size([1, 768])

In [41]:
similarity_cot_graph_paths = []
for i in clinicalbert_path_embeddings:
    cosine_sim = (torch.nn.functional.cosine_similarity(generated_cot_clinicalbert_embedding, i))
    similarity_cot_graph_paths.append(cosine_sim.item())
print(max(similarity_cot_graph_paths))

0.6001008152961731


**BLEU/Rouge**

In [42]:
bleu = bleu_score.compute(predictions=[generated_cot], references=[cot_gt])
rouge = rouge_score.compute(predictions=[generated_cot], references=[cot_gt])
print(bleu)
print(rouge)

{'bleu': 0.0, 'precisions': [0.12087912087912088, 0.011111111111111112, 0.0, 0.0], 'brevity_penalty': 1.0, 'length_ratio': 1.0340909090909092, 'translation_length': 91, 'reference_length': 88}
{'rouge1': 0.15853658536585366, 'rouge2': 0.012345679012345677, 'rougeL': 0.14634146341463417, 'rougeLsum': 0.1097560975609756}


#### Zero Shot Prompt with CoT prompt
Prompt:
```
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: Let's think step by step
```

In [43]:
input = f"""
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: Let's think step by step 
"""
generated_answer = medalpaca_model(input, max_new_tokens = 128)


**Sentence BERT comparison**

In [44]:
generated_cot = generated_answer[0]["generated_text"][len(input):]
print(generated_cot)
cot_gt = """Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries"""
sentences_to_compare = [
    generated_cot,
    cot_gt
]
sentence_embeddings = model.encode(sentences_to_compare)


Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it 
This is true 
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate 
This is true 
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas 
This is true 
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it 
This is false


In [45]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(sentence_embeddings[[0]], sentence_embeddings[[1]])

array([[0.763897]], dtype=float32)

**GPT-4 comparison**
Prompt:
```
How similar are the following sentences? Rate the similarity using a score between 0 and 1 with a reasoning for the same.
Sentence A: Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.This is true 
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate This is true 
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas This is true 
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it This is false

Sentence B: Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries
```
Response
```
Sentence A and Sentence B both discuss physiological concepts related to the circulation and filtration in the kidneys, with a specific focus on capillary dynamics and glomerular filtration.

- Content: Both sentences mention the oncotic pressure of the fluid in the capillaries and the concentration of glucose in the capillaries versus the glomerular filtrate. Sentence A also discusses the effects of afferent arteriole constriction and hematocrit, which Sentence B does not directly address.
- Structure: Sentence A is formatted as a list of options with true or false assertions, while Sentence B appears to be an answer to a question, including references to a textbook (Guyton) for additional context.
- Topic: The topic in both sentences is closely related as they discuss the principles of renal physiology, particularly capillary dynamics and glomerular filtration.

Given these points, the similarity score would be relatively high, as the specific content regarding oncotic pressure and glucose concentration is directly aligned in both sentences.

**Similarity Score**: 0.7

**Reasoning**: The sentences have a strong topical alignment, with overlapping content regarding oncotic pressure and glucose concentration in capillaries and glomerular filtrate. However, there is some divergence as Sentence A includes additional information not found in Sentence B, such as the effect of afferent arteriole constriction and hematocrit differences. The score reflects a high level of similarity due to the shared focus on two out of the four points discussed in Sentence A, which are directly mentioned in Sentence B.
```

**Graph Edit distance**

In [46]:
generated_cot

'Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it \nThis is true \nOption B: Glucose concentration in the capillaries is the same as that in glomerular filtrate \nThis is true \nOption C: Constriction of afferent aeriole decreases the blood flow to the glomerulas \nThis is true \nOption D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it \nThis is false'

In [48]:
# Evaluating CoT
# candidate_cot_terms = [w.upper() for w in generated_cot.split(" ")]
candidate_cot_terms = ["oncotic pressure", "fluid", "capillaries", "glucose", "concentration", "same", "glomerular", "filtrate", "constriction", "afferent aeriole", "decreases", "blood flow", "glomerulas", "hematocrit"]
cot_terms = []
for candidate in candidate_cot_terms:
    if candidate.upper() in concepts:
        cot_terms.append(candidate.upper())
print(cot_terms)

['ONCOTIC PRESSURE', 'GLUCOSE', 'SAME', 'BLOOD FLOW']


In [49]:
# create graph chaining these terms in the order in which they occur.
cot_graph = nx.DiGraph()
cot_graph.add_nodes_from(cot_terms)
for c in cot_terms:
    if cot_terms.index(c) < len(cot_terms) - 1:
        cot_graph.add_edge(c, cot_terms[cot_terms.index(c) + 1])
print(list(cot_graph.nodes()))
print(list(cot_graph.edges()))

['ONCOTIC PRESSURE', 'GLUCOSE', 'SAME', 'BLOOD FLOW']
[('ONCOTIC PRESSURE', 'GLUCOSE'), ('GLUCOSE', 'SAME'), ('SAME', 'BLOOD FLOW')]


In [50]:
# graph edit distance
gedistances = []
for path in paths:
    path_graph = nx.DiGraph()
    path_graph.add_nodes_from(path)
    for c in path:
        if path.index(c) < len(path) - 1:
            path_graph.add_edge(c, path[path.index(c) + 1])
    gedistances.append(nx.graph_edit_distance(path_graph, cot_graph))
print(gedistances)

[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0]


In [51]:
print(max(gedistances))

2.0


**Comparison of CoT embedding and paths from KG**

Combine the triples from KG paths and compare the cosine similarity of resulting ClinicalBERT embedding against generated CoT embedding.

In [54]:
generated_cot_clinicalbert_embedding = clinicalBert_model(**clinicalBert_tokenizer(generated_cot, return_tensors="pt")).last_hidden_state.mean(dim=1)
generated_cot_clinicalbert_embedding.shape

torch.Size([1, 768])

In [55]:
similarity_cot_graph_paths = []
for i in clinicalbert_path_embeddings:
    cosine_sim = (torch.nn.functional.cosine_similarity(generated_cot_clinicalbert_embedding, i))
    similarity_cot_graph_paths.append(cosine_sim.item())
print(max(similarity_cot_graph_paths))

0.6581728458404541


**BLEU/Rouge**

In [56]:
bleu = bleu_score.compute(predictions=[generated_cot], references=[cot_gt])
rouge = rouge_score.compute(predictions=[generated_cot], references=[cot_gt])
print(bleu)
print(rouge)

{'bleu': 0.2638901782056748, 'precisions': [0.49382716049382713, 0.3, 0.24050632911392406, 0.19230769230769232], 'brevity_penalty': 0.9172091490348874, 'length_ratio': 0.9204545454545454, 'translation_length': 81, 'reference_length': 88}
{'rouge1': 0.5512820512820513, 'rouge2': 0.32467532467532473, 'rougeL': 0.38461538461538464, 'rougeLsum': 0.37179487179487186}


#### Zero shot CoT with template
Prompt:
```
1. <step1>
2. <step2>
...
So the answer is (<answer>).
Make sure that the answer uses the above format and answers the question step by step.
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: 

```

In [57]:
input = f"""
1. <step1>
2. <step2>
...
So the answer is (<answer>).
Make sure that the answer uses the above format and answers the question step by step.
Question: Which of the following is not true about glomerular capillaries?
Option A: The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
Option B: Glucose concentration in the capillaries is the same as that in glomerular filtrate
Option C: Constriction of afferent aeriole decreases the blood flow to the glomerulas
Option D: Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
Answer: 
"""
generated_answer = medalpaca_model(input, max_new_tokens = 128)


**Sentence BERT comparison**

In [58]:
generated_cot = generated_answer[0]["generated_text"][len(input):]
cot_gt = """Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries"""
sentences_to_compare = [
    generated_cot,
    cot_gt
]
sentence_embeddings = model.encode(sentences_to_compare)


In [59]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(sentence_embeddings[[0]], sentence_embeddings[[1]])

array([[0.6303545]], dtype=float32)

In [60]:
generated_cot

'\nThe oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.'

**GPT-4 comparison**
Prompt
```
How similar are the following sentences? Rate the similarity using a score between 0 and 1 with a reasoning for the same.
Sentence A: '\nThe oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.'
Sentence B: Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries
```
Response:
```
Sentence A and Sentence B both contain a statement regarding the oncotic pressure of the fluid in relation to the capillaries:

- Content: The specific statement about the oncotic pressure of fluid leaving the capillaries being less than that of fluid entering it is identical in both sentences. However, Sentence B provides additional information about glomerular oncotic pressure, the filtrate in Bowman's capsule, and the concentration of glucose.
- Structure: Sentence A is a clear, singular statement. Sentence B appears to be a longer, more detailed answer that includes the same statement about oncotic pressure along with additional context and related information.
- Topic: Both sentences address the topic of fluid dynamics in the context of renal physiology, specifically focusing on oncotic pressure at one point.

Given the identical statement about oncotic pressure in both sentences, the similarity score should be high for that component. However, the additional content in Sentence B means the sentences are not entirely the same.

**Similarity Score**: 0.9

**Reasoning**: The core statement about oncotic pressure is shared exactly between Sentence A and Sentence B, which justifies a high similarity score. The score is not 1 because Sentence B includes additional information not present in Sentence A, but this extra content is related and does not fundamentally change the shared statement's meaning. Therefore, the score reflects the high degree of overlap regarding the specific fact about oncotic pressure.
```

**Graph Edit distance**

In [61]:
generated_cot

'\nThe oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.'

In [62]:
# Evaluating CoT
# candidate_cot_terms = [w.upper() for w in generated_cot.split(" ")]
candidate_cot_terms = ["oncotic pressure", "fluid", "capillaries", "less", "fluid"]
cot_terms = []
for candidate in candidate_cot_terms:
    if candidate.upper() in concepts:
        cot_terms.append(candidate.upper())
print(cot_terms)

['ONCOTIC PRESSURE']


In [63]:
# create graph chaining these terms in the order in which they occur.
cot_graph = nx.DiGraph()
cot_graph.add_nodes_from(cot_terms)
for c in cot_terms:
    if cot_terms.index(c) < len(cot_terms) - 1:
        cot_graph.add_edge(c, cot_terms[cot_terms.index(c) + 1])
print(list(cot_graph.nodes()))
print(list(cot_graph.edges()))

['ONCOTIC PRESSURE']
[]


In [64]:
# graph edit distance
gedistances = []
for path in paths:
    path_graph = nx.DiGraph()
    path_graph.add_nodes_from(path)
    for c in path:
        if path.index(c) < len(path) - 1:
            path_graph.add_edge(c, path[path.index(c) + 1])
    gedistances.append(nx.graph_edit_distance(path_graph, cot_graph))
print(gedistances)

[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]


In [65]:
print(max(gedistances))

4.0


**Comparison of CoT embedding and paths from KG**

Combine the triples from KG paths and compare the cosine similarity of resulting ClinicalBERT embedding against generated CoT embedding.

In [66]:
generated_cot_clinicalbert_embedding = clinicalBert_model(**clinicalBert_tokenizer(generated_cot, return_tensors="pt")).last_hidden_state.mean(dim=1)
generated_cot_clinicalbert_embedding.shape

torch.Size([1, 768])

In [67]:
similarity_cot_graph_paths = []
for i in clinicalbert_path_embeddings:
    cosine_sim = (torch.nn.functional.cosine_similarity(generated_cot_clinicalbert_embedding, i))
    similarity_cot_graph_paths.append(cosine_sim.item())
print(max(similarity_cot_graph_paths))

0.6568158864974976


**BLEU/Rouge**

In [68]:
bleu = bleu_score.compute(predictions=[generated_cot], references=[cot_gt])
rouge = rouge_score.compute(predictions=[generated_cot], references=[cot_gt])
print(bleu)
print(rouge)

{'bleu': 0.019498302031739346, 'precisions': [1.0, 0.9411764705882353, 0.9375, 0.9333333333333333], 'brevity_penalty': 0.020468075714350494, 'length_ratio': 0.20454545454545456, 'translation_length': 18, 'reference_length': 88}
{'rouge1': 0.3541666666666667, 'rouge2': 0.3404255319148936, 'rougeL': 0.3541666666666667, 'rougeLsum': 0.3541666666666667}


### 1-shot CoT with answer in a template form
Prompt:
```
1. <step1>
2. <step2>
...
So the answer is (<answer>).
Make sure that the answer uses the above format. Answer the question step by step.

Question: Low insulin to glucagon ratio is seen in all of these except:
A) Glycogen synthesis
B) Glycogen breakdown
C) Gluconeogenesis
D) Ketogenesis
1. Low insulin to glucagon ratio implies a catabolic state, i.e. usage of body stores to form energy, typically seen in fasting state ancl diabetes mellitus.
2. Catabolic state promotes glycogen breakdown, gluconeogenesis and ketogenesis.
3. Glycogen synthesis is an anabolic process, which is promoted by insulin.
So the answer is A.

Question: Which of the following is not true about glomerular capillaries?
A) The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
B) Glucose concentration in the capillaries is the same as that in glomerular filtrate
C) Constriction of afferent aeriole decreases the blood flow to the glomerulas
D) Hematocrit of the fluid leaving the capillaries is less than that of the fluid entering it
```

In [69]:
input = f"""
1. <step1>
2. <step2>
...
So the answer is (<answer>).
Make sure that the answer uses the above format. Answer the question step by step.

Question: Low insulin to glucagon ratio is seen in all of these except:
A) Glycogen synthesis
B) Glycogen breakdown
C) Gluconeogenesis
D) Ketogenesis
1. Low insulin to glucagon ratio implies a catabolic state, i.e. usage of body stores to form energy, typically seen in fasting state ancl diabetes mellitus.
2. Catabolic state promotes glycogen breakdown, gluconeogenesis and ketogenesis.
3. Glycogen synthesis is an anabolic process, which is promoted by insulin.
So the answer is A.

Question: Which of the following is not true about glomerular capillaries?
A) The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it
B) Glucose concentration in the capillaries is the same as that in glomerular filtrate
C) Constriction of afferent aeriole decreases the blood flow to the glomerulas
"""
generated_answer = medalpaca_model(input, max_new_tokens = 128)

**Sentence BERT comparison**

In [70]:
generated_cot = generated_answer[0]["generated_text"][len(input):]
cot_gt = """Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries"""
sentences_to_compare = [
    generated_cot,
    cot_gt
]
sentence_embeddings = model.encode(sentences_to_compare)


In [71]:
generated_cot

'D) Efferent arteriole is permeable to water but not to solutes\n1. B\n2. D\n3. A\n4. C\n\nQuestion: Which of the following is not true about the glomerular capillaries?\nA) The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.\nB) Glucose concentration in the capillaries is the same as that in glomerular filtrate.\nC) Constriction of afferent arteriole decreases the blood flow to the'

In [72]:
from sklearn.metrics.pairwise import cosine_similarity
cosine_similarity(sentence_embeddings[[0]], sentence_embeddings[[1]])

array([[0.6550557]], dtype=float32)

**GPT-4 comparison**
Prompt
```
How similar are the following sentences? Rate the similarity using a score between 0 and 1 with a reasoning for the same.
Sentence A: 'D) Efferent arteriole is permeable to water but not to solutes\n1. B\n2. D\n3. A\n4. C\n\nQuestion: Which of the following is not true about the glomerular capillaries?\nA) The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.\nB) Glucose concentration in the capillaries is the same as that in glomerular filtrate.\nC) Constriction of afferent arteriole decreases the blood flow to the'
Sentence B: Ans-a. The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it Guyton I LpJ1 4-.;anong 23/e p653-6_)Glomerular oncotic pressure (due to plasma protein content) is higher than that of filtrate oncotic pressure in Bowman's capsule"Since glucose is freely filtered and the fluid in the Bowman's capsule is isotonic with plasma, the concentration of glucose in the filtrate is the same as in the capillaries
```
Response
```
```

**Graph Edit distance**

In [73]:
generated_cot

'D) Efferent arteriole is permeable to water but not to solutes\n1. B\n2. D\n3. A\n4. C\n\nQuestion: Which of the following is not true about the glomerular capillaries?\nA) The oncotic pressure of the fluid leaving the capillaries is less than that of fluid entering it.\nB) Glucose concentration in the capillaries is the same as that in glomerular filtrate.\nC) Constriction of afferent arteriole decreases the blood flow to the'

In [74]:
# Evaluating CoT
# candidate_cot_terms = [w.upper() for w in generated_cot.split(" ")]
candidate_cot_terms = ["efferent arteriole", "permeable", "water", "solutes", "glomerular", "capillaries", "oncotic pressure", "fluid", "glucose", "concentration", "same", "glomerular", "filtrate", "constriction", "afferent aeriole", "decreases", "blood flow"]
cot_terms = []
for candidate in candidate_cot_terms:
    if candidate.upper() in concepts:
        cot_terms.append(candidate.upper())
print(cot_terms)

['WATER', 'ONCOTIC PRESSURE', 'GLUCOSE', 'SAME', 'BLOOD FLOW']


In [75]:
# create graph chaining these terms in the order in which they occur.
cot_graph = nx.DiGraph()
cot_graph.add_nodes_from(cot_terms)
for c in cot_terms:
    if cot_terms.index(c) < len(cot_terms) - 1:
        cot_graph.add_edge(c, cot_terms[cot_terms.index(c) + 1])
print(list(cot_graph.nodes()))
print(list(cot_graph.edges()))

['WATER', 'ONCOTIC PRESSURE', 'GLUCOSE', 'SAME', 'BLOOD FLOW']
[('WATER', 'ONCOTIC PRESSURE'), ('ONCOTIC PRESSURE', 'GLUCOSE'), ('GLUCOSE', 'SAME'), ('SAME', 'BLOOD FLOW')]


In [76]:
# graph edit distance
gedistances = []
for path in paths:
    path_graph = nx.DiGraph()
    path_graph.add_nodes_from(path)
    for c in path:
        if path.index(c) < len(path) - 1:
            path_graph.add_edge(c, path[path.index(c) + 1])
    gedistances.append(nx.graph_edit_distance(path_graph, cot_graph))
print(gedistances)

[4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0]


In [77]:
print(max(gedistances))

4.0


**Comparison of CoT embedding and paths from KG**

Combine the triples from KG paths and compare the cosine similarity of resulting ClinicalBERT embedding against generated CoT embedding.

In [78]:
generated_cot_clinicalbert_embedding = clinicalBert_model(**clinicalBert_tokenizer(generated_cot, return_tensors="pt")).last_hidden_state.mean(dim=1)
generated_cot_clinicalbert_embedding.shape

torch.Size([1, 768])

In [79]:
similarity_cot_graph_paths = []
for i in clinicalbert_path_embeddings:
    cosine_sim = (torch.nn.functional.cosine_similarity(generated_cot_clinicalbert_embedding, i))
    similarity_cot_graph_paths.append(cosine_sim.item())
print(max(similarity_cot_graph_paths))

0.6779890060424805


**BLEU/Rouge**

In [80]:
bleu = bleu_score.compute(predictions=[generated_cot], references=[cot_gt])
rouge = rouge_score.compute(predictions=[generated_cot], references=[cot_gt])
print(bleu)
print(rouge)

{'bleu': 0.2554403654142008, 'precisions': [0.4883720930232558, 0.24705882352941178, 0.21428571428571427, 0.18072289156626506], 'brevity_penalty': 0.9770125183673897, 'length_ratio': 0.9772727272727273, 'translation_length': 86, 'reference_length': 88}
{'rouge1': 0.5359477124183006, 'rouge2': 0.2913907284768212, 'rougeL': 0.3660130718954248, 'rougeLsum': 0.41830065359477125}
