# Testing de GPT Fine Tunned Model on the Original Dataset

The goal of this notebook is to test the GPT (davinci-02) fine tunned model on the Medical Flashcards dataset.

## Install requirements

In [1]:
!pip uninstall -y openai
!pip install openai==0.28
!pip install datasets
!pip install scikit-learn sentence-transformers
!pip install nltk

[0mCollecting openai==0.28
  Downloading openai-0.28.0-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.5/76.5 kB[0m [31m724.1 kB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.28.0
Collecting datasets
  Downloading datasets-2.19.1-py3-none-any.whl (542 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting multiprocess (from datasets)
  Downlo

In [2]:
import json
import openai
import pandas as pd
import numpy as np
from datasets import load_dataset
from sklearn.model_selection import train_test_split
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import nltk
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
from google.colab import drive
import os

## Connect to GDrive

In [4]:
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
path = '_NLP/Project'

os.chdir(f'/content/drive/MyDrive/{path}')
os.getcwd()

'/content/drive/MyDrive/_NLP/Project'

## Add API key

In [14]:
api_key ="sk-proj-###" # ADD YOUR API KEY HERE
openai.api_key = api_key

## Split dataset

In [15]:
dataset = load_dataset('arrow', data_files='data-00000-of-00001.arrow')

In [16]:
df = dataset['train'].to_pandas()

In [17]:
df.head()

Unnamed: 0,input,output,instruction
0,What is the relationship between very low Mg2+...,Very low Mg2+ levels correspond to low PTH lev...,Answer this question truthfully
1,What leads to genitourinary syndrome of menopa...,Low estradiol production leads to genitourinar...,Answer this question truthfully
2,What does low REM sleep latency and experienci...,Low REM sleep latency and experiencing halluci...,Answer this question truthfully
3,What are some possible causes of low PTH and h...,"PTH-independent hypercalcemia, which can be ca...",Answer this question truthfully
4,How does the level of anti-müllerian hormone r...,The level of anti-müllerian hormone is directl...,Answer this question truthfully


In [18]:
df = dataset['train'].to_pandas()
df = df.iloc[:, :-1]

In [19]:
train_data, test_data = train_test_split(df, test_size=0.2, random_state=42)

In [20]:
test_data.head()

Unnamed: 0,input,output
27911,What are some physical signs that may indicate...,What are some physical signs that may indicate...
7251,What is the name of the amino acid that serves...,Arginine is the amino acid that acts as the pr...
32050,Do high or low potency typical antipsychotics ...,High potency typical antipsychotics are more l...
7969,Which type of heart valves are commonly affect...,Viridans streptococci infection is typically s...
6904,"Among all bugs, which one is the most frequent...",Staphylococcus aureus is the bug that is the m...


## Generate answers from the model

In [21]:
def generate_answer(question):
    prompt = question + " ->"
    response = openai.Completion.create(
        model='ft:davinci-002:personal::9KLi6nKN',
        prompt=prompt,
        max_tokens=100,
        top_p=0.9,
        frequency_penalty=2,
        presence_penalty=1,
        stop=["\n"]
    )
    return response.choices[0].text

total_questions = len(test_data)
predictions = []
count = 0
for index, row in test_data.iterrows():
    if count >= 100:
        break
    print(f"Processed question {count + 1} out of {total_questions}")
    question = row["input"]
    predicted_answer = generate_answer(question)
    predictions.append(predicted_answer)
    count = count + 1

Processed question 1 out of 6791
Processed question 2 out of 6791
Processed question 3 out of 6791
Processed question 4 out of 6791
Processed question 5 out of 6791
Processed question 6 out of 6791
Processed question 7 out of 6791
Processed question 8 out of 6791
Processed question 9 out of 6791
Processed question 10 out of 6791
Processed question 11 out of 6791
Processed question 12 out of 6791
Processed question 13 out of 6791
Processed question 14 out of 6791
Processed question 15 out of 6791
Processed question 16 out of 6791
Processed question 17 out of 6791
Processed question 18 out of 6791
Processed question 19 out of 6791
Processed question 20 out of 6791
Processed question 21 out of 6791
Processed question 22 out of 6791
Processed question 23 out of 6791
Processed question 24 out of 6791
Processed question 25 out of 6791
Processed question 26 out of 6791
Processed question 27 out of 6791
Processed question 28 out of 6791
Processed question 29 out of 6791
Processed question 30 o

## Compute some metrics

In [22]:
# Get answers frmo the dataset
count = 0
references = []
for index, row in test_data.iterrows():
    if count >= 100:
        break
    count = count + 1
    references.append(row["output"])

In [23]:
# Load a pre-trained model
model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

# Compute embeddings
reference_embeddings = model.encode(references)
prediction_embeddings = model.encode(predictions)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/3.73k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [24]:
# Log the shape of the embeddings for debugging
print(f"Reference embeddings shape: {np.array(reference_embeddings).shape}")
print(f"Prediction embeddings shape: {np.array(prediction_embeddings).shape}")

Reference embeddings shape: (100, 384)
Prediction embeddings shape: (100, 384)


### Cosine Similarity

In [25]:
# Compute cosine similarity for each pair
cosine_similarities = []
for ref_emb, pred_emb in zip(reference_embeddings, prediction_embeddings):
    cos_sim = cosine_similarity([ref_emb], [pred_emb])[0][0]
    cosine_similarities.append(cos_sim)

In [26]:
# Calculate average cosine similarity
average_cosine_similarity = np.mean(cosine_similarities)
print(f'Average Cosine Similarity: {average_cosine_similarity:.2f}')

Average Cosine Similarity: 0.79


### BLEU Score

In [27]:
# Function to calculate BLEU score
def calculate_bleu(reference, prediction):
    reference_tokens = [nltk.word_tokenize(reference)]
    prediction_tokens = nltk.word_tokenize(prediction)
    # Using smoothing function to avoid zero scores for short sequences
    smoothing_function = SmoothingFunction().method1
    bleu_score = sentence_bleu(reference_tokens, prediction_tokens, smoothing_function=smoothing_function)
    return bleu_score

In [28]:
# Ensure NLTK resources are downloaded
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [29]:
# Calculate BLEU scores for all predictions
bleu_scores = [calculate_bleu(ref, pred) for ref, pred in zip(references, predictions)]

In [30]:
# Calculate average BLEU score
average_bleu_score = sum(bleu_scores) / len(bleu_scores)
print(f'Average BLEU Score: {average_bleu_score:.2f}')

Average BLEU Score: 0.18


# Manual checking with random question from test dataset

In [31]:
import random

# Assuming you have a pandas DataFrame named test_data containing your test dataset
random_index = random.randint(0, len(test_data) - 1)
random_row = test_data.iloc[random_index]
prompt = random_row["input"] + " ->"
actual_answer = random_row["output"]

bot_answer = generate_answer(prompt)

print('*************************************')
print('Question: ', prompt)
print('Actual Answer:', actual_answer)
print('Bot Answer: ', bot_answer)


*************************************
Question:  What is methimazole and what are some of the potential teratogenic complications associated with its use during the first trimester of pregnancy? ->
Actual Answer: Methimazole is a medication used to treat hyperthyroidism. However, if taken during the first trimester of pregnancy, it can be teratogenic and cause birth defects in the developing fetus. One potential complication is aplasia cutis, which is the absence of skin on the scalp or other parts of the body. Other potential teratogenic complications of methimazole include choanal atresia, esophageal atresia, and congenital heart defects. It is important for pregnant women to discuss any medications they are taking with their healthcare provider to determine if they are safe to use during pregnancy.
Bot Answer:   Methimazole is a medication used to treat hyperthyroidism, which can cause complications during pregnancy if it is taken in the first trimester. Specifically, methimazole ha

---