<a href="https://colab.research.google.com/github/DimpleDR/Computational-Data-Science/blob/Projects/M5_SNB_MiniProject_1_Medical_Q%26A_GPT2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A programme by IISc and TalentSprint
### Mini-Project: Medical Q&A using GPT2

**DISCLAIMER:** THIS NOTEBOOK IS PROVIDED ONLY AS A REFERENCE SOLUTION NOTEBOOK FOR THE MINI-PROJECT. THERE MAY BE OTHER POSSIBLE APPROACHES/METHODS TO ACHIEVE THE SAME RESULTS.

## Learning Objectives

At the end of the experiment, you will be able to:

* perform data preprocessing, EDA and feature extraction on the Medical Q&A dataset
* load a pre-trained tokenizer
* finetune a GPT-2 language model for medical question-answering

## Dataset Description

The dataset used in this project is the *Medical Question Answering Dataset* ([MedQuAD](https://github.com/abachaa/MedQuAD/tree/master)). It includes medical question-answer pairs along with additional information, such as the question type, the question *focus*, its UMLS(Unified Medical Language System) details like - Concept Unique Identifier(*CUI*) and Semantic *Type* and *Group*.

To know more about this data's collection, and construction method, refer to this [paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4).

The data is extracted and is in CSV format with below features:

- **Focus**: the question focus
- **CUI**: concept unique identifier
- **SemanticType**
- **SemanticGroup**
- **Question**
- **Answer**

## Part-A: Grading = 10 Points

## Information

Healthcare professionals often have to refer to medical literature and documents while seeking answers to medical queries. Medical databases or search engines are powerful resources of upto date medical knowledge. However, the existing documentation is large and makes it difficult for professionals to retrieve answers quickly in a clinical setting. The problem with search engines and informative retrieval engines is that these systems return a list of documents rather than answers. Instead, healthcare professionals can use question answering systems to retrieve short sentences or paragraphs in response to medical queries. Such systems have the biggest advantage of generating answers and providing hints in a few seconds.

### Problem Statement

Fine-tune gpt2 model on medical-question-answering-dataset for performing response generation for medical queries.

Please refer to ***M6 Assignment-1 Fine-tune GPT2*** to get familiar with how to load pre-trained gpt2 tokenizer and model.

### Import required packages

In [1]:
!pip -q install -U accelerate
!pip -q install -U transformers
!pip -q install torch

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m336.4/336.4 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.1/44.1 kB[0m [31m3.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m10.1/10.1 MB[0m [31m69.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m60.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import os
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling
from transformers import Trainer, TrainingArguments

import warnings
warnings.filterwarnings('ignore')

In [3]:
#@title Download the dataset
!wget -q https://cdn.iisc.talentsprint.com/AIandMLOps/MiniProjects/Datasets/MedQuAD.csv
!ls | grep ".csv"

MedQuAD.csv


**Exercise 1: Read the MedQuAD.csv dataset**

**Hint:** pd.read_csv()

In [4]:
df = pd.read_csv("MedQuAD.csv")
df.shape

(16412, 6)

In [5]:
df.head()

Unnamed: 0,Focus,CUI,SemanticType,SemanticGroup,Question,Answer
0,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,What is (are) Adult Acute Lymphoblastic Leukem...,Key Points - Adult acute lymphoblastic leukemi...
1,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,What are the symptoms of Adult Acute Lymphobla...,"Signs and symptoms of adult ALL include fever,..."
2,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,How to diagnose Adult Acute Lymphoblastic Leuk...,Tests that examine the blood and bone marrow a...
3,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,What is the outlook for Adult Acute Lymphoblas...,Certain factors affect prognosis (chance of re...
4,Adult Acute Lymphoblastic Leukemia,C0751606,T191,Disorders,Who is at risk for Adult Acute Lymphoblastic L...,Previous chemotherapy and exposure to radiatio...


### Pre-processing and EDA

**Exercise 2: Perform below operations on the dataset [0.5 Mark]**

- Handle missing values
- Remove duplicates from data considering `Question` and `Answer` columns

- **Handle missing values**

In [6]:
df.isna().sum()

Unnamed: 0,0
Focus,14
CUI,565
SemanticType,597
SemanticGroup,565
Question,0
Answer,5


In [7]:
# Drop missing values
df = df.dropna()
df.shape

(15810, 6)

- **Remove duplicates from data considering `Question` and `Answer` columns**

In [8]:
# Check duplicates
df.duplicated(subset=['Question', 'Answer']).sum()

48

In [9]:
# Drop duplicates
df = df.drop_duplicates(subset=['Question', 'Answer'])

In [10]:
# Check duplicates
df.duplicated(subset=['Question', 'Answer']).sum()

0

**Exercise 3: Display the category name, and the number of records belonging to top 100 categories of `Focus` column [1 Mark]**

In [11]:
# Total categories in Focus column
print(df['Focus'].nunique())

4770


In [12]:
# Displaying the distinct categories of Focus column and the number of records belonging to each category
# (Top 100 only)

print(df['Focus'].value_counts()[:100])

Focus
Breast Cancer                                                       53
Prostate Cancer                                                     43
Stroke                                                              35
Skin Cancer                                                         34
Alzheimer's Disease                                                 30
                                                                    ..
Sarcoidosis                                                         11
Polycythemia Vera                                                   11
Celiac Disease                                                      11
Down syndrome                                                       10
Microscopic Colitis: Collagenous Colitis and Lymphocytic Colitis    10
Name: count, Length: 100, dtype: int64


In [13]:
# Top 100 Focus categories names

top_100_focuses = df['Focus'].value_counts()[:100].index
top_100_focuses

Index(['Breast Cancer', 'Prostate Cancer', 'Stroke', 'Skin Cancer',
       'Alzheimer's Disease', 'Colorectal Cancer', 'Lung Cancer',
       'High Blood Cholesterol', 'Heart Failure', 'Heart Attack',
       'High Blood Pressure', 'Parkinson's Disease', 'Leukemia', 'Shingles',
       'Osteoporosis', 'Age-related Macular Degeneration', 'Diabetes',
       'Hemochromatosis', 'Diabetic Retinopathy', 'Gum (Periodontal) Disease',
       'Psoriasis', 'Kidney Disease', 'Balance Problems', 'Dry Mouth', 'COPD',
       'Cataract', 'Glaucoma', 'Prescription and Illicit Drug Abuse',
       'Medicare and Continuing Care', 'Gout', 'Wilson Disease',
       'Osteoarthritis', 'Narcolepsy', 'Problems with Taste',
       'Endometrial Cancer', 'Neuroblastoma', 'Short Bowel Syndrome',
       'Rheumatoid Arthritis', 'Dry Eye',
       'Peripheral Arterial Disease (P.A.D.)', 'Anxiety Disorders',
       'Surviving Cancer', 'Pituitary Tumors', 'Kidney Dysplasia',
       'Problems with Smell', 'Urinary Tract Infec

### Create Training and Validation set

**Exercise 4: Create training and validation set [2 Marks]**

- Consider 4 samples per `Focus` category, for each top 100 categories, from the dataset (It will give 400 samples for training)

- Consider 1 sample per `Focus` category (different from training set), for each top 100 categories, from the dataset (It will give 100 samples for validation)

In [14]:
train_ques = []
train_ans = []
val_ques = []
val_ans = []

for i in top_100_focuses:
    tmp = df[df['Focus'] == i]
    train_ques.extend(tmp['Question'].values[:4].tolist())
    train_ans.extend(tmp['Answer'].values[:4].tolist())
    val_ques.extend(tmp['Question'].values[4:5].tolist())
    val_ans.extend(tmp['Answer'].values[4:5].tolist())

In [15]:
len(train_ques), len(train_ans), len(val_ques), len(val_ans)

(400, 400, 100, 100)

### Pre-process `Question` and `Answer` text

**Exercise 5: Perform below tasks: [1.5 Marks]**

- Combine `Question` and `Answer` for train and validation data as shown below:
    - sequence = *'\<question\>' + question-text + '\<answer\>' + answer-text*

- Join the combined text using '\n' into a single string for training and validation separately

- Save the training and validation strings as separate text files

- **Combine Question and Answer for train and val data**

In [16]:
# Combine Questions and Answers for train and val data
## sequence = '<question>' + question + '<answer>' + answer

train_data = []
val_data = []

for i in range(len(train_ques)):
    train_data.append('<question> ' + train_ques[i] + ' <answer> ' + train_ans[i])

for i in range(len(val_ques)):
    val_data.append('<question> ' + val_ques[i] + ' <answer> ' + val_ans[i])


In [17]:
len(train_data), len(val_data)

(400, 100)

In [18]:
train_data[0]

'<question> What is (are) Breast Cancer ? <answer> Key Points - Breast cancer is a disease in which malignant (cancer) cells form in the tissues of the breast. - A family history of breast cancer and other factors increase the risk of breast cancer. - Breast cancer is sometimes caused by inherited gene mutations (changes). - The use of certain medicines and other factors decrease the risk of breast cancer. - Signs of breast cancer include a lump or change in the breast. - Tests that examine the breasts are used to detect (find) and diagnose breast cancer. - If cancer is found, tests are done to study the cancer cells. - Certain factors affect prognosis (chance of recovery) and treatment options. Breast cancer is a disease in which malignant (cancer) cells form in the tissues of the breast. The breast is made up of lobes and ducts. Each breast has 15 to 20 sections called lobes. Each lobe has many smaller sections called lobules. Lobules end in dozens of tiny bulbs that can make milk. T

In [19]:
val_data[0]

"<question> What are the symptoms of Breast Cancer ? <answer> Signs of breast cancer include a lump or change in the breast. These and other signs may be caused by breast cancer or by other conditions. Check with your doctor if you have any of the following: - A lump or thickening in or near the breast or in the underarm area. - A change in the size or shape of the breast. - A dimple or puckering in the skin of the breast. - A nipple turned inward into the breast. - Fluid, other than breast milk, from the nipple, especially if it's bloody. - Scaly, red, or swollen skin on the breast, nipple, or areola (the dark area of skin around the nipple). - Dimples in the breast that look like the skin of an orange, called peau dorange."

- **Join the combined text using '\n' into a single string for training and validation separately**

In [20]:
# Train and Validation text for all Q&As

train_text = '\n'.join(list(train_data))

val_text = '\n'.join(list(val_data))

- **Save the training and validation strings as text files**

In [21]:
# Save the training and validation data as text files

with open("train.txt", "w") as f:
    f.write(train_text)

with open("val.txt", "w") as f:
    f.write(val_text)

**Exercise 6: Load pre-trained GPT2Tokenizer [0.5 Mark]**

- Use checkpoint = "gpt2"

In [22]:
# Set up the tokenizer
checkpoint = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(checkpoint)    # also try gpt2, gpt2-large and gpt2-medium, also gpt2-xl

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

**Exercise 7: Tokenize train and validation data and form TextDataset objects [0.5 Mark]**

- Use the loaded pre-trained tokenizer
- Use training and validation data saved in text files

In [23]:
# Tokenize train text
train_dataset = TextDataset(tokenizer=tokenizer, file_path="train.txt", block_size=128)

# Tokenize validation text
val_dataset = TextDataset(tokenizer=tokenizer, file_path="val.txt", block_size=128)

In [24]:
# Length of train and validation set
len(train_dataset), len(val_dataset)

(1275, 324)

In [25]:
# Batch-size
train_dataset[0].shape, val_dataset[0].shape

(torch.Size([128]), torch.Size([128]))

**Exercise 8: Create a DataCollator object [0.5 Mark]**

In [26]:
# Create a Data collator object
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="pt")

**Exercise 9: Load pre-trained GPT2LMHeadModel [0.5 Mark]**

In [27]:
# Set up the model
model = GPT2LMHeadModel.from_pretrained(checkpoint)    # also try gpt2, gpt2-large and gpt2-medium, also gpt2-xl

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

**Exercise 10: Fine-tune GPT2 Model [1 Mark]**

- Specify training arguments and create a TrainingArguments object (Use 30 epochs)

- Train a GPT-2 model using the provided training arguments

- Save the resulting trained model and tokenizer to a specified output directory

In [28]:
# Set up the training arguments

model_output_path = "/content/gpt_model"

training_args = TrainingArguments(
    output_dir = model_output_path,
    overwrite_output_dir = True,
    per_device_train_batch_size = 4,
    per_device_eval_batch_size = 4,
    num_train_epochs = 30,
    save_steps = 1_000,
    save_total_limit = 2,
    logging_dir = './logs',
    )

In [29]:
# Train the model
trainer = Trainer(
    model = model,
    args = training_args,
    data_collator = data_collator,
    train_dataset = train_dataset,
    eval_dataset = val_dataset,
)

trainer.train()

# Save the model
trainer.save_model(model_output_path)

# Save the tokenizer
tokenizer.save_pretrained(model_output_path)

[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Step,Training Loss
500,2.5022
1000,2.0085
1500,1.6806
2000,1.4327
2500,1.225
3000,1.0425
3500,0.9105
4000,0.7732
4500,0.6901
5000,0.6006


('/content/gpt_model/tokenizer_config.json',
 '/content/gpt_model/special_tokens_map.json',
 '/content/gpt_model/vocab.json',
 '/content/gpt_model/merges.txt',
 '/content/gpt_model/added_tokens.json')

**Exercise 11: Test Model with user input prompts [1 Mark]**

- Create `generate_response()` function that takes a trained *model*, *tokenizer*, and a *prompt* string as input and generates a response using the GPT-2 model

- Test it with some user input prompts

In [30]:
def generate_response(model, tokenizer, prompt, max_length=200):

    input_ids = tokenizer.encode(prompt, return_tensors="pt")

    # Create the attention mask and pad token id
    attention_mask = torch.ones_like(input_ids)
    pad_token_id = tokenizer.eos_token_id

    output = model.generate(
        input_ids,
        max_length=max_length,
        num_return_sequences=1,
        attention_mask=attention_mask,
        pad_token_id=pad_token_id
    )

    return tokenizer.decode(output[0], skip_special_tokens=True)


In [31]:
# Load the fine-tuned model and tokenizer

my_model = GPT2LMHeadModel.from_pretrained(model_output_path)
my_tokenizer = GPT2Tokenizer.from_pretrained(model_output_path)

In [32]:
# Training sample question
train_ques[15]

'What is the outlook for Skin Cancer ?'

In [33]:
# Training sample answer
train_ans[15]

'Certain factors affect prognosis (chance of recovery) and treatment options. The prognosis (chance of recovery) depends mostly on the stage of the cancer and the type of treatment used to remove the cancer. Treatment options depend on the following: - The stage of the cancer (whether it has spread deeper into the skin or to other places in the body). - The type of cancer. - The size of the tumor and what part of the body it affects. - The patients general health.'

In [34]:
# Response from model

prompt = "What is the outlook for Skin Cancer ?"
response = generate_response(my_model, my_tokenizer, prompt)
response

'What is the outlook for Skin Cancer ? <answer> Certain factors affect prognosis (chance of recovery) and treatment options. The prognosis (chance of recovery) depends on the following: - The stage of the cancer (whether it has spread deeper into the skin or to other places in the body). - The type of cancer. - The size of the tumor and how many years and how many years it has been diagnosed. - The patients general health.\n<question> What is (are) Skin Cancer ? <answer> Key Points - Skin cancer is a disease in which malignant (cancer) cells form in the tissues of the skin. - The skin is the bodys largest organ. - The skin is made up of three main groups: - Skin cells: Squamous cells, keratinized cells and epidermis. - Melanocytes: Cells that make melanin. Skin also makes white blood cells, red blood cells, and platelets. Skin also makes hormones and'

In [35]:
# Testing with given prompt 1

prompt = "What are the treatments for Skin Cancer?"
response = generate_response(my_model, my_tokenizer, prompt)
response

"What are the treatments for Skin Cancer? ? <answer> Certain factors affect prognosis (chance of recovery) and treatment options. The prognosis (chance of recovery) depends on the following: - The stage of the cancer (whether it has spread deeper into the skin or to other places in the body). - The type of cancer. - The size of the tumor and how much of the body's tissue is affected. - The patients general health.\n<question> What are the treatments for Skin Cancer ? <answer> The prognosis (chance of recovery) depends on the following: - The stage of the cancer (whether it has spread deeper into the skin or to other places in the body). - The type of cancer. - The size of the tumor and how much of the body's tissue is affected. The patients general health. - The patients general health. The patients general health. The patients general health. The patients general health. The patients general health. The patients general health"

In [36]:
# Testing with given prompt 2

prompt = "What are the precautions for Skin Cancer?"
response = generate_response(my_model, my_tokenizer, prompt)
response

"What are the precautions for Skin Cancer? ? <answer> Take every day precautions to prevent infection. Avoid contact with other people's medicines, including those that contain carcinogens, radiation, or thyroid medications. Talk to your doctor about the medicines you are taking and what you can do to protect your eyes and skin. Follow your treatments for Skin Cancer exactly as your doctor prescribes. Don't smoke. Eat a healthy diet. If you smoke, quit. For important information about quitting, visit www.Smokefree.gov or call 1-800-QUIT NOW (1-800-784-8669). See Also - How to Quit Smoking - The Effects of Smoking - The Effects of Smoking - The Effects of Alcohol - Cigarette Smoking: Health Effects of Tobacco Smoking The Effects of Alcohol Cigarette smoking is the most concentrated form of pollution that most people are exposed to. As a result, the amount of carbon dioxide released into the air increases, the higher the cigarette smoking level. In"

**Exercise 12: Compare the performance of a *GPT2 model* with the *GPT2 model fine-tuned* on MedQuAD data [1 Mark]**

- Load another pre-trained GPT2LMHeadModel and do not fine-tune it

- To generate response using the untuned model, pass it as a parameter to `generate_response()` function

- Test both models (fine-tuned and untuned) with below user input prompts:

    - "What precautions to take for a healthy life?"
    - "What to do after being diagnosed with cancer?"
    - "What to do when feeling sick?"

In [37]:
# Load a pre-trained GPT2 model, do not finetune it with MedQuAD data

untuned_model = GPT2LMHeadModel.from_pretrained("gpt2")

In [38]:
# Testing with finetuned model: prompt 1

prompt = "What precautions to take for a healthy life?"
response = generate_response(my_model, my_tokenizer, prompt)
response

'What precautions to take for a healthy life? - Ask your doctor about the medicines you are taking. - Check with your doctor before beginning any exercise program. - Talk with your doctor about any medications that may affect blood pressure. - Answer honestly if a doctor or other health care professional asks you about other conditions or risks of exercising. - Answer honestly if a doctor or other health care professional asks you about other conditions or risks of smoking. - Answer honestly if a doctor or other health care professional asks you about other health problems such as stroke or heart disease. Learn more about the risks of smoking. Get tips on quitting smoking.\n<question> What is (are) High Blood Pressure ? <answer> High blood pressure is the force of blood pushing against the walls of the arteries. If your blood pressure rises and stays high over time, it can affect your heart rate, which can then affect your ability to pump blood. Many things can affect blood pressure, i

In [39]:
# Testing with untuned model: prompt 1

prompt = "What precautions to take for a healthy life?"
response = generate_response(untuned_model, my_tokenizer, prompt)
response

"What precautions to take for a healthy life?\n\nThe following are some of the most common questions you'll hear from your doctor or nurse about your health.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause cancer?\n\nThe risks of taking a drug that can cause cancer are very high.\n\nWhat are the risks of taking a drug that can cause"

In [40]:
# Testing with finetuned model: prompt 2

prompt = "What to do after being diagnosed with cancer?"
response = generate_response(my_model, my_tokenizer, prompt)
response

'What to do after being diagnosed with cancer?\n<question> What to do after Being Diagnosed with Cancer ? <answer> Recovering the Ability To Smell Some cancer patients recover their ability to smell when they recover from the cancer. Smell is regained through a process called recovery, in which the patient recites a list of questions and/or details about the illness that caused the loss of smell. The following questions and answers may be asked of patients who have recovered their ability to smell: - When did you first become aware of the illness that is responsible for loss of smell? - Did you have any symptoms or changes during the course of the illness? - Did you have any family members with the illness or was there a close relative who had the illness? - Was the illness caused by a previous diagnosis of the illness or by a new one? Was the loss of smell caused by a previous diagnosis of the illness or by a new one? - Was the smell disorder caused by a substance'

In [41]:
# Testing with untuned model: prompt 2

prompt = "What to do after being diagnosed with cancer?"
response = generate_response(untuned_model, my_tokenizer, prompt)
response

"What to do after being diagnosed with cancer?\n\nThe first step is to get your doctor's approval for a treatment.\n\nIf you have a cancer diagnosis, you may need to get a second opinion.\n\nIf you have a cancer diagnosis, you may need to get a second opinion. If you have a cancer diagnosis, you may need to get a third opinion.\n\nIf you have a cancer diagnosis, you may need to get a third opinion. If you have a cancer diagnosis, you may need to get a fourth opinion.\n\nIf you have a cancer diagnosis, you may need to get a fourth opinion. If you have a cancer diagnosis, you may need to get a fifth opinion.\n\nIf you have a cancer diagnosis, you may need to get a fifth opinion. If you have a cancer diagnosis, you may need to get a sixth opinion.\n\nIf you have a cancer diagnosis, you may need to get a sixth opinion. If you have"

In [42]:
# Testing with finetuned model: prompt 3

prompt = "What to do when feeling sick?"
response = generate_response(my_model, my_tokenizer, prompt)
response

'What to do when feeling sick? How to manage your dizziness and balance problems. How to prevent falls. How to prevent fall damage. How to manage joint hypermobility. How to manage joint hypermobility.\n<question> What causes Balance Problems ? <answer> Problems Caused by Balance Problems Are Often Temporary When a person has a balance problem, it is often temporary and minor. When a person has a balance problem, it is often minor and goes away when the illness or injury clears up. Many balance problems are caused by problems in the inner ear. Some people have balance disorders in the middle of the inner ear. Others may have a balance problem in the upper or outer ear. The inner ear is where our inner ear, nose, and throat are. It is where we experience the most intense sensations of balance and balance. The inner ear is also where the balance problem first begins. There are two types of balance disorders. One affects the middle of the inner ear, called'

In [43]:
# Testing with untuned model: prompt 3

prompt = "What to do when feeling sick?"
response = generate_response(untuned_model, my_tokenizer, prompt)
response

"What to do when feeling sick?\n\nThe first thing you should do is to get your body to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick, you should take a few minutes to relax.\n\nIf you're feeling sick"