<a href="https://colab.research.google.com/github/LivingstonTardzenyuy/Generative-AI/blob/main/Virtual_doctor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Develop a patient interaction model using Hugging Face, capable of asking patient-related questions and providing results.

## Define the scope and requirements

### Subtask:
Clearly define the types of interactions the model should handle, the kind of questions it should ask, and the desired output format (e.g., text, structured data). Consider ethical implications and data privacy from the outset.


**Reasoning**:
Define the types of interactions, questions, output format, and ethical considerations for the patient interaction model as requested in the instructions.



In [2]:
# 1. Primary Goal: The primary goal is to build an AI model that can conduct empathetic and informative conversations with patients, gather relevant health information, and potentially provide preliminary insights or guidance based on the input, ultimately aiming to improve patient engagement and streamline information collection for healthcare professionals.

# 2. Types of Interactions:
#    - Information gathering (symptoms, medical history, lifestyle)
#    - Providing general health information and education
#    - Answering frequently asked questions
#    - Scheduling or appointment reminders (optional, depending on integration)
#    - Initial symptom assessment and potential triage guidance (with clear disclaimers)
#    - Follow-up on treatment or condition progress

# 3. Scope of Questions:
#    - Demographic information (age, gender, etc.) - handled with strict privacy
#    - Current symptoms (onset, duration, severity, characteristics)
#    - Medical history (pre-existing conditions, surgeries, allergies, medications)
#    - Lifestyle factors (diet, exercise, smoking, alcohol)
#    - Questions related to specific conditions or treatments
#    - Questions about mental and emotional well-being

# 4. Desired Output Format:
#    - Primarily structured data (e.g., JSON or similar) containing extracted and categorized information from the patient's responses (symptoms, medical history, etc.). This facilitates integration with Electronic Health Records (EHRs) or other healthcare systems.
#    - Free-form text for conversational flow and general information provision.
#    - A summary of the interaction.

# 5. Ethical Considerations and Data Privacy:
#    - **Data Privacy:** Strict adherence to regulations like HIPAA (in the US) or GDPR (in Europe) is paramount. Patient data must be encrypted at rest and in transit. Anonymization and de-identification techniques should be employed where possible. Access controls should be strictly enforced.
#    - **Informed Consent:** Patients must be fully informed about how their data will be used and provide explicit consent.
#    - **Transparency:** The model should clearly identify itself as an AI and not a human healthcare professional.
#    - **Bias:** Efforts must be made to mitigate bias in the training data to ensure equitable treatment across different patient demographics.
#    - **Accuracy and Limitations:** The model should clearly state its limitations and not provide definitive medical diagnoses or replace professional medical advice. Recommendations for consulting a healthcare professional should be standard.
#    - **Security:** Robust security measures are required to protect against data breaches and unauthorized access.
#    - **Empathy and Sensitivity:** The model should be designed to be empathetic and sensitive in its interactions, particularly when discussing sensitive health topics.

## Data collection and preparation

### Subtask:
Gather a relevant dataset of patient-related questions and corresponding responses or data. This might involve creating synthetic data, using publicly available medical Q&A datasets (with appropriate licensing and privacy considerations), or partnering with healthcare professionals to curate data. Clean, preprocess, and format the data for training.


**Reasoning**:
I need to gather a dataset of patient-related questions and responses. Since I don't have access to real patient data due to privacy concerns, I will create a small synthetic dataset for demonstration purposes, focusing on common health questions and plausible answers. I will then format this data into a simple question-answer structure.



In [3]:
from transformers import pipeline

In [1]:
# Check if we are using GPU.

!nvidia-smi -L

GPU 0: Tesla T4 (UUID: GPU-f7195b64-4936-6599-c825-3d003a03902e)


In [4]:
question_answerer = pipeline("question-answering", model = 'deepset/bert-large-uncased-whole-word-masking-squad2')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/540 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

Some weights of the model checkpoint at deepset/bert-large-uncased-whole-word-masking-squad2 were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/19.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cuda:0


In [5]:
context = "I was very glad to see you today. We discussed your symptoms and decided on a treatment plan."
question = "What did we discuss today?"
result = question_answerer(question=question, context=context)
result

{'score': 0.7576507329940796,
 'start': 47,
 'end': 60,
 'answer': 'your symptoms'}

In [8]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, TrainingArguments, Trainer



In [9]:
# Load the dataset.
dataset = load_dataset("medalpaca/medical_meadow_wikidoc_patient_information")

In [10]:
dataset

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 5942
    })
})

In [11]:
dataset['train'][0]

{'input': 'What are the symptoms of Allergy?',
 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin peeling. Drug allergies usually involve the whole body and can lead to a variety of symptoms.',
 'i

## Preprocess the Data

In [12]:
from transformers import AutoTokenizer

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Tokenize the dataset.
def tokenize_function(examples):
  # Use 'instruction' for the question and 'output' for the context/answer
  return tokenizer(examples['instruction'], examples['output'], padding='max_length', truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Print the first tokenized example to verify
print(tokenized_datasets['train'][0])

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/5942 [00:00<?, ? examples/s]

{'input': 'What are the symptoms of Allergy?', 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin peeling. Drug allergies usually involve the whole body and can lead to a variety of symptoms.', 'ins

In [13]:
tokenized_datasets['train'][0]

{'input': 'What are the symptoms of Allergy?',
 'output': 'Allergy symptoms vary, but may include:\nBreathing problems (coughing, shortness of breath) Burning, tearing, or itchy eyes Conjunctivitis (red, swollen eyes) Coughing Diarrhea Headache Hives Itching of the nose, mouth, throat, skin, or any other area Runny nose Skin rashes Stomach cramps Vomiting Wheezing\nWhat part of the body is contacted by the allergen plays a role in the symptoms you develop. For example:\nAllergens that are breathed in often cause a stuffy nose, itchy nose and throat, mucus production, cough, or wheezing. Allergens that touch the eyes may cause itchy, watery, red, swollen eyes. Eating something you are allergic to can cause nausea, vomiting, abdominal pain, cramping, diarrhea, or a severe, life-threatening reaction. Allergens that touch the skin can cause a skin rash, hives, itching, blisters, or even skin peeling. Drug allergies usually involve the whole body and can lead to a variety of symptoms.',
 'i

## Set up training Arguments

Specify the hyperparamters and training settings

In [14]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir = './results',
    num_train_epochs = 3,
    per_device_train_batch_size = 16,
    per_device_eval_batch_size = 64,
    report_to = ['tensorboard'] # Changed to only report to tensorboard
)

training_args

TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=True,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=False,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,
eval_strategy=IntervalStrategy.NO,
eval_use_gather_object=False,
f

## Initialize the Model.

Load the pre-trained model and define training procedue.

In [15]:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer

model = AutoModelForQuestionAnswering.from_pretrained('bert-base-uncased', num_labels = 2)

# Initialize the Traner.
trainer = Trainer(
    model = model,
    args = training_args,
    train_dataset = tokenized_datasets['train'],
    # eval_dataset = tokenized_datasets['test'], # Removed as 'test' split does not exist
)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [16]:
# Train the model.

trainer.train()

ValueError: The model did not return a loss from the inputs, only the following keys: start_logits,end_logits. For reference, the inputs it received are input_ids,token_type_ids,attention_mask.

# Evalute the model.
Assess the performance of our model.

In [None]:
result = trainer.evaluate()
print(result)