<a href="https://colab.research.google.com/github/TienNguyen93/clinical-generation/blob/main/clinical_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Clinical Note Generation**

In [None]:
%pip install datasets evaluate rouge_score

Collecting datasets
  Downloading datasets-3.5.0-py3-none-any.whl.metadata (19 kB)
Collecting evaluate
  Downloading evaluate-0.4.3-py3-none-any.whl.metadata (9.2 kB)
Collecting rouge_score
  Downloading rouge_score-0.1.2.tar.gz (17 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.12.0,>=2023.1.0 (from fsspec[http]<=2024.12.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.12.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.5.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.2/491.2 kB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading

## **Import libraries**

In [None]:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

## **Load dataset**

In [None]:
ds = load_dataset("316usman/research_clinical_visit_note_summarization_corpus_mts")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/525 [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/561k [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/43.5k [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/182k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1201 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/400 [00:00<?, ? examples/s]

In [None]:
ds

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 1201
    })
    validation: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 100
    })
    test: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 400
    })
})

### **Find out the longest sequence and shortest sequence in train, val, tes set**

In [None]:
# TODO

### **Prepare dataset**

 Convert the dialog-summary (prompt-response) pairs into explicit instructions

In [None]:
"""
Preprocessing function needs to:

* Prefix the input with a prompt so T5 knows this is a summarization task. Some models capable of multiple NLP tasks require prompting for specific tasks.
* Use the keyword text_target argument when tokenizing labels.
* Truncate sequences to be no longer than the maximum length set by the max_length parameter.
"""

# tokenize function
def t5_tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '

    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["prompt"]]

    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids
    example['labels'] = tokenizer(example["completion"], padding="max_length", truncation=True, return_tensors="pt").input_ids

    return example

## **Load models**

### **T5 model**

In [None]:
# load T5 model
t5_name ='google/flan-t5-base'
t5_model = AutoModelForSeq2SeqLM.from_pretrained(t5_name)

# T5 tokenizer
# parameter use_fast switches on fast tokenizer
t5_tokenizer = AutoTokenizer.from_pretrained(t5_name, use_fast=True)

In [None]:
# apply tokenization
t5_tokenized_ds = ds.map(t5_tokenize_function, batched=True)
t5_tokenized_ds = t5_tokenized_ds.remove_columns(['prompt', 'completion'])

Map:   0%|          | 0/1201 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

In [None]:
# t5_tokenized_ds = t5_tokenized_ds.filter(lambda example, index: index % 100 == 0, with_indices=True)

# check shape
print(f"Shapes of the datasets:")
print(f"Training: {t5_tokenized_ds['train'].shape}")
print(f"Validation: {t5_tokenized_ds['validation'].shape}")
print(f"Test: {t5_tokenized_ds['test'].shape}")

Shapes of the datasets:
Training: (1201, 2)
Validation: (100, 2)
Test: (400, 2)


In [None]:
t5_tokenized_ds

DatasetDict({
    train: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 1201
    })
    validation: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 100
    })
    test: Dataset({
        features: ['input_ids', 'labels'],
        num_rows: 400
    })
})

In [None]:
"""
 create a batch of examples using DataCollatorForSeq2Seq.
 It’s more efficient to dynamically pad the sentences to the longest length in a batch during collation,
 instead of padding the whole dataset to the maximum length.
"""

# from transformers import DataCollatorForSeq2Seq

# data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)

'\n create a batch of examples using DataCollatorForSeq2Seq. \n It’s more efficient to dynamically pad the sentences to the longest length in a batch during collation, \n instead of padding the whole dataset to the maximum length.\n'

#### Fine-tune T5

In [None]:
output_dir="./results"

training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    num_train_epochs=3,
    weight_decay=0.01,
    per_device_train_batch_size=8,
    auto_find_batch_size=True,
    logging_steps=10,
    # max_steps=1,
    eval_strategy='epoch',
    report_to="none",
)

trainer = Trainer(
    model=t5_model,
    tokenizer=t5_tokenizer,
    args=training_args,
    train_dataset=t5_tokenized_ds['train'],
    eval_dataset=t5_tokenized_ds['validation']
)

trainer.train()

  trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,0.7583,0.504427
2,0.5806,0.314973
3,0.3796,0.283986


TrainOutput(global_step=903, training_loss=2.6102292082238434, metrics={'train_runtime': 1199.7238, 'train_samples_per_second': 3.003, 'train_steps_per_second': 0.753, 'total_flos': 2467180740870144.0, 'train_loss': 2.6102292082238434, 'epoch': 3.0})

In [None]:
t5_instruct_model = AutoModelForSeq2SeqLM.from_pretrained("/content/results/checkpoint-903")

#### Evaluate the T5 Qualitatively

In [None]:
index = 100
dialogue = ds['test'][index]['prompt']
human_baseline_summary = ds['test'][index]['completion']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

input_ids = t5_tokenizer(prompt, return_tensors="pt").input_ids

t5_res = t5_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
t5_text_res = t5_tokenizer.decode(t5_res[0], skip_special_tokens=True)

t5_instruct_res = t5_instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200, num_beams=1))
t5_instruct_text_res = t5_tokenizer.decode(t5_instruct_res[0], skip_special_tokens=True)

dash_line = '-'.join('' for x in range(100))
print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{human_baseline_summary}')
print(dash_line)
print(f'ORIGINAL MODEL:\n{t5_text_res}')
print(dash_line)
print(f'INSTRUCT MODEL:\n{t5_instruct_text_res}')

`generation_config` default values have been modified to match model-specific defaults: {'pad_token_id': 0, 'eos_token_id': 1, 'decoder_start_token_id': 0}. If this is not desired, please set these values explicitly.


---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
The patient is a previously healthy 2-month-old female, who has had a cough and congestion for the past week.  The mother has also reported irregular breathing, which she describes as being rapid breathing associated with retractions.  The mother states that the cough is at times paroxysmal and associated with posttussive emesis.  The patient has had short respiratory pauses following the coughing events.  The patient's temperature has ranged between 102 and 104.  She has had a decreased oral intake and decreased wet diapers.  The brother is also sick with URI symptoms, and the patient has had no diarrhea.  The mother reports that she has begun to regurgitate after her feedings.  She did not do this previously.
---------------------------------------------------------------------------------------------------
ORIGINAL MODEL:
Guest_family's baby is sick.
----------

#### Evaluate the T5 Quantitatively

ROUGE Metric

In [None]:
rouge = evaluate.load('rouge')

Downloading builder script:   0%|          | 0.00/6.27k [00:00<?, ?B/s]

In [None]:
dialogues = ds['test'][0:3]['prompt']
human_baseline_summaries = ds['test'][0:3]['completion']

original_model_summaries = []
instruct_model_summaries = []

for _, dialogue in enumerate(dialogues):
    prompt = f"""
Summarize the following conversation.

{dialogue}

Summary: """
    input_ids = t5_tokenizer(prompt, return_tensors="pt").input_ids

    original_model_outputs = t5_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    original_model_text_output = t5_tokenizer.decode(original_model_outputs[0], skip_special_tokens=True)
    original_model_summaries.append(original_model_text_output)

    instruct_model_outputs = t5_instruct_model.generate(input_ids=input_ids, generation_config=GenerationConfig(max_new_tokens=200))
    instruct_model_text_output = t5_tokenizer.decode(instruct_model_outputs[0], skip_special_tokens=True)
    instruct_model_summaries.append(instruct_model_text_output)

zipped_summaries = list(zip(human_baseline_summaries, original_model_summaries, instruct_model_summaries))

df = pd.DataFrame(zipped_summaries, columns = ['human_baseline_summaries', 'original_model_summaries', 'instruct_model_summaries'])
df

Unnamed: 0,human_baseline_summaries,original_model_summaries,instruct_model_summaries
0,The patient is a 55-year-old African-American ...,"Patient: Good afternoon, sir. I'm sorry to hea...",Patient: Just turned 50.
1,Positive for stroke and sleep apnea.,Doctor: I have a stroke.,Doctor: Have a stroke.
2,"MSK: Negative myalgia, negative joint pain, ne...",Patient: I have no pain in my muscles.,Doctor:


In [None]:
original_model_results = rouge.compute(
    predictions=original_model_summaries,
    references=human_baseline_summaries[0:len(original_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

instruct_model_results = rouge.compute(
    predictions=instruct_model_summaries,
    references=human_baseline_summaries[0:len(instruct_model_summaries)],
    use_aggregator=True,
    use_stemmer=True,
)

print('ORIGINAL MODEL:')
print(original_model_results)
print('INSTRUCT MODEL:')
print(instruct_model_results)

ORIGINAL MODEL:
{'rouge1': np.float64(0.11225258986453017), 'rouge2': np.float64(0.0), 'rougeL': np.float64(0.11225258986453017), 'rougeLsum': np.float64(0.11225258986453017)}
INSTRUCT MODEL:
{'rouge1': np.float64(0.07759562841530056), 'rouge2': np.float64(0.0), 'rougeL': np.float64(0.07759562841530056), 'rougeLsum': np.float64(0.07759562841530056)}


BERTScore, and

BLEURT

### **BART**

In [None]:
# load BART model
bart_name = 'facebook/bart-large-cnn'
bart_model = AutoModelForSeq2SeqLM.from_pretrained(bart_name)

# BART tokenizer
bart_tokenizer = AutoTokenizer.from_pretrained(bart_name, use_fast=True)

config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [None]:
def bart_tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '

    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["prompt"]]

    model_inputs = bart_tokenizer(prompt, padding="max_length", truncation=True, max_length=512)
    labels = bart_tokenizer(example["completion"], padding="max_length", truncation=True, max_length=128)

    # example['input_ids'] = bart_tokenizer(prompt, padding="max_length", truncation=True,  max_length=512)
    # example['labels'] = bart_tokenizer(example["completion"], padding="max_length", truncation=True, max_length=128)

    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

    # return example

In [None]:
bart_tokenizer.pad_token = bart_tokenizer.eos_token

# apply tokenization
bart_tokenized_ds = ds.map(bart_tokenize_function, batched=True)
bart_tokenized_ds = bart_tokenized_ds.remove_columns(['prompt', 'completion'])

Map:   0%|          | 0/1201 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

In [None]:
training_args_bart = TrainingArguments(
    output_dir='./bart-clinical',
    learning_rate=1e-5,
    num_train_epochs=3,
    weight_decay=0.01,
    per_device_train_batch_size=8,
    auto_find_batch_size=True,
    logging_steps=10,
    # max_steps=1,
    eval_strategy='epoch',
    report_to="none",
)

trainer_bart = Trainer(
    model=bart_model,
    tokenizer=bart_tokenizer,
    args=training_args_bart,
    train_dataset=bart_tokenized_ds['train'],
    eval_dataset=bart_tokenized_ds['validation']
)

trainer_bart.train()

  trainer_bart = Trainer(


Epoch,Training Loss,Validation Loss


## **Evaluation**

In [None]:
# ROUGE, BERTScore, and BLEURT.

# **Examples**

## **View an instance of dialogue**

In [None]:
example_indices = [40, 200]

dash_line = '-'.join('' for x in range(100))

for i, index in enumerate(example_indices):
    print(dash_line)
    print('Example ', i + 1)
    print(dash_line)
    print('INPUT DIALOGUE:')
    print(ds['test'][index]['prompt'])
    print(dash_line)
    print('BASELINE HUMAN SUMMARY:')
    print(ds['test'][index]['completion'])
    print(dash_line)
    print()

---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT DIALOGUE:
Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.
--------------------------------------------------------------------------------------------

In [None]:
# test tokenizer
sentence = "What time is it, Tom?"

sentence_encoded = t5_tokenizer(sentence, return_tensors='pt')

sentence_decoded = t5_tokenizer.decode(
        sentence_encoded["input_ids"][0],
        skip_special_tokens=True
    )

print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)

ENCODED SENTENCE:
tensor([ 363,   97,   19,   34,    6, 3059,   58,    1])

DECODED SENTENCE:
What time is it, Tom?


## **Summarize Dialogue without Prompt Engineering**

In [None]:
for model_name, (tokenizer, model) in models.items():
  print("Model:", model_name)

  for i, index in enumerate(example_indices):
      dialogue = ds['test'][index]['prompt']
      summary = ds['test'][index]['completion']

      # tokenization
      inputs = tokenizer(dialogue, return_tensors='pt')
      output = tokenizer.decode(
          model.generate(
              inputs["input_ids"],
              max_new_tokens=50,
          )[0],
          skip_special_tokens=True
      )

      print(dash_line)
      print('Example ', i + 1)
      print(dash_line)
      print(f'INPUT PROMPT:\n{dialogue}')
      print(dash_line)
      print(f'BASELINE HUMAN SUMMARY:\n{summary}')
      print(dash_line)
      print(f'MODEL GENERATION - WITHOUT PROMPT ENGINEERING:\n{output}\n')

Model: T5
---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.
------------------------------------------------------------------------------------



---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:
Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.
----------------------------------------------------------------------------------------------

## **Summarize Dialogue with an Instruction Prompt**

### Zero Shot Inference with an Instruction Prompt

In [None]:
for model_name, (tokenizer, model) in models.items():
  print("Model:", model_name)

  for i, index in enumerate(example_indices):
      dialogue = ds['test'][index]['prompt']
      summary = ds['test'][index]['completion']

      prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
    """

      # tokenization
      inputs = tokenizer(prompt, return_tensors='pt')
      output = tokenizer.decode(
          model.generate(
              inputs["input_ids"],
              max_new_tokens=50,
          )[0],
          skip_special_tokens=True
      )

      print(dash_line)
      print('Example ', i + 1)
      print(dash_line)
      print(f'INPUT PROMPT:\n{prompt}')
      print(dash_line)
      print(f'BASELINE HUMAN SUMMARY:\n{summary}')
      print(dash_line)
      print(f'MODEL GENERATION - ZERO SHOT:\n{output}\n')

Model: T5
---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Summarize the following conversation.

Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.

Summary:
    
---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.
-----------------------------

### Zero Shot Inference with the Prompt Template

In [None]:
for model_name, (tokenizer, model) in models.items():
  print("Model:", model_name)

  for i, index in enumerate(example_indices):
      dialogue = ds['test'][index]['prompt']
      summary = ds['test'][index]['completion']

      prompt = f"""
Dialogue:

{dialogue}

What was going on?
"""

      # tokenization
      inputs = tokenizer(prompt, return_tensors='pt')
      output = tokenizer.decode(
          model.generate(
              inputs["input_ids"],
              max_new_tokens=50,
          )[0],
          skip_special_tokens=True
      )

      print(dash_line)
      print('Example ', i + 1)
      print(dash_line)
      print(f'INPUT PROMPT:\n{prompt}')
      print(dash_line)
      print(f'BASELINE HUMAN SUMMARY:\n{summary}')
      print(dash_line)
      print(f'MODEL GENERATION - ZERO SHOT (another template):\n{output}\n')

Model: T5
---------------------------------------------------------------------------------------------------
Example  1
---------------------------------------------------------------------------------------------------
INPUT PROMPT:

Dialogue:

Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.

What was going on?

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.
---------------------------------------------------

## **Summarize Dialogue with One Shot and Few Shot Inference**

### One Shot Inference

In [None]:
def make_prompt(example_indices_full, example_index_to_summarize):
    prompt = ''
    for index in example_indices_full:
        dialogue = ds['test'][index]['prompt']
        summary = ds['test'][index]['completion']

        # The stop sequence '{summary}\n\n\n' is important for FLAN-T5. Other models may have their own preferred stop sequence.
        prompt += f"""
Dialogue:

{dialogue}

What was going on?
{summary}


"""

    dialogue = ds['test'][example_index_to_summarize]['prompt']

    prompt += f"""
Dialogue:

{dialogue}

What was going on?
"""

    return prompt

In [None]:
example_indices_full = [40]
example_index_to_summarize = 200

one_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(one_shot_prompt)


Dialogue:

Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.

What was going on?
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.



Dialogue:

Doctor: Hello, how are you today?
Patient: Not good. 
Doctor: What happened? 
Patient: I have a lot of congestion. I also am coughing a lot. It feels like I am choking on something.

What was going on?



In [None]:
summary = ds['test'][example_index_to_summarize]['completion']

inputs = t5_tokenizer(one_shot_prompt, return_tensors='pt')
output = t5_tokenizer.decode(
    t5_model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - ONE SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Congestion and cough.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - ONE SHOT:
Patient has a lot of congestion and coughing a lot.


### Few Shot Inference

In [None]:
example_indices_full = [40, 80, 120]
example_index_to_summarize = 200

few_shot_prompt = make_prompt(example_indices_full, example_index_to_summarize)

print(few_shot_prompt)


Dialogue:

Doctor: So I talked to Doctor X about your problems.
Patient: And?
Doctor: And he said your recent intestine infction is resolved. Which is C Diff colitis. You saw him two weeks back right?
Patient: Oh Thank God! Yes thanks, recently i saw him.
Doctor: Yeah.
Patient: Ok alright. 
Doctor: And we have Hypertension and high Cholesterol as well which we talked about.
Patient: Thanks doc.

What was going on?
Hypertension, hyperlipidemia, recent C. diff colitis, which had resolved based on speaking to Dr. X.  Two weeks ago, he had seen the patient and she was clinically well.



Dialogue:

Guest_clinician: Is the patient restrained? 
Doctor: No, but she does have a palm protector in her right hand.

What was going on?
RESTRAINTS: None.  She does have a palm protector in her right hand.



Dialogue:

Doctor: So, what's going on with your hand, miss? Is it right or left? 
Patient: It's the right one. It's been on and off and it's been happening for the last several weeks. 
Doctor: 

In [None]:
summary = ds['test'][example_index_to_summarize]['completion']

inputs = t5_tokenizer(few_shot_prompt, return_tensors='pt')
output = t5_tokenizer.decode(
    t5_model.generate(
        inputs["input_ids"],
        max_new_tokens=50,
    )[0],
    skip_special_tokens=True
)

print(dash_line)
print(f'BASELINE HUMAN SUMMARY:\n{summary}\n')
print(dash_line)
print(f'MODEL GENERATION - FEW SHOT:\n{output}')

---------------------------------------------------------------------------------------------------
BASELINE HUMAN SUMMARY:
Congestion and cough.

---------------------------------------------------------------------------------------------------
MODEL GENERATION - FEW SHOT:
Patient has a lot of congestion and coughing a lot.
