# Clinical Quesition Answering Using BERT

What if we want to support any question a physician might want to ask instead of simpler rule-based questions such as "is a disease present"? To do this, we'll have to use more recent artificial intelligence techniques and large datasets as opposed to mere classification based on tabular data. In this project we go through the pre- and post-processing involved in applying [BERT](https://github.com/google-research/bert) to the problem of question answering. After developing this infrastructure, we use the model to answer questions from clinical notes.

Implementing question answering can take a few steps, even using pretrained models. 
- First retrieve our model and tokenizer (preparing the input), mapping each word to a unique element in the vocabulary and inserting special tokens. 
- Then, the model processes these tokenized inputs to create valuable embeddings and performs tasks such as question answering!

In [13]:
import tensorflow as tf
import numpy as np
from transformers import *

## 1 - Load the Tokenizer

In [4]:
tokenizer = AutoTokenizer.from_pretrained("./models")

loading configuration file ./models/config.json
Model config BertConfig {
  "_name_or_path": "./models",
  "architectures": [
    "BertForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "eos_token_ids": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_past": true,
  "pad_token_id": null,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

loading file vocab.txt
loading file tokenizer.json
loading file added_tokens.json
l

## 2 - Preparing the Input

Our first task will be ttohe prepare the raw passage and question for input into the model. 

Given the strings `p` and `q`, we want to turn them into an input of the following form: 

`[CLS]` `[q_token1]`, `[q_token2]`, ..., `[SEP]` `[p_token1]`, `[p_token2]`, ...

Here, the special characters `[CLS]` and `[SEP]` let the model know which part of the input is the question and which is the answer. 
- The question appears between `[CLS]` and `[SEP]`.
- The answer appears after `[SEP]`

We'll also pad the input to the max input length, since BERT takes in a fixed-length input.

We'll return three items:
- First is `input_ids`, which holds the numerical ids of each token. 
- Second, we'll output the `input_mask`, which has 1's in parts of the input tensor representing input tokens, and 0's where there is padding. 
- Finally, we'll output `tokens`, the output of the tokenizer (including the `[CLS]` and `[SEP]` tokens).

In [5]:
def prepare_bert_input(question, passage, tokenizer, max_seq_length=384):
    """
    Prepare question and passage for input to BERT. 

    Args:
        question (string): question string
        passage (string): passage string where answer should lie
        tokenizer (Tokenizer): used for transforming raw string input
        max_seq_length (int): length of BERT input
    
    Returns:
        input_ids (tf.Tensor): tensor of size (1, max_seq_length) which holds
                               ids of tokens in input
        input_mask (list): list of length max_seq_length of 1s and 0s with 1s
                           in indices corresponding to input tokens, 0s in
                           indices corresponding to padding
        tokens (list): list of length of actual string tokens corresponding to input_ids
    """
    # tokenize question
    question_tokens = tokenizer.tokenize(question)
    
    # tokenize passage
    passage_token = tokenizer.tokenize(passage)

    # get special tokens 
    CLS = tokenizer.cls_token
    SEP = tokenizer.sep_token
        
    # manipulate tokens to get input in correct form (not adding padding yet)
    # CLS {question_tokens} SEP {answer_tokens} 
    tokens = [CLS] + question_tokens + [SEP] + passage_token

    # Convert tokens into integer IDs
    input_ids = tokenizer.convert_tokens_to_ids(tokens)
    
    # Create an input mask which has integer 1 for each token in the 'tokens' list
    input_mask = [1] * len(tokens)

    # pad input_ids with 0s until it is the max_seq_length
    input_ids += [0] * (max_seq_length - len(input_ids))
    
    # Do the same to pad the input_mask so its length is max_seq_length
    input_mask += [0] * (max_seq_length - len(input_mask))

    return tf.expand_dims(tf.convert_to_tensor(input_ids), 0), input_mask, tokens  

## 3 - Getting Answer from Model Output

After taking in the tokenized input, the model outputs two vectors. 
- The first vector contains the scores (more formally, logits) for the starting index of the answer. 
    - A higher score means that index is more likely to be the start of the answer span in the passage. 
- The second vector contains the score for the end index of the answer. 

We want to output the span that maximizes the start score and end score. 
- To be valid, the start index has to occur before the end index. Formally, yweou want to find:

$$\arg\max_{i <= j, mask_i=1, mask_j = 1} start\_scores[i] + end\_scores[j]$$
- In words, this formulas is saying, calculate the sum and start scores of start position 'i' and end position 'j', given the constraint that the start 'i' is either before or at the end position 'j'; then find the positions 'i' and 'j' where this sum is the highest.
- Furthermore, we want to make sure that $i$ and $j$ are in the relevant parts of the input (i.e. where `input_mask` equals 1.)


In [21]:
def get_span_from_scores(start_scores, end_scores, input_mask, verbose=False):
    """
    Find start and end indices that maximize sum of start score
    and end score, subject to the constraint that start is before end
    and both are valid according to input_mask.

    Args:
        start_scores (list): contains scores for start positions, shape (1, n)
        end_scores (list): constains scores for end positions, shape (1, n)
        input_mask (list): 1 for valid positions and 0 otherwise
    """
    n = len(start_scores)
    max_start_i = -1
    max_end_j = -1
    max_start_val = -np.inf
    max_end_val = -np.inf
    max_sum = -np.inf
    
    # Find i and j that maximizes start_scores[i] + end_scores[j]
    # so that i <= j and input_mask[i] == input_mask[j] == 1

    # Ensure start_scores and end_scores are in a numeric format (e.g., list of floats)
    try:
        start_scores = [float(score) for score in start_scores]
    except ValueError:
        raise ValueError("start_scores contains non-numeric values")

    try:
        end_scores = [float(score) for score in end_scores]
    except ValueError:
        raise ValueError("end_scores contains non-numeric values")
    
    # set the range for i
    for i in range(n):
        
        # set the range for j
        for j in range(i, n):

            # both input masks should be 1
            if input_mask[i] == input_mask[j] == 1:
                
                # check if the sum of the start and end scores is greater than the previous max sum
                if (start_scores[i] + end_scores[j]) > max_sum:

                    # calculate the new max sum
                    max_sum = start_scores[i] + end_scores[j]
        
                    # save the index of the max start score
                    max_start_i = i
                
                    # save the index for the max end score
                    max_end_j = j
                    
                    # save the value of the max start score
                    max_start_val = start_scores[i]
                    
                    # save the value of the max end score
                    max_end_val = end_scores[j]
                                        
    if verbose:
        print(f"max start is at index i={max_start_i} and score {max_start_val}")
        print(f"max end is at index i={max_end_j} and score {max_end_val}")
        print(f"max start + max end sum of scores is {max_sum}")
    return max_start_i, max_end_j

### 3.1 - Construct the Answer

We need to add some post-processing to get the final string.

In [7]:
def construct_answer(tokens):
    """
    Combine tokens into a string, remove some hash symbols, and leading/trailing whitespace.
    Args:
        tokens: a list of tokens (strings)
    
    Returns:
        out_string: the processed string.
    """
    
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    
    # join the tokens together with whitespace
    out_string = ' '.join(tokens)
    
    # replace ' ##' with empty string
    out_string = out_string.replace(' ##', '')
    
    # remove leading and trailing whitespace
    out_string = out_string.strip()
    
    ### END CODE HERE ###
    
    # if there is an '@' symbol in the tokens, remove all whitespace
    if '@' in tokens:
        out_string = out_string.replace(' ', '')

    return out_string

## 4 - Putting It All Together

The `get_model_answer` function takes all the functions that we've implemented and performs question-answering!

In [8]:
model = TFAutoModelForQuestionAnswering.from_pretrained("./models")

loading configuration file ./models/config.json
Model config BertConfig {
  "_name_or_path": "./models",
  "architectures": [
    "BertForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "directionality": "bidi",
  "eos_token_ids": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 1024,
  "initializer_range": 0.02,
  "intermediate_size": 4096,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "output_past": true,
  "pad_token_id": null,
  "pooler_fc_size": 768,
  "pooler_num_attention_heads": 12,
  "pooler_num_fc_layers": 3,
  "pooler_size_per_head": 128,
  "pooler_type": "first_token_transform",
  "position_embedding_type": "absolute",
  "transformers_version": "4.39.0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

loading weights file ./models/tf_model.h5
All model checkpoint layers were used whe

In [9]:
def get_model_answer(model, question, passage, tokenizer, max_seq_length=384):
    """
    Identify answer in passage for a given question using BERT. 

    Args:
        model (Model): pretrained Bert model which we'll use to answer questions
        question (string): question string
        passage (string): passage string
        tokenizer (Tokenizer): used for preprocessing of input
        max_seq_length (int): length of input for model
        
    Returns:
        answer (string): answer to input question according to model
    """ 
    # prepare input: use the function prepare_bert_input
    input_ids, input_mask, tokens = prepare_bert_input(question, passage, tokenizer, max_seq_length)
    
    # get scores for start of answer and end of answer
    # use the model returned by TFAutoModelForQuestionAnswering.from_pretrained("./models")
    # pass in in the input ids that are returned by prepare_bert_input
    start_scores, end_scores = model(input_ids)
    
    # start_scores and end_scores will be tensors of shape [1,max_seq_length]
    # To pass these into get_span_from_scores function, 
    # take the value at index 0 to get a tensor of shape [max_seq_length]
    start_scores = start_scores[0]
    end_scores = end_scores[0]
    
    # using scores, get most likely answer
    # use the get_span_from_scores function
    span_start, span_end = get_span_from_scores(start_scores, end_scores, input_mask)
    
    # Using array indexing to get the tokens from the span start to span end (including the span_end)
    answer_tokens = tokens[span_start:span_end+1]
    
    # Combine the tokens into a single string and perform post-processing
    # use construct_answer
    answer = construct_answer(answer_tokens)
    
    return answer

## 5 - Experimentation!

Now let's try it on clinical notes. Below we have an excerpt of a doctor's notes for a patient with an abnormal echocardiogram (this sample is taken from [here](https://www.mtsamples.com/site/pages/sample.asp?Type=6-Cardiovascular%20/%20Pulmonary&Sample=1597-Abnormal%20Echocardiogram))

In [None]:
passage = "Abnormal echocardiogram findings and followup. Shortness of breath, congestive heart failure, \
           and valvular insufficiency. The patient complains of shortness of breath, which is worsening. \
           The patient underwent an echocardiogram, which shows severe mitral regurgitation and also large \
           pleural effusion. The patient is an 86-year-old female admitted for evaluation of abdominal pain \
           and bloody stools. The patient has colitis and also diverticulitis, undergoing treatment. \
           During the hospitalization, the patient complains of shortness of breath, which is worsening. \
           The patient underwent an echocardiogram, which shows severe mitral regurgitation and also large \
           pleural effusion. This consultation is for further evaluation in this regard. As per the patient, \
           she is an 86-year-old female, has limited activity level. She has been having shortness of breath \
           for many years. She also was told that she has a heart murmur, which was not followed through \
           on a regular basis."

q1 = "How old is the patient?"
q2 = "Does the patient have any complaints?"
q3 = "What is the reason for this consultation?"
q4 = "What does her echocardiogram show?"
q5 = "What other symptoms does the patient have?"


questions = [q1, q2, q3, q4, q5]

for i, q in enumerate(questions):
    print("Question {}: {}".format(i+1, q))
    print()
    print("Answer: {}".format(get_model_answer(model, q, passage, tokenizer)))
    print()
    print()

### Output:

Question 1: How old is the patient?

Answer: 86


Question 2: Does the patient have any complaints?

Answer: The patient complains of shortness of breath


Question 3: What is the reason for this consultation?

Answer: further evaluation


Question 4: What does her echocardiogram show?

Answer: severe mitral regurgitation and also large pleural effusion


Question 5: What other symptoms does the patient have?

Answer: colitis and also diverticulitis




Even without fine-tuning, the model is able to reasonably answer most of the questions! Of course, it isn't perfect (it doesn't give much detail). To improve performance, we would ideally collect a medical QA dataset and fine tune the model.