<a href="https://colab.research.google.com/github/MarcAtanante/ai-for-fun/blob/main/03-question-answering-model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Question-Answering Model using PyTorch
### Easily deploy a QA HuggingFace model using Docker and FastAPI
Credits to [André Ribeiro: Build a Q&A App with PyTorch](https://towardsdatascience.com/build-a-q-a-app-with-pytorch-cb599480e29) for the documentation.


There are two main types of QA models. The first one encodes a large corpus of domain specific knowledge into the model and generates an answer based on the learned knowledge. The second one makes use of a given context and extracts the best paragraph / answer from that context.

The second approach is more easily generalisable to different domains without retraining or fine-tuning the original model. As such, in this post we will focus on this approach.

To use a context based QA model we first need to define our ‘context’. Here, we will use the Stanford Question Answering Dataset 2.0². To download this dataset click [here](https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json).

## Loading the dataset from my Google Drive

In [None]:
from google.colab import drive
drive.mount("/content/gdrive")

Mounted at /content/gdrive


In [23]:
import json

with open("/content/gdrive/My Drive/Colab Notebooks/train-v2.0.json", 'r') as f:
  data = json.load(f)

print("Type:", type(data))
print("Length:", len(data))
print(data['data'][0]['paragraphs'][0]['qas'][0])

Type: <class 'dict'>
Length: 2
{'question': 'When did Beyonce start becoming popular?', 'id': '56be85543aeaaa14008c9063', 'answers': [{'text': 'in the late 1990s', 'answer_start': 269}], 'is_impossible': False}


From this data we will focus on the question and answers fields where the topic is 'Premier League'. This will provide us with exact answers to a specific number of questions. If you instead want to extract an answer from a context paragraph look at the context field.

To obtain the questionsand answers, define and run the following function get_qa . This should return a set of 357 pairs of questions and answers.

In [None]:
# get the available questions and answers for a given topic
def get_qa(topic, data):
    q = []
    a = []
    for d in data['data']:
        if d['title']==topic:
            for paragraph in d['paragraphs']:
                for qa in paragraph['qas']:
                    if not qa['is_impossible']:
                        q.append(qa['question'])
                        a.append(qa['answers'][0]['text'])
            return q,a

questions, answers = get_qa(topic = 'Premier_League', data = data)

print("Number of available questions: {}".format(len(questions)))

Number of available questions: 357


## Build the QA Embedding model
In simple terms our model will work by comparing a new question from our user to the set of questions in our context set, and then extracting the corresponding answer.

Since we cannot compare the questions in their raw format (text), we will need to transform both the context questions and the unknown questions from the user into a common space, prior to perform any similarity evaluation.

To do this, we will define a new text embedding class that will be used to convert the context, and the unknown questions from the user, from text to a numeric vector.

### 1. Download a pretrained embedding model

In [None]:
%%capture
!pip install datasets transformers
!pip install transformers[torch]
!pip install huggingface_hub

In [None]:
from huggingface_hub import notebook_login

notebook_login()

Token is valid.
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
!apt install git-lfs

Reading package lists... Done
Building dependency tree       
Reading state information... Done
git-lfs is already the newest version (2.9.2-1).
0 upgraded, 0 newly installed, 0 to remove and 23 not upgraded.


In [None]:
import transformers
print(transformers.__version__)


from transformers import AutoTokenizer
from transformers import AutoModel
import torch

4.27.3


In [None]:
def get_model(model_name):
    model = AutoModel.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer
  
model, tokenizer = get_model(model_name = 'sentence-transformers/paraphrase-MiniLM-L6-v2')

Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

### 2. Test the embedding model locally
Let’s now run our embedding model over a sample of the context questions. To do this, run the following instructions.

In [None]:
# Mean Pooling - Take attention mask into account for correct averaging
# source: https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    
    input_mask_expanded = (
      attention_mask
      .unsqueeze(-1)
      .expand(token_embeddings.size())
      .float()
    )
    
    pool_emb = (
      torch.sum(token_embeddings * input_mask_expanded, 1) 
      / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    )
    
    return pool_emb

def get_embeddings(questions, tokenizer, model):
  # Tokenize sentences
  encoded_input = tokenizer(questions, padding=True, truncation=True, return_tensors='pt')

  # Compute token embeddings
  with torch.no_grad():
      model_output = model(**encoded_input)

  # Average pooling
  embeddings = mean_pooling(model_output, encoded_input['attention_mask']) 
  
  return embeddings

embeddings = get_embeddings(questions[:3], tokenizer, model)
print("Embeddings shape: {}".format(embeddings.shape))

Embeddings shape: torch.Size([3, 384])


### 3. Test the similarity of the context to a new question
Let’s start by checking our previous sample questions:

In [None]:
questions[:3]

['How many club members are there?',
 'How many matches does each team play?',
 'What days are most games played?']

Let's paraphrase the last one to: 'Which days have the most events played at?'

Finally, let’s embed our new question and calculate the Euclidean distance between new_embedding and embeddings.

In [None]:
new_question = 'Which days have the most events played at?'
new_embedding = get_embeddings([new_question], tokenizer, model)

# squared Euclidean distance between sample questions and new_question
((embeddings - new_embedding)**2).sum(axis=1)

tensor([71.4030, 59.8726, 23.9430])

The tensor values show that the last question in our sample is the closest (smallest distance) to our new question (23.9430).

## Deploy the model using Docker and FastAPI
To make this usable in a production setting, we need to:

Wrap the previous functions in one or more easy to use classes;
Define an app and call the required class methods through HTTP;
Wrap the whole app and dependencies in a container for easy scalability.

### 1. Define the QA Search Model
Let’s wrap the previously introduced concepts into two new classes: QAEmbedder and QASearcher.

The QAEmbedder will define how to load the model (get_model) from disk and return a set of embeddings given a set of questions (get_embeddings). Note that for efficiency get_embeddings will embed a batch of questions at a time.

In [None]:
class QAEmbedder:
  def __init__(self, model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"):
    """
    Defines a QA embedding model. This is, given a set of questions,
    this class returns the corresponding embedding vectors.
    
    Args:
      model_name (`str`): Directory containing the necessary tokenizer
        and model files.
    """
    self.model = None
    self.tokenizer = None
    self.model_name = model_name
    self.set_model(model_name)
  
  
  def get_model(self, model_name):
    """
    Loads a general tokenizer and model using pytorch
    'AutoTokenizer' and 'AutoModel'
    
    Args:
      model_name (`str`): Directory containing the necessary tokenizer
        and model files.
    """
    model = AutoModel.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    return model, tokenizer
  
  
  def set_model(self, model_name):
    """
    Sets a general tokenizer and model using the 'self.get_model'
    method.
    
    Args:
      model_name (`str`): Directory containing the necessary tokenizer
        and model files.
    """
    self.model, self.tokenizer = self.get_model(self.model_name)
  
  
  def _mean_pooling(self, model_output, attention_mask):
    """
    Internal method that takes a model output and an attention
    mask and outputs a mean pooling layer.
    
    Args:
      model_output (`torch.Tensor`): output from the QA model
      attention_mask (`torch.Tensor`): attention mask defined in the QA tokenizer
      
    Returns:
      The averaged tensor.
    """
    token_embeddings = model_output[0]
    
    input_mask_expanded = (
      attention_mask
      .unsqueeze(-1)
      .expand(token_embeddings.size())
      .float()
    )
    
    pool_emb = (
      torch.sum(token_embeddings * input_mask_expanded, 1) 
      / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    )
    
    return pool_emb
  
  
  def get_embeddings(self, questions, batch=32):
    """
    Gets the corresponding embeddings for a set of input 'questions'.
    
    Args:
      questions (`list` of `str`): List of strings defining the questions to be embedded
      batch (`int`): Performs the embedding job 'batch' questions at a time
      
    Returns:
      The embedding vectors.
    """
    question_embeddings = []
    for i in range(0, len(questions), batch):
    
        # Tokenize sentences
        encoded_input = self.tokenizer(questions[i:i+batch], padding=True, truncation=True, return_tensors='pt')

        # Compute token embeddings
        with torch.no_grad():
            model_output = self.model(**encoded_input)

        # Perform mean pooling
        batch_embeddings = self._mean_pooling(model_output, encoded_input['attention_mask'])
        question_embeddings.append(batch_embeddings)
    
    question_embeddings = torch.cat(question_embeddings, dim=0)
    return question_embeddings

The QASearcher will set the context of corresponding questions and answers (set_context_qa), and return the answer to the most similar question in our context to the new unseen question from the user (get_answers).

In [None]:
class QASearcher:
  def __init__(self, model_name = "sentence-transformers/paraphrase-MiniLM-L6-v2"):
    """
    Defines a QA Search model. This is, given a new question it searches
    the most similar questions in a set 'context' and returns both the best
    question and associated answer.
    
    Args:
      model_name (`str`): Directory containing the necessary tokenizer
        and model files.
    """
    self.answers = None
    self.questions = None
    self.question_embeddings = None
    self.embedder = QAEmbedder(model_name=model_name)
  
  
  def set_context_qa(self, questions, answers):
    """
    Sets the QA context to be used during search.
    
    Args:
      questions (`list` of `str`):  List of strings defining the questions to be embedded
      answers (`list` of `str`): Best answer for each question in 'questions'
    """
    self.answers = answers
    self.questions = questions
    self.question_embeddings = self.get_q_embeddings(questions)
  
  
  def get_q_embeddings(self, questions):
    """
    Gets the embeddings for the questions in 'context'.
    
    Args:
      questions (`list` of `str`):  List of strings defining the questions to be embedded
    
    Returns:
      The embedding vectors.
    """
    question_embeddings = self.embedder.get_embeddings(questions)
    question_embeddings  = torch.nn.functional.normalize(question_embeddings, p=2, dim=1)
    return question_embeddings.transpose(0,1)
  
  
  def cosine_similarity(self, questions, batch=32):
    """
    Gets the cosine similarity between the new questions and the 'context' questions.
    
    Args:
      questions (`list` of `str`):  List of strings defining the questions to be embedded
      batch (`int`): Performs the embedding job 'batch' questions at a time
    
    Returns:
      The cosine similarity
    """
    question_embeddings = self.embedder.get_embeddings(questions, batch=batch)
    question_embeddings = torch.nn.functional.normalize(question_embeddings, p=2, dim=1)
    
    cosine_sim = torch.mm(question_embeddings, self.question_embeddings)
    
    return cosine_sim
  
  
  def get_answers(self, questions, batch=32):
    """
    Gets the best answers in the stored 'context' for the given new 'questions'.
    
    Args:
      questions (`list` of `str`):  List of strings defining the questions to be embedded
      batch (`int`): Performs the embedding job 'batch' questions at a time
    
    Returns:
      A `list` of `dict`'s containing the original question ('orig_q'), the most similar
      question in the context ('best_q') and the associated answer ('best_a').
    """
    similarity = self.cosine_similarity(questions, batch=batch)
    
    response = []
    for i in range(similarity.shape[0]):
      best_ix = similarity[i].argmax()
      best_q = self.questions[best_ix]
      best_a = self.answers[best_ix]
      
      response.append(
        {
          'orig_q':questions[i],
          'best_q':best_q,
          'best_a':best_a,
        }
      )
    
    return response

### 2. Define the FastAPI app
Our app should contain 2 POST endpoints, one to set the context (set_context) and one to get the answer to a given unseen question (get_answer).

The set_context endpoint will receive a dictionary containing 2 fields (questions and answers) and update the QASearcher.

The get_answer endpoint will receive a dictionary with 1 field (questions) and return a dictionary with the original question (orig_q), the most similar question in the context (best_q) and the associated answer (best_a).

In [None]:
%%capture
!pip install uvicorn
!pip install fastapi

In [None]:
import uvicorn
from fastapi import FastAPI, Request

qa_search = QASearcher()
app = FastAPI()

@app.post("/set_context")
async def set_context(data:Request):
  """
  Fastapi POST method that sets the QA context for search.
  
  Args:
    data(`dict`): Two fields required 'questions' (`list` of `str`)
      and 'answers' (`list` of `str`)
  """
  data = await data.json()
  
  qa_search.set_context_qa(
    data['questions'], 
    data['answers']
  )
  return {"message": "Search context set"}


@app.post("/get_answer")
async def get_answer(data:Request):
  """
  Fastapi POST method that gets the best question and answer 
  in the set context.
  
  Args:
    data(`dict`): One field required 'questions' (`list` of `str`)
  
  Returns:
    A `dict` containing the original question ('orig_q'), the most similar
    question in the context ('best_q') and the associated answer ('best_a').
  """
  data = await data.json()
  
  response = qa_search.get_answers(data['questions'], batch=1)
  return response

### 3. Build the Docker container
The last step is to wrap our app in a Docker container to more easily distribute and scale.

### I was not able to continue since my device is having issues in "Developer Mode". I cannot run the needed shell commands to complete the task.