# Fire Clauses

This notebook demonstrates the QA pipeline using the fire clauses from `fire-clauses.json`.
It shows how to:
1. Create embeddings using `sentence-transformers` and save them to a numpy file
2. Perform a manual test query (rather than using Qdrant) to simulate the semantic search process
3. Performs extraction of answers from the closest matches using a BERT model from `transformers`

## Creating the embeddings

In [1]:
from sentence_transformers import SentenceTransformer
import numpy as np
import pandas as pd
import torch
import os

KeyboardInterrupt: 

In [None]:
DATA_DIR = os.path.join(os.getcwd(), '..', 'data')
DATA_DIR

In [None]:
# Read the fire clauses into a dataframe
path = os.path.join(DATA_DIR, 'fire-clauses.json')
df = pd.read_json(path)

# Because the clause is unique, we can use it as the index
# Handle limitOnApplication NaNs by replacing with empty string
df.set_index('clause', inplace=True)
df['limitOnApplication'].fillna('', inplace=True)

# First 10 clauses
df.head(10)

In [None]:
# Load in the sentence transformer model - have a look and the comparisons here:
# https://www.sbert.net/docs/pretrained_models.html#sentence-embedding-models/
# multi-qa-MiniLM-L6-cos-v1 is trained for QA and is smaller with very minor loss in performance

model = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')

In [None]:
# Encode the clause contents to create sentence vector embeddings (combine `content` and `limitOnApplication`)

sentences = (df['content'] + ' ' + df['limitOnApplication']).tolist()
vectors = model.encode(sentences, show_progress_bar=True)

# Expect a mxn matrix where m is the number of clauses and n is the embedding dimension of the model
vectors.shape

In [None]:
# Save the vectors to a numpy file for the script to load and insert into Qdrant

save_path = os.path.join(DATA_DIR, 'fire-clauses.npy')
np.save(save_path, vectors, allow_pickle=False)

## Manual Test Query and Semantic Search

Make sure that our vectors have been converted expectedly where we manually search for a clause and find the closest match (we don't use Qdrant here yet)

In [None]:
from sentence_transformers import util

In [None]:
# Target clause C3.8
question = 'How high must the smoke be above the floor when firefighters put out a fire with water?'
context = df[df.index == 'C3.8']['content'].values[0]

print(f'Question: {question}')
print()
print(f'Expected context: {context}')

In [None]:
# Encode the question
question_vector = model.encode(question)
question_vector.shape

In [None]:
# Look for the top 3 closest matches - we use cosine similarity and gain all the scores in asc order.
# With the sorted scores we get the last 3 (top 3) and then flip for descending order.
# We then obtain from the data frame the clauses that match the top 3 scores

scores = util.cos_sim(np.array([question_vector]), vectors)[0]
top_score_ids = np.argsort(scores)[-3:].flip(0)
top_scored_rows = df.iloc[top_score_ids][['content']]
top_scored_rows

As we can see clause C3.8 was the top match.In the actual application, these embeddings need to be persisted in an actual vector database which we use Qdrant for. Qdrant has a client library to perform semantic search and have a nicer developer experience.

## Answer Extraction with BERT

From our top matches, we will attempt to extract the answer from the context using a BERT model.

In [None]:
# Load the model and tokenizer
from transformers import AutoTokenizer
from transformers import AutoModelForQuestionAnswering

model_name = 'deepset/tinyroberta-squad2'

model = AutoModelForQuestionAnswering.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

### Method 1: Using the `pipeline` from `transformers` for automatic inference

This is a more beginner-friendly approach, providing a general abstraction and allows you to use any of the pre-trained models from `transformers` to complete any inference task. We also show different ways to obtain answers from a varying number of contexts which could be used for experiments.

In [None]:
from transformers import pipeline

qa_pipeline = pipeline('question-answering', model=model, tokenizer=tokenizer)

In [None]:
# Result from top single context
qa_pipeline(question=question, context=top_scored_rows['content'][0])

In [None]:
# Concatenate all the top contexts and get the answer from the concatenated context
qa_pipeline(question=question, context=''.join(top_scored_rows['content'].tolist()))

In [None]:
# Individual answers from each context and get/sort by their probability scores
qa_pipeline(question=[question] * len(top_scored_rows), context=top_scored_rows['content'].tolist())

### Method 2: Applying the model directly

This approach allows for more control over the inference process. This approach allows you to perform further research and analysis on the model's (intermediate) outputs and integrations for custom workflows.

In [None]:
# Encode using the tokenizer
inputs = tokenizer(question, top_scored_rows['content'][0], return_tensors='pt')

In [None]:
# no_grad() to disable gradient calculation as we are only performing inference and don't need back propagation
with torch.no_grad():
    outputs = model(**inputs)

# Get the tokens of the predicted answer
start_idx, end_idx = outputs.start_logits.argmax(), outputs.end_logits.argmax()
predicted_answer_tokens = inputs['input_ids'][0, start_idx:end_idx + 1]
predicted_answer_tokens

In [None]:
# Decode the tokens back to words to obtain the answer (trim string)
tokenizer.decode(predicted_answer_tokens, skip_special_tokens=True).strip()