## **Reranking based RAG** 

_**What are your expectations for Fusion Retrieval with the provided manual?**_<br>
I do not expect the responses to be far superior than the vanilla approach due to the main reason being that these LLMs still require context and in-depth domain knowledge to truly understand how to rank the final set of chunks or documents appropriately. However, depending on how good the LLM is (in terms of size and amount of data that it was trained on), the results will vary.

**_How do you plan to test and compare these techniques?_**<br><br>
<img src="./reranking_workflow.png" alt="Flowchart" width="900" /><br><br>
The approach is pretty straight forward here and the main area where the changes take place compared to vanilla RAG is at the last stage where an LLM is prompted to retrieve the top-L indexes from the final retrieved set of chunks. Teh workflow is as follows:<br>
1. The document data is extracted, specifically text and tabular data. 
2. These data are stored in a way where the sequence is maintained, that way there will be more context for a certain text that may have a table before or after it. 
3. This set is chunked and converted to BERT based vector embeddings. BERT was chosen because from what I know, they can represent these chunks in a very context aware fashion, thanks to the utilization os the encoder of a transformers. 
4. Now the query is also converted to BERT based embeddings and using cosine similarity, the top-K chunks are retrieved.
5. Now these chunks, along with the query and chunk IDs is passed into an LLM for further ranking. 
6. The LLM returns a list of IDs of the top-L relevant chunks (L < K). 
7. The chunks are retrieved from the original set of chunks using these L indexes.
8. This set of chunks, which acts as context, along with the query, is passed into a new LLM for a refined information retrieval.

**It must be noted that K > L. In this implementation, they are set as 10 and 8 respectively.**

_**Comparison Strategy**_<br>
There are possibly two main ways in which we can compare this approach with the Reranking apprach. One is by assessing the top-L retrieved chunks and the other is obviously by assessing the final response from the LLM. 

_**Note**: Considering images is important in order to create a robust RAG system. Due to technical/financial constraints, images are omitted for this implementation. However, in the absence of such constraints, what I would have done is have the LLM read the image and prompt it to generate a description. This description will be added into the resulting array while also maintaining the sequence. **One obvious question in that case will be whether or not the LLM knows about the content in the image, provided that it is very domain specific and unfamiliar to the LLM**. One way I thought of on mitigating this issue is by providing some set of surrounding context of the image to the LLM along with the image itself for it to draw better conclusions. These contexts can be the nearest 2 or 3 elements (text, table or another image) surrounding the image in hand. Let this value be J. So, if J is 3, we feed 3 elements before the image and 3 elements after the image as context for the LLM to generate proper a description of the image in hand. This might not be the most efficient solution, but there can be scenarios where this will work._

---

#### Import libraries

In [1]:
import fitz # for text extraction
import camelot # for table extraction
from pathlib import Path
from sentence_transformers import SentenceTransformer, util # for semantic vector embedding creation 
import numpy as np
from groq import Groq
import os
import time
import torch
import re
import ollama
import json

#### 1. Function to extract texts & tables from PDF
The goal is to preserve the sequence, that way there will be more context for a certain text that may have a table before or after it.

In [None]:
def extract_text_and_tables(pdf_path):

    pdf_file = Path(pdf_path)
    if not pdf_file.is_file() or pdf_file.suffix.lower() != ".pdf":
        raise FileNotFoundError("Provided file path is not a valid PDF.")

    doc = fitz.open(str(pdf_file))
    result = []

    # text extraction
    for page_num, page in enumerate(doc, start = 1):
        page_blocks = []

        blocks = page.get_text("dict")["blocks"]
        for block in blocks:
            if block["type"] == 0: # type 0 is text
                text_content = " ".join(
                    span["text"] for line in block["lines"] for span in line["spans"]
                ).strip()
                if text_content:
                    y = block["bbox"][1]
                    page_blocks.append({
                        "type": "TEXT DATA",
                        "y": y,
                        "content": text_content
                    })

        # table extraction
        try:
            tables = camelot.read_pdf(str(pdf_file), pages = str(page_num), flavor = 'lattice') # lattice flavor to extract tables
        except Exception as e:
            print(f"Failed to read tables on page {page_num}: {e}")
            tables = []

        for table in tables:
            table_data = table.data
            bbox = table._bbox
            y = float(bbox[1])
            page_blocks.append({
                "type": "TABLE DATA",
                "y": y,
                "content": table_data
            })

        page_blocks.sort(key = lambda b: b["y"]) # sort contents on current page
        result.extend(page_blocks) # append content to result list

    return result

In [None]:
# extract texts and tables from the maual
pre_result = extract_text_and_tables("manual.pdf")
pre_result[100:105] # few elements from the extracted data

In [2]:
# load manual
with open("manual.json", "r") as file:
    pre_result = json.load(file)

In [3]:
# removing 'fervi.com' background text
result = []
for res in pre_result:
    if res['content'] != 'fervi.com':
        result.append(res)

In [4]:
# sample table data
result[1173]

{'type': 'TABLE DATA',
 'y': 54.94955827871188,
 'content': [['Part No.', 'Description', 'i', 'Description'],
  ['T999/F001', 'Body\nv', '', 'Micrometer'],
  ['T999/F002', 'Flange', '', 'Lock'],
  ['T999/F003', '', 'T999/F026', 'Switch'],
  ['T999/F004', 'r', 'T999/F029', 'Knob'],
  ['T999/F005', '', 'T999/F030', 'Knob'],
  ['T999/F007', 'e', 'T999/F031', 'Allen key'],
  ['T999/F008', '', 'T999/F032', 'Allen key'],
  ['T999/F009\nf', '', 'T999/F033', 'Screw'],
  ['T999/F010', '', 'T999/F034', 'Screw'],
  ['T999/F011', '', 'T999/F035', 'Screw'],
  ['T999/F012', '', 'T999/F036', 'Screw'],
  ['T999/F013', 'Pin', 'T999/F037', 'Nut'],
  ['T999/F014', 'Screw', 'T999/F038', 'Nut'],
  ['T999/F015', 'Sleeve coupling', 'T999/F039', 'Key'],
  ['T999/F016', 'Tie rod', 'T999/F040', 'Washer'],
  ['T999/F019', 'Pin', 'T999/F041', 'Plug'],
  ['T999/F020', 'Lever', 'T999/F041', 'Bearing'],
  ['T999/F021', 'Nut', 'T999/F042', 'Oiler']]}

In [5]:
# list formatting by adding labels for text and table
final = []
for r in result:
    s = f"{r['type']}: {r['content']}"
    final.append(s)

In [6]:
# table data sample after flattening
final[1173]

"TABLE DATA: [['Part No.', 'Description', 'i', 'Description'], ['T999/F001', 'Body\\nv', '', 'Micrometer'], ['T999/F002', 'Flange', '', 'Lock'], ['T999/F003', '', 'T999/F026', 'Switch'], ['T999/F004', 'r', 'T999/F029', 'Knob'], ['T999/F005', '', 'T999/F030', 'Knob'], ['T999/F007', 'e', 'T999/F031', 'Allen key'], ['T999/F008', '', 'T999/F032', 'Allen key'], ['T999/F009\\nf', '', 'T999/F033', 'Screw'], ['T999/F010', '', 'T999/F034', 'Screw'], ['T999/F011', '', 'T999/F035', 'Screw'], ['T999/F012', '', 'T999/F036', 'Screw'], ['T999/F013', 'Pin', 'T999/F037', 'Nut'], ['T999/F014', 'Screw', 'T999/F038', 'Nut'], ['T999/F015', 'Sleeve coupling', 'T999/F039', 'Key'], ['T999/F016', 'Tie rod', 'T999/F040', 'Washer'], ['T999/F019', 'Pin', 'T999/F041', 'Plug'], ['T999/F020', 'Lever', 'T999/F041', 'Bearing'], ['T999/F021', 'Nut', 'T999/F042', 'Oiler']]"

It can be seen that the flattened version somewhat preserves the structure of the actual table by keeping each row inside a list. The LLM can hopefully understand this due to the presence of the label 'TABLE DATA' at the start.

### 2. Chunking

In [7]:
chunks = [" ".join(final[i:i + 10]) for i in range(0, len(final), 10)] # be careful here
print(f"Number of chunks: {len(chunks)}\n")

chunks[15] # sample

Number of chunks: 128



'TEXT DATA: Spindle hole diameter (mm) 38 TEXT DATA: Maximum swing over the bed (mm) 320 TEXT DATA: Maximum swing over the cross slide (mm) 198 TEXT DATA: Turning diameter over cavity (mm) 470 TEXT DATA: Spindle diameter (3 + 3 self centring) (mm) 160 TEXT DATA: Spindle connector Camlock D1-4 TEXT DATA: No. of spindle speeds 8 TEXT DATA: Spindle speed (r/min) 70 - 2000 RPM TEXT DATA: No. of metric threads 32 TEXT DATA: Range of metric threads (mm) 0.44- 10'

### 3. Creating BERT based vector embeddings

In [8]:
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
sem_embs = model.encode(chunks, convert_to_tensor = True)

### 4. Pipeline to return indeces of top-K chunks that match with the query

In [25]:
def bert_query_pipeline(query, top_k = 10):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs)[0] # cosine similarity
    top_indexes = np.argsort(cosine_scores.cpu().numpy())[::-1][:top_k]

    return top_indexes

In [26]:
# query = "Summarize the manual." # 1
# query = "What are some general safety rules when using machine equipment?" # 2
# query = "What does the manual say about unplugging the power cord of the machine from the power outlet?"" # 3
# query = "What are the several manual controls on the tool holder carriage?" # 4
query = "Tell me about the lever for selection of longitudinal feeds." # 5
# query = "What does the document talk about regarding digital displays?" # 6
# query = "What controls does the electric panel have?" # 7
# query = "How to achieve balance when lifting the Lathe?" # 8
# query = "Tell me about using the machine for turning non-ferrous materials." # 9
# query = "What should a grounding conductor be used for?" # 10

In [27]:
# get chunks
bert_top_k_idx = bert_query_pipeline(query)
final_idx = list(set(list(bert_top_k_idx)))

# get top embeddings directly from embeddings tensor
sem_embs_final = torch.stack([sem_embs[idx] for idx in final_idx])

### 5. Function to get the final set of chunks

In [28]:
def bert_final_idx(query):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs_final)[0]
    indexes = np.argsort(cosine_scores.cpu().numpy())[::-1]

    return indexes

In [29]:
best = bert_final_idx(query)
max_idx = max(best) # used for LLM later
len(best)

10

### 6. Get final context

In [30]:
context_string = ""
for idx in best:
    context_string += f"{chunks[idx].strip()} (IDX = {idx})."

In [31]:
# remove unnecessary dots
context_string_4_mistral = re.sub(r'\.{2,}', '.', context_string)

In [32]:
context_string_4_mistral



### 7. Setting up LLM A that returns indexes of top-L chunks based on the query

In [33]:
def prompt_rank_contexts(query, context_string, max_idx, top_l = 8):
    # for i, ctx in enumerate(final_context):
    #     context_string += f"Context {i+1}:\n{ctx.strip()}\n\n"
    # context_string = ""

    return f"""
        You are a highly skilled AI assistant that ranks technical contexts from a machinery operations and maintenance manual.

        Your task is to rank the top-{top_l} most relevant contexts for answering the user’s question, based solely on the provided content.

        Each context ends with a tag in the format: **(IDX = N)**, where N is an integer between 0 and {max_idx}. Do not guess or invent index values.

        **Instructions**:
        - Carefully read all the context snippets.
        - Identify the most relevant contexts that directly support answering the question.
        - Return exactly {top_l} `idx` values, in descending order of relevance (most relevant first).
        - Only include integers in the range 0 to {max_idx}.
        - Format your answer as a comma-separated list of integers with no extra text.

        **Example**: 12, 4, 7, 1, 0, 8, 22, 10

        User Question:
        {query}

        Candidate Contexts:
        {context_string}
    """

def llm_a(query, final_context): # mistral might not be the best but no option
    prompt = prompt_rank_contexts(query, final_context, max_idx)

    try:
        response = ollama.generate(
            model = 'mistral', 
            prompt = prompt,
            stream = False
        )
        output = response['response'].strip()

        # parse the returned string for integer idx values
        ranked_indexes = [int(idx.strip()) for idx in output.split(",") if idx.strip().isdigit()]
        return ranked_indexes[:8] # L-value

    except Exception as e:
        print("Error using Ollama (Python client):", str(e))
        return []

In [34]:
# get ids of top-L chunks
top_ids = llm_a(query, context_string_4_mistral)
print(f"IDs of top-L chunks are: {top_ids}")

IDs of top-L chunks are: [7, 2, 1, 0, 5, 3, 4, 9]


### 8. Get final context

In [35]:
final_context = ''
for idx in top_ids:
    final_context += chunks[idx]

In [36]:
# remove unnecessary dots
final_context_string = re.sub(r'\.{2,}', '.', final_context)

In [37]:
final_context_string




### 9. Setting up LLM B for response generation

In [38]:
def llm_b(prompt):
    client = Groq(
        api_key = os.getenv("GROQ_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        model = "llama-3.3-70b-versatile",
        # model = "llama3-70b-8192",
        # model = "mistral-saba-24b",
        messages = [
            {
                "role": "system",
                "content": "You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 0.5,
        max_tokens = 5640,
        top_p = 1,
        stream = True,
    )

    for chunk in chat_completion:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end = '', flush = True)  # print to console without newline, flush immediately
            time.sleep(0.01)  # optional tiny delay for typewriter effect
    

def prompt(query, context):
    return f"""
        You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery.

        Given the user question and the relevant extracted context from the manual:

        - Provide a clear, precise, and factual answer to the question.
        - Base your response strictly on the provided context; do not guess beyond it.
        - If the context does not contain enough information, indicate that the answer is not available in the manual or that the context is not sufficient.
        - Keep the answer professional, concise, and focused on practical instructions.
        - Each section of the context begins with a tag: either 'TEXT DATA' or 'TABLE DATA'.
        - 'TEXT DATA' represents plain, unstructured text. 'TABLE DATA' represents information extracted from a table and flattened into a list format.
        - The 'TABLE DATA' is structured as a list of rows, where each row is a list containing the column values in order. The format is as follows: [[column 1 value, column 2 value, ...], [column 1 value, column 2 value, ...], ...]
        - If available, provide references for the information.
        
        User Question:
        {query}

        Context from Manual:
        {context}
    """

### 10. Inference

In [39]:
prompt = prompt(query, final_context) # go to section number 4 to change query
print(f"QUERY: {query}\n")
print('RESPONSE:')
llm_b(prompt)

QUERY: Tell me about the lever for selection of longitudinal feeds.

RESPONSE:

For accurate information about the lever for selection of longitudinal feeds, it is recommended to consult the manual further, specifically sections that describe the machine's controls and operational components, such as section 9 "DESCRIPTION OF CONTROLS" or other relevant parts of the manual that might detail the machine's operational levers and their functions.

Reference: Since the specific information about the lever for selection of longitudinal feeds is not available in the provided context, it is suggested to look into sections 9.2 "Levers and control wheels" or other parts of the manual that might discuss the machine's control mechanisms.

### General points when testing
1. Mistral 7B cannot handle too much context, therefore top-K is set to a low value. Optimal seems to be 10 here. This also depends on the context size, which is set to 10.
2. A better model like GPT will probably do better.
3. Based on the current settings, Fusion Retrieval seems to do better compared to Reranking.