## **Reranking based RAG** 

_**What are your expectations for Fusion Retrieval with the provided manual?**_<br>
I do not expect the responses to be far superior than the vanilla approach due to the main reason being that these LLMs still require context and in-depth domain knowledge to truly understand how to rank the final set of chunks or documents appropriately. However, depending on how good the LLM is (in terms of size and amount of data that it was trained on), the results will vary.

**_How do you plan to test and compare these techniques?_**<br><br>
<img src="./reranking_workflow.png" alt="Flowchart" width="700" /><br><br>
The approach is pretty straight forward here and the main area where the changes take place compared to vanilla RAG is at the last stage where an LLM is prompted to retrieve the top-L indexes from the final retrieved set of chunks. Teh workflow is as follows:<br>
1. The document data is extracted, specifically text and tabular data. 
2. These data are stored in a way where the sequence is maintained, that way there will be more context for a certain text that may have a table before or after it. 
3. This set is chunked and converted to BERT based vector embeddings. BERT was chosen because from what I know, they can represent these chunks in a very context aware fashion, thanks to the utilization os the encoder of a transformers. 
4. Now the query is also converted to BERT based embeddings and using cosine similarity, the top-K chunks are retrieved.
5. Now these chunks, along with the query and chunk IDs is passed into an LLM for further ranking. 
6. The LLM returns a list of IDs of the top-L relevant chunks (L < K). 
7. The chunks are retrieved from the original set of chunks using these L indexes.
8. This set of chunks, which acts as context, along with the query, is passed into a new LLM for a refined information retrieval.

_**Comparison Strategy**_<br>
There are possibly two main ways in which we can compare this approach with the Reranking apprach. One is by assessing the top-L retrieved chunks and the other is obviously by assessing the final response from the LLM. 

---

#### Import libraries

In [None]:
import fitz # for text extraction
import camelot # for table extraction
from pathlib import Path
from sentence_transformers import SentenceTransformer, util # for semantic vector embedding creation 
import numpy as np
from groq import Groq
import os
import time
import torch
import re
import ollama

#### 1. Function to extract texts & tables from PDF
The goal is to preserve the sequence, that way there will be more context for a certain text that may have a table before or after it.

In [2]:
def extract_text_and_tables(pdf_path):

    pdf_file = Path(pdf_path)
    if not pdf_file.is_file() or pdf_file.suffix.lower() != ".pdf":
        raise FileNotFoundError("Provided file path is not a valid PDF.")

    doc = fitz.open(str(pdf_file))
    result = []

    # text extraction
    for page_num, page in enumerate(doc, start = 1):
        page_blocks = []

        blocks = page.get_text("dict")["blocks"]
        for block in blocks:
            if block["type"] == 0: # type 0 is text
                text_content = " ".join(
                    span["text"] for line in block["lines"] for span in line["spans"]
                ).strip()
                if text_content:
                    y = block["bbox"][1]
                    page_blocks.append({
                        "type": "TEXT DATA",
                        "y": y,
                        "content": text_content
                    })

        # table extraction
        try:
            tables = camelot.read_pdf(str(pdf_file), pages = str(page_num), flavor = 'lattice') # lattice flavor to extract tables
        except Exception as e:
            print(f"Failed to read tables on page {page_num}: {e}")
            tables = []

        for table in tables:
            table_data = table.data
            bbox = table._bbox
            y = float(bbox[1])
            page_blocks.append({
                "type": "TABLE DATA",
                "y": y,
                "content": table_data
            })

        page_blocks.sort(key=lambda b: b["y"]) # sort contents on current page
        result.extend(page_blocks) # append content to result list

    return result

In [3]:
# extract texts and tables from the maual
result = extract_text_and_tables("manual.pdf")
result[100:105] # few elements from the extracted data

[{'type': 'TEXT DATA',
  'y': 145.75482177734375,
  'content': 'Follow the instructions contained herein, in addition to the general precautions to be observed while working. Even if the operator is already familiar with the use of manually operated lathes, it is necessary to: In particular:'},
 {'type': 'TEXT DATA', 'y': 173.48190307617188, 'content': 'fervi.com'},
 {'type': 'TEXT DATA',
  'y': 188.8348388671875,
  'content': '\uf0b7 Acquire full knowledge of the machine. For safe operation, this manual must be read carefully in order to acquire the necessary knowledge of the machine and to understand: operation, safety devices and all necessary precautions. \uf0b7 Wear appropriate clothing for the job. The operator must wear appropriate clothing to prevent accidents. \uf0b7 Maintain the machine with care.'},
 {'type': 'TEXT DATA',
  'y': 312.05987548828125,
  'content': 'Risks associated with using the machine'},
 {'type': 'TEXT DATA',
  'y': 342.43487548828125,
  'content': 'The mac

In [4]:
# sample table data
result[156]

{'type': 'TABLE DATA',
 'y': 91.1826731262468,
 'content': [['Description (unit of measurement)', 'T999/230V\nT999/400V'],
  ['Centres distance (mm)', '1000'],
  ['Spindle hole diameter (mm)', '38'],
  ['Maximum swing over the bed (mm)', '320'],
  ['Maximum swing over the cross slide (mm)', '198'],
  ['Turning diameter over cavity (mm)', ''],
  ['Spindle diameter (3 + 3 self centring) (mm)', ''],
  ['Spindle connector', ''],
  ['No. of spindle speeds', 'm'],
  ['Spindle speed (r/min)', ''],
  ['No. of metric threads', ''],
  ['Range of metric threads (mm)', 'o'],
  ['No. of inch threads', ''],
  ['Range of inch threads (mm)', ''],
  ['Range of longitudinal\nfeeds (mm)', '00.78- 1.044\nc'],
  ['Range of transverse feeds (mm)', '0.022- 0.298'],
  ['Outer diameter of the feed screw (mm)\n.', '22'],
  ['Guide length (mm)\ni', '1390'],
  ['Cross carriage travel (mm)\nv', '200'],
  ['Tailstock sleeve diameter (mm)', '32'],
  ['Maximum travel of the tailstock sleeve (mm)\nr', '80'],
  ['Inner

In [5]:
# list formatting by adding labels for text and table
final = []
for r in result:
    s = f"{r['type']}: {r['content']}"
    final.append(s)

In [6]:
# table data sample after flattening
final[156]

'TABLE DATA: [[\'Description (unit of measurement)\', \'T999/230V\\nT999/400V\'], [\'Centres distance (mm)\', \'1000\'], [\'Spindle hole diameter (mm)\', \'38\'], [\'Maximum swing over the bed (mm)\', \'320\'], [\'Maximum swing over the cross slide (mm)\', \'198\'], [\'Turning diameter over cavity (mm)\', \'\'], [\'Spindle diameter (3 + 3 self centring) (mm)\', \'\'], [\'Spindle connector\', \'\'], [\'No. of spindle speeds\', \'m\'], [\'Spindle speed (r/min)\', \'\'], [\'No. of metric threads\', \'\'], [\'Range of metric threads (mm)\', \'o\'], [\'No. of inch threads\', \'\'], [\'Range of inch threads (mm)\', \'\'], [\'Range of longitudinal\\nfeeds (mm)\', \'00.78- 1.044\\nc\'], [\'Range of transverse feeds (mm)\', \'0.022- 0.298\'], [\'Outer diameter of the feed screw (mm)\\n.\', \'22\'], [\'Guide length (mm)\\ni\', \'1390\'], [\'Cross carriage travel (mm)\\nv\', \'200\'], [\'Tailstock sleeve diameter (mm)\', \'32\'], [\'Maximum travel of the tailstock sleeve (mm)\\nr\', \'80\'], [\'I

It can be seen that the flattened version somewhat preserves the structure of the actual table by keeping each row inside a list. The LLM can hopefully understand this due to the presence of the label 'TABLE DATA' at the start.

### 2. Chunking

In [7]:
chunks = [" ".join(final[i:i+10]) for i in range(0, len(final), 30)]
print(f"Number of chunks: {len(chunks)}\n")

chunks[15] # sample

Number of chunks: 46



'TEXT DATA: 8.3 Levelling the machine TEXT DATA: For this operation, it is recommended to use a precision spirit level (0.001 mm). TEXT DATA: 8.3.1 Preliminary phase TEXT DATA: The preliminary phase serves to eliminate the presence of torsions in the lathe table. Proceed to reset the head by adjusting the relative screws and then locking the tailstock with the relative adjustment screws moving the reference mark to zero. TEXT DATA: fervi.com TEXT DATA: 8.3.2 Transverse levelling of the table TEXT DATA: Position the spirit level in a transverse direction on the lathe guides under the spindle and check the bubble. Position the spirit level in a transverse direction on the table guides under the tailstock and check the bubble. Repeat these operations frequently and, if necessary, make small corrections by screwing and/or unscrewing the adjustable feet below the pallet. TEXT DATA: 8.3.3 Levelling the lathe rails TEXT DATA: Place the level on the sides of the carriage and move it slowly alo

### 3. Creating BERT based vector embeddings

In [8]:
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
sem_embs = model.encode(chunks, convert_to_tensor = True)

### 4. Pipeline to return indeces of top-K chunks that match with the query

In [9]:
def bert_query_pipeline(query, top_k = 5):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs)[0]
    top_indexes = np.argsort(cosine_scores.cpu().numpy())[::-1][:top_k]

    return top_indexes

In [None]:
# query = 'What are some general safety rules when using machine equipment?'
query = 'What does the manual say about unplugging the power cord of the machine from the power outlet?'

In [None]:
# get all chunks from chunks retrived from both implementations

bert_top_10_idx = bert_query_pipeline(query)
final_idx = list(set(list(bert_top_10_idx)))

top_context_chunks = [chunks[idx] for idx in final_idx] 

# get top embeddings directly from precomputed tensor
sem_embs_final = torch.stack([sem_embs[idx] for idx in final_idx])

### 5. Function to get the final set of chunks

In [11]:
def bert_final_idx(query):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs_final)[0]
    indexes = np.argsort(cosine_scores.cpu().numpy())[::-1]

    return indexes

In [12]:
best = bert_final_idx(query)

### 6. Get final context

In [13]:
context_string = ""
for idx in best:
    context_string += f"{chunks[idx].strip()} (idx = {idx})\n\n"

In [14]:
# remove unnecessary dots
final_context_string = re.sub(r'\.{2,}', '.', context_string)
final_context_string



### 7. Setting up LLM A that returns indexes of top-L chunks based on the query

In [15]:
def prompt_rank_contexts(query, final_context, top_l = 5):
    context_string = ""
    for i, ctx in enumerate(final_context):
        context_string += f"Context {i+1}:\n{ctx.strip()}\n\n"

    return f"""
        You are a highly skilled AI assistant that ranks technical contexts from a machinery operations and maintenance manual.

        Given a user question and several candidate context excerpts from the manual, rank the top-L most relevant ones for answering the question. Relevance means how well the context can be used to answer the question **accurately and directly**.

        Each context ends with a tag in the format **(idx = N)**. Use this identifier to reference the context when deciding relevance.

        User Question:
        {query}

        Candidate Contexts:
        {context_string}

        Return only the `idx` values of the top {top_l} most relevant contexts, in descending order of relevance (most relevant first). Format your response like this:
        22, 8, 15, 4, 31
        """

def llm_a(query, final_context): # mistral might not be the best but no option
    prompt = prompt_rank_contexts(query, final_context)

    try:
        response = ollama.generate(
            model = 'mistral', 
            prompt = prompt,
            stream = False
        )
        output = response['response'].strip()

        # parse the returned string for integer idx values
        ranked_indexes = [int(idx.strip()) for idx in output.split(",") if idx.strip().isdigit()]
        return ranked_indexes[:5]

    except Exception as e:
        print("Error using Ollama (Python client):", str(e))
        return []

In [16]:
# get ids of top-L chunks
top_ids = llm_a(query, final_context_string)
print(f"IDs of top-L chunks are: {top_ids}")

IDs of top-L chunks are: [22, 8, 15, 4, 31]


### 8. Get final context

In [17]:
final_context = ''
for idx in top_ids:
    final_context += chunks[idx]

In [18]:
final_context

"TEXT DATA: \uf0ea TEXT DATA: 0. 0 0 0 X TEXT DATA: Page 38 of 84 TEXT DATA: MACHINES AND TEXT DATA: ACCESSORIES TEXT DATA: 4- D EFAULT DATA The default data allows continuous monitoring of the machining in operation. If, for example, it has a piece as shown in  Figure 29 /a) and you want to get the piece in  Figure 29 /b) you can set all the heights in order to precisely control the actual machining. To set the data, proceed as follows: TEXT DATA: fervi.com TEXT DATA: Figure 29 – Example of machining. - Move the tool to the height A in the Z direction (longitudinal). TEXT DATA: - Press the button TEXT DATA: to set height 5.TEXT DATA: 1 2 TEXT DATA: 22 23 TEXT DATA: Figure 2 – Main parts of the gear head lathe (Art. T999/230V and T999/400V). TEXT DATA: 1 Brake 9 Protective device 17  Tailstock handwheel TEXT DATA: 2 Carriage handwheel 10  Halogen lamp 18  Support bars TEXT DATA: 3 Workbench 11  Turret 19  Lead screw TEXT DATA: 4 Speed switches 12  Coolant tube 20  Turning bar TEXT DATA


### 9. Setting up LLM B for response generation

In [1]:
def llm_b(prompt):
    client = Groq(
        api_key = os.getenv("GROQ_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        model = "llama-3.3-70b-versatile",
        # model = "llama3-70b-8192",
        # model = "mistral-saba-24b",
        messages = [
            {
                "role": "system",
                "content": "You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 0.5,
        max_tokens = 5640,
        top_p = 1,
        stream = True,
    )

    for chunk in chat_completion:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end = '', flush = True)  # print to console without newline, flush immediately
            time.sleep(0.01)  # optional tiny delay for typewriter effect
    

def prompt(query, context):
    return f"""
        You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery.

        Given the user question and the relevant extracted context from the manual:

        - Provide a clear, precise, and factual answer to the question.
        - Base your response strictly on the provided context; do not guess beyond it.
        - If the context does not contain enough information, indicate that the answer is not available in the manual or that the context is not sufficient.
        - Keep the answer professional, concise, and focused on practical instructions.
        - Each section of the context begins with a tag: either 'TEXT DATA' or 'TABLE DATA'.
        - 'TEXT DATA' represents plain, unstructured text. 'TABLE DATA' represents information extracted from a table and flattened into a list format.
        - The 'TABLE DATA' is structured as a list of rows, where each row is a list containing the column values in order. The format is as follows: [[column 1 value, column 2 value, ...], [column 1 value, column 2 value, ...], ...]
        
        User Question:
        {query}

        Context from Manual:
        {context}
        """

### 10. Inference

In [20]:
prompt = prompt(query, final_context) # go to section number 4 to change query
llm_b(prompt)

According to the manual, the power cord of the machine should be unplugged from the power outlet in the following situations:

1. When the machine is not being operated.
2. When the machine is left unattended.
3. During maintenance or registration, if the machine does not work properly.
4. If the power cable is damaged.
5. When the tool is replaced.
6. When the machine is being moved or transported.
7. During cleaning operations.

This information is found in point 23 of the manual.