## **Reranking based RAG** 

>**_Your expectations for Reranking with the provided manual._**<br><br>
To be quite honest, I do not expect the responses to be far superior than the vanilla approach due to the main reason being that these LLMs still require context and in-depth domain knowledge to truly understand how to rank the final set of chunks or documents appropriately. However, depending on how good the LLM is (in terms of size and data that it was trained on), the results will vary.

>**_How you planned to test and compare these techniques._**<br><br>
><img src="./rr_workflow.jpeg" alt="Flowchart" width="700" /><br><br>
The approach is pretty straight forward here. The main area where the changes happen is at the last stage where we need prompt an LLM once more to retrieve the top-k indices from the final retrieved set of chunks.<br><br>
Initially, the document data is extracted, specifically text and tabular data. Next, these data are stored in a way where the sequence is maintained, that way there will be more context for a certain text that may have a table before or after it. This set is chunked and converted BERT based vector embeddings. BERT was chosen because from what I know, they can represent these chunks in a very context aware fashion, thanks to the encoder representation from transformers. Now the query is also converted to BERT based embeddings and using cosine similarity, top-k chunks are retrieved.<br><br>
Now these chunks, along with the query and chunk ID is passed into an LLM for further ranking. The LLM returns a list of most relevant chunks (5 was chosen). These indices are retrieved from the original set of chunks and this set of text, which acts as context, along with the query, is passed into a new LLM for information retrieval.

>_**Comparison Strategy**_<br><br>
>Now there are 2 main ways in which we can compare this model with the Fusion Retrieval model, by assessing the top retrieved chunks and also by assessing the final response from the LLM. This will be done at the end.

---

#### Import libraries

In [115]:
import fitz # for text extraction
import camelot # for table extraction
from pathlib import Path
from sentence_transformers import SentenceTransformer, util # for semantic vector embedding creation 
import numpy as np
from groq import Groq
import os
import time
import torch
import re
import ollama

#### Function to extract texts & tables from PDF

In [None]:
def extract_text_and_tables(pdf_path):

    pdf_file = Path(pdf_path)
    if not pdf_file.is_file() or pdf_file.suffix.lower() != ".pdf":
        raise FileNotFoundError("Provided file path is not a valid PDF.")

    doc = fitz.open(str(pdf_file))
    result = []

    for page_num, page in enumerate(doc, start=1):
        page_blocks = []

        blocks = page.get_text("dict")["blocks"]
        for block in blocks:
            if block["type"] == 0:
                text_content = " ".join(
                    span["text"] for line in block["lines"] for span in line["spans"]
                ).strip()
                if text_content:
                    y = block["bbox"][1]
                    page_blocks.append({
                        "type": "text data",
                        "y": y,
                        "content": text_content
                    })

        try:
            tables = camelot.read_pdf(str(pdf_file), pages=str(page_num), flavor='lattice')
        except Exception as e:
            print(f"Failed to read tables on page {page_num}: {e}")
            tables = []

        for table in tables:
            table_data = table.data
            bbox = table._bbox
            y = float(bbox[1])
            page_blocks.append({
                "type": "table data",
                "y": y,
                "content": table_data
            })

        page_blocks.sort(key=lambda b: b["y"])
        result.extend(page_blocks)

    return result

result = extract_text_and_tables("manual.pdf")

In [3]:
# few elements from the extracted data
result[100:105]

[{'type': 'text',
  'y': 145.75482177734375,
  'content': 'Follow the instructions contained herein, in addition to the general precautions to be observed while working. Even if the operator is already familiar with the use of manually operated lathes, it is necessary to: In particular:'},
 {'type': 'text', 'y': 173.48190307617188, 'content': 'fervi.com'},
 {'type': 'text',
  'y': 188.8348388671875,
  'content': '\uf0b7 Acquire full knowledge of the machine. For safe operation, this manual must be read carefully in order to acquire the necessary knowledge of the machine and to understand: operation, safety devices and all necessary precautions. \uf0b7 Wear appropriate clothing for the job. The operator must wear appropriate clothing to prevent accidents. \uf0b7 Maintain the machine with care.'},
 {'type': 'text',
  'y': 312.05987548828125,
  'content': 'Risks associated with using the machine'},
 {'type': 'text',
  'y': 342.43487548828125,
  'content': 'The machine must only be used by

In [4]:
# sample table data
result[156]

{'type': 'table',
 'y': 91.1826731262468,
 'content': [['Description (unit of measurement)', 'T999/230V\nT999/400V'],
  ['Centres distance (mm)', '1000'],
  ['Spindle hole diameter (mm)', '38'],
  ['Maximum swing over the bed (mm)', '320'],
  ['Maximum swing over the cross slide (mm)', '198'],
  ['Turning diameter over cavity (mm)', ''],
  ['Spindle diameter (3 + 3 self centring) (mm)', ''],
  ['Spindle connector', ''],
  ['No. of spindle speeds', 'm'],
  ['Spindle speed (r/min)', ''],
  ['No. of metric threads', ''],
  ['Range of metric threads (mm)', 'o'],
  ['No. of inch threads', ''],
  ['Range of inch threads (mm)', ''],
  ['Range of longitudinal\nfeeds (mm)', '00.78- 1.044\nc'],
  ['Range of transverse feeds (mm)', '0.022- 0.298'],
  ['Outer diameter of the feed screw (mm)\n.', '22'],
  ['Guide length (mm)\ni', '1390'],
  ['Cross carriage travel (mm)\nv', '200'],
  ['Tailstock sleeve diameter (mm)', '32'],
  ['Maximum travel of the tailstock sleeve (mm)\nr', '80'],
  ['Inner tape

In [None]:
# list formatting by adding labels for text and table
final = []
for r in result:
    s = f"{r['type']}: {r['content']}"
    final.append(s)

In [6]:
# table data sample after flattening
final[156]

'table - [[\'Description (unit of measurement)\', \'T999/230V\\nT999/400V\'], [\'Centres distance (mm)\', \'1000\'], [\'Spindle hole diameter (mm)\', \'38\'], [\'Maximum swing over the bed (mm)\', \'320\'], [\'Maximum swing over the cross slide (mm)\', \'198\'], [\'Turning diameter over cavity (mm)\', \'\'], [\'Spindle diameter (3 + 3 self centring) (mm)\', \'\'], [\'Spindle connector\', \'\'], [\'No. of spindle speeds\', \'m\'], [\'Spindle speed (r/min)\', \'\'], [\'No. of metric threads\', \'\'], [\'Range of metric threads (mm)\', \'o\'], [\'No. of inch threads\', \'\'], [\'Range of inch threads (mm)\', \'\'], [\'Range of longitudinal\\nfeeds (mm)\', \'00.78- 1.044\\nc\'], [\'Range of transverse feeds (mm)\', \'0.022- 0.298\'], [\'Outer diameter of the feed screw (mm)\\n.\', \'22\'], [\'Guide length (mm)\\ni\', \'1390\'], [\'Cross carriage travel (mm)\\nv\', \'200\'], [\'Tailstock sleeve diameter (mm)\', \'32\'], [\'Maximum travel of the tailstock sleeve (mm)\\nr\', \'80\'], [\'Inner

It can be seen that the flattened version somewhat preserves the structure of the actual table by keeping each row inside a list. The LLM can hopefully understand this due to the presence of the label.

### Chunking

In [7]:
chunked_final = ["".join(final[i:i+10]) for i in range(0, len(final), 30)]
print(f"Number of chunks: {len(chunked_final)}")

Number of chunks: 46


### Creating BERT based vector embeddings

In [8]:
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
sem_embs = model.encode(chunked_final, convert_to_tensor = True)

### Pipeline to return indeces of top-k chunks that match with the query

In [85]:
def bert_query_pipeline(query, top_k = 5):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs)[0]
    top_indices = np.argsort(cosine_scores.cpu().numpy())[::-1][:top_k]

    return top_indices

In [86]:
# get all chunks from chunks retrived from both implementations
query = 'What are some general safety rules when using machine equipment?'
bert_top_10_idx = bert_query_pipeline(query)
final_idx = list(set(list(bert_top_10_idx)))

top_context_chunks = [chunked_final[idx] for idx in final_idx] 

# get top embeddings directly from precomputed tensor
sem_embs_final = torch.stack([sem_embs[idx] for idx in final_idx])


### To get the final set of chunks

In [87]:
def bert_final_idx(query):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs_final)[0]
    indices = np.argsort(cosine_scores.cpu().numpy())[::-1]

    return indices

In [88]:
best = bert_final_idx(query)

### Get final context list

In [92]:
context_string = ""
for idx in best:
    context_string += f"{chunked_final[idx].strip()} (idx = {idx})\n\n"

In [100]:
final_context_string = re.sub(r'\.{2,}', '.', context_string)
final_context_string



### LLM setup part 1

In [None]:
def prompt_rank_contexts(query, final_context):
    context_string = ""
    for i, ctx in enumerate(final_context):
        context_string += f"Context {i+1}:\n{ctx.strip()}\n\n"

    return f"""
        You are a highly skilled AI assistant that ranks technical contexts from a machinery operations and maintenance manual.

        Given a user question and several candidate context excerpts from the manual, rank the top 5 most relevant ones for answering the question. Relevance means how well the context can be used to answer the question **accurately and directly**.

        Each context ends with a tag in the format **(idx = N)**. Use this identifier to reference the context when deciding relevance.

        User Question:
        {query}

        Candidate Contexts:
        {context_string}

        Return only the `idx` values of the top 5 most relevant contexts, in descending order of relevance (most relevant first). Format your response like this:
        22, 8, 15, 4, 31
        """

def llama_context_ranker(query, final_context):
    prompt = prompt_rank_contexts(query, final_context)

    try:
        response = ollama.generate(
            model='mistral',  # or 'mistral:instruct' if that's what you're using
            prompt=prompt,
            stream=False
        )
        output = response['response'].strip()

        # Parse the returned string for integer idx values
        ranked_indices = [int(idx.strip()) for idx in output.split(",") if idx.strip().isdigit()]
        return ranked_indices[:5]

    except Exception as e:
        print("Error using Ollama (Python client):", str(e))
        return []

In [110]:
optimal_idx = llama_context_ranker(query, final_context_string)

In [111]:
optimal_idx

[5, 27, 25, 20, 24]

In [112]:
optimal_context = ''
for idx in optimal_idx:
    optimal_context += chunked_final[idx]

### LLM setup part 2

In [113]:
def llama_b(prompt):
    client = Groq(
        api_key = os.getenv("GROQ_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        model = "llama-3.3-70b-versatile",
        # model = "llama3-70b-8192",
        # model = "mistral-saba-24b",
        messages = [
            {
                "role": "system",
                "content": "You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 0.5,
        max_tokens = 5640,
        top_p = 1,
        stream = True,
    )

    for chunk in chat_completion:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end='', flush=True)  # print to console without newline, flush immediately
            time.sleep(0.01)  # optional tiny delay for typewriter effect
    

def prompt(query, context):
    return f"""
        You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery.

        Given the user question and the relevant extracted context from the manual:

        - Provide a clear, precise, and factual answer to the question.
        - Base your response strictly on the provided context; do not guess beyond it.
        - If the context does not contain enough information, indicate that the answer is not available in the manual or that the context is not sufficient.
        - Keep the answer professional, concise, and focused on practical instructions.

        User Question:
        {query}

        Context from Manual:
        {context}
        """

### Inference

In [114]:
prompt = prompt(query, optimal_context)
llama_b(prompt)

Based on the provided context from the manual, here are some general safety rules when using machine equipment:

1. **Check the presence and integrity of protective devices and the proper functioning of safety devices before starting operation**. If any defect is detected, do not use the machine.
2. **Do not modify or remove guards, safety devices, labels, and information plates on the machine**.
3. **Ensure the machine is correctly attached to prevent unwanted movement or loss of stability** before using it.
4. **Wear appropriate personal protective equipment (PPE)** such as gloves, goggles, overalls or aprons, and safety shoes.
5. **Check that the machine is stopped before starting work in the vicinity of the spindle**.
6. **Do not extend the continued use of the machine for more than 10 minutes to avoid overheating the machine and the equipment**.
7. **Ensure the working environment is sufficiently well lit** (at least 200 lux) to ensure maximum operational safety.
8. **Operate the 