## **Hybrid Architecture: Fusion Retrieval + Reranking based RAG** 

_**What are your expectations for the Hybrid architecture with the provided manual?**_<br>
The obvious answer would be that it will be better than both the approaches. Hybrid models in general tend to perform better than their individual constituent models. But this requires testing.

**_How do you plan to test and compare these techniques?_**<br><br>
<img src="./hybrid_workflow.png" alt="Flowchart" width="1000" /><br><br>
The addition here as compared to the Fusion Retrieval architecture is the addition of an LLM based reranking approach right before asking an LLM to generate the response. In this case, there will be 3 sets of chunks throughout the workflow. First will be the set that is returned from the initial BERT embeddings and BM25 representations. This will be denoted as top-K. The next set of chunks will be based on the fusion scores from the semantic based retried chunks and keyword based retrieved chunks. This will be denoted as top-L. And finally, this set of L chunks will be passed into the first LLM for reranking. The chunks returned here will be the top-M chunks which will be the final set of context for our second and final LLM for query answering.<br>
1. The document data is extracted, specifically text and tabular data. 
2. Next, these data are stored in a way where the sequence is maintained, that way there will be more context for a certain text that may have a table before or after it. 
3. This set is chunked and converted to BERT based vector embeddings and also to BM25 based representations.
4. Now the query is also converted to BERT based embeddings and BM25 based representations and the top-K chunks are retrieved for both. A union is taken between the retrieved chunks to avoid overlaps.
5. Next, this final set of chunks are again made into 2 copies, one of BERT based embeddings and the other of BM25 based inverted indexes. The query is passed into both structures and the scores are retrieved for each chunk. This gives us a set of chunk IDs, BERT scores, BM25 scores and the final fusion score. 
6. Based on this fusion score, the top-L chunks are retrieved (L < K). 
7. These top-L chunks are passed into the first LLM for reranking, which returns the top-M chunks based on its knowledge and understanding (M < L).
8. This final set of top-M chunks are passed into the second LLM, along with the query for the final response generation.

**It must be noted that K > L > M. In this implementation, they are set as 30, 12 and 8 respectively.**

_**Comparison Strategy**_<br>
There are possibly two main ways in which we can compare this approach with the Fusion Retrieval and Reranking approach. One is by assessing the top-M retrieved chunks from the Hybrid with the top-L retrieved chunks from both the other approaches. The other is obviously by assessing the final response from the LLM for each approach. 

_**Note**: Considering images is important in order to create a robust RAG system. Due to technical/financial constraints, images are omitted for this implementation. However, in the absence of such constraints, what I would have done is have the LLM read the image and prompt it to generate a description. This description will be added into the resulting array while also maintaining the sequence. **One obvious question in that case will be whether or not the LLM knows about the content in the image, provided that it is very domain specific and unfamiliar to the LLM**. One way I thought of on mitigating this issue is by providing some set of surrounding context of the image to the LLM along with the image itself for it to draw better conclusions. These contexts can be the nearest 2 or 3 elements (text, table or another image) surrounding the image in hand. Let this value be J. So, if J is 3, we feed 3 elements before the image and 3 elements after the image as context for the LLM to generate proper a description of the image in hand. This might not be the most efficient solution, but there can be scenarios where this will work._

---

#### Import libraries

In [6]:
import fitz # for text extraction
import camelot # for table extraction
from sentence_transformers import SentenceTransformer, util # for semantic vector embedding creation 
from rank_bm25 import BM25Okapi # for bm25 implementation
import spacy # for stop word removal
import re
from pathlib import Path
import numpy as np
from groq import Groq
import os
import time
import ollama
import json

#### 1. Function to extract texts & tables from PDF
The goal is to preserve the sequence, that way there will be more context for a certain text that may have a table before or after it.

In [2]:
def extract_text_and_tables(pdf_path):

    pdf_file = Path(pdf_path)
    if not pdf_file.is_file() or pdf_file.suffix.lower() != ".pdf":
        raise FileNotFoundError("Provided file path is not a valid PDF.")

    doc = fitz.open(str(pdf_file))
    result = []

    # text extraction
    for page_num, page in enumerate(doc, start = 1):
        page_blocks = []

        blocks = page.get_text("dict")["blocks"]
        for block in blocks:
            if block["type"] == 0: # type 0 is text
                text_content = " ".join(
                    span["text"] for line in block["lines"] for span in line["spans"]
                ).strip()
                if text_content:
                    y = block["bbox"][1]
                    page_blocks.append({
                        "type": "TEXT DATA",
                        "y": y,
                        "content": text_content
                    })

        # table extraction
        try:
            tables = camelot.read_pdf(str(pdf_file), pages = str(page_num), flavor = 'lattice') # lattice flavor to extract tables
        except Exception as e:
            print(f"Failed to read tables on page {page_num}: {e}")
            tables = []

        for table in tables:
            table_data = table.data
            bbox = table._bbox
            y = float(bbox[1])
            page_blocks.append({
                "type": "TABLE DATA",
                "y": y,
                "content": table_data
            })

        page_blocks.sort(key = lambda b: b["y"]) # sort contents on current page
        result.extend(page_blocks) # append content to result list

    return result

In [3]:
# extract texts and tables from the maual
pre_result = extract_text_and_tables("manual.pdf")
pre_result[100:105] # few elements from the extracted data

[{'type': 'TEXT DATA',
  'y': 145.75482177734375,
  'content': 'Follow the instructions contained herein, in addition to the general precautions to be observed while working. Even if the operator is already familiar with the use of manually operated lathes, it is necessary to: In particular:'},
 {'type': 'TEXT DATA', 'y': 173.48190307617188, 'content': 'fervi.com'},
 {'type': 'TEXT DATA',
  'y': 188.8348388671875,
  'content': '\uf0b7 Acquire full knowledge of the machine. For safe operation, this manual must be read carefully in order to acquire the necessary knowledge of the machine and to understand: operation, safety devices and all necessary precautions. \uf0b7 Wear appropriate clothing for the job. The operator must wear appropriate clothing to prevent accidents. \uf0b7 Maintain the machine with care.'},
 {'type': 'TEXT DATA',
  'y': 312.05987548828125,
  'content': 'Risks associated with using the machine'},
 {'type': 'TEXT DATA',
  'y': 342.43487548828125,
  'content': 'The mac

In [7]:
# save manual as json
with open("manual.json", "w") as file:
    json.dump(pre_result, file)

In [8]:
# load manual
with open("manual.json", "r") as file:
    pre_result = json.load(file)

In [11]:
# removing 'fervi.com' background text
result = []
for res in pre_result:
    if res['content'] != 'fervi.com':
        result.append(res)

In [12]:
# sample table data
result[1173]

{'type': 'TABLE DATA',
 'y': 54.94955827871188,
 'content': [['Part No.', 'Description', 'i', 'Description'],
  ['T999/F001', 'Body\nv', '', 'Micrometer'],
  ['T999/F002', 'Flange', '', 'Lock'],
  ['T999/F003', '', 'T999/F026', 'Switch'],
  ['T999/F004', 'r', 'T999/F029', 'Knob'],
  ['T999/F005', '', 'T999/F030', 'Knob'],
  ['T999/F007', 'e', 'T999/F031', 'Allen key'],
  ['T999/F008', '', 'T999/F032', 'Allen key'],
  ['T999/F009\nf', '', 'T999/F033', 'Screw'],
  ['T999/F010', '', 'T999/F034', 'Screw'],
  ['T999/F011', '', 'T999/F035', 'Screw'],
  ['T999/F012', '', 'T999/F036', 'Screw'],
  ['T999/F013', 'Pin', 'T999/F037', 'Nut'],
  ['T999/F014', 'Screw', 'T999/F038', 'Nut'],
  ['T999/F015', 'Sleeve coupling', 'T999/F039', 'Key'],
  ['T999/F016', 'Tie rod', 'T999/F040', 'Washer'],
  ['T999/F019', 'Pin', 'T999/F041', 'Plug'],
  ['T999/F020', 'Lever', 'T999/F041', 'Bearing'],
  ['T999/F021', 'Nut', 'T999/F042', 'Oiler']]}

In [13]:
# list formatting by adding labels for texts and tables
final = []
for r in result:
    s = f"{r['type']}: {r['content']}"
    final.append(s)

In [14]:
# table data sample after flattening
final[1173]

"TABLE DATA: [['Part No.', 'Description', 'i', 'Description'], ['T999/F001', 'Body\\nv', '', 'Micrometer'], ['T999/F002', 'Flange', '', 'Lock'], ['T999/F003', '', 'T999/F026', 'Switch'], ['T999/F004', 'r', 'T999/F029', 'Knob'], ['T999/F005', '', 'T999/F030', 'Knob'], ['T999/F007', 'e', 'T999/F031', 'Allen key'], ['T999/F008', '', 'T999/F032', 'Allen key'], ['T999/F009\\nf', '', 'T999/F033', 'Screw'], ['T999/F010', '', 'T999/F034', 'Screw'], ['T999/F011', '', 'T999/F035', 'Screw'], ['T999/F012', '', 'T999/F036', 'Screw'], ['T999/F013', 'Pin', 'T999/F037', 'Nut'], ['T999/F014', 'Screw', 'T999/F038', 'Nut'], ['T999/F015', 'Sleeve coupling', 'T999/F039', 'Key'], ['T999/F016', 'Tie rod', 'T999/F040', 'Washer'], ['T999/F019', 'Pin', 'T999/F041', 'Plug'], ['T999/F020', 'Lever', 'T999/F041', 'Bearing'], ['T999/F021', 'Nut', 'T999/F042', 'Oiler']]"

It can be seen that the flattened version somewhat preserves the structure of the actual table by keeping each row inside a list. The LLM can hopefully understand this due to the presence of the label 'TABLE DATA' at the start.

### 2. Chunking

In [18]:
chunks = [" ".join(final[i:i + 10]) for i in range(0, len(final), 10)] # be careful here
print(f"Number of chunks: {len(chunks)}\n")

chunks[15] # sample

Number of chunks: 128



'TEXT DATA: Spindle hole diameter (mm) 38 TEXT DATA: Maximum swing over the bed (mm) 320 TEXT DATA: Maximum swing over the cross slide (mm) 198 TEXT DATA: Turning diameter over cavity (mm) 470 TEXT DATA: Spindle diameter (3 + 3 self centring) (mm) 160 TEXT DATA: Spindle connector Camlock D1-4 TEXT DATA: No. of spindle speeds 8 TEXT DATA: Spindle speed (r/min) 70 - 2000 RPM TEXT DATA: No. of metric threads 32 TEXT DATA: Range of metric threads (mm) 0.44- 10'

### 3. Data cleaning for BM25

In [19]:
nlp = spacy.load("en_core_web_sm")

chunks_4_bm25 = []
for chunk in chunks:
    doc = nlp(chunk)
    filtered = [token.text for token in doc if not token.is_stop]
    chunks_4_bm25.append(" ".join(filtered))

chunks_4_bm25[15] # sample

'TEXT DATA : Spindle hole diameter ( mm ) 38 TEXT DATA : Maximum swing bed ( mm ) 320 TEXT DATA : Maximum swing cross slide ( mm ) 198 TEXT DATA : Turning diameter cavity ( mm ) 470 TEXT DATA : Spindle diameter ( 3 + 3 self centring ) ( mm ) 160 TEXT DATA : Spindle connector Camlock D1 - 4 TEXT DATA : . spindle speeds 8 TEXT DATA : Spindle speed ( r / min ) 70 - 2000 RPM TEXT DATA : . metric threads 32 TEXT DATA : Range metric threads ( mm ) 0.44- 10'

### 4. Creating semantic vector embeddings and BM25 inverted index

In [20]:
# for bert
model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')
sem_embs = model.encode(chunks, convert_to_tensor = True)

In [21]:
# for bm25
tokenized_corpus = [doc.split() for doc in chunks_4_bm25]
bm25 = BM25Okapi(tokenized_corpus)

### 5. Pipeline to return indices of top-K chunks that match with the query

In [77]:
def bert_query_pipeline(query, top_k = 30):
     
    device = sem_embs.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs)[0] # cosine similarity
    top_indices = np.argsort(cosine_scores.cpu().numpy())[::-1][:top_k]

    return top_indices

In [78]:
def bm25_query_pipeline(query, top_k = 30):

    tokenized_query = query.split()
    bm25_scores = bm25.get_scores(tokenized_query) # tf-idf like scoring
    top_indices = np.argsort(bm25_scores)[::-1][:top_k]

    return top_indices

In [79]:
# query = "Summarize the manual." # 1
# query = "What are some general safety rules when using machine equipment?" # 2
# query = "What does the manual say about unplugging the power cord of the machine from the power outlet?" # 3
# query = "What are the several manual controls on the tool holder carriage?" # 4
# query = "Tell me about the lever for selection of longitudinal feeds." # 5
# query = "What does the document talk about regarding digital displays?" # 6
# query = "What controls does the electric panel have?" # 7
# query = "How to achieve balance when lifting the Lathe?" # 8
query = "Tell me about using the machine for turning non-ferrous materials." # 9
# query = "What should a grounding conductor be used for?" # 10

In [80]:
# get common chunks from chunks retrived from both implementations

bert_top_k_idx = bert_query_pipeline(query)
bm25_top_k_idx = bm25_query_pipeline(query)
final_idx = list(set(list(bert_top_k_idx) + list(bm25_top_k_idx))) # union operation

staged_context = [chunks[idx] for idx in final_idx]
staged_context_4_bm25 = [chunks_4_bm25[idx] for idx in final_idx]
print(f"Number of staged chunks for context: {len(staged_context)}\n")

Number of staged chunks for context: 46



### 6. Embed staged context using BERT & get inverted indices of staged context using BM25

In [81]:
# for bert
sem_embs_final = model.encode(staged_context, convert_to_tensor = True)

# for bm25
tokenized_corpus_final = [doc.split() for doc in staged_context_4_bm25]
bm25_final = BM25Okapi(tokenized_corpus_final)

### 7. Function to get the final set of scores for the staged context chunks for both BERT & BM25

In [82]:
def bert_final_scores(query):
     
    device = sem_embs_final.device
    query_embedding = model.encode(query, convert_to_tensor = True)
    cosine_scores = util.cos_sim(query_embedding, sem_embs_final)[0]
    indices = np.argsort(cosine_scores.cpu().numpy())[::-1]

    return cosine_scores.cpu().numpy(), indices

In [83]:
bert_final_scores(query)

(array([0.39116088, 0.38352403, 0.29570445, 0.3804564 , 0.44452164,
        0.39220077, 0.39644188, 0.35995543, 0.4849117 , 0.48331106,
        0.4532263 , 0.32886553, 0.3223002 , 0.39964926, 0.5827457 ,
        0.4720347 , 0.4069566 , 0.40395635, 0.40832442, 0.35016885,
        0.45562816, 0.3177841 , 0.30462807, 0.4033295 , 0.34265903,
        0.41078186, 0.38350898, 0.3603382 , 0.40216637, 0.28225404,
        0.342164  , 0.35635427, 0.2311632 , 0.31217343, 0.18172249,
        0.42552388, 0.398474  , 0.44532353, 0.3845288 , 0.41005343,
        0.38993296, 0.43968284, 0.42096245, 0.38777182, 0.3788305 ,
        0.43159762], dtype=float32),
 array([14,  8,  9, 15, 20, 10, 37,  4, 41, 45, 35, 42, 25, 39, 18, 16, 17,
        23, 28, 13, 36,  6,  5,  0, 40, 43, 38,  1, 26,  3, 44, 27,  7, 31,
        19, 24, 30, 11, 12, 21, 33, 22,  2, 29, 32, 34], dtype=int64))

In [84]:
def bm25_query_pipeline(query):

    tokenized_query = query.split()
    bm25_scores = bm25_final.get_scores(tokenized_query)
    indices = np.argsort(bm25_scores)[::-1]

    return bm25_scores, indices

In [85]:
bm25_query_pipeline(query)

(array([1.22307922, 1.24360123, 1.3037595 , 0.        , 0.77988806,
        0.        , 1.30734236, 1.33057134, 1.59755848, 1.49572833,
        1.36654682, 1.48403631, 1.34745026, 0.63633981, 3.83236931,
        1.17929248, 0.79028268, 0.42283521, 1.16768041, 1.18993682,
        2.87768654, 3.39132246, 1.61653746, 1.55262162, 1.41615733,
        1.39500273, 1.17229768, 1.39336364, 3.07085375, 2.65813466,
        1.43206435, 1.33057134, 2.43938289, 1.46660481, 1.98953429,
        1.18676126, 1.3325649 , 0.90553576, 0.67204665, 1.10049604,
        1.22307922, 1.27341906, 0.6203612 , 0.9007311 , 2.36577387,
        0.76578666]),
 array([14, 21, 28, 20, 29, 32, 44, 34, 22,  8, 23,  9, 11, 33, 30, 24, 25,
        27, 10, 12, 36,  7, 31,  6,  2, 41,  1,  0, 40, 19, 35, 15, 26, 18,
        39, 37, 43, 16,  4, 45, 38, 13, 42, 17,  5,  3], dtype=int64))

### 8. Applying fusion scoring
**`α * xi + (1 - α) * yi`**<br><br>
...where `xi` is score of the ith chunk from the bert model and `yi` is score of the ith chunk from bm25.

In [86]:
# function to normalize the scores
def normalize_scores(scores):
    min_s = np.min(scores)
    max_s = np.max(scores)
    return (scores - min_s) / (max_s - min_s) if max_s > min_s else scores

# function to fuse the scores
def fused_scores(query, alpha = 0.2, top_l = 12):
    bm25_scores, bm25_indices = bm25_query_pipeline(query)
    bert_scores, bert_indices = bert_final_scores(query)
    
    # create arrays to hold scores aligned by document index
    num_docs = len(bm25_scores)  # should be same as bert_scores length
    bm25_aligned = np.zeros(num_docs)
    bert_aligned = np.zeros(num_docs)
    
    # align bm25 scores (indices are original document indices)
    for idx, score in zip(bm25_indices, bm25_scores):
        bm25_aligned[idx] = score

    # align BERT scores
    for idx, score in zip(bert_indices, bert_scores):
        bert_aligned[idx] = score

    # normalize
    bm25_norm = normalize_scores(bm25_aligned)
    bert_norm = normalize_scores(bert_aligned)

    # fuse
    fused = alpha * bm25_norm + (1 - alpha) * bert_norm

    # top-L indices by fused score
    top_indices = np.argsort(fused)[::-1][:top_l]

    return top_indices

best = fused_scores(query)
max_idx = max(best) # used for LLM later
len(best)

12

### 9. Get final context

In [87]:
final_context = ''
for idx in best:
    # remove unnecessary dots
    final_string = re.sub(r'\.{2,}', '.', chunks[idx])
    final_context += f"{final_string.strip()} (idx = {idx})\n\n"

final_context



### 10. Setting up LLM A that returns indexes of top-M chunks based on the query

In [88]:
def prompt_rank_contexts(query, final_context, max_idx, top_m = 8):
    # context_string = ""
    # max_idx = len(final_context)
    # for i, ctx in enumerate(final_context):
    #     context_string += f"Context {i+1}:\n{ctx.strip()}\n\n"

    return f"""
        You are a highly skilled AI assistant that ranks technical contexts from a machinery operations and maintenance manual.

        Your task is to rank the top-{top_m} most relevant contexts for answering the user’s question, based solely on the provided content.

        Each context ends with a tag in the format: **(idx = N)**, where N is an integer between 0 and {max_idx}. Do not guess or invent index values.

        **Instructions**:
        - Carefully read all the context snippets.
        - Identify the most relevant contexts that directly support answering the question.
        - Return exactly {top_m} `idx` values, in descending order of relevance (most relevant first).
        - Only include integers in the range 0 to {max_idx}.
        - Format your answer as a comma-separated list of integers with no extra text.

        **Example**: 12, 4, 7, 1, 0, 9, 2, 22

        User Question:
        {query}

        Candidate Contexts:
        {final_context}
    """

def llm_a(query, final_context): # mistral might not be the best but no option
    prompt = prompt_rank_contexts(query, final_context, max_idx)

    try:
        response = ollama.generate(
            model = 'mistral', 
            prompt = prompt,
            stream = False
        )
        output = response['response'].strip()

        # parse the returned string for integer idx values
        ranked_indexes = [int(idx.strip()) for idx in output.split(",") if idx.strip().isdigit()]
        return ranked_indexes[:8]

    except Exception as e:
        print("Error using Ollama (Python client):", str(e))
        return []

In [89]:
# get ids of top-M chunks
top_ids = llm_a(query, final_context)
print(f"IDs of top-M chunks are: {top_ids}")

IDs of top-M chunks are: [18, 35, 16, 4, 16, 26, 19, 11]


### 11. Get final context

In [90]:
final_context = ''
for idx in top_ids:
    final_context += chunks[idx]

In [91]:
# remove unnecessary dots
final_context_string = re.sub(r'\.{2,}', '.', final_context)
final_context_string

'TEXT DATA: MACHINES AND TEXT DATA: ACCESSORIES TEXT DATA: 4 DESCRIPTION OF THE MACHINE TEXT DATA: The  Gear head lathe (Art. T999/230V e T999/400V)  is a machine tool, with a horizontal axis, for the machining of metallic materials by means of cold chip removal. The cutting motion is given by the motion of the workpiece, rotating on its own axis, and the feed motion of the tool. The machine is completely  manually operated , as it can only execute movements under the direct control of the operator. TEXT DATA: 4.1 Intended use and field of application TEXT DATA: The machine is designed and built to perform the following operations on all types of ferrous metals: TEXT DATA: \uf0b7 Cylindrical turning; TEXT DATA: \uf0b7 Taper turning; TEXT DATA: \uf0b7 Facing; TEXT DATA: \uf0b7 ProfilingTABLE DATA: [[\'THE FOLLOWING IS STRICTLY PROHIBITED!\\nm\'], [\'Supplying the machine with voltage from the mains that is different from that shown on\\n\\uf0b7\\nthe identification plate (230V, 50 Hz).\

### 12. Setting up LLM B for response generation

In [92]:
def llm_b(prompt):
    client = Groq(
        api_key = os.getenv("GROQ_API_KEY"),
    )

    chat_completion = client.chat.completions.create(
        model = "llama-3.3-70b-versatile",
        # model = "llama3-70b-8192",
        # model = "mistral-saba-24b",
        messages = [
            {
                "role": "system",
                "content": "You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 0.5,
        max_tokens = 5640,
        top_p = 1,
        stream = True,
    )

    for chunk in chat_completion:
        content = chunk.choices[0].delta.content
        if content:
            print(content, end = '', flush = True)  # print to console without newline, flush immediately
            time.sleep(0.01)  # optional tiny delay for typewriter effect
    

def prompt(query, context):
    return f"""
        You are an expert technical assistant specialized in interpreting operations and maintenance manuals for machinery.

        Given the user question and the relevant extracted context from the manual:

        - Provide a clear, precise, and factual answer to the question.
        - Base your response strictly on the provided context; do not guess beyond it.
        - If the context does not contain enough information, indicate that the answer is not available in the manual or that the context is not sufficient.
        - Keep the answer professional, concise, and focused on practical instructions.
        - Each section of the context begins with a tag: either 'TEXT DATA' or 'TABLE DATA'.
        - 'TEXT DATA' represents plain, unstructured text. 'TABLE DATA' represents information extracted from a table and flattened into a list format.
        - The 'TABLE DATA' is structured as a list of rows, where each row is a list containing the column values in order. The format is as follows: [[column 1 value, column 2 value, ...], [column 1 value, column 2 value, ...], ...]
        - If available, provide references for the information.
        
        User Question:
        {query}

        Context from Manual:
        {context}
    """

### 13. Inference

In [93]:
prompt = prompt(query, final_context_string) # go to section number 5 to change query
print(f"QUERY: {query}\n")
print('RESPONSE:')
llm_b(prompt)

QUERY: Tell me about using the machine for turning non-ferrous materials.

RESPONSE:
The machine is designed and built to perform operations on all types of ferrous metals, including cylindrical turning, taper turning, facing, and profiling (TEXT DATA: 4.1 Intended use and field of application). 

Using the machine for turning non-ferrous materials is strictly prohibited, as it can result in serious danger to the safety of the staff and affect the functionality and intrinsic safety of the machine itself (TEXT DATA: 6 FORBIDDEN USES AND HAZARDS, TEXT DATA: Using the machine for turning non-ferrous materials, for unauthorised manoeuvres, its misuse and lack of maintenance can result in serious danger to the safety of the staff, especially the operator, as well as affecting the functionality and the intrinsic safety of the machine itself.). 

Therefore, it is not recommended to use the machine for turning non-ferrous materials. If you need to perform such operations, consider using a diff

### 12. Assessing the final context

In [None]:
print(final_context_string)

---

## **Overall Conclusion** 

**Fusion Retrieval-based RAG** seems to perform better compared to the other two, specifically the one where more wight is given to the keyword matching aspect. I assume this is because the queries require usage of certain domain specific keywords which when used means keyword matching works best. Moreover, the reason the semantic model was not able to perform better, contrary to my initial assumption, could be due to the model behind it (BERT) being trained over general data.<br><br>
**Reranking-based RAG** seems to perform worse but I highly assume this is due to 2 main reasons:
1. The mistral model that was used is a very weak model (8B) and it may not effectively be able to rank the context chunks based on the query. The reason local mistral 7B was used was due technical/financial reasons.
2. The incapability of LLMs in general to handle large context: When passing in K number of chunks into an LLM, along with the query and asking it to rank them is too much since all that context will take up a lot of space, and this space matters for the LLM. Even though some strong LLMs may handle more context, it is not efficient still. So, the only solution is to reduce the K chunks but then we lose a lot of contexts that may have been needed for getting the answer to the query.
3. Based on both of these points, the solution can be to use a state-of-the-art LLM that can handle a good size of context.<br><br>

**Hybrid RAG** performed worse compared to both. Even though a higher weight was given to the keyword scoring aspect of fusion retrieval, it did not perform well. I highly assume this is due to the LLM context issue.<br>

_Possible solutions:_
1. Incorporate images
2. **Query expansion**
3. Use stronger LLM
4. Fine tune LLM on domain specific data<br><br>

## _What did I learn?_

Mainly, learning about the variations of RAG, while also implementing them was something new to me. Now based on the task, I learned the following:
1.	Keyword Matching can be more important that semantics in domain-specific tasks.
2.	Semantic embedding models can struggle with domain specific documents.
3.	Model strength matters when employing LLM based reranking due to large context size.
4.	Hybrid model may not always be best (however, this requires testing by adopting the improvements).

