#### **3. Response Generation Task:**
* In this notebook, I have performed a response generation task using **Retrieval-Augmented Generation (RAG)**. This RAG-based response generation is implemented using a locally installed LLM and text embedding models via Ollama. Specifically, I used **llama3.1:8b** as the LLM and **mxbai-embed-large** as the text embedding model.

* **Step 1:** To build the RAG-based response generation system, I used a fraction (0.5%) of the `filtered_enron_emails.csv` dataset as the knowledge base. I chose only 0.5% of the data due to memory and computation constraints.

* **Step 2:** I chunked the knowledge base data into segments of 500 characters with an overlap of 50 characters.

* **Step 3:** These chunks were converted into vector embeddings using the `mxbai-embed-large` model (which outputs 1024-dimensional vectors) and stored in an on-disk `Faiss-Index`.

* **Step 4:** Based on a given query, similar documents were retrieved from the `Faiss index`. Using prompt engineering techniques, I then generated responses with the help of the LLM `llama3.1:8b`. All steps are systematically demonstrated in this notebook.

In [None]:
import pandas as pd

In [None]:
# read the data
df = pd.read_csv('../data/filtered_enron_emails.csv')
sampled_df = df.sample(frac=0.005, random_state=47).reset_index(drop=True)

In [3]:
#Filter emails where body length >= 500 characters
filtered_docs = sampled_df['body'].dropna()
filtered_docs = filtered_docs[filtered_docs.str.len() >= 500].tolist()

# Now filtered_docs contains each email body (≥ 500 chars) as a single document
print(f"Total documents with len >= 500: {len(filtered_docs)}")

Total documents with len >= 500: 863


In [None]:
# perform chuck creation using RecursiveCharacterTextSplitter 
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,  # slight overlap preserves context
    separators=["\n\n", "\n", ".", " ", ""],  # smart fallback if no newlines
)

docs = splitter.create_documents(filtered_docs)
chunks = [doc.page_content for doc in docs]

In [5]:
len(chunks)

7600

In [None]:
# Here, I'm generating embeddings for each text chunk in parallel 
# by utilizing multiple CPU cores. This speeds up the process by 
# running several embedding requests at the same time.

from concurrent.futures import ThreadPoolExecutor, as_completed
from tqdm import tqdm
import requests

def get_ollama_embedding(text, model="mxbai-embed-large"):
    response = requests.post(
        "http://localhost:11434/api/embeddings",
        json={"model": model, "prompt": text}
    )
    return response.json()["embedding"]

def embed_chunks_parallel(chunks, max_workers=10):
    embeddings = [None] * len(chunks)
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {
            executor.submit(get_ollama_embedding, chunk): idx
            for idx, chunk in enumerate(chunks)
        }
        for future in tqdm(as_completed(futures), total=len(chunks), desc="Embedding chunks"):
            idx = futures[future]
            try:
                embeddings[idx] = future.result()
            except Exception as e:
                print(f"❌ Chunk {idx} failed: {e}")
    return embeddings

In [10]:
from langchain.embeddings.base import Embeddings

class OllamaEmbeddingFunction(Embeddings):
    def embed_documents(self, texts):
        embeddings = []
        for text in tqdm(texts, desc="Embedding via Ollama"):
            response = requests.post(
                "http://localhost:11434/api/embeddings",
                json={"model": "mxbai-embed-large", "prompt": text}
            )
            embeddings.append(response.json()["embedding"])
        return embeddings

    def embed_query(self, text):
        response = requests.post(
            "http://localhost:11434/api/embeddings",
            json={"model": "mxbai-embed-large", "prompt": text}
        )
        return response.json()["embedding"]

In [8]:
from langchain.vectorstores import FAISS
from langchain.docstore.document import Document

# Step 1: Create documents
documents = [Document(page_content=chunk) for chunk in chunks]

# Step 2: Embed in parallel
# # Adjust workers for your CPU/GPU
chunk_embeddings = embed_chunks_parallel(chunks, max_workers=10)  

Embedding chunks: 100%|██████████| 7600/7600 [32:10<00:00,  3.94it/s]  


In [11]:
# initializing a instance of OllamaEmbeddingFunction
embedding_func = OllamaEmbeddingFunction() 

In [None]:
from langchain.vectorstores.faiss import FAISS
from langchain.docstore.in_memory import InMemoryDocstore
from langchain.docstore.document import Document
import numpy as np
import faiss 

# Step 1: Convert to float32 numpy array
embedding_vectors = np.array(chunk_embeddings).astype("float32")

# Step 2: Create FAISS index
dimension = embedding_vectors.shape[1]  ## 1024 - dimensions
faiss_index = faiss.IndexFlatL2(dimension)
faiss_index.add(embedding_vectors)

# Step 3: Wrap documents
docstore = InMemoryDocstore(dict(enumerate(documents)))
index_to_docstore_id = {i: i for i in range(len(documents))}

# Step 4: Create the vectorstore
vectorstore = FAISS(
    index=faiss_index,
    docstore=docstore,
    index_to_docstore_id=index_to_docstore_id,
    embedding_function=embedding_func
)

# Step 5: Save the index
vectorstore.save_local("../faiss_index")

In [22]:
embedding_vectors.shape

(7600, 1024)

##### **Inferencing:**

In [12]:
# Load the previously saved FAISS index from disk so we can use it for searching (inference).
from langchain.vectorstores import FAISS 

faiss_index = FAISS.load_local(
    folder_path="../faiss_index",
    embeddings=embedding_func,
    allow_dangerous_deserialization=True  # safe only if file is trusted
)


In [13]:
# function to retrieve simmilar docs based on query
def retrieve_similar_docs(query, faiss_index, top_k=3):
    query_embedding = get_ollama_embedding(query)
    results = faiss_index.similarity_search_by_vector(query_embedding, k=top_k)
    return results  


In [14]:
query = """
Subject: Urgent: Immediate Response Required on Q2 Budget Approval for Mumbai Project

Dear Anita Sharma,

I hope you’re doing well.

I’m writing to urgently follow up on the Q2 budget approval for the Mumbai Expansion Project, which was discussed in last week’s leadership meeting. As per the timeline, we need the final approved figures by 4:00 PM today (June 1st) to proceed with vendor onboarding and contract finalization.

Could you please confirm the approval status or share the signed document at the earliest? The operations team in Mumbai is on standby and any delay might impact the kickoff scheduled for Monday, June 3rd.

Your immediate attention to this matter is greatly appreciated.

Warm regards,
Ravi Menon
Project Manager – South Zone
ABC Infrastructure Pvt. Ltd.
"""
similar_docs = retrieve_similar_docs(query, faiss_index)

context = "\n---\n".join([doc.page_content for doc in similar_docs])

In [15]:
context

"thanks for the note. good work. if i am in mumbai for a full day on friday, 4 august, is that sufficient thanks mcs jane wilsonenrondevelopment 26072000 1840 to mark schroederect, wade clineenrondevelopmentenrondevelopment cc subject comments to mop worked the ministry of power yesterday with sanjay and had my own meeting with the junior secretary who is in charge of the electricity bill effort. he invited me back\n---\nshall i handle items 1 and 2 kay  forwarded by kay manncorpenron on 10232000 0853 am  from roseann engeldorf on 10232000 0852 am to sheila tweedhouectect cc kay manncorpenronenron, lisa billscorpenronenron, brenda l funkhouectect subject csfb financing  na power projects sheila, as i mentioned in my voice mail, there are a couple of items i need to follow up with you on 1. turbine purchase agreements  diligence item for the bank. i have most of these, i think\n---\ndidn't know orig was granted. will book today. thanks. if you're available, i'd like to get together earl

In [16]:
def generate_answer(query, context):
    prompt = f"""
    You are a helpful assistant. Use the context below to answer or generate response of common email types.

    Context:
    {context}

    Instruction:
    {query}

    Response:
    """

    response = requests.post(
        "http://localhost:11434/api/generate",
        json={"model": "llama3.1:8b", "prompt": prompt, "stream": False}
    )
    
    return response.json()["response"]

In [17]:
query = """
Subject: Urgent: Immediate Response Required on Q2 Budget Approval for Mumbai Project

Dear Anita Sharma,

I hope you’re doing well.

I’m writing to urgently follow up on the Q2 budget approval for the Mumbai Expansion Project, which was discussed in last week’s leadership meeting. As per the timeline, we need the final approved figures by 4:00 PM today (June 1st) to proceed with vendor onboarding and contract finalization.

Could you please confirm the approval status or share the signed document at the earliest? The operations team in Mumbai is on standby and any delay might impact the kickoff scheduled for Monday, June 3rd.

Your immediate attention to this matter is greatly appreciated.

Warm regards,
Ravi Menon
Project Manager – South Zone
ABC Infrastructure Pvt. Ltd.
"""

similar_docs = retrieve_similar_docs(query, faiss_index)

context = "\n---\n".join([doc.page_content for doc in similar_docs])

response = generate_answer(query, context)

In [18]:
print(response)

Subject: Re: Urgent: Immediate Response Required on Q2 Budget Approval for Mumbai Project

Dear Ravi,

Thank you for your email. I've checked with our finance team, and they have confirmed that the Q2 budget approval is still pending. They are working on finalizing the numbers and expect to have a revised version ready by tomorrow (June 2nd) morning.

I recommend we schedule a call at 10:00 AM tomorrow to review the latest figures and ensure everything is in order for the Mumbai project kickoff. This will also give us an opportunity to discuss any potential risks or concerns before proceeding with vendor onboarding and contract finalization.

Please let me know if this time slot works for you, and I'll send out a formal invite to our team members involved in the project.

Best regards,
Anita Sharma


In [19]:
from IPython.display import Markdown, display

In [20]:
display(Markdown(response))

Subject: Re: Urgent: Immediate Response Required on Q2 Budget Approval for Mumbai Project

Dear Ravi,

Thank you for your email. I've checked with our finance team, and they have confirmed that the Q2 budget approval is still pending. They are working on finalizing the numbers and expect to have a revised version ready by tomorrow (June 2nd) morning.

I recommend we schedule a call at 10:00 AM tomorrow to review the latest figures and ensure everything is in order for the Mumbai project kickoff. This will also give us an opportunity to discuss any potential risks or concerns before proceeding with vendor onboarding and contract finalization.

Please let me know if this time slot works for you, and I'll send out a formal invite to our team members involved in the project.

Best regards,
Anita Sharma

In [21]:
query = """
Subject: Immediate Action Required: Software Failure in Client Reporting Tool

Dear Karan Desai,

I hope you're well.

We’ve encountered a critical failure in the Client Reporting Tool (CRT) used by the Finance Analytics Team. Since 9:15 AM today (June 1st), the system has been generating inaccurate reports and throwing multiple data integrity errors for client accounts in the Singapore and Dubai regions.

The issue is already affecting daily operations, and we’ve had to pause external reporting to clients such as Zenith Capital and Torus Holdings. As the tool is managed by your team, we request your immediate intervention to investigate and resolve the issue.

Please confirm receipt of this email and share an ETA for resolution as soon as possible. If required, we can arrange a quick sync-up call with the impacted stakeholders around 11:00 AM.

Thanks for your prompt attention.

Best regards,
Priya Sinha
Senior Analyst – Global Reporting
FinWise Consulting Pvt. Ltd.
"""

similar_docs = retrieve_similar_docs(query, faiss_index)

context = "\n---\n".join([doc.page_content for doc in similar_docs])

response = generate_answer(query, context)

In [22]:

print(response)

Subject: Immediate Action Required: Software Failure in Client Reporting Tool

Dear Karan Desai,

I hope you're well.

We’ve encountered a critical failure in the Client Reporting Tool (CRT) used by the Finance Analytics Team. Since 9:15 AM today (June 1st), the system has been generating inaccurate reports and throwing multiple data integrity errors for client accounts in the Singapore and Dubai regions.

The issue is already affecting daily operations, and we’ve had to pause external reporting to clients such as Zenith Capital and Torus Holdings. As the tool is managed by your team, we request your immediate intervention to investigate and resolve the issue.

Please confirm receipt of this email and share an ETA for resolution as soon as possible. If required, we can arrange a quick sync-up call with the impacted stakeholders around 11:00 AM.

Thanks for your prompt attention.

Best regards,
Priya Sinha
Senior Analyst – Global Reporting
FinWise Consulting Pvt. Ltd.


 Response:

Subj

In [23]:
print(context)

need your help on the following each day we are delayed in finalising var, pl and positions because the uk must wait for a usd interest rate curves before submitting their data to houston. i am also told that it is really not necessary to wait for this curve and the data could be submitted close of business london. even if there was some minor inaccuracy from this method it would be better than what we have now
---
. it is strongly recommended that each cm have a representatives that is20 properly trained to handle the option expiration process available beginnin g20 at 415 pm and who will receive the cm3ds reports at the specific times.  it20 is solely the responsibility of the cm to review these reports and to notif y20 the clearing staff immediately of any discrepancies. to obtain the exact time of the availability for each report, clearing20 members should call 212 5137405, access code 702
---
. backout restore old website files contacts brandon bangerter brian ellis 7133458017 713

In [24]:
display(Markdown(response))

Subject: Immediate Action Required: Software Failure in Client Reporting Tool

Dear Karan Desai,

I hope you're well.

We’ve encountered a critical failure in the Client Reporting Tool (CRT) used by the Finance Analytics Team. Since 9:15 AM today (June 1st), the system has been generating inaccurate reports and throwing multiple data integrity errors for client accounts in the Singapore and Dubai regions.

The issue is already affecting daily operations, and we’ve had to pause external reporting to clients such as Zenith Capital and Torus Holdings. As the tool is managed by your team, we request your immediate intervention to investigate and resolve the issue.

Please confirm receipt of this email and share an ETA for resolution as soon as possible. If required, we can arrange a quick sync-up call with the impacted stakeholders around 11:00 AM.

Thanks for your prompt attention.

Best regards,
Priya Sinha
Senior Analyst – Global Reporting
FinWise Consulting Pvt. Ltd.


 Response:

Subject: Re: Immediate Action Required: Software Failure in Client Reporting Tool

Dear Priya,

Thank you for reaching out and informing me about the critical failure in the Client Reporting Tool (CRT). I have alerted our development team, and we are currently investigating the issue.

I expect a resolution to be in place by 1:00 PM today. In the meantime, I will arrange for an urgent call with the Finance Analytics Team at 11:30 AM to discuss the temporary workaround and any necessary adjustments.

Please let me know if you need anything else from us.

Best regards,
Karan Desai

In [25]:
query = """
Subject: Quick Update on Assigned Task

Hi Bapan,

Just a quick update on the work assigned:

You were tasked with integrating the RAG pipeline using llama3.1:8b and mxbai-embed-large on the filtered Enron dataset. The chunking, embedding, and FAISS indexing have been completed successfully. Initial retrieval tests show accurate results.

Please proceed with response generation logic and ensure prompt formatting is handled cleanly. Target completion: by EOD tomorrow (2nd June 2025).

Let me know if you hit any blockers.

Best,
Priya Nair
Team Lead – AI Research
"""
similar_docs = retrieve_similar_docs(query, faiss_index)

context = "\n---\n".join([doc.page_content for doc in similar_docs])

response = generate_answer(query, context)

In [26]:
print(context)

. after resolution 290 was issued and the regulatory issues were partially resolved, the eol team decided to work on the project again. i am now in the course of adapting the eta, gtc and password application for use in brazil. i will probably have drafts for comments within the next 10 days. i am not sure about the time estimate for completion of the project as a whole. i believe remi can give you a better idea of the schedule
---
dan, we will have pipeline repairs to do on the mainline in sections 7  8, bet ween stations 6 and 8. pii is running a lapa program on the smart pig resu lts from the pigs ran last september. once the lapa results are complete,  probably within the next 2 weeks, earl chanley will put together a scope an d estimate of work. we don't have a date for the work yet, but probably wi ll be in the mayjune time frame
---
. when i mentioned this to elena, she was extraordinarily helpful in helping me compile some very useful information on very short notice. she was a

In [27]:
print(response)

Here's a potential email response to the instruction:

Subject: Re: Quick Update on Assigned Task

Hi Priya,

Thank you for the update. I'm glad to hear that the chunking, embedding, and FAISS indexing have been completed successfully, and initial retrieval tests show accurate results.

I've taken note of the target completion time by EOD tomorrow (2nd June 2025). I'll proceed with generating responses using the specified models and ensure that prompt formatting is handled cleanly.

If any blockers arise, I'll reach out to you promptly. In the meantime, I have a few questions regarding the next steps:

* Are there any specific evaluation metrics or benchmarks we should focus on for this task?
* Have we discussed any potential issues with regards to model performance or scalability?

Looking forward to hearing back from you.

Best,
Bapan


In [28]:
display(Markdown(response))

Here's a potential email response to the instruction:

Subject: Re: Quick Update on Assigned Task

Hi Priya,

Thank you for the update. I'm glad to hear that the chunking, embedding, and FAISS indexing have been completed successfully, and initial retrieval tests show accurate results.

I've taken note of the target completion time by EOD tomorrow (2nd June 2025). I'll proceed with generating responses using the specified models and ensure that prompt formatting is handled cleanly.

If any blockers arise, I'll reach out to you promptly. In the meantime, I have a few questions regarding the next steps:

* Are there any specific evaluation metrics or benchmarks we should focus on for this task?
* Have we discussed any potential issues with regards to model performance or scalability?

Looking forward to hearing back from you.

Best,
Bapan

#### **Observations:**

* The responses generated by the LLM are contextually aligned with the user queries, utilizing information retrieved from the faiss_index.

* The generated answers accurately capture key details such as timing, email subjects, important individuals, and semantic cues.

* The model maintains good contextual alignment, ensuring the responses stay relevant to the content provided.

* However, the quality of the responses is limited in sophistication, likely due to the small size of the underlying knowledge base.

#### **Final Thoughts and Future Work:**

* The current implementation relies on in-memory storage and locally installed LLMs and text embedding models, which work well for small-scale testing and development.

* To make the solution scalable, it can be extended using a NoSQL database for storing vector embeddings and high-end GPU machines to speed up embedding generation.

* By integrating more advanced LLMs and embedding models, we can enable real-time response generation suitable for production-level applications.

* APIs can be developed and deployed to serve various business use cases. As a proof of concept, a local API has already been built in this setup.