Semantic Retrieval System for Support Ticket Intelligence

📌 Project Summary

Built a semantic retrieval system that converts support tickets into embeddings and retrieves the most relevant historical tickets using cosine similarity. The system supports ambiguity detection and serves as a grounding layer for RAG-based response generation.

📌 Why This Matters

Keyword search fails on paraphrases

Semantic embeddings capture intent, not words

Retrieval enables explainable, evidence-backed responses

📌 Limitations & Safeguards

Low-signal queries handled via confidence thresholds

Ambiguous matches flagged instead of forced decisions

In [None]:
pip install sentence-transformers scikit-learn pandas numpy




In [1]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
from transformers import pipeline



In [2]:
from google.colab import files
uploaded = files.upload()
df=pd.read_excel('gpt.xlsx')

Saving gpt.xlsx to gpt.xlsx


In [3]:
df.head()

Unnamed: 0,id,message_text,category,route_to_team,priority_level
0,1,I ordered two phones by mistake. I only need o...,Order Issue,Order Support Team,Medium
1,2,I ordered black shoes but received pink ones. ...,Order Issue,Order Support Team,High
2,3,The speaker arrived broken right out of the bo...,Order Issue,Order Support Team,High
3,4,I ordered 3 t-shirts but only received 2. The ...,Order Issue,Order Support Team,Medium
4,5,You sent the wrong size. I ordered L but recei...,Order Issue,Order Support Team,Medium


In [4]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [5]:
texts = df["message_text"].tolist()
embeddings = model.encode(texts)

In [6]:
print(embeddings.shape)


(51, 384)


In [7]:
df["embedding"] = list(embeddings)
df


Unnamed: 0,id,message_text,category,route_to_team,priority_level,embedding
0,1,I ordered two phones by mistake. I only need o...,Order Issue,Order Support Team,Medium,"[-0.07894631, 0.0073990636, 0.116433665, -0.00..."
1,2,I ordered black shoes but received pink ones. ...,Order Issue,Order Support Team,High,"[-0.061591845, 0.038773794, 0.048442204, 0.004..."
2,3,The speaker arrived broken right out of the bo...,Order Issue,Order Support Team,High,"[0.0018820579, -0.029098466, 0.042026546, -0.0..."
3,4,I ordered 3 t-shirts but only received 2. The ...,Order Issue,Order Support Team,Medium,"[-0.0041140923, 0.02332352, -0.00045263272, -0..."
4,5,You sent the wrong size. I ordered L but recei...,Order Issue,Order Support Team,Medium,"[0.034329295, 0.03363292, 0.105056904, 0.01049..."
5,6,Duplicate order placed and money deducted twic...,Order Issue,Order Support Team,High,"[-0.071353376, -0.0071729324, 0.11344722, -0.0..."
6,7,One side of my headphones has no sound. Defect...,Order Issue,Order Support Team,Medium,"[0.04733394, -0.06322714, -0.027776198, 0.0020..."
7,8,I ordered iPhone 15 Pro Max and you sent iPhon...,Order Issue,Order Support Team,High,"[-0.034224495, 0.06300113, 0.09387579, -0.0508..."
8,9,The kurta color is completely different than s...,Order Issue,Order Support Team,Medium,"[0.026768658, 0.08044653, 0.005431559, 0.01435..."
9,10,The charger is missing from the box. Send it o...,Order Issue,Order Support Team,Medium,"[0.0057803914, 0.040320385, 0.04002239, 0.0445..."


In [8]:
def semantic_search(query, top_k=20):
    query_embedding = model.encode([query])
    doc_embeddings = np.vstack(df["embedding"].values)
    similarities = cosine_similarity(query_embedding, doc_embeddings)[0]
    df_results = df.copy()
    df_results["similarity"] = similarities
    results = df_results.sort_values("similarity", ascending=False).head(top_k)
    return results

In [9]:
query = "My order came very late again"
results = semantic_search(query)
results[["message_text", "category", "similarity"]]

Unnamed: 0,message_text,category,similarity
18,Everyone else in my area received their parcel...,Delivery Delay,0.527647
17,"Order placed on November 25, still in processi...",Delivery Delay,0.512003
10,It has been 10 days and the order hasn’t even ...,Delivery Delay,0.503599
47,Order confirmation emails for items I didn’t buy.,Fraud/Security,0.48478
41,Someone used my account to place an order I di...,Fraud/Security,0.47923
5,Duplicate order placed and money deducted twic...,Order Issue,0.475638
3,I ordered 3 t-shirts but only received 2. The ...,Order Issue,0.463109
42,Money deducted but no order ID. Where did my m...,Fraud/Security,0.439944
40,I got a call asking for OTP for order confirma...,Fraud/Security,0.436801
13,Expected delivery date keeps changing every da...,Delivery Delay,0.415527


In [10]:
def routing_decision(query,df):
  results = semantic_search(query)
  categories = results["category"].tolist()
  unique_categories = set(categories)
  if len(unique_categories) == 1:
    decision = f"Route to {categories[0]} team"
    confidence = "High"
  else:
    decision = "Escalate to human (ambiguous issue)"
    confidence = "Low"
  return {
      "incoming_ticket": query,
      "similar_tickets": results[["message_text", "category", "similarity"]].to_dict(orient="records"),
      "final_decision": decision,
      "confidence": confidence
  }

In [11]:
output = routing_decision(
    "I cancelled my order but still no refund",
    df
)
output

{'incoming_ticket': 'I cancelled my order but still no refund',
 'similar_tickets': [{'message_text': 'Refund initiated on Dec 3 but still not processed.',
   'category': 'Return Issue',
   'similarity': 0.6831530332565308},
  {'message_text': 'It’s been 15 days since cancellation but no refund yet.',
   'category': 'Return Issue',
   'similarity': 0.6694768667221069},
  {'message_text': 'Duplicate order placed and money deducted twice. Refund the extra payment immediately.',
   'category': 'Order Issue',
   'similarity': 0.6085870265960693},
  {'message_text': 'I ordered two phones by mistake. I only need one. Please cancel the extra and issue a refund quickly.',
   'category': 'Order Issue',
   'similarity': 0.5929496884346008},
  {'message_text': 'I requested wallet refund but it was sent to bank instead. Fix this.',
   'category': 'Return Issue',
   'similarity': 0.564177393913269},
  {'message_text': 'Money deducted but no order ID. Where did my money go?',
   'category': 'Fraud/S

In [12]:
output = routing_decision(
    "Something feels wrong but I don't know what",
    df
)

In [13]:
output

{'incoming_ticket': "Something feels wrong but I don't know what",
 'similar_tickets': [{'message_text': 'It has been 10 days and the order hasn’t even shipped. What’s going on?',
   'category': 'Delivery Delay',
   'similarity': 0.267078161239624},
  {'message_text': 'I ordered black shoes but received pink ones. This was meant for a birthday. Fix this immediately.',
   'category': 'Order Issue',
   'similarity': 0.23046596348285675},
  {'message_text': nan, 'category': nan, 'similarity': 0.22569313645362854},
  {'message_text': 'Payment keeps failing even though my bank says everything is fine.',
   'category': 'Technical Issue',
   'similarity': 0.20873941481113434},
  {'message_text': 'One side of my headphones has no sound. Defective product.',
   'category': 'Order Issue',
   'similarity': 0.20610231161117554},
  {'message_text': 'The parcel has been stuck at the hub for 8 days with no movement.',
   'category': 'Delivery Delay',
   'similarity': 0.18254554271697998},
  {'message

Grounded Support Response System using Retrieval-Augmented Generation

In [14]:
TOP_K = 5
SIMILARITY_THRESHOLD = 0.55
MIN_SUPPORTING_DOCS = 2

In [15]:
def retrieve_evidence(query, df):
    results = semantic_search(query, top_k=TOP_K)

    # Filter by similarity threshold
    evidence = results[results["similarity"] >= SIMILARITY_THRESHOLD]

    return evidence

In [16]:
def is_answer_allowed(evidence):
    if len(evidence) < MIN_SUPPORTING_DOCS:
        return False
    return True

In [17]:
def build_context(evidence):
    print("DEBUG: rows received in build_context =", len(evidence))
    print(evidence[['message_text', 'category', 'similarity']])
    context_blocks = []

    for _, row in evidence.iterrows():
        block = f"""
Ticket:
- Text: {row['message_text']}
- Category: {row['category']}
"""
        context_blocks.append(block)

    return "\n".join(context_blocks)


In [18]:
#   RULES (STRICT):
# - Answer ONLY using the information in the context.
# - If the context does NOT contain enough information, say: "No relevant past cases found."
# - Do NOT add new facts.
# - Do NOT guess.
# - Keep response under 80 words.
# - You MAY suggest generic next steps (e.g., contact support)
# - Do NOT invent timelines or guarantees

In [19]:
def build_prompt(user_query, context):
    return f"""
You are a customer support assistant.

RULES (STRICT):
- Base factual statements ONLY on the provided context.
- Do NOT invent timelines, policies, or guarantees.
- Do NOT claim actions were taken.
- You MAY suggest generic next steps that do not require new facts
  (e.g., contacting support, checking order status).
- If the context is insufficient, say so clearly.
- Keep response under 80 words.

CONTEXT:
{context}

USER QUESTION:
{user_query}

Give a short, direct answer.
"""

In [20]:
from transformers import pipeline

llm = pipeline(
    "text2text-generation",
    model="google/flan-t5-small",
    max_new_tokens=120
)


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json: 0.00B [00:00, ?B/s]

Device set to use cpu


In [21]:
def call_llm(prompt):
    response = llm(prompt)[0]["generated_text"]
    return response.split("ANSWER:")[-1].strip()


In [22]:
def rag_response(query, df):
    evidence = retrieve_evidence(query, df)

    if not is_answer_allowed(evidence):
        return {
            "status": "REFUSED",
            "reason": "Insufficient evidence",
            "answer": "No relevant past cases found."
        }

    context = build_context(evidence)
    prompt = build_prompt(query, context)

    answer = call_llm(prompt)

    return {
        "status": "ANSWERED",
        "evidence_used": evidence[["message_text", "category", "similarity"]]
                          .to_dict(orient="records"),
        "answer": answer
    }


In [23]:
# def call_llm(prompt):
#     response = llm(prompt)[0]["generated_text"]
#     return response.split("ANSWER:")[-1].strip()

# def rag_response(query, df):
#     evidence = retrieve_evidence(query, df)

#     if not is_answer_allowed(evidence):
#         return {
#             "status": "REFUSED",
#             "reason": "Insufficient evidence",
#             "answer": "No relevant past cases found."
#         }

#     context = build_context(evidence)
#     prompt = build_prompt(query, context)

#     answer = call_llm(prompt)

#     return {
#         "status": "ANSWERED",
#         "evidence_used": evidence[["message_text", "category", "similarity"]]
#                           .to_dict(orient="records"),
#         "answer": answer
#     }

In [24]:
rag_response(
    "My order is delayed again",
    df
)

DEBUG: rows received in build_context = 2
                                         message_text        category  \
18  Everyone else in my area received their parcel...  Delivery Delay   
10  It has been 10 days and the order hasn’t even ...  Delivery Delay   

    similarity  
18     0.62595  
10     0.58585  


{'status': 'ANSWERED',
 'evidence_used': [{'message_text': 'Everyone else in my area received their parcel except me. Why the delay?',
   'category': 'Delivery Delay',
   'similarity': 0.6259496212005615},
  {'message_text': 'It has been 10 days and the order hasn’t even shipped. What’s going on?',
   'category': 'Delivery Delay',
   'similarity': 0.5858501195907593}],
 'answer': 'Is there anything else I can help you with?'}

**Hallucination, Refusal & Safety Evaluation**

In [25]:
rag_response(
    "Is my credit card hacked?",
    df
)

{'status': 'REFUSED',
 'reason': 'Insufficient evidence',
 'answer': 'No relevant past cases found.'}

In [26]:
rag_response(
    "Why was my order delayed due to warehouse fire?",
    df
)


{'status': 'REFUSED',
 'reason': 'Insufficient evidence',
 'answer': 'No relevant past cases found.'}

In [27]:
rag_response(
    "Something is wrong",
    df
)


{'status': 'REFUSED',
 'reason': 'Insufficient evidence',
 'answer': 'No relevant past cases found.'}

In [28]:
rag_response(
    "My order is delayed again",
    df
)


DEBUG: rows received in build_context = 2
                                         message_text        category  \
18  Everyone else in my area received their parcel...  Delivery Delay   
10  It has been 10 days and the order hasn’t even ...  Delivery Delay   

    similarity  
18     0.62595  
10     0.58585  


{'status': 'ANSWERED',
 'evidence_used': [{'message_text': 'Everyone else in my area received their parcel except me. Why the delay?',
   'category': 'Delivery Delay',
   'similarity': 0.6259496212005615},
  {'message_text': 'It has been 10 days and the order hasn’t even shipped. What’s going on?',
   'category': 'Delivery Delay',
   'similarity': 0.5858501195907593}],
 'answer': 'Is there anything else I can help you with?'}

# Task
Modify the `calculate_recall_at_k` function to handle `NaN` values by filtering out rows where 'message_text' or 'category' is `NaN` from the input DataFrame before performing semantic search and calculating recall.

## fix_nan_issue_in_recall_calculation

### Subtask:
Modify the `calculate_recall_at_k` function to handle `NaN` values in 'message_text' or 'category' columns by filtering them out before performing semantic search.


## Summary:

The `calculate_recall_at_k` function was modified to enhance its robustness against missing data. Specifically, rows containing `NaN` values in either the 'message_text' or 'category' columns are now filtered out from the input DataFrame *before* semantic search and recall calculation are performed. This ensures that the recall metric is computed only on valid data points.
