# Telecom Multilingual RAG Pipeline

This notebook implements a complete Retrieval-Augmented Generation (RAG) system for telecom customer service using multilingual embeddings and Open-Source LLMs.

## Key Components

### **Technology Stack**
- **Vector Store**: ChromaDB with BAAI/bge-m3 multilingual embeddings
- **LLM Engine**: Ollama with multiple models (Llama3, Mistral, Zephyr, Granite, WizardLM)
- **Framework**: LangChain for RAG orchestration
- **Embeddings**: BGE-M3 for multilingual semantic search

### **Pipeline Features**
- **Multilingual Support**: Handles English, German, French, and Italian queries
- **Context-Aware Responses**: Uses retrieved telecom conversation context
- **Model Comparison**: Evaluates multiple LLMs for response quality
- **Prompt Engineering**: Optimized templates for customer service scenarios

### **Evaluation Process**
1. **Vector Retrieval**: Semantic search through telecom conversation database
2. **Response Generation**: Context-augmented answers using open-source LLMs
3. **Multilingual Testing**: Validates responses across different languages
4. **Performance Analysis**: RAGAS framework for quality assessment

### **AI Assistant Persona**
- **Name**: Max (Union Mobile's AI assistant)
- **Style**: Professional yet friendly email responses
- **Focus**: Direct problem resolution with minimal assumptions
- **Language**: Responds in user's preferred language


In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/telecom-test-dataset-with-summary-new/test_df_with_summaries.csv
/kaggle/input/telecom-vector-store-new/chromadb/chroma.sqlite3
/kaggle/input/telecom-vector-store-new/chromadb/2d5c59a8-49a6-4769-a848-94fbf2fd1e63/header.bin
/kaggle/input/telecom-vector-store-new/chromadb/2d5c59a8-49a6-4769-a848-94fbf2fd1e63/index_metadata.pickle
/kaggle/input/telecom-vector-store-new/chromadb/2d5c59a8-49a6-4769-a848-94fbf2fd1e63/link_lists.bin
/kaggle/input/telecom-vector-store-new/chromadb/2d5c59a8-49a6-4769-a848-94fbf2fd1e63/length.bin
/kaggle/input/telecom-vector-store-new/chromadb/2d5c59a8-49a6-4769-a848-94fbf2fd1e63/data_level0.bin
/kaggle/input/telecom-test-data/test_df.csv


In [None]:
# Install required packages for RAG pipeline
!pip install -q -U FlagEmbedding  # BGE-M3 multilingual embeddings
!pip install -q langchain-chroma  # ChromaDB vector store integration
!pip install -q langchain-community  # LangChain community components
!pip install -q langchain-huggingface  # Hugging Face embeddings
!pip install -q langchain-openai  # OpenAI integration (if needed)

%pip install -qU langchain-ollama  # Ollama LLM integration
!pip install -q colab-xterm  # Terminal access in Colab
%load_ext colabxterm

!pip install -q ollama  # Ollama local LLM server

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.9/163.9 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m89.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m74.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m38.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━

In [None]:
# Import core RAG components
from FlagEmbedding import BGEM3FlagModel  # BGE-M3 multilingual embedding model
from langchain_chroma import Chroma  # ChromaDB vector store
from langchain_huggingface.embeddings import HuggingFaceEmbeddings  # Hugging Face embeddings wrapper

2025-07-13 13:22:33.059827: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1752412953.425171      36 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1752412953.536164      36 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'

In [None]:
import shutil

# Setup vector database by copying to writable directory
source_dir = '/kaggle/input/telecom-vector-store-new/chromadb'
destination_dir = '/kaggle/working/chromadb'

# Copy the pre-built ChromaDB database to working directory
if not os.path.exists(destination_dir):
    shutil.copytree(source_dir, destination_dir)
    print("Vector database copied successfully")

In [None]:
# Initialize BGE-M3 multilingual embedding model
# This model supports 100+ languages and is optimized for multilingual retrieval
embedding_model = HuggingFaceEmbeddings(
    model_name="BAAI/bge-m3",  # BGE-M3 multilingual model
    model_kwargs={'device': 'cuda'},  # Use GPU for faster embedding
    encode_kwargs={"normalize_embeddings": True}  # Normalize for cosine similarity
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/123 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/54.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/687 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.27G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/444 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/964 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

In [None]:
# Initialize ChromaDB vector store with telecom conversation embeddings
vector_store = Chroma(
    collection_name="telecom_vector_store",  # Collection containing telecom conversations
    embedding_function=embedding_model,  # Use BGE-M3 for retrieval
    persist_directory="/kaggle/working/chromadb"  # Path to database
)

# Create retriever that returns top 5 most relevant conversations
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

In [None]:
# Install Ollama for running local LLMs
!curl https://ollama.ai/install.sh | sh
!sudo apt install -y neofetch  # System information display

# Ollama allows running large language models locally without external API calls
# This ensures data privacy and eliminates API costs

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13281    0 13281    0     0  58310      0 --:--:-- --:--:-- --:--:-- 58250
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%                                               5.8%##########################                             63.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  caca-utils chafa imagemagick imagemagick-6.q16 jp2a libchafa0 libid3tag0
  libimlib2

In [None]:
# Start Ollama server as a background process
import subprocess
import time

# Launch Ollama server to handle LLM requests
command = "nohup ollama serve&"

# Start the server process in the background
process = subprocess.Popen(command,
                          shell=True,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE)
print("Ollama server started with Process ID:", process.pid)

# Wait for server to initialize
time.sleep(5)

Process ID: 564


In [9]:
!ollama -v

ollama version is 0.9.6


In [10]:
!ollama library

Error: unknown command "library" for "ollama"


In [None]:
# Download LLM models for evaluation
# Starting with Llama3 8B as the primary model
!ollama pull llama3:8b

# Additional models available for comparison (commented out to save time/space)
#!ollama pull mistral:7b      # Mistral 7B - good for multilingual tasks
#!ollama pull zephyr:7b       # Zephyr 7B - instruction-tuned model  
#!ollama pull granite3.3:8b   # IBM Granite - enterprise-focused
#!ollama pull wizardlm2:7b    # WizardLM - strong reasoning capabilities

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling 6a0746a1ec1a:   0% ▕                  ▏  11 MB/4.7 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling 6a0746a1ec1a:   1% ▕                  ▏  43 MB/4.7 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling 6a0746a1ec1a:   3% ▕                  ▏ 120 MB/4.7 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling 6a0746a1ec1a:   4% ▕                  ▏ 196 MB/4.7 GB

In [None]:
# Set the primary model for RAG pipeline
model_name = 'llama3:8b'  # Llama3 8B chosen for balanced performance and multilingual capability

In [22]:
!ollama list

NAME         ID              SIZE      MODIFIED      
llama3:8b    365c0bd3c000    4.7 GB    8 minutes ago    


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


### Invoking OllamaLLM Directly without context

In [None]:
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Test LLM without RAG context (baseline comparison)
template = """
You are Max, Union Mobile's AI assistant. Your top priority is to resolve the customer's issue efficiently and clearly.

Follow these rules carefully:

Resolution First: Focus on directly resolving the customer's issue. Avoid unnecessary checks, explanations, or assumptions unless required.
Language: Always respond in {language}, no matter what language the input is in.
Style: Keep the email short, helpful, and friendly — include a greeting, body, and optional closing.
Tone: Professional but warm — be respectful, clear, and supportive.

Keep the salutation general.

Customer Issue: {input}
Compose a short, email-style response in {language}, using the information from the customer's issue to address it.
"""

# Create prompt template without context variable
prompt = ChatPromptTemplate.from_template(template)

# Initialize Ollama LLM
model = OllamaLLM(model=model_name)

# Create simple chain: prompt -> model -> output parser (no RAG)
chain = prompt | model | StrOutputParser()

# Test with a sample customer issue
response = chain.invoke({
    "input": """I am facing a problem with my mobile device monitoring.
I'm unable to access certain features on my phone and keep getting error messages
saying 'Invalid SIM Card' and 'No Service'.
This issue was supposed to be resolved previously, but it hasn't been addressed yet.
I've been experiencing inconvenience due to this ongoing problem, which is preventing my phone from working properly.
""",
    "language": "English",
})

print("Response without RAG context:")
print(response)

Subject: Assistance with Mobile Device Monitoring Issue

Dear Customer,

Thank you for reaching out about the issue you're experiencing with your mobile device monitoring. I'm sorry to hear that you're unable to access certain features on your phone and are receiving error messages saying 'Invalid SIM Card' and 'No Service'. This issue was previously reported, and I apologize that it hasn't been resolved yet.

I've reviewed your account, and I'd like to investigate this further. Can you please provide me with the exact model of your device and the current software version installed? Additionally, have you tried restarting your phone or checking for any recent changes to your SIM card or network provider?

Once I have this information, I'll do my best to assist you in resolving this issue as soon as possible.

Thank you for your patience and cooperation. If you have any further questions or concerns, please don't hesitate to reach out.

Best regards,
Max


### Invoking OllamaLLM Directly with context

In [None]:
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

# Enhanced template with context from vector database
template = """
You are Max, Union Mobile's AI assistant. Your top priority is to resolve the customer's issue efficiently and clearly using the information provided.

Follow these rules carefully:

Resolution First: Focus on directly resolving the customer's issue. Avoid unnecessary checks, explanations, or assumptions unless required by the context.
Language: Always respond in {language}, no matter what language the input is in.
Context Only: Use only the content in the context. Do not invent or infer beyond it.
Style: Keep the email short, helpful, and friendly — include a greeting, body, and optional closing.
Tone: Professional but warm — be respectful, clear, and supportive.

Keep the salutation general.
<context>
{context}
</context>

Customer Issue: {input}
Compose a short, email-style response in {language}, using the context above to address the customer's issue.
"""

# Create RAG chain with context from vector database
prompt = ChatPromptTemplate.from_template(template)
model = OllamaLLM(model=model_name)

# Document chain combines retrieved documents with the prompt
doc_chain = create_stuff_documents_chain(model, prompt)

# Retrieval chain handles vector search + response generation
chain = create_retrieval_chain(retriever, doc_chain)

In [None]:
# Test RAG chain with context from vector database
response = chain.invoke({
    "input": """I am facing a problem with my mobile device monitoring.
I'm unable to access certain features on my phone and keep getting error messages
saying 'Invalid SIM Card' and 'No Service'.
This issue was supposed to be resolved previously, but it hasn't been addressed yet.
I've been experiencing inconvenience due to this ongoing problem, which is preventing my phone from working properly.
""",
    "language": "English",
})

print("Response with RAG context:")
print(response['answer'])
print(f"\nNumber of retrieved documents: {len(response.get('context', []))}")

Subject: Resolving Your Mobile Device Monitoring Issue

Dear [Customer],

Thank you for reaching out to Union Mobile regarding the issue with your mobile device monitoring. I apologize for the inconvenience caused by the errors you're experiencing, including "Invalid SIM Card" and "No Service".

I understand that this problem was previously reported, but unfortunately, it hasn't been fully resolved yet. I'm committed to helping you resolve this issue as soon as possible.

After reviewing your account information, I found that there might be a problem with the SIM card installation. To resolve this, I'll send a replacement SIM card to your address on file. In the meantime, I'd like to offer you a complimentary 3-month subscription to our premium technical service, which includes dedicated tech support and assistance diagnostics.

Please let me know if this resolves the issue or if you have any further questions or concerns. Your satisfaction is my top priority, and I'm committed to ensu

### Trying different prompt templates

In [None]:
# Alternative prompt template with explicit language parameter
# This version enforces language consistency more strictly
prompt_template_with_lang = """
You are Max, Union Mobile's AI assistant that communicates with users via email in a friendly and professional tone.

Please follow these rules carefully:
- Resolution First: Focus on directly resolving the customer's issue. Avoid unnecessary checks, explanations, or assumptions unless required by the context.
- Language: Always respond in {language}, regardless of the context's language.
- Use only the context provided: Do not add or assume anything outside the given context.
- Style: Write a short, helpful, and friendly email (greeting, body, optional closing).
- Tone: Professional but warm — keep it clear and respectful.
- If no relevant information is found: Reply only with "I do not know."
- Keep the salutation general.
<context>
{context}
</context>

Customer Issue: {input}
Compose a short, email-style response in {language}, using the context above to address the customer's issue.
"""

In [None]:
# This version lets the model automatically detect and respond in the user's language
prompt_template_without_lang = """
You are Max, Union Mobile's AI assistant. Your top priority is to resolve the customer's issue efficiently and clearly using the information provided.

Follow these rules carefully:

Resolution First: Focus on directly resolving the customer's issue. Avoid unnecessary checks, explanations, or assumptions unless required by the context.
Language: Always respond in user language.
Context Only: Use only the content in the context. Do not invent or infer beyond it.
Style: Keep the email short, helpful, and friendly — include a greeting, body, and optional closing.
Tone: Professional but warm — be respectful, clear, and supportive.

Keep the salutation general.
<context>
{context}
</context>

Customer Issue: {input}
Compose a short, email-style response, using the context above to address the customer's issue.
"""

In [None]:
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

def prompt_template_test(template, user_prompt, language=None):
    """
    Test different prompt templates with the same user input
    Args:
        template: The prompt template string to test
        user_prompt: The customer issue/question
        language: Optional language specification
    Returns:
        Generated response string
    """
    # Create prompt template from string
    prompt = ChatPromptTemplate.from_template(template)

    # Initialize model and chains
    model = OllamaLLM(model=model_name)
    doc_chain = create_stuff_documents_chain(model, prompt)
    chain = create_retrieval_chain(retriever, doc_chain)
    
    # Build input arguments dynamically
    invoke_args = {"input": user_prompt}
    if language is not None:
        invoke_args["language"] = language
    
    # Generate and return response
    response = chain.invoke(invoke_args)
    return response['answer']

In [None]:
# Test prompt template with German input and explicit language specification
print("=== German Input with Language Parameter ===")
print(prompt_template_test(prompt_template_with_lang,
                    """ Ich habe ein Problem mit meinem aktuellen IoT-Plan. 
                    Ich denke darüber nach, zu einem anderen Anbieter zu wechseln, 
                    da andere Unternehmen anscheinend bessere Angebote für IoT-Pläne haben. 
                    Ich verwende normalerweise etwa 5 GB Daten pro Monat und möchte sicherstellen, 
                    dass ich einen Plan habe, der meinen Bedürfnissen entspricht. Außerdem ist es mir"""
                           ,"German"
                    ))

In [None]:
# Test language-agnostic prompt template with same German input
print("=== German Input without Language Parameter ===")
print(prompt_template_test(prompt_template_without_lang,
                    """ Ich habe ein Problem mit meinem aktuellen IoT-Plan. Ich denke darüber nach, zu einem anderen Anbieter zu wechseln, da andere Unternehmen anscheinend bessere Angebote für IoT-Pläne haben. Ich verwende normalerweise etwa 5 GB Daten pro Monat und möchte sicherstellen, dass ich einen Plan habe, der meinen Bedürfnissen entspricht. Außerdem ist es mir"""
                    ))

In [None]:
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

template = prompt_template_resolution_focus

# Create a prompt template
prompt = ChatPromptTemplate.from_template(template)

model = OllamaLLM(model=model_name)
doc_chain = create_stuff_documents_chain(model, prompt)
chain = create_retrieval_chain(retriever, doc_chain)

## Pipeline - design

In [None]:
import time

# Load test dataset with customer issue summaries
df_test_with_summaries = pd.read_csv("/kaggle/input/telecom-test-dataset-with-summary-new/test_df_with_summaries.csv", encoding='UTF-8')

# Process all test cases for RAGAS evaluation
results = []
start_time = time.time()

print(f"Processing {len(df_test_with_summaries)} test cases...")

for idx, row in df_test_with_summaries.iterrows():
    user_prompt = row['issue_summary']  # Customer issue description
    language = row['language']  # Target response language
    
    # Generate RAG response
    response = chain.invoke({"input": user_prompt, "language": language})
    answer = response['answer']
    context = response.get('context', [])
    
    # Store results in RAGAS format
    results.append({
        "question": user_prompt,
        "answer": answer,
        "contexts": context if isinstance(context, list) else [context]
    })
    
    # Progress tracking
    if (idx + 1) % 10 == 0 or (idx + 1) == len(df_test_with_summaries):
        elapsed = (time.time() - start_time) / 60
        print(f"Processed {idx + 1}/{len(df_test_with_summaries)} records... Elapsed time: {elapsed:.2f} minutes")

# Save results for RAGAS evaluation
df_ragas = pd.DataFrame(results)
df_ragas.to_csv(f"/kaggle/working/df_ragas_{model_name}.csv", index=False)

total_time = (time.time() - start_time) / 60
print(f"Done. Total records processed: {len(df_test_with_summaries)}. Total time: {total_time:.2f} minutes.")

In [None]:
df_test_with_summaries = pd.read_csv("/kaggle/input/telecom-test-dataset-with-summary-new/test_df_with_summaries.csv", encoding='UTF-8')

### Creating common function for pipeline

In [None]:
from langchain_ollama.llms import OllamaLLM
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

def create_chain(model_name, retriever):
    """
    Create a RAG chain for a specific model
    Args:
        model_name: Name of the Ollama model to use
        retriever: Vector store retriever for context
    Returns:
        Configured RAG chain
    """
    # Standard prompt template for consistent evaluation
    template = """
You are Max, Union Mobile's AI assistant. Your top priority is to resolve the customer's issue efficiently and clearly using the information provided.

Follow these rules carefully:

Resolution First: Focus on directly resolving the customer's issue. Avoid unnecessary checks, explanations, or assumptions unless required by the context.
Language: Always respond in {language}, no matter what language the input is in.
Context Only: Use only the content in the context. Do not invent or infer beyond it.
Style: Keep the email short, helpful, and friendly — include a greeting, body, and optional closing.
Tone: Professional but warm — be respectful, clear, and supportive.

Keep the salutation general.
<context>
{context}
</context>

Customer Issue: {input}
Compose a short, email-style response in {language}, using the context above to address the customer's issue.
"""
    # Build chain components
    prompt = ChatPromptTemplate.from_template(template)
    model = OllamaLLM(model=model_name)
    doc_chain = create_stuff_documents_chain(model, prompt)
    chain = create_retrieval_chain(retriever, doc_chain)
    return chain

In [None]:
def generate_ragas_eval_df(input_csv, chain, model_name):
    """
    Generate RAGAS evaluation dataset for a specific model
    Args:
        input_csv: Path to test dataset
        chain: RAG chain to evaluate
        model_name: Model identifier for output file
    """
    df_test_with_summaries = pd.read_csv(input_csv, encoding='UTF-8')
    results = []
    start_time = time.time()
    
    print(f"Evaluating {model_name} on {len(df_test_with_summaries)} test cases...")
    
    for idx, row in df_test_with_summaries.iterrows():
        user_prompt = row['issue_summary']
        language = row['language']
        
        # Generate response using RAG chain
        response = chain.invoke({"input": user_prompt, "language": language})
        answer = response['answer']
        context = response.get('context', [])
        
        # Format for RAGAS evaluation
        results.append({
            "question": user_prompt,
            "answer": answer,
            "contexts": context if isinstance(context, list) else [context]
        })
        
        # Progress tracking
        if (idx + 1) % 10 == 0 or (idx + 1) == len(df_test_with_summaries):
            elapsed = (time.time() - start_time) / 60
            print(f"Processed {idx + 1}/{len(df_test_with_summaries)} records... Elapsed time: {elapsed:.2f} minutes")
    
    # Save evaluation results
    df_ragas = pd.DataFrame(results)
    output_file = f"/kaggle/working/df_ragas_{model_name.replace(':', '_')}.csv"
    df_ragas.to_csv(output_file, index=False)
    
    total_time = (time.time() - start_time) / 60
    print(f"Done. Results saved to {output_file}. Total time: {total_time:.2f} minutes.")

def run_inference_for_models(input_csv, retriever, model_names):
    """
    Run evaluation pipeline for multiple models sequentially
    Args:
        input_csv: Test dataset path
        retriever: Vector store retriever
        model_names: List of model names to evaluate
    """
    prev_model = None
    
    for model_name in model_names:
        print(f"\n{'='*50}")
        print(f"Processing model: {model_name}")
        print(f"{'='*50}")
        
        # Clean up previous model to save disk space
        if prev_model:
            print(f"Removing previous model: {prev_model}")
            os.system(f"ollama rm {prev_model}")
        
        # Download current model
        print(f"Pulling model: {model_name}")
        os.system(f"ollama pull {model_name}")
        
        # Create RAG chain for this model
        chain = create_chain(model_name, retriever)
        
        # Run evaluation
        generate_ragas_eval_df(input_csv, chain, model_name)
        prev_model = model_name

# Model comparison pipeline
# Note: Only run this if you have sufficient compute resources and time
model_list = [
    # "llama3:8b",      # Already processed above
    "mistral:7b",        # Mistral 7B
    "zephyr:7b",         # Zephyr 7B  
    "granite3.3:8b",     # IBM Granite
    "wizardlm2:7b"       # WizardLM2
]

# Uncomment to run full model comparison
# run_inference_for_models("/kaggle/input/telecom-test-dataset-with-summary-new/test_df_with_summaries.csv", retriever, model_list)