### Testing RAG Applications üìë

#### RAG Application
This application reads data about Model Context Protocol (MCP) server from internet, stores in vector stores, chunks the data with embedding and useful to answer the question about MCP while inferenced.

<img src="./img/RAG.png" width="500" height="400" style="display: block; margin: auto;">

In [1]:
#!pip install -qU langchain-chroma

In [2]:
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from typing import List
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.document import Document
from langchain_ollama import ChatOllama

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
llm = ChatOllama(
    base_url="http://localhost:11434",
    model = "qwen2.5:latest",
    temperature=0.5,
    max_tokens = 250
)

In [6]:
# Load data from Web
loader = WebBaseLoader("https://www.descope.com/learn/post/mcp")
data = loader.load()

# Split text into documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splits = text_splitter.split_documents(data)

# Add text to vector db
embedding = OllamaEmbeddings(model="nomic-embed-text:latest")
vectordb = Chroma.from_documents(documents=splits, embedding=embedding)

# Create a retriever
retriever = vectordb.as_retriever()

def format_docs(docs: List[Document]) -> str:
    return "\n\n".join([d.page_content for d in docs])


template = """Answer the question based only on the following context:

    {context}
    
    Give a summary not the full detail

    Question: {question}
    """
prompt = ChatPromptTemplate.from_template(template)


def retrieve_and_format(question):
    docs = retriever.get_relevant_documents(question)
    return format_docs(docs)

chain = {"context": retrieve_and_format, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser()


#### Output of the LLM Application

In [None]:
response = chain.invoke("What is MCP")

print(response)

  docs = retriever.get_relevant_documents(question)


MCP, or Model Context Protocol, is a protocol designed to facilitate interactions between large language models (LLMs) and various applications. It supports two main transport methods: STDIO for local integrations and HTTP+SSE for remote connections. Communication uses JSON-RPC 2.0 as the underlying message standard. The core components of MCP include the host application, which interacts with users; the MCP client, integrated within the host to handle connections; the MCP server, which adds context and exposes specific functions; and the transport layer, handling communication between clients and servers.


### Testing RAG Application with DeepEval
<img src="./img/RAGTesting.png" width="800" height="400" style="display: block; margin: auto;">

In [7]:
from deepeval.test_case import LLMTestCase
from deepeval.dataset import EvaluationDataset

test_case = LLMTestCase(
    input="What is MCP?",
    actual_output=chain.invoke("What is MCP"),
    expected_output="The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools‚Äîessentially a ‚Äúuniversal remote‚Äù for AI apps"
)

dataset = EvaluationDataset()
dataset.add_test_case(test_case=test_case)

  docs = retriever.get_relevant_documents(question)


In [8]:
dataset

EvaluationDataset(test_cases=[LLMTestCase(input='What is MCP?', actual_output="MCP, or Model Context Protocol, is a protocol designed to facilitate interactions between large language models (LLMs) and various applications. It uses JSON-RPC 2.0 for standardized communication and supports two primary transport methods: STDIO for local integrations and HTTP+SSE for remote connections. MCP's architecture includes host applications that interact with users, MCP clients that handle communications with servers, and MCP servers that provide specific functions and context to AI apps. Overall, MCP enables seamless integration of LLMs into diverse applications by standardizing communication and providing a structured way to exchange data and context.", expected_output='The Model Context Protocol (MCP) addresses this challenge by providing a standardized way for LLMs to connect with external data sources and tools‚Äîessentially a ‚Äúuniversal remote‚Äù for AI apps', context=None, retrieval_contex

In [9]:
from deepeval.test_case import LLMTestCaseParams
from deepeval.metrics import GEval

concise_metrics = GEval(
    name = "Concise",
    criteria="Assess if the actual output remains concise while preserving all essential information.",
    
    evaluation_params=[
        LLMTestCaseParams.ACTUAL_OUTPUT
    ]
)

In [15]:
from deepeval.test_case import LLMTestCaseParams
from deepeval.metrics import GEval

completness_metrics = GEval(
    name = "Completeness",
    criteria="Assess whether the actual output retains all the key information from the input",
    
    evaluation_params=[
        LLMTestCaseParams.ACTUAL_OUTPUT
    ]
)

### Evaluation with GEval 

In [18]:
from deepeval.evaluate import evaluate
from deepeval.metrics import AnswerRelevancyMetric

evaluate(dataset.test_cases, metrics=[
    completness_metrics, 
    AnswerRelevancyMetric(),
    concise_metrics
])



Metrics Summary

  - ‚úÖ Completeness [GEval] (score: 0.9, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The response effectively addresses all key aspects of MCP as outlined in the input, providing detailed information on communication protocols, transport methods, architecture components, and integration capabilities. It accurately retains the original meaning and context from the input., error: None)
  - ‚úÖ Answer Relevancy (score: 0.8571428571428571, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The score is 0.86 because the output did not directly address what MCP stands for, instead discussing a transport method which is unrelated to the query., error: None)
  - ‚úÖ Concise [GEval] (score: 0.9, threshold: 0.5, strict: False, evaluation model: deepseek-r1:8b (Ollama), reason: The actual output is concise and includes all essential information. It effectively explains MCP's purpose, components, and integrati

I0000 00:00:1755832385.609059 13942971 fork_posix.cc:75] Other threads are currently calling into gRPC, skipping fork() handlers


EvaluationResult(test_results=[TestResult(name='test_case_0', success=True, metrics_data=[MetricData(name='Completeness [GEval]', threshold=0.5, success=True, score=0.9, reason='The response effectively addresses all key aspects of MCP as outlined in the input, providing detailed information on communication protocols, transport methods, architecture components, and integration capabilities. It accurately retains the original meaning and context from the input.', strict_mode=False, evaluation_model='deepseek-r1:8b (Ollama)', error=None, evaluation_cost=0.0, verbose_logs='Criteria:\nAssess whether the actual output retains all the key information from the input \n \nEvaluation Steps:\n[\n    "Compare the actual output with the input to identify any missing or altered information.",\n    "Identify and list all key pieces of information from both the input and the actual output.",\n    "Verify that each identified key piece of information is present, accurate, and retains its original mea