# RAG Evaluation with Langfuse

This notebook runs a RAG (Retrieval-Augmented Generation) workflow and sends the results to Langfuse. The actual evaluation is performed on the Langfuse platform using a configured LLM-as-a-Judge evaluator.


## 1. Prepare Workflow

Before running the evaluation, we need to ingest the documents that the RAG pipeline will use to answer questions.


In [None]:
from langfuse import Langfuse
from workflow import RAGWorkflow

langfuse = Langfuse()

workflow = RAGWorkflow()

await workflow.ingest_documents("data")

## 2. Run Evaluation Pipeline

With the workflow initialized and documents ingested, we can now run the evaluation pipeline. This process will execute the RAG workflow for each question in our dataset and log the results to Langfuse under a unique session ID. The server-side LLM-as-a-Judge evaluator, which you configured in the Langfuse UI, will then automatically process these new traces. For evaluations Langfure has their own metrics along with RAGAS.


In [None]:
from evaluation import EvaluationPipeline

pipeline = EvaluationPipeline(workflow, langfuse)

await pipeline.run("eval-data/golden_dataset.jsonl")