## Code to Chapter 10 of LangChain for Life Science and Healthcare book, by Dr. Ivan Reznikov

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ysMJ3XFQto_Asx4y-9E0etJEcKF-hsrD?usp=sharing)


## Langfuse Tutorial - Building an Observable RAG System

This tutorial demonstrates how to build a Retrieval-Augmented Generation (RAG) system with **Langfuse** for comprehensive observability and monitoring. Langfuse is an open-source LLM engineering platform that provides tracing, evaluation, and analytics for AI applications.

## What You'll Learn
- Setting up Langfuse for LLM observability
- Building a RAG system using LangChain and LangGraph
- Implementing tracing and monitoring for production AI systems
- Processing PDF documents for knowledge retrieval

## 1. Environment Setup and Dependencies

First, we'll install all the necessary packages for our RAG system with Langfuse integration.

In [None]:
!pip install -q langfuse langchain langgraph langchain_community langchain_openai openai pypdf

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.3/299.3 kB[0m [31m6.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.9/143.9 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m41.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m70.6/70.6 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m313.2/313.2 kB[0m [31m13.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.9/43.9 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.3/50.3 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [None]:
!pip freeze | grep "lang\|openai"

google-ai-generativelanguage==0.6.15
google-cloud-language==2.17.2
langchain==0.3.26
langchain-community==0.3.27
langchain-core==0.3.71
langchain-openai==0.3.28
langchain-text-splitters==0.3.8
langcodes==3.5.0
langfuse==3.2.1
langgraph==0.5.4
langgraph-checkpoint==2.1.1
langgraph-prebuilt==0.5.2
langgraph-sdk==0.1.74
langsmith==0.4.8
language_data==1.3.0
libclang==18.1.1
openai==1.97.1


## 2. Configuration and Authentication

Setting up environment variables for API access. We're using Google Colab's userdata for secure credential management.

**Important Notes:**
- Replace the userdata calls with your actual API keys if not using Colab
- Choose the appropriate Langfuse host based on your region
- Keep your API keys secure and never commit them to version control

In [None]:
import os
import openai
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get("LC4LS_OPENAI_API_KEY")
os.environ["LANGFUSE_PUBLIC_KEY"] = userdata.get("LANGFUSE_PUBLIC_KEY")
os.environ["LANGFUSE_SECRET_KEY"] = userdata.get("LANGFUSE_SECRET_KEY")

os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # EU region
#os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # US region

## 3. Initialize Langfuse Callback Handler

The CallbackHandler is the core component that enables tracing of your LLM applications.

**What this does:**
- Creates a callback handler that will automatically trace all LLM interactions
- Verifies that your API keys and connection to Langfuse are working
- If successful, you should see a confirmation message

In [None]:
from langfuse.langchain import CallbackHandler

langfuse_handler = CallbackHandler()

In [None]:
# Tests the SDK connection with the server
#langfuse_handler.auth_check()

## 4. Data Preparation - Downloading Research Paper

We'll download a research paper about protein generative models to use as our knowledge base.

**Key Points:**
- We create a data directory to organize our files
- Headers are added to avoid potential blocking by the server
- The paper focuses on watermarking protein generative models

In [None]:
os.makedirs('./data', exist_ok=True)

In [None]:
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
    'Referer': 'https://github.com/IvanReznikov/LangChain4LifeScience/blob/main/data/articles/2410.20354v4.pdf',
}

response = requests.get(
    'https://raw.githubusercontent.com/IvanReznikov/LangChain4LifeScience/refs/heads/main/data/articles/2410.20354v4.pdf',
    headers=headers,
)

pdf_path = "./data/article.pdf"
with open(pdf_path, "wb") as f:
    f.write(response.content)

## 5. Import Required Libraries

Now we'll import all the necessary components for building our RAG system with LangGraph.

**Library Overview:**
- **LangChain Hub**: Access to pre-built prompts and chains
- **Document Loaders**: For processing various file formats
- **Text Splitters**: Breaking documents into manageable chunks
- **LangGraph**: For building stateful, multi-step AI workflows
- **Vector Stores**: For semantic search and retrieval
- **OpenAI Components**: LLM and embedding models

In [None]:
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict
from langchain_community.document_loaders import PyPDFLoader
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings



## 6. Initialize the Language Model

We'll use GPT-4o-mini for our RAG system - it's cost-effective and performs well for most use cases.

**Why GPT-4o-mini?**
- Balanced performance and cost
- Good reasoning capabilities for Q&A tasks
- Fast response times
- Suitable for production RAG applications

In [None]:
llm = ChatOpenAI(model="gpt-4o-mini")

## 7. Define Application State

LangGraph uses a state-based approach. We define what information flows between steps.

**State Management:**
- **question**: The user's query that needs to be answered
- **context**: Documents retrieved from the vector store that are relevant to the question
- **answer**: The final generated response combining the question and context

In [None]:
# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

## 8. Define Application Steps

Our RAG system has two main steps: retrieve relevant documents and generate an answer.

**How it works:**
1. **Retrieve**: Searches the vector store for documents similar to the question
2. **Generate**: Combines retrieved context with the question and generates a comprehensive answer

### Step 1: Document Retrieval

In [None]:
# Define application steps
def retrieve(state: State):
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}

### Step 2: Answer Generation

In [None]:
def generate(state: State):
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

## 9. Document Processing Pipeline

Now we'll load, split, and index our research paper for retrieval.

**Why these parameters?**
- **chunk_size=1000**: Large enough to contain meaningful information, small enough for efficient processing
- **chunk_overlap=100**: Ensures important information isn't lost at chunk boundaries

In [None]:
loader = PyPDFLoader("./data/article.pdf")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
docs = text_splitter.split_documents(documents)


### Vector Store Creation

**Embedding Model Choice:**
- **text-embedding-3-large**: OpenAI's most capable embedding model
- Provides high-quality semantic representations
- Better retrieval accuracy compared to smaller models

In [None]:
vector_store = InMemoryVectorStore(OpenAIEmbeddings(model = "text-embedding-3-large"))
vector_store.add_documents(documents=docs)

['076c1114-ef20-4dad-9bbb-9de8e176bc62',
 'd4dd5192-bb8f-4763-b6fc-614d92909dd8',
 '10104c72-9c2e-4408-8e10-38b666c5620d',
 'd051d16e-9bcb-4653-80f7-88dae27685f4',
 '125660ee-fd98-4d7e-aecc-0b212d07acd6',
 '6a6935b7-ce1f-463a-a7f9-a61f008dfcca',
 'fbe9d2e4-4994-40cf-b192-d56e74251c4f',
 '60456c74-00f5-413d-90d8-a2bfd730f0fa',
 'f7023ed1-7422-4306-91ee-bb905e054fea',
 '89af3f2d-e589-4555-b04b-36572ba7e06b',
 'c2b426a7-d357-4158-80ba-401d009a817b',
 'fa8c89db-bbc9-4c18-872d-c25a5f9d7df8',
 '91cf4fa5-64f9-4302-bbb8-c955fb583f24',
 '97dedee4-1884-4af0-b7b8-9399c43eb12e',
 '0b6f252d-ac99-457e-8650-7616408495e0',
 'bebdad74-5392-49a4-bedb-08c850917b95',
 '5c3d0c75-4954-41b7-8feb-64a8a8f24faa',
 '3a193f99-6467-45e6-8baf-0bd060d0e409',
 '025934d2-c878-43a4-9f17-e370bf49bb17',
 '3fe3d524-1022-4ffe-aa4f-06eba627186a',
 '3321fe72-e918-4a77-9ce7-f3311e629c6a',
 'ded81913-9c85-4f1e-bb17-2bec3b319109',
 'a6843812-ffc5-45ce-a6e7-4023b05f6354',
 '148301e5-4586-4e11-b60f-7bc8581a8a7e',
 '6ff45b71-cf01-

### RAG Prompt Template

This prompt template is specifically designed for RAG applications and includes instructions for using retrieved context effectively.

In [None]:
prompt = hub.pull("rlm/rag-prompt")



## 10. Build the LangGraph Application

Now we'll create our RAG pipeline using LangGraph's state-based approach.

**Graph Structure:**
1. **START** → **retrieve**: Begin by finding relevant documents
2. **retrieve** → **generate**: Use retrieved documents to generate answer
3. The state flows through each step, accumulating information

In [None]:
# Compile application and test
graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

## 11. Test the RAG System with Monitoring

Let's test our system with a question about the research paper content.

**What happens here:**
1. The question is processed through our RAG pipeline
2. Langfuse automatically traces each step (retrieval, generation)
3. All LLM calls, embeddings, and intermediate results are logged
4. You can view detailed traces in your Langfuse dashboard


In [None]:
query = "What are the benefits of watermarking protein generative models?"

response = graph.invoke({"question": query}, config={
    "callbacks":[langfuse_handler]
})

In [None]:
print(response["answer"])

The benefits of watermarking protein generative models include enabling copyright authentication and tracking of generated protein structures. The proposed FoldMark method effectively embeds user-specific information while maintaining the original quality of the protein structures. Additionally, it offers robustness against post-processing and adaptive attacks, addressing ethical concerns in generative AI applications for protein design.
