# LangChain RAG Demo with AWS Bedrock Nova Pro

This notebook demonstrates two approaches to building RAG (Retrieval-Augmented Generation) applications:
1. **Custom RAG Implementation** - Direct boto3 and ChromaDB usage
2. **LangChain RAG Implementation** - Using LangChain's pipe operators and integrations

Both implementations use AWS Bedrock Nova Pro model with car manufacturing industry content.

## Prerequisites

Make sure you have:
- AWS credentials configured
- All dependencies installed (`pip install -r requirements.txt`)
- Access to AWS Bedrock Nova Pro model

In [None]:
!pip install -r requirements.txt

## Part 1: Setup ChromaDB Vector Store

First, we'll create the local vector store with car manufacturing content.

In [1]:
# Execute the setup script to create ChromaDB vector store
!python setup_local_vector_store.py

README.md: 10.5kB [00:00, 49.3MB/s]
  warn(
2025-07-30 18:44:46.577748: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-07-30 18:44:46.577816: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-07-30 18:44:46.577830: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-30 18:44:46.583464: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate comp

## Part 2: Custom RAG Implementation

This implementation uses direct boto3 calls and manual ChromaDB operations.

In [2]:
# Import required libraries for custom RAG
import chromadb
from sentence_transformers import SentenceTransformer
import boto3
import json
import random
from config import (
    AWS_REGION, BEDROCK_MODEL_ID, EMBEDDING_MODEL_NAME, 
    CHROMA_DB_PATH, COLLECTION_NAME, MAX_TOKENS, TEMPERATURE, 
    TOP_K_RESULTS, SAMPLE_QUESTIONS
)

In [3]:
def connect_to_vector_store():
    """Connect to the ChromaDB vector store"""
    settings = chromadb.config.Settings(
        anonymized_telemetry=False
    )
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH, settings=settings)
    collection = client.get_collection(name=COLLECTION_NAME)
    return collection

def retrieve_relevant_context(query, collection, model):
    """Retrieve relevant documents from vector store"""
    # Generate embedding for the query
    query_embedding = model.encode([query])
    
    # Search for similar documents
    results = collection.query(
        query_embeddings=query_embedding.tolist(),
        n_results=TOP_K_RESULTS
    )
    
    # Format retrieved context and return both formatted and raw results
    context_docs = []
    retrieved_docs = []
    
    for i in range(len(results['documents'][0])):
        doc_text = results['documents'][0][i]
        metadata = results['metadatas'][0][i]
        
        # Store raw doc info for printing
        retrieved_docs.append({
            'title': metadata['title'],
            'category': metadata['category'],
            'content': doc_text[:200] + "..." if len(doc_text) > 200 else doc_text
        })
        
        # Format for context
        context_docs.append(f"Title: {metadata['title']}\nCategory: {metadata['category']}\nContent: {doc_text}")
    
    return "\n\n".join(context_docs), retrieved_docs

def query_bedrock_with_context(query, context):
    """Query AWS Bedrock Nova Pro with context"""
    # Initialize Bedrock runtime client
    bedrock_runtime = boto3.client(
        service_name='bedrock-runtime',
        region_name=AWS_REGION
    )
    
    # Create system prompt
    system_prompt = """You are an expert automotive technician and customer service representative. 
    Use the provided context from car manufacturing documentation to answer questions accurately and helpfully. 
    Focus on practical advice and safety considerations. If the context doesn't contain relevant information, 
    say so clearly."""
    
    # Prepare the request body for Nova Pro
    request_body = {
        "messages": [
            {
                "role": "user",
                "content": [{"text": system_prompt}]
            },
            {
                "role": "user", 
                "content": [{"text": f"Context:\n{context}\n\nQuestion: {query}"}]
            }
        ],
        "inferenceConfig": {
            "maxTokens": MAX_TOKENS,
            "temperature": TEMPERATURE
        }
    }
    
    # Call Bedrock Nova Pro
    response = bedrock_runtime.invoke_model(
        modelId=BEDROCK_MODEL_ID,
        body=json.dumps(request_body)
    )
    
    # Parse response
    response_body = json.loads(response['body'].read())
    return response_body['output']['message']['content'][0]['text']

In [4]:
# Run Custom RAG Demo
print("🔧 Custom RAG Demo with AWS Bedrock Nova Pro")
print("=" * 50)

# Initialize components
collection = connect_to_vector_store()
embedding_model = SentenceTransformer(EMBEDDING_MODEL_NAME)

# Choose a random question
question = random.choice(SAMPLE_QUESTIONS)

print(f"\n📝 User Question: {question}")
print("=" * 50)

# Retrieve relevant context and docs
context, retrieved_docs = retrieve_relevant_context(question, collection, embedding_model)

# Print retrieved documents
print("\n📚 Retrieved Documents:")
print("-" * 30)
for i, doc in enumerate(retrieved_docs, 1):
    print(f"{i}. Title: {doc['title']}")
    print(f"   Category: {doc['category']}")
    print(f"   Content: {doc['content']}")
    print()

# Query Bedrock with context
response = query_bedrock_with_context(question, context)

print("🤖 Final Response:")
print("-" * 30)
print(response)
print("=" * 50)

🔧 Custom RAG Demo with AWS Bedrock Nova Pro


  warn(
2025-07-30 18:45:08.614307: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-07-30 18:45:08.614378: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-07-30 18:45:08.614391: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-30 18:45:08.620841: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



📝 User Question: How do I know if my transmission fluid needs changing?

📚 Retrieved Documents:
------------------------------
1. Title: Transmission Fluid Change Intervals
   Category: maintenance
   Content: Proper transmission maintenance is crucial for vehicle longevity and performance. Automatic transmission fluid should be changed every 30,000 to 60,000 miles depending on driving conditions and vehicl...

2. Title: Oil Change Intervals and Specifications
   Category: maintenance
   Content: Proper engine oil maintenance is the most important factor in engine longevity and performance. Oil change intervals vary based on driving conditions, with normal conditions requiring changes every 7,...

3. Title: Brake Pad Replacement Procedures
   Category: repair
   Content: Brake pad replacement is a critical safety procedure that requires proper tools and techniques to ensure safe operation. Before beginning work, inspect the brake rotors for scoring, warping, or excess...

🤖 Final Resp

## Part 3: LangChain RAG Implementation

This implementation uses LangChain's official integrations and pipe operators for a more structured approach.

### 🔄 Kernel Restart Required

**⚠️ IMPORTANT**: To avoid ChromaDB settings conflicts between the Custom RAG and LangChain implementations, we need to restart the kernel before proceeding to the LangChain section.

**What this cell does**:
- Automatically restarts the Jupyter kernel
- Clears all variables and imports from memory
- Ensures a clean environment for the LangChain implementation

**After running this cell**:
1. The kernel will restart automatically
2. Then proceed with the **LangChain RAG Implementation** section

**Why is this necessary?**
ChromaDB maintains singleton instances with specific settings. When we switch from the custom implementation to LangChain's Chroma integration, different settings cause conflicts. A kernel restart ensures a clean slate.

---

**Run the below cell to restart the kernel:**


In [1]:
import IPython
print("🔄 Restarting kernel to clear ChromaDB settings...")
IPython.Application.instance().kernel.do_shutdown(True)


🔄 Restarting kernel to clear ChromaDB settings...


{'status': 'ok', 'restart': True}

In [1]:
# Import required libraries for custom RAG
import chromadb
from sentence_transformers import SentenceTransformer
import boto3
import json
import random
from config import (
    AWS_REGION, BEDROCK_MODEL_ID, EMBEDDING_MODEL_NAME, 
    CHROMA_DB_PATH, COLLECTION_NAME, MAX_TOKENS, TEMPERATURE, 
    TOP_K_RESULTS, SAMPLE_QUESTIONS
)

In [2]:
# Import LangChain libraries
from langchain_chroma import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser

In [3]:
# Custom Bedrock LLM class that works with LangChain
class BedrockNovaLLM:
    def __init__(self):
        self.client = boto3.client('bedrock-runtime', region_name=AWS_REGION)
        self.model_id = BEDROCK_MODEL_ID
    
    def invoke(self, prompt: str) -> str:
        request_body = {
            "messages": [
                {
                    "role": "user",
                    "content": [{"text": prompt}]
                }
            ],
            "inferenceConfig": {
                "maxTokens": MAX_TOKENS,
                "temperature": TEMPERATURE
            }
        }
        
        response = self.client.invoke_model(
            modelId=self.model_id,
            body=json.dumps(request_body)
        )
        
        response_body = json.loads(response['body'].read())
        return response_body['output']['message']['content'][0]['text']

In [4]:
def setup_langchain_rag():
    """Setup LangChain RAG pipeline with pipe operators"""
    # Initialize embeddings
    embeddings = HuggingFaceEmbeddings(model_name=EMBEDDING_MODEL_NAME)
    
    # Connect to existing ChromaDB using LangChain Chroma
    vectorstore = Chroma(
        persist_directory=CHROMA_DB_PATH,
        embedding_function=embeddings,
        collection_name=COLLECTION_NAME
    )
    
    # Create retriever from vectorstore
    retriever = vectorstore.as_retriever(search_kwargs={"k": TOP_K_RESULTS})
    
    # Initialize Bedrock LLM
    llm = BedrockNovaLLM()
    
    # Create prompt template
    template = """You are an expert automotive technician and customer service representative.
Use the provided context from car manufacturing documentation to answer questions accurately and helpfully.
Focus on practical advice and safety considerations. If the context doesn't contain relevant information,
say so clearly.

Context: {context}

Question: {question}

Answer:"""
    
    prompt = PromptTemplate.from_template(template)
    
    # Format documents function
    def format_docs(docs):
        formatted = []
        for doc in docs:
            metadata = doc.metadata
            content = doc.page_content
            formatted.append(f"Title: {metadata.get('title', 'N/A')}\nCategory: {metadata.get('category', 'N/A')}\nContent: {content}")
        return "\n\n".join(formatted)
    
    # Create RAG chain using LangChain pipe operators
    rag_chain = (
        {
            "context": retriever | RunnableLambda(format_docs),
            "question": RunnablePassthrough()
        }
        | prompt
        | RunnableLambda(lambda x: llm.invoke(x.text))
    )
    
    return rag_chain, retriever

In [5]:
# Run LangChain RAG Demo
print("🔗 LangChain RAG Demo with Pipe Operators and AWS Bedrock Nova Pro")
print("=" * 65)

# Setup RAG chain and retriever
rag_chain, retriever = setup_langchain_rag()

# Choose a random question
question = random.choice(SAMPLE_QUESTIONS)

print(f"\n📝 User Question: {question}")
print("=" * 65)

# Retrieve and show documents
retrieved_docs = retriever.invoke(question)
print("\n📚 Retrieved Documents:")
print("-" * 30)
for i, doc in enumerate(retrieved_docs, 1):
    content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
    print(f"{i}. Title: {doc.metadata.get('title', 'N/A')}")
    print(f"   Category: {doc.metadata.get('category', 'N/A')}")
    print(f"   Content: {content_preview}")
    print()

# Use the RAG chain with LangChain pipe operators
response = rag_chain.invoke(question)

print("🤖 Final Response:")
print("-" * 30)
print(response)
print("=" * 65)

🔗 LangChain RAG Demo with Pipe Operators and AWS Bedrock Nova Pro


  warn(
2025-07-30 18:46:33.871193: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-07-30 18:46:33.871259: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-07-30 18:46:33.871272: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-07-30 18:46:33.877490: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.



📝 User Question: What are the signs of brake system problems?

📚 Retrieved Documents:
------------------------------
1. Title: Brake System Warranty Coverage
   Category: warranty
   Content: Our comprehensive brake system warranty covers all major brake components for a period of 36 months or 36,000 miles, whichever comes first. This warranty includes brake pads, brake rotors, brake calip...

2. Title: Brake Pad Replacement Procedures
   Category: repair
   Content: Brake pad replacement is a critical safety procedure that requires proper tools and techniques to ensure safe operation. Before beginning work, inspect the brake rotors for scoring, warping, or excess...

3. Title: Suspension System Diagnostics and Repair
   Category: repair
   Content: The suspension system is critical for vehicle handling, ride comfort, and tire wear. Common suspension problems include worn shock absorbers, damaged struts, worn ball joints, and deteriorated bushing...

🤖 Final Response:
----------------

## Summary

This notebook demonstrated two approaches to building RAG applications:

### Custom RAG Implementation
- ✅ Direct control over all components
- ✅ Simple and straightforward
- ✅ Easy to customize and debug
- ❌ More manual work required

### LangChain RAG Implementation
- ✅ Built-in integrations and abstractions
- ✅ Pipe operators for clean chain composition
- ✅ Standardized patterns and interfaces
- ✅ Rich ecosystem of components
- ❌ Additional abstraction layer

Both approaches successfully demonstrate RAG capabilities with AWS Bedrock Nova Pro and provide insights into the retrieved context used for generating responses.