### Clear memory

In [1]:
%reset -f
import gc
gc.collect()

0

### Import

In [None]:
import pandas as pd
import chromadb
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_core.documents import Document
from langchain_core.prompts import PromptTemplate
import torch
from langchain_anthropic import ChatAnthropic
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain_core.output_parsers import StrOutputParser
import os
from dotenv import load_dotenv
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
%matplotlib inline

In [3]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)

# Data Loading
Load chunks, prepared in the first notebook

In [5]:
chunks = pd.read_json('../data/processed/chunks.json', orient='records')

chunks.head(3)

Unnamed: 0,chunk_id,text,page_num,char_count,start_char,end_char
0,0,User Guide AWS Toolkit for Microsoft Azure Dev...,1,134,0,134
1,1,AWS Toolkit for Microsoft Azure DevOps User Gu...,2,422,0,422
2,2,"s likely to cause confusion among customers, o...",2,260,322,822


In [6]:
print('Number of chunks:', chunks.shape[0])

Number of chunks: 569


# Embeddings Creation
Embeddings creation is different with LangChain. We need special wrapper that will be used by LangChain when it is needed (no need to explicitly manually create an embedding for each chunk). Here the same sentence transformer `all-MiniLM-L6-v2` model is used that comes within HuggingFace LangChain package. 

In [7]:
embedding_function = HuggingFaceEmbeddings(
    model_name='sentence-transformers/all-MiniLM-L6-v2',
    model_kwargs={'device': device},
    encode_kwargs={'normalize_embeddings': True, 'batch_size': 32}
)

Test the function with a text sample

In [8]:
test_text = 'What is AWS?'
test_embedding = embedding_function.embed_query(test_text)

print('Embedding dimension:', len(test_embedding))

Embedding dimension: 384


# Vector Database Setup

### Convert Chunks to Documents

LangChain requires data in special `Document` format, so, let's turn each chunk into document 

In [9]:
documents = []

for idx, row in chunks.iterrows():
    doc = Document(
        page_content=row['text'],
        metadata={
            'chunk_id': int(row['chunk_id']),
            'page_num': int(row['page_num']),
            'char_count': int(row['char_count']),
            'start_char': int(row['start_char']),
            'end_char': int(row['end_char']),
        }
    )
    documents.append(doc)


Look at the first document 

In [10]:
documents[0]

Document(metadata={'chunk_id': 0, 'page_num': 1, 'char_count': 134, 'start_char': 0, 'end_char': 134}, page_content='User Guide AWS Toolkit for Microsoft Azure DevOps Copyright © 2025 Amazon Web Services, Inc. and/or its aﬃliates. All rights reserved.')

### Delete Old Collection

In [11]:
try:
    client = chromadb.PersistentClient(path='../data/chromadb')
    client.delete_collection('aws_docs_langchain')
except:
    pass

### Create Vector Store 
Create a Vector Store using LangChain. LangChain will automatically embed all documents using provided embedding function and put them into ChromaDB collection. Also, we set up a cosine similarity as a search metric


In [12]:
vectorstore = Chroma.from_documents(
    documents=documents,
    embedding=embedding_function,
    collection_name='aws_docs_langchain',
    persist_directory='../data/chromadb',
    collection_metadata={'hnsw:space': 'cosine'},
)

print('Name of the collection:', vectorstore._collection.name)
print('Number of documents in collection:', vectorstore._collection.count())

Name of the collection: aws_docs_langchain
Number of documents in collection: 569


# Sematic Search 

There is no need to implement semantic search with LangChain, but we can test it

In [13]:
test_query = 'If I do not have an AWS account, what do I do?'

search_results = vectorstore.similarity_search_with_score(query=test_query, k=3)

print('Query:', test_query)
print('-' * 100)

for i,s in enumerate(search_results, 1):
    print(f'Rank: {i} | Similarity: {1-s[1]:.3} | Page: {s[0].metadata['page_num']} | Chunk ID: {search_results[0][0].metadata['chunk_id']}')
    print(s[0].page_content, end='\n\n')

Query: If I do not have an AWS account, what do I do?
----------------------------------------------------------------------------------------------------
Rank: 1 | Similarity: 0.699 | Page: 10 | Chunk ID: 77
WS account 1. Open https://portal.aws.amazon.com/billing/signup. 2. Follow the online instructions. Part of the sign-up procedure involves receiving a phone call or text message and entering a veriﬁcation code on the phone keypad. When you sign up for an AWS account, an AWS account root user is created. The root user has access to all AWS services and resources in the account. As a security best practice, assign administrative access to a user, and use only the root user to perform tasks that re

Rank: 2 | Similarity: 0.639 | Page: 109 | Chunk ID: 77
WS, see Troubleshooting AWS identity and access or the user guide of the AWS service you are using. Service administrator – If you're in charge of AWS resources at your company, you probably have full access to AWS. It's your job to d

# Prompt Engineering
LangChain uses `PromptTemplate` to define reusable prompt structures. 

### Create Custom Prompt Template
Same prompt template is used here as in previous notebook. Input variables must be put into figure brackets.

In [14]:
prompt_template = """You are an expert at answering questions about Amazon Web Services documentation.

INSTRUCTIONS:
1. Read all context chunks from documentation carefully
2. Identify which chunks contain relevant information
3. Synthesize a clear answer using ONLY the provided context
4. Do NOT use your general knowledge and do not make assumptions
5. Cite page numbers for each piece of information
6. Explicitly state if the answer is not in the provided context
7. Write in PLAIN TEXT without any formatting (no bold, no italics, no markdown syntax like ** or __)
8. You may use line breaks and simple numbering/bullet points for clarity

CONTEXT CHUNKS FROM DOCUMENTATION:
{context}

USER QUESTION:
{query}

Think step-by-step, then provide your final ANSWER only without steps.

ANSWER:"""

Use LangChain Prompt Template. Define input variable names 

In [15]:
PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=['context', 'query'],
)

### Test the Prompt Template

In [16]:
sample_question = 'If I do not have an AWS account, what do I do?'
sample_context = 'If you do not have an AWS account, go to AWS website and create it'

formatted_prompt = PROMPT.format(
    context=sample_context,
    query=sample_question,
)

print(formatted_prompt)


You are an expert at answering questions about Amazon Web Services documentation.

INSTRUCTIONS:
1. Read all context chunks from documentation carefully
2. Identify which chunks contain relevant information
3. Synthesize a clear answer using ONLY the provided context
4. Do NOT use your general knowledge and do not make assumptions
5. Cite page numbers for each piece of information
6. Explicitly state if the answer is not in the provided context
7. Write in PLAIN TEXT without any formatting (no bold, no italics, no markdown syntax like ** or __)
8. You may use line breaks and simple numbering/bullet points for clarity

CONTEXT CHUNKS FROM DOCUMENTATION:
If you do not have an AWS account, go to AWS website and create it

USER QUESTION:
If I do not have an AWS account, what do I do?

Think step-by-step, then provide your final ANSWER only without steps.

ANSWER:


# RAG Pipeline with LangChain

LangChain automates the entire RAG process by chaining components together. In this section there are three main elements:
1. Retriever
2. Prompt
3. LLM

### Setup LLM

Use the same LLM as in previous notebook. Initialize Claude API client through LangChain wrapper. Here we restrict model temperature to 0.3 to get more certain answers. 

In [17]:
load_dotenv()
api_key = os.getenv('ANTHROPIC_API_KEY')

LLM = ChatAnthropic(
    model='claude-haiku-4-5-20251001',
    temperature=0.3,
    max_tokens=500,
    anthropic_api_key=api_key,
)

### Create Retriever

Retriever is a LangChain interface for vectorstore that will be used in the chain.

In [18]:
retriever = vectorstore.as_retriever(
    search_type='similarity',
    search_kwargs={'k': 5},
)


### Document Formatting
Retriever returns the *k* number of documents. They have to be inserted into Prompt. Hence, they must be formatted and concatenated together as plain text. 

In [19]:
def format_retrieved_docs(docs):
    """
    Formats and concatenates the list of documents.
    Returns formatted string.
    """
    context = []
    for idx, doc in enumerate(docs, 1):
        context.append(f"[Context chunk {idx} - Page {doc.metadata['page_num']}]\n{doc.page_content}")
    return "\n\n".join(context)

### Build RAG Chain

Build RAG chain with LangChain. 

The first component uses parallel processing, because the Prompt expects both the context (output of the retriever) and the user query (input of the retriever). The query: 

The second component is the Prompt Template that was defined above. 

The third component is the LLM that generates responses, it was also defined above. 

In the end we use `StrOutputParser` that returns only LLM generated text response without metadata. 

In [20]:
retriever_step = RunnableParallel(
    {
        'context': retriever | format_retrieved_docs,
        'query': RunnablePassthrough(),
    }
)

chain = (retriever_step | PROMPT | LLM | StrOutputParser())


### Test the Pipeline
Test the pipeline with relevant question 

In [21]:
query = 'If I do not have an AWS account, what do I do?'

result = chain.invoke(query)

print(result)

Based on the provided documentation, to create an AWS account if you do not have one:

1. Open https://portal.aws.amazon.com/billing/signup
2. Follow the online instructions
3. As part of the sign-up procedure, you will receive a phone call or text message and need to enter a verification code on the phone keypad

When you complete the sign-up process, an AWS account root user will be created, which has access to all AWS services and resources in the account.

(Page 10)


Test the pipeline with irrelevant question 

In [None]:
query = 'How do I make a tasty pizza?'

result = chain.invoke(query)

print(result)

**Observation:** the model provides correct answer for relevant question and do not hallucinate after irrelevant question.

# Summary

In this notebook we built the complete RAG pipeline using LangChain with the same quality results as manual approach. The same steps were created: 
- embedding wrapper function for chunks 
- database to store embeddings and chunks using `Document` chunk structure
- semantic search using cosine similarity 
- clear and well-structured prompt template for LLMs
- complete manual RAG pipeline by chaining components using LCEL (LangChain Expression Language)

## Comparison of Approaches

| Aspect         | Manual (NB2)          | LangChain (NB3)   |
|----------------|-----------------------|-------------------|
| Code Length    | ~150 lines            | ~50 lines         |
| Flexibility    | Full control          | Less control      |
| Debugging      | Easy to trace         | Black box         |
| Learning Curve | Understand RAG deeply | Faster to start   |
| Production     | Custom solutions      | Standard patterns |
| Maintenance    | More effort           | Framework updates |
| Best use case  | Research and learning | Fast prototyping  |


