### Clear memory

In [27]:
%reset -f
import gc
gc.collect()

0

### Import

In [28]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
import pymupdf, json, requests, re, sys
from sentence_transformers import SentenceTransformer
from pathlib import Path
from tqdm.auto import tqdm
from typing import List, Tuple, Dict, Any, Optional
import chromadb
from chromadb.config import Settings
import anthropic
from anthropic import Anthropic
import os
from dotenv import load_dotenv
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
sns.set_style('whitegrid')
%matplotlib inline

In [29]:
import warnings
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)
warnings.filterwarnings('ignore', category=UserWarning)

In [30]:
sys.path.append('..')
from src.data_utils import *

# Data Loading
Load chunks, prepared in the first notebook

In [31]:
chunks = pd.read_json('../data/processed/chunks.json', orient='records')

chunks.head(3)

Unnamed: 0,chunk_id,text,page_num,char_count,start_char,end_char
0,0,User Guide AWS Toolkit for Microsoft Azure Dev...,1,134,0,134
1,1,AWS Toolkit for Microsoft Azure DevOps User Gu...,2,422,0,422
2,2,"s likely to cause confusion among customers, o...",2,260,322,822


In [32]:
print('Number of chunks:', chunks.shape[0])

Number of chunks: 569


# Embeddings Creation
### Model Selection
For this project `all-MiniLM-L6-v2` model is chosen, due to high performance and relatively small weight. 


In [33]:
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')  # all-mpnet-base-v2

### Text Encoding
Convert each chunk into an embedding vector 

In [34]:
embeddings = embedding_model.encode(
    sentences = chunks['text'].tolist(),
    show_progress_bar = True,
)

print('Dimension of each embedding:', embeddings.shape[1])

Batches:   0%|          | 0/18 [00:00<?, ?it/s]

Dimension of each embedding: 384


### Save Embeddings
Add embeddings to dataframe with chunks

In [35]:
chunks['embedding'] = list(embeddings)

chunks.head(3)

Unnamed: 0,chunk_id,text,page_num,char_count,start_char,end_char,embedding
0,0,User Guide AWS Toolkit for Microsoft Azure Dev...,1,134,0,134,"[-0.013883891, 0.039782137, -0.03466231, 0.017..."
1,1,AWS Toolkit for Microsoft Azure DevOps User Gu...,2,422,0,422,"[-0.05496494, 0.04391669, -0.036545783, 0.0084..."
2,2,"s likely to cause confusion among customers, o...",2,260,322,822,"[-0.008677264, -0.07277807, 0.006473142, -0.08..."


# Vector Database Setup
Create a database to store embeddings and speed up similarity search. 
### Initialize Client
In this project ChromaDB client is used, because it is free and does not require external server. Use `PersistentClient` to store embeddings on a disk (not in RAM). 

In [36]:
client = chromadb.PersistentClient(
    path='../data/chromadb',
    settings=Settings(
        anonymized_telemetry=False,
        allow_reset=True,
    )
)

### Create Collection
Create collection called `aws_docs` inside database and also delete the old one if exists. Cosine similarity is used as metric. 

In [37]:
try:
    client.delete_collection(name='aws_docs')
except:
    pass

collection = client.create_collection(
    name = 'aws_docs',
    configuration = {'hnsw': {'space': 'cosine'}},
    metadata = {'description': 'AWS VSTS documentation chunks'},
)

### Fill Collection

First, unpack ids, embeddings, text chunks and chunks metadata

In [38]:
ids = list(map(str, chunks['chunk_id'].tolist()))
embeddings_list = chunks['embedding'].tolist()
documents = chunks['text'].tolist()
metadatas = chunks[['page_num', 'char_count', 'start_char', 'end_char']].to_dict('records')

Add to collection

In [39]:
collection.add(
    ids=ids,
    embeddings=embeddings_list,
    documents=documents,
    metadatas=metadatas,
)

### Verify Storage

Print random document from the database to verify that everything is stored correctly 

In [40]:
sample = collection.get(
    ids=['99'],
    include=['documents', 'metadatas', 'embeddings']
)

print('ID:', *sample['ids'])
print('Page:', sample['metadatas'][0]['page_num'])
print('Char quantity:', sample['metadatas'][0]['char_count'])
print('Embedding shape:', sample['embeddings'][0].shape[0])
print('Text preview:', sample['documents'][0][:200])

ID: 99
Page: 15
Char quantity: 441
Embedding shape: 384
Text preview: ironment variables. These variables can be used to get credentials from a custom credentials store. The following are all the supported standard named AWS environment variables: • AWS_ACCESS_KEY_ID – 


Everything is stored fine!

# Semantic Search

Semantic search is a process of finding the closest chunk in terms of meaning to a query where distance is usually measured by cosine similarity.

### Formulate a Search Query

In [41]:
query = 'If I do not have an AWS account, what do I do?'

### Perform Semantic Search
Use custom function that returns 3 best chunks by default 

In [42]:
search_results = semantic_search(
    query,
    n_results=3,
    model=embedding_model,
    collection=collection,
)

### Print Results
Use custom function for pretty print

In [43]:
print_search_results(results=search_results, query=query)

Query: If I do not have an AWS account, what do I do?
----------------------------------------------------------------------------------------------------
Rank 1 | Similarity: 0.699 | Page: 10 | Chunk ID: 77 | Text preview below (first 300 chars):
WS account 1. Open https://portal.aws.amazon.com/billing/signup. 2. Follow the online instructions. Part of the sign-up procedure involves receiving a phone call or text message and entering a veriﬁcation code on the phone keypad. When you sign up for an AWS account, an AWS account root user is crea...

Rank 2 | Similarity: 0.639 | Page: 109 | Chunk ID: 521 | Text preview below (first 300 chars):
WS, see Troubleshooting AWS identity and access or the user guide of the AWS service you are using. Service administrator – If you're in charge of AWS resources at your company, you probably have full access to AWS. It's your job to determine which AWS features and resources your service users shoul...

Rank 3 | Similarity: 0.633 | Page: 109 | Chunk 

### Test It One More Time

In [44]:
query = 'What if I want to allow people outside of my AWS account to access my AWS resources?'
search_results = semantic_search(
    query,
    n_results=5,
    model=embedding_model,
    collection=collection,
)
print_search_results(results=search_results, query=query)

Query: What if I want to allow people outside of my AWS account to access my AWS resources?
----------------------------------------------------------------------------------------------------
Rank 1 | Similarity: 0.695 | Page: 114 | Chunk ID: 549 | Text preview below (first 300 chars):
perform: iam:PassRole In this case, Mary's policies must be updated to allow her to perform the iam:PassRole action. If you need help, contact your AWS administrator. Your administrator is the person who provided you with your sign-in credentials. I want to allow people outside of my AWS account to ...

Rank 2 | Similarity: 0.681 | Page: 114 | Chunk ID: 550 | Text preview below (first 300 chars):
r organization can use to access your resources. You can specify who is trusted to assume the role. For services that support resource-based policies or access control lists (ACLs), you can use those policies to grant people access to your resources. To learn more, consult the following: • To learn ...

Rank 3 

# Prompt Engineering
A good prompt should:
- keep clear structure of instructions;
- define a role of LLM;
- make LLM follow the instructions;
- prevent LLM from hallucination;
- make LLM provide citations (e.g. document page number).

Here is the template that will be used in custom function for generating answer

In [45]:
prompt_template = """You are an expert at answering questions about Amazon Web Services documentation.

INSTRUCTIONS:
1. Read all context chunks from documentation carefully
2. Identify which chunks contain relevant information
3. Synthesize a clear answer using ONLY the provided context
4. Do NOT use your general knowledge and do not make assumptions
5. Cite page numbers for each piece of information
6. Explicitly state if the answer is not in the provided context
7. Write in PLAIN TEXT without any formatting (no bold, no italics, no markdown syntax like ** or __)
8. You may use line breaks and simple numbering/bullet points for clarity

CONTEXT CHUNKS FROM DOCUMENTATION:
{context_block}

USER QUESTION:
{query}

Think step-by-step, then provide your final ANSWER only without steps.

ANSWER:"""

Create a prompt 

In [46]:
prompt = create_prompt(query, search_results)

# Answer Generating
Due to computational constrains of the local machine, an external LLM is used in this project. 
### Setup Anthropic API
First, we need to set up access to LLM via API. In this project we stick to Claude API

In [47]:
load_dotenv()
api_key = os.getenv('ANTHROPIC_API_KEY')

client = Anthropic(api_key=api_key)

### Use LLM to Generate Answer
Send the prompt to Claude and get an answer. In this example we use Haiku 4.5 model, since it the fastest and the cheapest one. Set temperature to 0.3 to make LLM less creative and more deterministic. As alternative LLM, Claude Sonnet 4.5 could be used, it is slower and more expensive, but it has better quality of responses

In [48]:
message = client.messages.create(
    model = 'claude-haiku-4-5-20251001',
    max_tokens = 500,
    messages = [{'role': 'user', 'content': prompt}],
    temperature = 0.3,
)

Print the answer

In [49]:
print(message.content[0].text)

According to the AWS documentation, if you want to allow people outside of your AWS account to access your AWS resources, you have several options:

1. Create a role that users in other accounts or people outside your organization can use to access your resources. You can specify who is trusted to assume the role.

2. For services that support resource-based policies or access control lists (ACLs), you can use those policies to grant people access to your resources.

The documentation references three specific guides for different scenarios:

- To provide access to your resources across AWS accounts that you own, see Providing access to an IAM user in another AWS account that you own in the IAM User Guide.

- To provide access to your resources to third-party AWS accounts, see Providing access to AWS accounts owned by third parties in the IAM User Guide.

- To provide access through identity federation, see Providing access to externally authenticated users (identity federation) in the

**Observation:** the model returns clear result with page referencing

# Manual RAG Pipeline
### Build complete RAG pipeline

In [50]:
def rag_pipeline(query: str,
                 embedding_model: SentenceTransformer,
                 collection: Collection,
                 n_results: int = 3,
                 llm_name: str = 'claude-haiku-4-5-20251001',
                 temperature: int = 0.3,
                 max_tokens: int = 500,
                 ) -> Dict[str, Any]:
    """
    Complete RAG pipeline
    """

    # Step 1: Semantic search
    search_results = semantic_search(
        query = query,
        model = embedding_model,
        collection = collection,
        n_results = n_results,
    )

    # Step 2: Create prompt
    prompt = create_prompt(query, search_results)

    # Step 3: Generate answer
    message = client.messages.create(
        model = llm_name,
        max_tokens = max_tokens,
        messages = [{'role': 'user', 'content': prompt}],
        temperature = temperature,
    )

    answer = message.content[0].text

    # Step 4: Return
    return {
        'query': query,
        'answer': answer,
        'search_results': search_results,
        'prompt': prompt,
    }


### Test the Pipeline
Test the pipeline with relevant question 

In [51]:
# Test query 1
query = 'If I do not have an AWS account, what do I do?'

result = rag_pipeline(
    query=query,
    embedding_model=embedding_model,
    collection=collection,
    n_results=5,
)

print('Question:')
print(result['query'])
print('-'*100)
print('Answer:')
print(result['answer'])

Question:
If I do not have an AWS account, what do I do?
----------------------------------------------------------------------------------------------------
Answer:
Based on the provided documentation, if you do not have an AWS account, you should follow these steps (Page 10):

1. Open https://portal.aws.amazon.com/billing/signup
2. Follow the online instructions
3. Complete the sign-up procedure, which involves receiving a phone call or text message and entering a verification code on the phone keypad

When you complete the sign-up process, an AWS account root user will be created, which will have access to all AWS services and resources in the account.


Test the pipeline with irrelevant question 

In [26]:
query = 'How do I make a tasty pizza?'

result = rag_pipeline(
    query=query,
    embedding_model=embedding_model,
    collection=collection,
    n_results=3,
)

print('Question:')
print(result['query'])
print('-'*100)
print('Answer:')
print(result['answer'])

Question:
How do I make a tasty pizza?
----------------------------------------------------------------------------------------------------
Answer:
The provided documentation chunks do not contain any information about making pizza. These chunks appear to be from AWS documentation table of contents or reference materials, containing only structural elements like "Synopsis," "Description," and "Parameters" sections.

I cannot answer your question based on the provided context, as it is not related to Amazon Web Services documentation.


**Observation:** the model provides correct answer for relevant question and do not hallucinate after irrelevant question.

# Summary
In this notebook were built: 
- embeddings for chunks
- database to store embeddings and chunks
- semantic search using cosine similarity 
- clear amd well-structured prompt for LLMs
- complete manual RAG pipeline to generate accurate response without hallucination 