# Chapter 7 - Open-source Frameworks: RAG Pipeline with LlamaIndex

## Overview
This notebook demonstrates how to build a Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex integrated with Amazon Bedrock. We'll explore how to create document indexes, perform semantic search, and generate contextually relevant responses.

## Introduction
This notebook demonstrates how to build a basic Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex. The pipeline leverages Amazon Bedrock's foundational models to create an efficient question-answering system based on your own data.

## Prerequisites
- AWS account with Amazon Bedrock access
- Permissions to use Claude 3 Sonnet and Titan embedding models
- PDF document for knowledge extraction

## Setup

### Install Required Dependencies

In [None]:
%pip install llama-index --quiet
%pip install llama-index-llms-bedrock --quiet
%pip install llama-index-embeddings-bedrock --quiet
%pip install llama-index-embeddings-huggingface --quiet

In [None]:
# Note: you may need to restart the kernel to use updated packages.

## Imports

In [None]:
!pip install --upgrade pydantic --quiet

In [None]:
!pip install llama-index-readers-file --quiet

In [None]:
!pip install llama-index-vector-stores-faiss --quiet

In [None]:
!pip install llama-index-llms-bedrock-converse

### Import Required Libraries

In [None]:
from llama_index.core import (
    VectorStoreIndex,
    SimpleDirectoryReader,
    StorageContext,
    load_index_from_storage
)
from llama_index.core.settings import Settings
from llama_index.llms.bedrock_converse import BedrockConverse  # Updated import
from llama_index.embeddings.bedrock import BedrockEmbedding, Models

## Configure Foundation Models

### Setup LLM and Embedding Models

In [None]:
# Initialize Claude 3 Sonnet as our LLM
llm = BedrockConverse(model="anthropic.claude-3-sonnet-20240229-v1:0")
embedding = BedrockEmbedding(model="amazon.titan-embed-text-v1")

In [None]:
from llama_index.readers.file import PDFReader
from llama_index.vector_stores.faiss import FaissVectorStore
from llama_index.core import Settings
# Configure global settings
Settings.llm = llm
Settings.embed_model = embedding
Settings.chunk_size = 512

## Data Processing

### Load and Process Documents

In [None]:
# Load data from a PDF file
documents = PDFReader().load_data(file='data/generative-ai-report.pdf')

## Create Vector Index

### Initialize FAISS Vector Store

In [None]:
import faiss

# dimensions of titan text embedding
d = 1024
faiss_index = faiss.IndexFlatL2(d)

In [None]:
# Create a vector store index using FAISS
vector_store = FaissVectorStore(faiss_index = faiss_index)
index = VectorStoreIndex.from_documents(
    documents,
    llm=llm,
    embedding=embedding,
    vector_store=vector_store
)

## Query the RAG Pipeline

### Create Query Engine and Ask Questions

In [None]:
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()
response = query_engine.query("Who are the participants")

In [None]:
print(response)

In [None]:
query_engine = index.as_query_engine()
response = query_engine.query("List the large language models mentioned in this document")
print(response)

# Conclusion

In this notebook, we've successfully built a Retrieval-Augmented Generation (RAG) pipeline using LlamaIndex and Amazon Bedrock's foundation models. By leveraging Claude 3 Sonnet for generation and Titan Text Embeddings for semantic representation, we created a powerful question-answering system that grounds its responses in specific document knowledge.

The workflow we implemented demonstrates the core components of an effective RAG system:
1. Document ingestion and chunking
2. Vector embedding generation
3. Efficient storage using FAISS vector database
4. Semantic retrieval of relevant context
5. Augmented generation with retrieved information

This approach addresses one of the fundamental challenges of working with LLMs: providing factual, relevant answers based on specific knowledge sources rather than general pretrained information. Our implementation shows how RAG can significantly improve the reliability and contextual accuracy of AI-generated responses.

The pipeline we built is flexible and can be extended to handle various document types, knowledge domains, and use cases. By adjusting parameters like chunk size, retrieval methods, or prompt engineering techniques, you can further optimize the system for your specific needs.

For production deployments, consider implementing additional components such as:
- Persistent storage for your vector index
- Monitoring for response quality
- User feedback mechanisms
- Hybrid retrieval approaches combining semantic and keyword search

RAG represents a powerful paradigm for building AI applications that combine the flexibility of generative models with the reliability of information retrieval systems, enabling more trustworthy and useful AI assistants.