# Amazon Bedrock Knowledge Bases
- **Amazon Bedrock Knowledge Bases** is a **fully managed service** that simplifies the **entire RAG workflow**.
- It eliminates the need to **build custom integrations** and **manage data flows manually**.
- Enables **FMs and agents** to retrieve contextual **company-specific data** for **more accurate and personalized responses**.
- Supports **multi-turn conversations** with **session context management**.

## **Steps to Create a Knowledge Base**
### **Step 1: Create Knowledge Base**
- Use the **AWS Management Console** to create a **new knowledge base**.

![image.png](attachment:image.png)

### **Step 2: Configure Details & Permissions**
- Provide a **name** and **description**.
- Create a **runtime role** to allow Amazon Bedrock to access other AWS services.

![image.png](attachment:image.png)

### **Step 3: Select Data Sources**
- Ingest content from **repositories** like:
  - **Amazon S3**
  - **Confluence (preview)**
  - **Salesforce (preview)**
  - **SharePoint (preview)**
- Amazon Bedrock **automatically fetches documents** from these sources.

![image.png](attachment:image.png)

### **Step 4: Choose an Embeddings Model**
- Select an **embeddings model** to convert documents into **vector representations**.

![image.png](attachment:image.png)

### **Step 5: Select a Vector Database**
- Store embeddings in:
  - **Amazon OpenSearch Serverless**
  - **Pinecone**
  - **Redis Cloud**
  - **Amazon Aurora**
  - **MongoDB**
- **Knowledge Bases also manages workflow complexities**:
  - **Content comparison**
  - **Failure handling**
  - **Throughput control**
  - **Encryption and security**

## **Customize knowledge bases to deliver accurate responses at runtime**
- **Fine-tune data ingestion and retrieval** for better accuracy.
- **Advanced Parsing**:
  - To Understand **unstructured data** (e.g., **PDFs, scanned images**) with **complex content** (e.g., **tables**).
- **Chunking Strategies**:
  - Use **built-in** chunking (default, fixed size, hierarchical, semantic).
  - Implement **custom chunking** via **Lambda functions** or **LangChain/LlamaIndex**.
- **Query Reformulation**:
  - Improves **understanding of complex queries** for better retrieval.

## **Retrieve relevant data and augment prompts**
- **Retrieve API**:
  - Fetches **relevant knowledge** from **Amazon Bedrock Knowledge Bases**.
- **RetrieveAndGenerate API**:
  - **Directly augments the prompt** with retrieved data and return the response.
- **Integration with Bedrock Agents**:
  - Provides **real-time contextual data** to **enhance AI agent responses**.

## **Provide source attribution**
- Amazon Bedrock **provides citations** for retrieved information.
- Users can **view source details** to improve trust and **reduce hallucinations**.

![image.png](attachment:image.png)

## **How Amazon Bedrock Knowledge Bases Work**
### **Pre-processing Data**
- **Splitting**: Documents are broken into **manageable chunks**.
- **Embedding**: Text chunks are converted into **vector representations** and written to a vector index, while maintaining a mapping to the original document.
- **Storage**: Embeddings are **stored in a vector database**.
- Enables **semantic search** by linking chunks to their original sources.

![image.png](attachment:image.png)

### **Runtime**
- The **user query** is converted into **a vector**.
- The **vector index** finds **semantically similar** document chunks.
- The **retrieved chunks** are added to the **user’s prompt**.
- The **augmented prompt** is sent to the **FM for final response generation**.

![image.png](attachment:image.png)

## **Optimizing RAG with Amazon Bedrock**
- **Improve Retrieval**:
  - Expand and refine datasets.
  - Use **advanced chunking strategies**.
- **Optimize Embeddings & VectorDB**:
  - Ensure **high-quality embeddings** for better semantic matching.
- **Enhance System Performance**:
  - deliver the most relevant information while minimizing the inclusion of irrelevant data.

## **Key Takeaways**
- **Amazon Bedrock Knowledge Bases** simplifies **RAG implementation**.
- **Automates data ingestion, retrieval, and augmentation**.
- **Supports multiple data sources** (S3, Confluence, Salesforce, etc.).
- **Offers flexible vector storage** (OpenSearch, Pinecone, Redis Cloud, etc.).
- **Optimizable for accuracy** through **chunking, query reformulation, and parsing**.
- **Enhances transparency** with **source attribution and citations**.