# Retrieval and Generation with Bedrock Foundational Models

### Overview  
This notebook demonstrates how to perform retrieval-augmented generation (RAG) using Amazon Bedrock's foundational models. It covers retrieving relevant documents from a knowledge base and generating responses based on the retrieved context.

### Build your own Retrieval Augmented Generation (RAG) system
When constructing your own retrieval augmented generation (RAG) system, you can leverage a retriever system and a generator system. The retriever can be an embedding model that identifies the relevant chunks from the vector database based on similarity scores. The generator can be a Large Language Model (LLM) that utilizes the model's capability to answer questions based on the retrieved results (also known as chunks). In the following sections, we will provide additional tips on how to optimize the prompts for your RAG system.

# 🔍 Retrieval in Flotorch

[Flotorch](https://www.flotorch.ai/) is a real-time Retrieval-Augmented Generation (RAG) orchestration engine designed to streamline operational complexity and enhance observability in deploying AI workflows.

In Flotorch, **retrieval** refers to the process of fetching relevant information from external knowledge bases to augment the responses generated by language models. This ensures that the AI system provides accurate, timely, and context-aware answers by combining its pre-trained knowledge with up-to-date external data.

---

## 🔧 Key Components of Retrieval in Flotorch

1. **Retriever**  
   Searches external databases or knowledge sources to find relevant information based on the user's query.

2. **Augmentation**  
   Incorporates the retrieved data into the model's input to enhance the quality and relevance of the generated response.

3. **Generator**  
   Synthesizes a response by integrating the retrieved information with the model's existing knowledge.

---

## ✅ Benefits of Retrieval in Flotorch

- **Enhanced Accuracy**  
  Accesses real-time data to minimize the risk of outdated or incorrect information.

- **Contextual Understanding**  
  Provides responses that are tailored to the specific query, ensuring relevance and usefulness.

- **Scalability**  
  Efficiently handles large datasets and complex queries.

- **Cost-Effectiveness**  
  Reduces the need for frequent retraining by dynamically pulling in fresh data.

---

This retrieval mechanism is integral to Flotorch's ability to deliver precise and context-aware AI solutions across various industries.


## 🔧 Step 1: load aws variables created

In [1]:
import json
with open("variables.json", "r") as f:
    variables = json.load(f)

variables

{'accountNumber': '677276078734',
 'regionName': 'us-east-1',
 'collectionArn': 'arn:aws:aoss:us-east-1:677276078734:collection/8jt7139u7r4fgi1o7w8d',
 'collectionId': '8jt7139u7r4fgi1o7w8d',
 'vectorIndexName': 'ws-index-',
 'bedrockExecutionRoleArn': 'arn:aws:iam::677276078734:role/advanced-rag-workshop-bedrock_execution_role-us-east-1',
 's3Bucket': '677276078734-us-east-1-advanced-rag-workshop',
 'kbFixedChunk': 'IMXM4XCO1G'}

## Configuring package paths

In [4]:
import sys
import os
print(os.getcwd())
base_path1 = os.path.abspath(os.path.join(os.getcwd(), "flotorchcore"))
base_path2 = os.path.abspath(os.path.join(os.getcwd(), "flotorchcore","flotorchretriever"))
# base_path3 = os.path.abspath(os.path.join(os.getcwd(), "flotorchcore","fargate"))
base_path4 = os.path.abspath(os.path.join(os.getcwd(), "flotorchretriever", "fargate"))
sys.path.append(os.getcwd())
sys.path.append(base_path1)
sys.path.append(base_path2)
# sys.path.append(base_path3)
sys.path.append(base_path4)

/Users/fl_lpt-301/Documents/flotorchnotebooks


In [5]:
sys.path

['/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python310.zip',
 '/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10',
 '/usr/local/Cellar/python@3.10/3.10.14_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/lib-dynload',
 '',
 '/Users/fl_lpt-301/Documents/projects/crag/crag_ravi/CRAG/crag_env/lib/python3.10/site-packages',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks/flotorchcore',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks/flotorchcore/flotorchretriever',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks/flotorchcore',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks/flotorchcore/flotorchretriever',
 '/Users/fl_lpt-301/Documents/flotorchnotebooks/flotorchretriever/fargate']

## Load Prompt json

In [6]:
prompt_file_path = './data/prompt.json'
with open(prompt_file_path, 'r') as f:
    prompt = json.load(f)

## Sample experiment JSON

In [7]:
exp_config_data = {
            "temp_retrieval_llm": "0.1",
            "gt_data": variables["ground_truth_path"],
            "eval_retrieval_model": "bedrock/amazon.titan-text-express-v1",
            "rerank_model_id": "none",
            "embedding_model": "amazon.titan-embed-text-v2:0",
            "bedrock_knowledge_base": "nsd9kkl5lkjp5j7c885m",
            "kb_data": variables['kbFixedChunk'],
            "retrieval_service": "bedrock",
            "knn_num": "3",
            "knowledge_base": True,
            "retrieval_model": "amazon.nova-pro-v1:0",
            "index_id": variables['vectorIndexName'],
            "gateway_api_key": "sk_MWY1MjY4OGEtMGUwYi00YjUxLTllY2UtY2M2NjM0ZWIyZDVm_dPgUOclec5GOphmUzDF4n8C2B5tLEztAWvRKBu1Z5Ps=",
            "vector_dimension": "1024",
            "experiment_id": "P1A8Q0LG",
            "n_shot_prompts": "1",
            "gateway_enabled": False,
            "gateway_url": "https://qa-gateway.flotorch.cloud",
            "chunking_strategy": "Fixed",
            "aws_region": "us-east-1",
            "n_shot_prompt_guide_obj": prompt,
        "n_shot_prompts": 1
        }

## 🔍 Load env config

In [8]:
from flotorch_core.config.env_config_provider import EnvConfigProvider
from flotorch_core.config.config import Config

In [9]:
env_config_provider = EnvConfigProvider()
config = Config(env_config_provider)

### Load Retriver function and other dependencies

In [14]:
from flotorchretriever.fargate.retriever_processor import Retriever
from flotorch_core.storage.storage_provider_factory import StorageProviderFactory
from flotorch_core.reader.json_reader import JSONReader
from flotorch_core.storage.db.vector.vector_storage_factory import VectorStorageFactory
from flotorch_core.inferencer.inferencer_provider_factory import InferencerProviderFactory
from flotorch_core.embedding.embedding_registry import embedding_registry

In [15]:
gt_data = exp_config_data['gt_data']
storage = StorageProviderFactory.create_storage_provider(gt_data)
gt_data_path = storage.get_path(gt_data)
json_reader = JSONReader(storage)

In [16]:
if exp_config_data.get("knowledge_base", False) and not exp_config_data.get("bedrock_knowledge_base", False):
    embedding_class = embedding_registry.get_model(exp_config_data.get("embedding_model"))
    embedding = embedding_class(
        exp_config_data.get("embedding_model"), 
        exp_config_data.get("aws_region"), 
        int(exp_config_data.get("vector_dimension")))
    is_opensearch_required = True
else:
    embedding = None
    is_opensearch_required = False

## 🗃️ Vector Storage Initialization

This section initializes the `VectorStorage` component using a factory method that dynamically selects the appropriate vector storage backend (e.g., OpenSearch, Bedrock Knowledge Base) based on the experimental configuration.

---

### 🛠️ `VectorStorageFactory.create_vector_storage(...)`

Creates an instance of vector storage using configuration flags and credentials.

- **Parameters:**
  - `knowledge_base`: *(bool)* – Whether a knowledge base is used as a backend.
  - `use_bedrock_kb`: *(bool)* – If set, uses AWS Bedrock Knowledge Base.
  - `embedding`: *(BaseEmbedding)* – Embedding generator to use for vector creation.
  - `opensearch_host`: *(str | None)* – OpenSearch host (set if required).
  - `opensearch_port`: *(int | None)* – OpenSearch port (set if required).
  - `opensearch_username`: *(str | None)* – OpenSearch authentication username.
  - `opensearch_password`: *(str | None)* – OpenSearch authentication password.
  - `index_id`: *(str | None)* – Identifier for the index to be used.
  - `knowledge_base_id`: *(str | None)* – ID of the Bedrock knowledge base.
  - `aws_region`: *(str | None)* – AWS region for Bedrock and related services.

---

### ⚙️ Dynamic Backend Selection

The factory method chooses the backend as follows:

- If `bedrock_knowledge_base` is enabled → connects to **Bedrock KB**.
- Else if `knowledge_base` is enabled → connects to **custom knowledge base**.
- Else if `is_opensearch_required` is true → initializes **OpenSearch** with provided credentials.

---

### 📝 Result

Returns a configured `VectorStorage` instance ready for:
- KNN-based vector search
- Bedrock KB search
- Integration into QA or retrieval pipelines



In [17]:
vector_storage = VectorStorageFactory.create_vector_storage(
                knowledge_base=exp_config_data.get("knowledge_base", False),
                use_bedrock_kb=exp_config_data.get("bedrock_knowledge_base", False),
                embedding=embedding,
                opensearch_host=config.get_opensearch_host() if is_opensearch_required else None,
                opensearch_port=config.get_opensearch_port() if is_opensearch_required else None,
                opensearch_username='admin',
                opensearch_password='Flotorch@123',
                index_id=exp_config_data.get("index_id"),
                knowledge_base_id=exp_config_data.get("kb_data"),
                aws_region=exp_config_data.get("aws_region")
            )

## 🤖 Inferencer Initialization

This block initializes the **Inferencer** using a factory method that configures the inference engine for text generation or question answering based on the experimental setup.

---

### 🏗️ `InferencerProviderFactory.create_inferencer_provider(...)`

Creates and returns an appropriate `Inferencer` instance depending on configuration such as API gateway usage, model settings, region, and credentials.

---

### 🔧 Parameters

- `gateway_enabled`: *(bool)* – Enables API gateway-based invocation if set to `True`.
- `gateway_url`: *(str)* – URL endpoint for the API Gateway (e.g., `/api/openai/v1`).
- `gateway_api_key`: *(str)* – API key for authenticating requests to the gateway.
- `retrieval_service`: *(str)* – Name of the retrieval service (e.g., Bedrock, sagemaker).
- `retrieval_model`: *(str)* – The model to use for inference (e.g., `anthropic.claude-v2`).
- `aws_region`: *(str)* – AWS region for service provisioning (e.g., `us-east-1`).
- `iam_role`: *(str)* – IAM role ARN for Bedrock invocation permissions.
- `n_shot_prompts`: *(int)* – Number of few-shot examples to include in prompt.
- `temp_retrieval_llm`: *(float)* – Temperature setting for the language model.
- `n_shot_prompt_guide_obj`: *(Any)* – Few-shot guide object for prompt engineering.

---

### ⚙️ Behavior

- If `gateway_enabled` is `True`, connects to the specified API Gateway using credentials.
- If disabled, falls back to direct model invocation through supported services like AWS Bedrock.
- Supports dynamic few-shot prompting and custom temperature configuration.

---

### 🎯 Outcome

Returns a fully configured `Inferencer` object capable of generating answers or completions for queries using the selected language model.



In [18]:
inferencer = InferencerProviderFactory.create_inferencer_provider(
                exp_config_data.get("gateway_enabled", False),
                f'{exp_config_data.get("gateway_url", "")}/api/openai/v1',
                exp_config_data.get("gateway_api_key", ""),
                exp_config_data.get("retrieval_service"),
                exp_config_data.get("retrieval_model"), 
                exp_config_data.get("aws_region"), 
                'arn:aws:iam::677276078734:role/flotorch-bedrock-role-mainqa',
                int(exp_config_data.get("n_shot_prompts", 0)), 
                float(exp_config_data.get("temp_retrieval_llm", 0)), 
                exp_config_data.get("n_shot_prompt_guide_obj")
            )

## 🔁 Reranker Initialization

This code conditionally initializes the **`BedrockReranker`**, which reorders retrieved documents based on relevance using a reranking model.

---

### 🏗️ `BedrockReranker(...)` Initialization

The reranker is only instantiated if a valid rerank model ID is provided in the experiment configuration.

---

### 🔧 Parameters

- `aws_region`: *(str)* – AWS region where the Bedrock reranking model is hosted.
- `rerank_model_id`: *(str)* – ID of the Bedrock reranking model to be used.

---

### ⚙️ Behavior

- If `rerank_model_id` is **not** `"none"` (case-insensitive), a `BedrockReranker` is created.
- If the value is `"none"`, no reranker is used and the value is set to `None`.

---

### 🎯 Outcome

- A `BedrockReranker` object if reranking is enabled.
- Otherwise, `reranker = None`.



In [19]:
reranker = BedrockReranker(exp_config_data.get("aws_region"), exp_config_data.get("rerank_model_id")) \
                if exp_config_data.get("rerank_model_id").lower() != "none" \
                else None

In [20]:
# processor = RetrieverProcessor(task_token="dummy-token", input_data=exp_config_data)
hierarchical = exp_config_data.get("chunking_strategy") == 'hierarchical'
retriever = Retriever(json_reader, embedding, vector_storage, inferencer, reranker)
results = retriever.retrieve(
    gt_data_path, 
    "What is the patient's name?",
    int(exp_config_data.get("knn_num")), 
    hierarchical
)

2025-04-14 05:29:22,042 - ERROR - Error retrieving from Bedrock Knowledge Base: An error occurred (AccessDeniedException) when calling the Retrieve operation: Bedrock was unable to assume the specified role. Provide necessary permissions to Bedrock and retry the request.


In [26]:
final_results = []
for each_result in results:
    final_results.append(each_result.to_json())

In [27]:
final_results

[{'question': 'What is Amazon Bedrock?',
  'answer': "Sorry, I don't have sufficient information to provide an answer.",
  'guardrails_output_assessment': None,
  'guardrails_context_assessment': None,
  'guardrails_input_assessment': None,
  'guardrails_blocked': False,
  'guardrails_block_level': '',
  'answer_metadata': {'inputTokens': 987,
   'outputTokens': 14,
   'totalTokens': 1001,
   'latencyMs': 320},
  'query_metadata': {'input_token': 0, 'latency_ms': 0},
  'reference_contexts': ["As part of our effort to improve the awareness of the importance of diversity in companies, we offer investors a glimpse into the transparency of more than just who are the shareholders at Amazon. We highlight the company&#x27;s commitment to diversity, inclusiveness, and social responsibility as ...As part of our effort to improve the awareness of the importance of diversity in companies, we offer investors a glimpse into the transparency of more than just who are the shareholders at Amazon. We h

In [29]:
with open("inference_metrics.json", "w") as json_file:
    json.dump(final_results, json_file, indent=4)