# Jupyter Notebook: AWS Bedrock with LlamaIndex and Elasticsearch

This notebook sets up a project that integrates **AWS Bedrock**, **LlamaIndex**, and **Elasticsearch** for building a retrieval-augmented generation (RAG) application. The notebook provides step-by-step guidance on installation, setup, and implementation.

## 1. Clone the Repository

```sh
git clone https://github.com/GenMindHub/07-LiamaIndex.git
cd 07-LiamaIndex
```

## 2. Create and Activate Virtual Environment

```sh
python3 -m venv .liamaindex
source .liamaindex/bin/activate
```

## 3. Install Jupyter Lab (Optional) 

If Jupyter Lab is not installed on your system, [install](https://jupyter.org/install) it using the following command:

```sh
python3 -m pip install -qU jupyterlab
python3 -m pip install -qU notebook
jupyter notebook password
jupyter lab
```

 ## 4. Install and Setup Elasticsearch

Follow the instructions [here](https://www.elastic.co/guide/en/elasticsearch/reference/current/run-elasticsearch-locally.html) to install and run Elasticsearch locally.

## 5. Install AWS CLI and Configure IAM Credentials

Install AWS CLI using the guide [here](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

Configure your AWS IAM credentials:

```sh
aws configure
```

# Python Code Implementation

## Install Required Libraries

Explanation
- `boto3` : it is an official AWS SDK for Python that allows you to interact with AWS services programmatically. 
- `python-dotenv` : Allows you to load environment variables from a .env file.Useful for storing API keys, AWS credentials, and other sensitive data without hardcoding them in your script.
- `llama-index`: A framework for indexing and querying large-scale data using LLMs.Used for building retrieval-augmented generation (RAG) applications.
- `llama-index-vector-stores-elasticsearch`: Elasticsearch vector store integration for LlamaIndex.Allows you to store and retrieve embeddings efficiently using Elasticsearch.
- `llama-index-embeddings-bedrock`: AWS Bedrock embeddings integration for LlamaIndex.
Enables you to generate text embeddings using AWS Bedrock foundation models.
- `llama-index-llms-bedrock`: AWS Bedrock LLM integration for LlamaIndex.
Allows you to use AWS-hosted foundation models for answering queries and processing text.

In [None]:
%pip install boto3
%pip install python-dotenv
%pip install llama-index
%pip install llama-index-vector-stores-elasticsearch
%pip install llama-index-embeddings-bedrock
%pip install llama-index-llms-bedrock

## Import Required Libraries

In [3]:
import os
# Creates a connection with AWS services
import boto3
# Loads AWS credentials and other environment variables from a .env file.
from dotenv import load_dotenv

# Defines global configurations for LlamaIndex, such as embedding models and storage options.
from llama_index.core import Settings
# Manages persistent storage for vector embeddings and index metadata.
from llama_index.core import StorageContext
# Reads text files (e.g., PDFs, TXT, or JSON) from a local directory for processing
from llama_index.core import SimpleDirectoryReader
# Creates an index for storing document embeddings to enable efficient search
from llama_index.core import VectorStoreIndex

# Defines custom functions and query engines to enhance responses.
from llama_index.core.tools import FunctionTool
from llama_index.core.tools import QueryEngineTool

# Uses AWS Bedrock's foundation models (like Titan or Claude) to generate text embeddings.
from llama_index.embeddings.bedrock import BedrockEmbedding, Models
# Stores and retrieves document embeddings using Elasticsearch as a vector database.
from llama_index.vector_stores.elasticsearch import ElasticsearchStore

# Uses AWS Bedrock's LLMs for answering queries and processing text
from llama_index.llms.bedrock import Bedrock
# Enables a reasoning-based agent that interacts with users using LLM-powered decision-making.
from llama_index.core.agent import ReActAgent


## Function Definitions

### 1. Load and Parse PDF Documents

In [4]:
def load_documents(pdf_folder: str):
    reader = SimpleDirectoryReader(pdf_folder, required_exts=[".pdf"])
    documents = reader.load_data()
    return documents

### 2. Initialize Elasticsearch as Vector Store

In [21]:
def initialize_elasticsearch():
    host = os.getenv("ELASTIC_HOST")
    username = os.getenv("ELASTIC_USERNAME")
    password = os.getenv("ELASTIC_PASSWORD")
    index_name = os.getenv("INDEX_NAME")
    
    vector_store = ElasticsearchStore(
        index_name=index_name, es_url=host, es_user=username, es_password=password
    )

    return vector_store

### 3. Embed Documents and Store in Vector Store

In [6]:
def create_index(documents, vector_store):
    
    # Initialize the bedrock embedding model 
    boto3_bedrock_client = boto3.client(service_name="bedrock-runtime")
    bedrock_embedding_model = BedrockEmbedding(model_name=os.getenv("AWS_BEDROCK_EMBEDDING_MODEL"), client=boto3_bedrock_client,)
    # Set Embeddings Globally Using `settings`
    Settings.embed_model = bedrock_embedding_model
    # assign OpenSearch as the vector_store to the context
    storage_context = StorageContext.from_defaults(vector_store=vector_store)
    # create index
    index = VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True)

    return index

### 4. Define Basic Math Functions

In [7]:
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and returns the product"""
    return a * b

def add(a: float, b: float) -> float:
    """Add two numbers and returns the sum"""
    return a + b

### 5. Create Query Engine Tool

In [8]:
def get_budget_tool(index, name, description):
    query_engine = index.as_query_engine()
    budget_tool = QueryEngineTool.from_defaults(
        query_engine,
        name=name,
        description=description,
    )
    return budget_tool

### 6. Initialize ReAct Agent

In [9]:
def get_react_agent(multiply_tool, add_tool, budget_tool):
    agent = ReActAgent.from_tools(
        [multiply_tool, add_tool, budget_tool], verbose=True
    )
    return agent

## Execution Flow

In [None]:
load_dotenv()

Settings.llm = Bedrock(model=os.getenv("AWS_BEDROCK_LLM_MODEL"), temperature=0, max_tokens=1000)

print("Loading documents...")
documents = load_documents("./data") # path is relative to virtual env
print(documents)

print("Initializing OpenSearch...")
vector_store = initialize_elasticsearch()
print(vector_store)


print("Creating index and storing embeddings...")
index = create_index(documents, vector_store)

print("Get Basic Maths Functions and Query Engine Tools...")
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
budget_tool = get_budget_tool(
    index, 
    "canadian_budget_2023", 
    "A RAG engine with some basic facts about the 2023 Canadian federal budget."
)
print(budget_tool)

print("Initialize and Get ReAct Agent for Chat...")
agent = get_react_agent(multiply_tool, add_tool, budget_tool)
response = agent.chat(
    "What is the total amount of the 2023 Canadian federal budget multiplied by 3? Go step by step, using a tool to do any math."
)

# Summary of workflow
- Load environment variables.
- Initialize AWS client for Bedrock.
- Set up LlamaIndex settings (embedding model, storage).
- Read documents from a directory.
- Create vector embeddings using AWS Bedrock.
- Store embeddings in Elasticsearch for retrieval.
- Set up Bedrock LLM for answering queries.
- Create tools and agents to enhance interactions.
- Use the ReAct agent to process and respond to user queries dynamically.