# Retrieval Augumented Generation (RAG) inference

***This notebook works best with the `conda_python3` on the `ml.t3.large` instance***.

---

At this point our slide deck data is ingested into Amazon OpenSearch Service Serverless collection. We are now ready to talk to our slide deck using a large multimodal model. We are using the [Anthropic’s Claude 3 Sonnet foundation model](https://aws.amazon.com/about-aws/whats-new/2024/03/anthropics-claude-3-sonnet-model-amazon-bedrock/) for this purpose.

## Step 1. Setup

Install the required Python packages and import the relevant files.

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

In [None]:
# import necessary libraries to run this notebook
import os
import io
import sys
import json
import yaml
import glob
import boto3
import base64
import logging
import requests
import botocore
import sagemaker
import opensearchpy
import numpy as np
import pandas as pd
import globals as g
from PIL import Image
from pathlib import Path
from typing import List, Dict
from IPython.display import Image
from urllib.parse import urlparse
from botocore.auth import SigV4Auth
from pandas.core.series import Series
from sagemaker import get_execution_role
from botocore.awsrequest import AWSRequest
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from utils import get_cfn_outputs, get_text_embedding, get_llm_response, combined_llm_response

In [None]:
# set a logger
logging.basicConfig(format='[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s', level=logging.INFO)
logger = logging.getLogger(__name__)

In [None]:
# global constants
CONFIG_FILE_PATH = "config.yaml"
# read the config yaml file
fpath = CONFIG_FILE_PATH
with open(fpath, 'r') as yaml_in:
    config = yaml.safe_load(yaml_in)
logger.info(f"config read from {fpath} -> {json.dumps(config, indent=2)}")

In [None]:
!pygmentize globals.py

## Step 2. Create two OpenSearch clients for images and texts separately

We create an OpenSearch client so that we can query the vector database for embeddings (pdf files) similar to the questions that we might want to ask of our slide deck and then we create a SageMaker [`Predictor`](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html) to run inference using the LLaVA model given the slide we retrieved from OpenSearch.

Get the name of the OpenSearch Service Serverless collection endpoint and index name from the CloudFormation stack outputs.

In [None]:
outputs = get_cfn_outputs(config['aws']['cfn_stack_name'])
host = outputs['MultimodalCollectionEndpoint'].split('//')[1]
text_index_name = outputs['OpenSearchTextIndexName']
img_index_name = outputs['OpenSearchImgIndexName']
logger.info(f"opensearchhost={host}, text index={text_index_name}, image index={img_index_name}")

In [None]:
session = boto3.Session()
credentials = session.get_credentials()
auth = AWSV4SignerAuth(credentials, g.AWS_REGION, g.OS_SERVICE)

# Represents the OSI client for images
img_os_client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)

# Represents the OSI client for images
text_os_client = OpenSearch(
    hosts = [{'host': host, 'port': 443}],
    http_auth = auth,
    use_ssl = True,
    verify_certs = True,
    connection_class = RequestsHttpConnection,
    pool_maxsize = 20
)

## Step 3. Read for RAG

We now have all the pieces for RAG. Here is how we _talk to our slide deck_.

1. Convert the user question into embeddings using the Titan Text Embeddings model.

1. Find the most similar slide (image) corresponding to the the embeddings (for the user question) from the vector database (OpenSearch Serverless).

1. Now ask Claude3 to answer the user question using the retrieved image description for the most similar slide.

In [None]:
bedrock = boto3.client(service_name="bedrock-runtime", endpoint_url=g.TITAN_URL)

A handy function for similarity search in the vector db

In [None]:
def find_similar_data(text_embeddings: np.ndarray, 
                      size: int, 
                      os_client: opensearchpy.client.OpenSearch, 
                      index_name: str) -> Dict:
    query = {
        "size": size,
        "query": {
            "knn": {
                "vector_embedding": {
                    "vector": text_embeddings,
                    "k": size
                }
            }
        }
    }
    try:
        content_based_search = os_client.search(body=query, index=index_name)
        # remove the vector_embedding field for readability purposes, it was needed during
        # the similarity search (by the vector db), we do not need it any more.
        source = content_based_search['hits']['hits'][0]['_source'].pop('vector_embedding')
    except Exception as e:
        logger.error(f"error occured while querying OpenSearch index={index_name}, exception={e}")
        content_based_search = None
    return content_based_search

## Function to get response from indexes

In [None]:
def get_index_response(question: str, 
                           size: int, 
                           os_client: opensearchpy.client.OpenSearch, 
                           index_name: str) -> Dict:
    try:
        index_llm_response_and_context: Dict = {}
        index_llm_response: Optional[str] = None
        # get the text embedding for the given question
        text_embedding = get_text_embedding(bedrock, question)
        vector_db_response: Dict = find_similar_data(text_embedding, size, os_client, index_name)
        content_path = vector_db_response.get('hits', {}).get('hits')[0].get('_source').get('file_path')
        if index_name == outputs['OpenSearchImgIndexName']:
            # download and display the image
            !aws s3 cp {content_path} .
            local_img_path = os.path.basename(content_path)
            display(Image(filename=local_img_path))
        logger.info(f"going to answer the question=\"{question}\" using the context: \"{content_path}\"")
        file_text = vector_db_response.get('hits', {}).get('hits')[0].get('_source').get('file_text')
        index_llm_response_and_context['file_text'] = file_text
        # logger.info(f"extracted text description for {index_name}: {file_text}")
        index_llm_response = get_llm_response(bedrock, question, file_text)
        index_llm_response_and_context['index_llm_response'] = index_llm_response
    except Exception as e:
        logger.error(f"Could not get a response from the {index_name}: {e}")
        index_llm_response_and_context['index_llm_response'] = None
        index_llm_response_and_context['file_text'] = None
    return index_llm_response_and_context

### Question 1 - Text Only Response

This question is specifically pointed to a text within any of the 10 pdf files that are ingested

In [None]:
question: str = "What was the big decision that Amazon announced in 3Q23?"
get_index_response(question, 1, text_os_client, text_index_name)

### Question 2 - Image Index Response

This question is specifically pointed to a image within any of the 10 pdf files that are ingested

In [None]:
# question: str = "What is the 52-week high and low for Boeing based on the Market overview?"
# question: str = "What was the number of outstanding shares for Boeing based on the Market overview?"
question: str = "What is the earnings and growth analysis for Amazon?"
get_index_response(question, 1, img_os_client, img_index_name)

### Question 3 - Combined Response (Both Image and Text Indexes)

#### First, get the response from the text index using the text index

In [None]:
# question: str = "Give us a summary of Amazon's financial performance in Q4 2023"
question: str = "What was the guidance for Amazon's financial performance for Q1 2024?"
text_only_response: Dict = get_index_response(question, 1, text_os_client, text_index_name)
text_only_response

#### Get the response from the image index using the image index

In [None]:
image_only_response: Dict = get_index_response(question, 1, img_os_client, img_index_name)
image_only_response

#### Now, get the combined response from both image and text indexes

In [None]:
combined_response = combined_llm_response(bedrock, question, 
                                          f"{text_only_response.get('file_text')}\n{image_only_response.get('file_text')}")

In [None]:
combined_response

## Clean Up
