# Intelligent Video and Audio Q&A with LLMs

This practice lab demonstrates how large language models (LLMs) can be used to analyze audio and video media for question-answering tasks. In this solution, you use LangChain, Hugging Face's GPT and Falcon models, and Amazon Sagemaker to construct a system capable of providing informative answers based on the content of transcribed multimedia files.


In [None]:
## Code Cell 1 ##
#Install packages

!pip install --upgrade sagemaker --quiet
!pip install ipywidgets==7.0.0 --quiet
!pip install langchain==0.0.148 --quiet
!pip install faiss-cpu --quiet
!pip install unstructured==0.8.1 --quiet

In [None]:
## Code Cell 2 ##

import time
import os
import sagemaker, boto3, json
from sagemaker.session import Session
from sagemaker.model import Model
from sagemaker import image_uris, model_uris, script_uris, hyperparameters
from sagemaker.predictor import Predictor
from sagemaker.utils import name_from_base
from typing import Any, Dict, List, Optional
from langchain.embeddings import SagemakerEndpointEmbeddings
from langchain.llms.sagemaker_endpoint import ContentHandlerBase

sagemaker_session = Session()

aws_role = sagemaker_session.get_caller_identity_arn()
aws_region = boto3.Session().region_name
sess = sagemaker.Session()

client = boto3.client("runtime.sagemaker")

def query_endpoint_with_json_payload(encoded_json, endpoint_name, content_type="application/json"):
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType=content_type, Body=encoded_json
    )
    return response


def parse_response(query_response):
    model_predictions = json.loads(query_response["Body"].read())
    return model_predictions

def get_model_endpoint_with_prefix(prefix="falcon"):
    # Create a SageMaker client
    client = boto3.client('sagemaker')
    
    # List all SageMaker endpoints
    response = client.list_endpoints(
        MaxResults=5,  
        SortBy='Name'
    )
    
    # Filter endpoints that start with 'falcon'
    for endpoint in response['Endpoints']:
        if endpoint['EndpointName'].startswith(prefix):
            return endpoint['EndpointName']
    
    return None

qa_endpoint_name = get_model_endpoint_with_prefix()

## Ask a question to the LLM without providing the context.

Retrieval Augmented Generation (RAG) is an effective way to resolve the Q&A issue. To demonstrate this, directly ask your model a question and see how it responds. This practice lab uses the Falcon 7B Instruct BF16 model to generate a response to a question.

In [None]:
## Code Cell 3 ##

question = "What is Amazon Bedrock?"

payload = {
        "inputs": question,
        "max_new_tokens": 50,
        "top_k":50,
        "num_return_sequences": 1,
        "top_p":0.95,
        "do_sample":True
    }

query_response = query_endpoint_with_json_payload(
        json.dumps(payload).encode("utf-8"), endpoint_name=qa_endpoint_name
    )
generated_texts = parse_response(query_response)

print(f"{generated_texts[0]}")    

You can see that the generated answer is wrong or doesn't make much sense.

## Use a RAG-based approach with [LangChain](https://python.langchain.com/en/latest/index.html) and SageMaker endpoints to build a basic Q&A application.


This practice lab uses document embeddings to fetch the most relevant documents in a document knowledge library and combine them with the prompt that is provided to the LLM.

To achieve that, you will:

1. Generate embeddings for each document in the knowledge library with the SageMaker GPT-J 6B Embedding FP16 embedding model.
2. Identify the top K most relevant documents based on the user query.
    - 2.1 For a query of your interest, generate the embedding of the query using the same embedding model.
    - 2.2 Search the indexes of the top K most relevant documents in the embedding space by using an in-memory Faiss search.
    - 2.3 Use the indexes to retrieve the corresponding documents.
3. Combine the retrieved documents, with the prompt and question, and send them to the SageMaker LLM.


Note: The retrieved document or text should be large enough to contain the information needed to answer a question, but it should be small enough to fit into the LLM prompt, which has a maximum sequence length of 1024 tokens. 

---
To build a basic Q&A application with LangChain, you must: 
1. Wrap your SageMaker endpoints, for the embedding model and the LLM, into `langchain.embeddings.SagemakerEndpointEmbeddings` and `langchain.llms.sagemaker_endpoint.SagemakerEndpoint`. This requires a small overwriting of the `SagemakerEndpointEmbeddings` class to make it compatible with the SageMaker embedding model.
2. Prepare the dataset to build the knowledge database. 

---

First, wrap your SageMaker endpoint for the embedding model into `langchain.embeddings.SagemakerEndpointEmbeddings`. This requires a small overwriting of the `SagemakerEndpointEmbeddings` class to make it compatible with SageMaker embedding model.

In [None]:
## Code Cell 4 ##

from langchain.embeddings.sagemaker_endpoint import EmbeddingsContentHandler
from langchain.embeddings import SagemakerEndpointEmbeddings


class SagemakerEndpointEmbeddingsJumpStart(SagemakerEndpointEmbeddings):
    def embed_documents(self, texts: List[str], chunk_size: int = 5) -> List[List[float]]:
        """Compute doc embeddings using a SageMaker Inference Endpoint.

        Args:
            texts: The list of texts to embed.
            chunk_size: The chunk size defines how many input texts will
                be grouped together as request. If None, will use the
                chunk size specified by the class.

        Returns:
            List of embeddings, one for each text.
        """
        results = []
        _chunk_size = len(texts) if chunk_size > len(texts) else chunk_size

        for i in range(0, len(texts), _chunk_size):
            response = self._embedding_func(texts[i : i + _chunk_size])
            print
            results.extend(response)
        return results


class ContentHandler(EmbeddingsContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"text_inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        embeddings = response_json["embedding"]
        return embeddings


content_handler = ContentHandler()

embeddings = SagemakerEndpointEmbeddingsJumpStart(
    endpoint_name=get_model_endpoint_with_prefix("jumpstart"),
    region_name=aws_region,
    content_handler=content_handler,
)

Next, wrap SageMaker endpoint for the LLM into `langchain.llms.sagemaker_endpoint.SagemakerEndpoint`. 


In [None]:
## Code Cell 5 ##

from langchain.llms.sagemaker_endpoint import LLMContentHandler, SagemakerEndpoint

parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 250,
    "top_p": 0.95,
    "do_sample": False,
    "temperature": 1,
}

class ContentHandler(LLMContentHandler):
    content_type = "application/json"
    accepts = "application/json"

    def transform_input(self, prompt: str, model_kwargs={}) -> bytes:
        input_str = json.dumps({"inputs": prompt, **model_kwargs})
        return input_str.encode("utf-8")

    def transform_output(self, output: bytes) -> str:
        response_json = json.loads(output.read().decode("utf-8"))
        print(response_json)
       
        return response_json[0]["generated_text"]

content_handler = ContentHandler()

sm_llm = SagemakerEndpoint(
    endpoint_name=get_model_endpoint_with_prefix("falcon"),
    region_name=aws_region,
    model_kwargs=parameters,
    content_handler=content_handler,
)

Now, download the transcription data from the media files, and prepare it to make embeddings. You will use a transcript generated by Amazon Transcribe as the knowledge library for your model. 


In [None]:
## Code Cell 6 ##

s3 = boto3.client('s3')
transcribes_bucket = [bucket['Name'] for bucket in s3.list_buckets()['Buckets'] if bucket['Name'].startswith('transcribe-')].pop()

!mkdir -p rag_data
!aws s3 cp --recursive s3://$transcribes_bucket rag_data

#Converting transcripts from json to text files
directory='rag_data'
for filename in os.listdir(directory):
    # Check if the current file is a JSON file
    if filename.endswith('.json'):
        json_path = os.path.join(directory, filename)
        # Open and read the JSON file
        with open(json_path) as json_file:
            data = json.load(json_file)
            transcript = data["results"]["transcripts"][0]["transcript"]
        
        # Create a corresponding text file name
        text_filename = filename.replace('.json', '.txt')
        text_path = os.path.join(directory, text_filename)
        
        # Open the text file for writing and write the transcript
        with open(text_path, 'w') as text_file:
            text_file.write(transcript)

Use LangChain to read the .txt data. There are multiple built-in functions in LangChain to read different file formats, such as .txt, .html, and .pdf. For more information, see [LangChain document loaders](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html).

For a case when you have data saved in multiple subsets, the following code can read all files that end with .txt and concatenate them together. Be sure that each .csv file has the same format.

In [None]:
## Code Cell 7 ##

# Import the LangChain doc
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.vectorstores import Chroma, AtlasDB, FAISS
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain import PromptTemplate
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import DirectoryLoader

loader = DirectoryLoader("./rag_data/", glob="*.txt")
documents = loader.load()

Now, you can build your Q&A application! LangChain streamlines this process with the following few lines of code.

In [None]:
## Code Cell 8 ##

index_creator = VectorstoreIndexCreator(
    vectorstore_cls=FAISS,
    embedding=embeddings,
    text_splitter=RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=200),
)
index = index_creator.from_loaders([loader])

# Review the question variable content.
print(question)

# Ask the question to the model using the vector store with the transcript embeddings.
index.query(question=question, llm=sm_llm)

## Customize the previous Q&A app with a different prompt.

Now, you can see the efficient power of LangChain to create a question and answering application with just few lines of code. Next, you will take a closer look at the previous `VectorstoreIndexCreator`. You will demonstrate how to incorporate a customized prompt, rather than using a default prompt, with `VectorstoreIndexCreator`.

First, generate embeddings for each document in the knowledge library by using the SageMaker GPT-J-6B embedding model.

Based on the previously asked question, you'll identify the top K most relevant documents, based on the user query, where K = 3 in this setup.

In [None]:
## Code Cell 9 ##

docsearch = FAISS.from_documents(documents, embeddings)

print(question)

docs = docsearch.similarity_search(question, k=3)

Print out the top 3 most relevant documents, as shown below.

In [None]:
## Code Cell 10 ##

docs

Finally, combine the retrieved documents with the prompt, and question and send them to the SageMaker LLM. 

You define a customized prompt, as shown below.

In [None]:
## Code Cell 11 ## 
prompt_template = """Answer based on context:\n\n{context}\n\n{question}"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
chain = load_qa_chain(llm=sm_llm, prompt=PROMPT)

Send the top 3 most relevant documents and the question to the LLM to get a answer.

In [None]:
## Code Cell 12 ##
result = chain({"input_documents": docs, "question": question}, return_only_outputs=True)[
    "output_text"
]

The final answer from the model should be accurate and based on the document provided with the prompt.