# Question and Answering
[Retrieval Augmented Question & Answering with Amazon Bedrock using LangChain](https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/03_QuestionAnswering/01_qa_w_rag_claude.ipynb)

In [1]:
from sagemaker import get_execution_role

In [2]:
strSageMakerRoleName = get_execution_role().rsplit('/', 1)[-1]
print (f"SageMaker Execution Role Name: {strSageMakerRoleName}")

SageMaker Execution Role Name: AmazonSageMakerServiceCatalogProductsUseRole


In [3]:
#!wget https://preview.documentation.bedrock.aws.dev/Documentation/SDK/bedrock-python-sdk.zip
#!unzip bedrock-python-sdk.zip -d bedrock-sdk
#!rm -rf bedrock-python-sdk.zip

In [4]:
install_needed = False

In [5]:
import sys
import IPython

if install_needed:
    print("installing deps and restarting kernel")
    !{sys.executable} -m pip install -U pip
    !{sys.executable} -m pip install -U sagemaker
    !{sys.executable} -m pip install -U ./bedrock-sdk/botocore-1.29.162-py3-none-any.whl
    !{sys.executable} -m pip install -U ./bedrock-sdk/boto3-1.26.162-py3-none-any.whl
    !{sys.executable} -m pip install -U ./bedrock-sdk/awscli-1.27.162-py3-none-any.whl
    !{sys.executable} -m pip install -U langchain
    !rm -rf bedrock-sdk

    IPython.Application.instance().kernel.do_shutdown(True)
    
    !pip install opensearch-py --quiet

In [6]:
import os
module_path = "."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

In [7]:
import boto3
import langchain

In [8]:
bedrock_region = "us-west-2" 
bedrock_config = {
    "region_name":bedrock_region,
    "endpoint_url":"https://prod.us-west-2.frontend.bedrock.aws.dev"
}

In [9]:
boto3_bedrock = bedrock.get_bedrock_client(
    region=bedrock_config["region_name"],
    url_override=bedrock_config["endpoint_url"])
    
modelInfo = boto3_bedrock.list_foundation_models()    
print('models: ', modelInfo)

Create new client
  Using region: us-west-2
boto3 Bedrock client successfully created!
bedrock(https://prod.us-west-2.frontend.bedrock.aws.dev)
models:  {'ResponseMetadata': {'RequestId': '4436f3e9-4f29-4a85-b03f-b9a9f369e5b5', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Tue, 25 Jul 2023 03:10:23 GMT', 'content-type': 'application/json', 'content-length': '256', 'connection': 'keep-alive', 'x-amzn-requestid': '4436f3e9-4f29-4a85-b03f-b9a9f369e5b5'}, 'RetryAttempts': 0}, 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-tg1-large', 'modelId': 'amazon.titan-tg1-large'}, {'modelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-e1t-medium', 'modelId': 'amazon.titan-e1t-medium'}]}


In [10]:
from langchain.llms.bedrock import Bedrock

In [11]:
modelId = 'amazon.titan-tg1-large'
llm = Bedrock(model_id=modelId, client=boto3_bedrock)

In [12]:
llm('Who is the president of usa?')

'\nThe current president of the United States of America is Joe Biden. He was born in Scranton, Pennsylvania and the first son of Catherine Eugenia Finnegan Biden and Joseph Robinette Biden, Sr.'

## Data Preparation

In [13]:
if install_needed:
    !pip install PyPDF2 --quiet

In [14]:
import PyPDF2
from io import BytesIO

In [15]:
import sagemaker, boto3, json
from sagemaker.session import Session

In [16]:
sess = sagemaker.Session()
s3_bucket = sess.default_bucket()
s3_prefix = 'docs'

In [17]:
#s3_file_name = 'sample-blog.pdf'
s3_file_name = '2016-3series.pdf'
#s3_file_name = 'gen-ai-aws.pdf'

In [18]:
s3r = boto3.resource("s3")
doc = s3r.Object(s3_bucket, s3_prefix+'/'+s3_file_name)
       
contents = doc.get()['Body'].read()
reader = PyPDF2.PdfReader(BytesIO(contents))
        
raw_text = []
for page in reader.pages:
    raw_text.append(page.extract_text())
contents = '\n'.join(raw_text)  

In [19]:
#new_contents = str(contents[:8000]).replace("\n"," ") 
new_contents = str(contents).replace("\n"," ") 

#print('new_contents: ', new_contents)

In [20]:
from langchain.text_splitter import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=0)
texts = text_splitter.split_text(new_contents) 

In [21]:
from langchain.docstore.document import Document
docs = [
    Document(
        page_content=t
    ) for t in texts[:5]
]

In [22]:
from langchain.embeddings import BedrockEmbeddings
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

In [23]:
from langchain.vectorstores import OpenSearchVectorSearch

In [24]:
endpoint_url = "https://search-os-rag-ndnwd5kdjwyo6ohcdyc22nufmi.ap-northeast-2.es.amazonaws.com"

In [25]:
vectorstore = OpenSearchVectorSearch.from_documents(
    docs, 
    bedrock_embeddings, 
    opensearch_url=endpoint_url,
    http_auth=("admin", "Wifi1234!"),
)

In [26]:
query = "What did the president say about Ketanji Brown Jackson"
docs = vectorstore.similarity_search(query)

In [27]:
docs

[Document(page_content='the text beginning,  “This should only be done by your service  center …” should be disregarded and the  following text should be read in lieu thereof:  “BMW recommends having this work per- formed by a service center as it is important  that this safety feature functions properly.” 7.At page 91, under the heading: \xa0 “Special  windshield,” the paragraph beginning,  “Therefore, have the special windshield …”  should be disregarded and the following  text should be read in lieu thereof: \xa0 “BMW  recommends that you have the special  windshield replaced by the service center.” 8.At page 168 under the heading: “Objects  within the range of movement of the ped- als” and at page 232 under the heading:  “Carpets and floor mats,” the paragraph  that begins: “Only use floor mats …” should  be disregarded and the following language  should be read in lieu thereof: “The manu- facturer of your vehicle recommends that  you use floor mats that have been identified  by it

### Question Answering1

In [28]:
from langchain.embeddings import BedrockEmbeddings
bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)

In [29]:
query = "Tell me how to use the manual."

In [30]:
from langchain.indexes.vectorstore import VectorStoreIndexWrapper

In [31]:
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore)

In [32]:
relevant_documents = vectorstore.similarity_search(query)
print(f'{len(relevant_documents)} documents are fetched which are relevant to the query.')
print('----')
for i, rel_doc in enumerate(relevant_documents):
    print_ww(f'## Document {i+1}: {rel_doc.page_content}.......')
    print('---')

4 documents are fetched which are relevant to the query.
----
## Document 1: are free to elect, both during  those periods and thereafter, to have main- tenance
and repair work provided by other  service centers or repair shops. 3.Where the Owner's Manual makes
refer- ence to parts and accessories having been  approved by BMW, those references are  intended to
reflect that those parts and  accessories are recommended by BMW of  North America LLC. You may
elect to use  other parts and accessories, but, if you do, we recommend that you make sure that any
such parts and/or accessories are appropri- ate for use on your vehicle. 4.At page 7, under the
warranty section's dis- cussion of homologation, where it states  that you “cannot lodge warranty
claims for  your vehicle there,” the text should read  that you “may not be able to lodge warranty
claims for your vehicle there.”  5.At page 7, under the “Parts and accesso- ries” section, in the
sixth sentence, the  word “cannot” should read “do

In [33]:
answer = wrapper_store_faiss.query(question=query, llm=llm)

In [34]:
print_ww(answer)


Read this owner's manual before starting off in your new BMW. Also use the Integrated Owner's Manual
in your vehicle. It contains important information on vehicle operation that will help you make full
use of the technical features available in your BMW.



### Customisable option

In [35]:
from langchain.prompts import PromptTemplate

prompt_template = """Human: Use the following pieces of context to provide a concise answer to the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
Assistant:"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [36]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_type="similarity", search_kwargs={"k": 3}
    ),
    return_source_documents=True,
    chain_type_kwargs={"prompt": PROMPT}
)

In [37]:
result = qa({"query": query})
print('result: ', result)

result:  {'query': 'Tell me how to use the manual.', 'result': " The owner's manual provides important information on vehicle operation and maintenance. It is recommended that you read it before starting off in your new BMW. The manual is available in many countries as an app and additional information can be found on the Internet.", 'source_documents': [Document(page_content="are free to elect, both during  those periods and thereafter, to have main- tenance and repair work provided by other  service centers or repair shops. 3.Where the Owner's Manual makes refer- ence to parts and accessories having been  approved by BMW, those references are  intended to reflect that those parts and  accessories are recommended by BMW of  North America LLC. You may elect to use  other parts and accessories, but, if you do, we recommend that you make sure that any  such parts and/or accessories are appropri- ate for use on your vehicle. 4.At page 7, under the warranty section's dis- cussion of homolo

In [38]:
source_documents = result['source_documents']
print(source_documents)

[Document(page_content="are free to elect, both during  those periods and thereafter, to have main- tenance and repair work provided by other  service centers or repair shops. 3.Where the Owner's Manual makes refer- ence to parts and accessories having been  approved by BMW, those references are  intended to reflect that those parts and  accessories are recommended by BMW of  North America LLC. You may elect to use  other parts and accessories, but, if you do, we recommend that you make sure that any  such parts and/or accessories are appropri- ate for use on your vehicle. 4.At page 7, under the warranty section's dis- cussion of homologation, where it states  that you “cannot lodge warranty claims for  your vehicle there,” the text should read  that you “may not be able to lodge warranty  claims for your vehicle there.”  5.At page 7, under the “Parts and accesso- ries” section, in the sixth sentence, the  word “cannot” should read “does not.” 6.At page 54, in the “Check and replace  s

In [39]:
print('output: ', result['result'])

output:   The owner's manual provides important information on vehicle operation and maintenance. It is recommended that you read it before starting off in your new BMW. The manual is available in many countries as an app and additional information can be found on the Internet.
