# Knowledge Bases for Amazon Bedrock - End to end example

This notebook provides sample code for building an empty OpenSearch Serverless (OSS) index, Amazon Bedrock knowledge base and ingest documents into the index.


#### Notebook Walkthrough

A data pipeline that ingests documents (typically stored in Amazon S3) into a knowledge base i.e. a vector database such as Amazon OpenSearch Service Serverless (AOSS) so that it is available for lookup when a question is received.

- Load the documents into the knowledge base by connecting your s3 bucket (data source). 
- Ingestion - Knowledge base will split them into smaller chunks (based on the strategy selected), generate embeddings and store it in the associated vectore store.

![data_ingestion.png](./images/data_ingestion.png)


#### Steps: 
- Create Amazon Bedrock Knowledge Base execution role with necessary policies for accessing data from S3 and writing embeddings into OSS.
- Create an empty OpenSearch serverless index.
- Download documents
- Create Amazon Bedrock knowledge base
- Create a data source within knowledge base which will connect to Amazon S3
- Start an ingestion job using KB APIs which will read data from s3, chunk it, convert chunks into embeddings using Amazon Titan Embeddings model and then store these embeddings in AOSS. All of this without having to build, deploy and manage the data pipeline.

Once the data is available in the Bedrock Knowledge Base then a question answering application can be built using the Knowledge Base APIs provided by Amazon Bedrock in following notebooks in the same folder. 
- [1_managed-rag-kb-retrieve-generate-api.ipynb](./1\_managed-rag-kb-retrieve-generate-api.ipynb)
- [2_customized-rag-retrieve-api-claude-v2.ipynb](./2\_customized-rag-retrieve-api-claude-v2.ipynb)
- [3_customized-rag-retrieve-api-langchain-claude-v2.ipynb](./3\_customized-rag-retrieve-api-langchain-claude-v2.ipynb)


#### Pre-requisites
This notebook requires permissions to:
- create and delete Amazon IAM roles
- create, update and delete Amazon S3 buckets
- access Amazon Bedrock
- access to Amazon OpenSearch Serverless

If running on SageMaker Studio, you should add the following managed policies to your role:
- IAMFullAccess
- AWSLambda_FullAccess
- AmazonS3FullAccess
- AmazonBedrockFullAccess
- Custom policy for Amazon OpenSearch Serverless such as:
```
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "aoss:*",
            "Resource": "*"
        }
    ]
}
```

## Setup
Before running the rest of this notebook, you'll need to run the cells below to (ensure necessary libraries are installed and) connect to Bedrock.

In [None]:
%pip install -U opensearch-py==2.3.1
%pip install -U boto3==1.33.2
%pip install -U retrying==1.3.4

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
import json
import os
import boto3
from botocore.exceptions import ClientError
import pprint
from utility import create_bedrock_execution_role, create_oss_policy_attach_bedrock_execution_role, create_policies_in_oss, interactive_sleep
import random
from retrying import retry
suffix = random.randrange(200, 900)

sts_client = boto3.client('sts')
boto3_session = boto3.session.Session()
region_name = boto3_session.region_name
bedrock_agent_client = boto3_session.client('bedrock-agent', region_name=region_name)
service = 'aoss'
s3_client = boto3.client('s3')
account_id = sts_client.get_caller_identity()["Account"]
s3_suffix = f"{region_name}-{account_id}"
bucket_name = f'bedrock-kb-{s3_suffix}' # replace it with your bucket name.
pp = pprint.PrettyPrinter(indent=2)

In [6]:
# Check if bucket exists, and if not create S3 bucket for knowledge base data source
try:
    s3_client.head_bucket(Bucket=bucket_name)
    print(f'Bucket {bucket_name} Exists')
except ClientError as e:
    print(f'Creating bucket {bucket_name}')
    s3bucket = s3_client.create_bucket(
        Bucket=bucket_name,
        CreateBucketConfiguration={ 'LocationConstraint': region_name }
    )

Bucket bedrock-kb-us-west-2-976939723775 Exists


## Create a vector store - OpenSearch Serverless index

### Step 1 - Create OSS policies and collection
Firt of all we have to create a vector store. In this section we will use *Amazon OpenSerach serverless.*

Amazon OpenSearch Serverless is a serverless option in Amazon OpenSearch Service. As a developer, you can use OpenSearch Serverless to run petabyte-scale workloads without configuring, managing, and scaling OpenSearch clusters. You get the same interactive millisecond response times as OpenSearch Service with the simplicity of a serverless environment. Pay only for what you use by automatically scaling resources to provide the right amount of capacity for your application—without impacting data ingestion.

In [8]:
import boto3
import time
vector_store_name = f'bedrock-mmrag-{suffix}'
index_name = f"bedrock-mmrag-index-{suffix}"
aoss_client = boto3_session.client('opensearchserverless')
bedrock_kb_execution_role = create_bedrock_execution_role(bucket_name=bucket_name)
bedrock_kb_execution_role_arn = bedrock_kb_execution_role['Role']['Arn']

In [9]:
# create security, network and data access policies within OSS
encryption_policy, network_policy, access_policy = create_policies_in_oss(vector_store_name=vector_store_name,
                       aoss_client=aoss_client,
                       bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn)
collection = aoss_client.create_collection(name=vector_store_name,type='VECTORSEARCH')

In [10]:
pp.pprint(collection)

{ 'ResponseMetadata': { 'HTTPHeaders': { 'connection': 'keep-alive',
                                         'content-length': '309',
                                         'content-type': 'application/x-amz-json-1.0',
                                         'date': 'Wed, 10 Apr 2024 20:19:41 '
                                                 'GMT',
                                         'x-amzn-requestid': 'abaa8b7b-c0e6-4f75-858b-6abbe59f41a0'},
                        'HTTPStatusCode': 200,
                        'RequestId': 'abaa8b7b-c0e6-4f75-858b-6abbe59f41a0',
                        'RetryAttempts': 0},
  'createCollectionDetail': { 'arn': 'arn:aws:aoss:us-west-2:976939723775:collection/2hssfswq7j8e0s1w7xgk',
                              'createdDate': 1712780381849,
                              'id': '2hssfswq7j8e0s1w7xgk',
                              'kmsKeyArn': 'auto',
                              'lastModifiedDate': 1712780381849,
                             

In [11]:
# Get the OpenSearch serverless collection URL
collection_id = collection['createCollectionDetail']['id']
host = collection_id + '.' + region_name + '.aoss.amazonaws.com'
print(host)

2hssfswq7j8e0s1w7xgk.us-west-2.aoss.amazonaws.com


In [12]:
# wait for collection creation
# This can take couple of minutes to finish
response = aoss_client.batch_get_collection(names=[vector_store_name])
# Periodically check collection status
while (response['collectionDetails'][0]['status']) == 'CREATING':
    print('Creating collection...')
    interactive_sleep(30)
    response = aoss_client.batch_get_collection(names=[vector_store_name])
print('\nCollection successfully created:')
pp.pprint(response["collectionDetails"])

Creating collection...
Done!.........................

Collection successfully created:
[ { 'arn': 'arn:aws:aoss:us-west-2:976939723775:collection/2hssfswq7j8e0s1w7xgk',
    'collectionEndpoint': 'https://2hssfswq7j8e0s1w7xgk.us-west-2.aoss.amazonaws.com',
    'createdDate': 1712780381849,
    'dashboardEndpoint': 'https://2hssfswq7j8e0s1w7xgk.us-west-2.aoss.amazonaws.com/_dashboards',
    'id': '2hssfswq7j8e0s1w7xgk',
    'kmsKeyArn': 'auto',
    'lastModifiedDate': 1712780409097,
    'name': 'bedrock-mmrag-757',
    'standbyReplicas': 'ENABLED',
    'status': 'ACTIVE',
    'type': 'VECTORSEARCH'}]


In [13]:
# create opensearch serverless access policy and attach it to Bedrock execution role
try:
    create_oss_policy_attach_bedrock_execution_role(collection_id=collection_id,
                                                    bedrock_kb_execution_role=bedrock_kb_execution_role)
    # It can take up to a minute for data access rules to be enforced
    interactive_sleep(60)
except Exception as e:
    print("Policy already exists")
    pp.pprint(e)

Opensearch serverless arn:  arn:aws:iam::976939723775:policy/AmazonBedrockOSSPolicyForKnowledgeBase_671
Done!.......................................................


## Step 2 - Create vector index

In [14]:
# Create the vector index in Opensearch serverless, with the knn_vector field index mapping, specifying the dimension size, name and engine.
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth, RequestError
credentials = boto3.Session().get_credentials()
awsauth = auth = AWSV4SignerAuth(credentials, region_name, service)

index_name = f"bedrock-sample-index-{suffix}"
body_json = {
   "settings": {
      "index.knn": "true",
       "number_of_shards": 1,
       "knn.algo_param.ef_search": 512,
       "number_of_replicas": 0,
   },
   "mappings": {
      "properties": {
         "vector": {
            "type": "knn_vector",
            "dimension": 1536,
             "method": {
                 "name": "hnsw",
                 "engine": "faiss",
                 "space_type": "l2"
             },
         },
         "text": {
            "type": "text"
         },
         "text-metadata": {
            "type": "text"         }
      }
   }
}

# Build the OpenSearch client
oss_client = OpenSearch(
    hosts=[{'host': host, 'port': 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)


In [15]:
# Create index
try:
    response = oss_client.indices.create(index=index_name, body=json.dumps(body_json))
    print('\nCreating index:')
    pp.pprint(response)

    # index creation can take up to a minute
    interactive_sleep(60)
except RequestError as e:
    # you can delete the index if its already exists
    # oss_client.indices.delete(index=index_name)
    print(f'Error while trying to create the index, with error {e.error}\nyou may unmark the delete above to delete, and recreate the index')
    


Creating index:
{ 'acknowledged': True,
  'index': 'bedrock-sample-index-757',
  'shards_acknowledged': True}
Done!.......................................................


In [26]:
%%capture cap

print(f"AOSS_host_name: {host}:443\nAOSS_index_name: {index_name}")
with open('./.aoss_config.txt', 'w') as file:
    file.write(cap.stdout)

## Download data to ingest into our knowledge base

In [52]:
# Download and prepare dataset
!mkdir -p ./data

from urllib.request import urlretrieve
urls = [
    'https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf',
    'https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf'
]

filenames = [
    'AMZN-2022-Shareholder-Letter.pdf',
    'AMZN-2021-Shareholder-Letter.pdf',
    'AMZN-2020-Shareholder-Letter.pdf',
    'AMZN-2019-Shareholder-Letter.pdf'
]

data_root = "./data/"

for idx, url in enumerate(urls):
    file_path = data_root + filenames[idx]
    urlretrieve(url, file_path)


#### Upload data to S3 Bucket data source

In [28]:
# Upload data to s3 to the bucket that was configured as a data source to the knowledge base
s3_client = boto3.client("s3")
def uploadDirectory(path,bucket_name):
        for root,dirs,files in os.walk(path):
            for file in files:
                s3_client.upload_file(os.path.join(root,file),bucket_name,file)

uploadDirectory(data_root, bucket_name)

## Create Knowledge Base
Steps:
- initialize Open search serverless configuration which will include collection ARN, index name, vector field, text field and metadata field.
- initialize chunking strategy, based on which KB will split the documents into pieces of size equal to the chunk size mentioned in the `chunkingStrategyConfiguration`.
- initialize the s3 configuration, which will be used to create the data source object later.
- initialize the Titan embeddings model ARN, as this will be used to create the embeddings for each of the text chunks.

In [42]:
opensearchServerlessConfiguration = {
            "collectionArn": collection["createCollectionDetail"]['arn'],
            "vectorIndexName": index_name,
            "fieldMapping": {
                "vectorField": "vector",
                "textField": "text",
                "metadataField": "text-metadata"
            }
        }

# Ingest strategy - How to ingest data from the data source
chunkingStrategyConfiguration = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 90,
        "overlapPercentage": 10
    }
}

# The data source to ingest documents from, into the OpenSearch serverless knowledge base index
s3Configuration = {
    "bucketArn": f"arn:aws:s3:::{bucket_name}",
    # "inclusionPrefixes":["*.*"] # you can use this if you want to create a KB using data within s3 prefixes.
}

# The embedding model used by Bedrock to embed ingested documents, and realtime prompts
embeddingModelArn = f"arn:aws:bedrock:{region_name}::foundation-model/amazon.titan-embed-text-v1"

name = f"bedrock-sample-knowledge-base-{suffix}"
description = "Amazon shareholder letter knowledge base."
roleArn = bedrock_kb_execution_role_arn


Provide the above configurations as input to the `create_knowledge_base` method, which will create the Knowledge base.

In [43]:
# Create a KnowledgeBase
from retrying import retry

@retry(wait_random_min=1000, wait_random_max=2000,stop_max_attempt_number=7)
def create_knowledge_base_func():
    create_kb_response = bedrock_agent_client.create_knowledge_base(
        name = name,
        description = description,
        roleArn = roleArn,
        knowledgeBaseConfiguration = {
            "type": "VECTOR",
            "vectorKnowledgeBaseConfiguration": {
                "embeddingModelArn": embeddingModelArn
            }
        },
        storageConfiguration = {
            "type": "OPENSEARCH_SERVERLESS",
            "opensearchServerlessConfiguration":opensearchServerlessConfiguration
        }
    )
    return create_kb_response["knowledgeBase"]

In [44]:
try:
    kb = create_knowledge_base_func()
except Exception as err:
    print(f"{err=}, {type(err)=}")

In [45]:
pp.pprint(kb)

{ 'createdAt': datetime.datetime(2024, 4, 10, 20, 56, 17, 977227, tzinfo=tzlocal()),
  'description': 'Amazon shareholder letter knowledge base.',
  'knowledgeBaseArn': 'arn:aws:bedrock:us-west-2:976939723775:knowledge-base/FTQA91C0EF',
  'knowledgeBaseConfiguration': { 'type': 'VECTOR',
                                  'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': 'arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v1'}},
  'knowledgeBaseId': 'FTQA91C0EF',
  'name': 'bedrock-sample-knowledge-base-757',
  'roleArn': 'arn:aws:iam::976939723775:role/AmazonBedrockExecutionRoleForKnowledgeBase_671',
  'status': 'CREATING',
  'storageConfiguration': { 'opensearchServerlessConfiguration': { 'collectionArn': 'arn:aws:aoss:us-west-2:976939723775:collection/2hssfswq7j8e0s1w7xgk',
                                                                   'fieldMapping': { 'metadataField': 'text-metadata',
                                                                    

In [46]:
# Get KnowledgeBase 
get_kb_response = bedrock_agent_client.get_knowledge_base(knowledgeBaseId = kb['knowledgeBaseId'])

Next we need to create a data source, which will be associated with the knowledge base created above. Once the data source is ready, we can then start to ingest the documents.

In [47]:
# Create a DataSource in KnowledgeBase 
create_ds_response = bedrock_agent_client.create_data_source(
    name = name,
    description = description,
    knowledgeBaseId = kb['knowledgeBaseId'],
    dataSourceConfiguration = {
        "type": "S3",
        "s3Configuration":s3Configuration
    },
    vectorIngestionConfiguration = {
        "chunkingConfiguration": chunkingStrategyConfiguration
    }
)
ds = create_ds_response["dataSource"]
pp.pprint(ds)

{ 'createdAt': datetime.datetime(2024, 4, 10, 20, 56, 44, 491631, tzinfo=tzlocal()),
  'dataSourceConfiguration': { 's3Configuration': { 'bucketArn': 'arn:aws:s3:::bedrock-kb-us-west-2-976939723775'},
                               'type': 'S3'},
  'dataSourceId': 'KAGX2R5UX2',
  'description': 'Amazon shareholder letter knowledge base.',
  'knowledgeBaseId': 'FTQA91C0EF',
  'name': 'bedrock-sample-knowledge-base-757',
  'status': 'AVAILABLE',
  'updatedAt': datetime.datetime(2024, 4, 10, 20, 56, 44, 491631, tzinfo=tzlocal()),
  'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE',
                                                               'fixedSizeChunkingConfiguration': { 'maxTokens': 90,
                                                                                                   'overlapPercentage': 10}}}}


In [96]:
# Get DataSource 
DS = bedrock_agent_client.get_data_source(knowledgeBaseId = kb['knowledgeBaseId'], dataSourceId = ds["dataSourceId"])

In [107]:
DS['dataSource']['knowledgeBaseId']
DS['dataSource']['dataSourceId']

'KAGX2R5UX2'

### Start ingestion job
Once the KB and data source is created, we can start the ingestion job.
During the ingestion job, KB will fetch the documents in the data source, pre-process it to extract text, chunk it based on the chunking size provided, create embeddings of each chunk and then write it to the vector database, in this case OSS.

In [49]:
# Start an ingestion job
start_job_response = bedrock_agent_client.start_ingestion_job(knowledgeBaseId = kb['knowledgeBaseId'], dataSourceId = ds["dataSourceId"])

In [50]:
job = start_job_response["ingestionJob"]
pp.pprint(job)

{ 'dataSourceId': 'KAGX2R5UX2',
  'ingestionJobId': 'HVVSQFV6XG',
  'knowledgeBaseId': 'FTQA91C0EF',
  'startedAt': datetime.datetime(2024, 4, 10, 20, 57, 1, 557898, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 0,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 0},
  'status': 'STARTING',
  'updatedAt': datetime.datetime(2024, 4, 10, 20, 57, 1, 557898, tzinfo=tzlocal())}


In [51]:
# Get job 
while(job['status']!='COMPLETE' ):
  get_job_response = bedrock_agent_client.get_ingestion_job(
      knowledgeBaseId = kb['knowledgeBaseId'],
        dataSourceId = ds["dataSourceId"],
        ingestionJobId = job["ingestionJobId"]
  )
  job = get_job_response["ingestionJob"]
pp.pprint(job)
interactive_sleep(40)

{ 'dataSourceId': 'KAGX2R5UX2',
  'ingestionJobId': 'HVVSQFV6XG',
  'knowledgeBaseId': 'FTQA91C0EF',
  'startedAt': datetime.datetime(2024, 4, 10, 20, 57, 1, 557898, tzinfo=tzlocal()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 4,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 4},
  'status': 'COMPLETE',
  'updatedAt': datetime.datetime(2024, 4, 10, 20, 57, 44, 696478, tzinfo=tzlocal())}
Done!...................................


In [114]:
%%capture cap
kb_id = kb["knowledgeBaseId"]
print(f"AOSS_host_name: {host}:443\nAOSS_index_name: {index_name}\nKB_id: {DS['dataSource']['knowledgeBaseId']}\nDS_id: {DS['dataSource']['dataSourceId']}")
print(f"Region: {region_name}\nS3_bucket_name: {bucket_name}")
with open('./.aoss_config.txt', 'w') as file:
    file.write(cap.stdout)
    
pp.pprint(kb_id)

'FTQA91C0EF'


In [54]:
# keep the kb_id for invocation later in the invoke request
%store kb_id

Stored 'kb_id' (str)


## Test the knowledge base
### Note: If you plan to run any following notebooks, you can skip this section
### Using RetrieveAndGenerate API
Behind the scenes, RetrieveAndGenerate API converts queries into embeddings, searches the knowledge base, and then augments the foundation model prompt with the search results as context information and returns the FM-generated response to the question. For multi-turn conversations, Knowledge Bases manage short-term memory of the conversation to provide more contextual results.

The output of the RetrieveAndGenerate API includes the generated response, source attribution as well as the retrieved text chunks.

In [121]:
# try out KB using RetrieveAndGenerate API
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)
# Lets see how different Anthropic models responds to the input text we provide
# Bedrock KB only supports Anthropic models as of 4/10/2024
claude_model_ids = [ ["Claude 3 Haiku", "anthropic.claude-3-haiku-20240307-v1:0"], ["Claude 3 Sonnet", "anthropic.claude-3-sonnet-20240229-v1:0"], ["Claude Instant", "anthropic.claude-instant-v1"], ["Claude v2", "anthropic.claude-v2"] ]

In [122]:
def ask_bedrock_llm_with_knowledge_base(query: str, model_arn: str, kb_id: str) -> str:
    response = bedrock_agent_runtime_client.retrieve_and_generate(
        input={
            'text': query
        },
        retrieveAndGenerateConfiguration={
            'type': 'KNOWLEDGE_BASE',
            'knowledgeBaseConfiguration': {
                'knowledgeBaseId': kb_id,
                'modelArn': model_arn
            }
        },
    )

    generated_text = response['output']['text']
    return generated_text

In [124]:
query = "What is Amazon's doing in the field of generative AI?"
query = "What was the main theme in Andy Jassy's 2022 letter to AMZN shareholders?"

for model_id in claude_model_ids:
    model_arn = f'arn:aws:bedrock:{region_name}::foundation-model/{model_id[1]}'
    generated_text = ask_bedrock_llm_with_knowledge_base(query, model_arn, kb_id)
    
    print(f"Generated using Amazon Bedrock and {model_id[0]}:")
    pp.pprint(generated_text)
    print()

Generated using Amazon Bedrock and Claude 3 Haiku:
('Based on the search results, I could not find any information about the main '
 "theme in Andy Jassy's 2022 letter to Amazon shareholders. The search results "
 'do not contain any details about the content or themes of this letter.')

Generated using Amazon Bedrock and Claude 3 Sonnet:
'Sorry, I am unable to assist you with this request.'

Generated using Amazon Bedrock and Claude Instant:
("The main theme in Andy Jassy's 2022 letter to AMZN shareholders was focusing "
 'on customers.')

Generated using Amazon Bedrock and Claude v2:
'Sorry, I am unable to assist you with this request.'



In [64]:
model_arn

'arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-v2'

In [None]:
## After showing the different response, we will continue from the last response to show the source attribution/citations from the original documents.
## print out the source attribution/citations from the original documents to see if the response generated belongs to the context.
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])

print(f'The attribution/citations for the response generated by {model_id[0]}:')
pp.pprint(contexts)

### Retrieve API
Retrieve API converts user queries into embeddings, searches the knowledge base, and returns the relevant results, giving you more control to build custom workﬂows on top of the semantic search results. The output of the Retrieve API includes the the retrieved text chunks, the location type and URI of the source data, as well as the relevance scores of the retrievals.

In [None]:
# retrieve api for fetching only the relevant context.
relevant_documents = bedrock_agent_runtime_client.retrieve(
    retrievalQuery= {
        'text': query
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        'vectorSearchConfiguration': {
            'numberOfResults': 3 # will fetch top 3 documents which matches closely with the query.
        }
    }
)

In [None]:
pp.pprint(relevant_documents["retrievalResults"])

<div class="alert alert-block alert-warning">
<b>Next steps:</b> Proceed to the next labs to learn how to use Bedrock Knowledge bases. Remember to CLEAN_UP at the end of your session.
</div>

## Use KB as a retriever

In [145]:
from langchain_community.retrievers import AmazonKnowledgeBasesRetriever
from langchain_community.chat_models import BedrockChat
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
from langchain import hub

model_id = "anthropic.claude-3-haiku-20240307-v1:0"
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

prompt_template = hub.pull("rlm/rag-prompt")
#prompt_template = hub.pull("hwchase17/anthropic-paper-qa")

model_kwargs =  { 
        "max_tokens": 2048,
        "temperature": 0.1,
        "top_k": 50,
        "top_p": 0.85,
        "stop_sequences": ["\n\nHuman"],
    }

chat_claude_v3 = BedrockChat(model_id=model_id, model_kwargs=model_kwargs)
    
retriever = AmazonKnowledgeBasesRetriever(
    knowledge_base_id=kb_id,
    retrieval_config={"vectorSearchConfiguration": {"numberOfResults": 8}},
)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt_template
    | chat_claude_v3
    | StrOutputParser()
)

In [146]:
query = "In Andy Jassy's 2024 letter to shareholders, what did he say about generative AI? Please think first and answer the question with details."
rag_chain.invoke(query)

"According to the context provided, in Andy Jassy's 2024 letter to shareholders, he expressed great enthusiasm and optimism about generative AI (GenAI). He highlighted that GenAI may be the largest technology transformation since the cloud and the internet. He mentioned that Amazon is focused on inventing the future in GenAI and believes much of this world-changing AI will be built on top of AWS. He also discussed Amazon's offerings and services related to GenAI, such as Amazon Bedrock and Amazon Q, and mentioned the significant traction they are seeing in their GenAI offerings."

## Test

In [148]:
%pip install arxiv

Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Collecting arxiv
  Downloading arxiv-2.1.0-py3-none-any.whl.metadata (6.1 kB)
Collecting feedparser==6.0.10 (from arxiv)
  Downloading feedparser-6.0.10-py3-none-any.whl.metadata (2.3 kB)
Collecting sgmllib3k (from feedparser==6.0.10->arxiv)
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25ldone
Downloading arxiv-2.1.0-py3-none-any.whl (11 kB)
Downloading feedparser-6.0.10-py3-none-any.whl (81 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.1/81.1 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25ldone
[?25h  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6047 sha256=8b907184ac4f17b889ff344a2d207a726dfd2afb014075214b7a4aa7fcb0f699
  Stored in directory: /home/alfred/.cache/pip/wheels/3b/25/2a/105d6a15df6914f4d15047

In [1]:
import sys, os
module_paths = ["./", "./configs"]
for module_path in module_paths:
    sys.path.append(os.path.abspath(module_path))
    
from utils import *

In [3]:
query = "arxiv+MOE+mixture of experts+mistral"
docs = search_arxiv(query, 3, 'pdfs')
docs

[Document(page_content="Fr\\'echet Video Distance (FVD), a prominent metric for evaluating video\ngeneration models, is known to conflict with human perception occasionally. In\nthis paper, we aim to explore the extent of FVD's bias toward per-frame quality\nover temporal realism and identify its sources. We first quantify the FVD's\nsensitivity to the temporal axis by decoupling the frame and motion quality and\nfind that the FVD increases only slightly with large temporal corruption. We\nthen analyze the generated videos and show that via careful sampling from a\nlarge set of generated videos that do not contain motions, one can drastically\ndecrease FVD without improving the temporal quality. Both studies suggest FVD's\nbias towards the quality of individual frames. We further observe that the bias\ncan be attributed to the features extracted from a supervised video classifier\ntrained on the content-biased dataset. We show that FVD with features extracted\nfrom the recent large-sca

In [6]:
arxiv_documents = search_and_convert("Difference between MoE and Mamba", 2)
arxiv_documents[0]

Document(page_content='  State Space Models (SSMs) have become serious contenders in the field of\nsequential modeling, challenging the dominance of Transformers. At the same\ntime, Mixture of Experts (MoE) has significantly improved Transformer-based\nLarge Language Models, including recent state-of-the-art open models. We\npropose that to unlock the potential of SSMs for scaling, they should be\ncombined with MoE. We showcase this on Mamba, a recent SSM-based model that\nachieves remarkable performance. Our model, MoE-Mamba, outperforms both Mamba\nand baseline Transformer-MoE. In particular, MoE-Mamba reaches the same\nperformance as Mamba in $2.35\\times$ fewer training steps while preserving the\ninference performance gains of Mamba against Transformer.\n', metadata={'title': 'MoE-Mamba: Efficient Selective State Space Models with Mixture of\n  Experts', 'authors': ['Maciej Pióro', 'Kamil Ciebiera', 'Krystian Król', 'Jan Ludziejewski', 'Michał Krutul', 'Jakub Krajewski', 'Szymon A

In [74]:
# To download a PDF of a specific paper
#paper_id = result[0]  # Replace with the actual paper ID
paper_id = '2312.03815v2'
paper = next(arxiv.Client().results(arxiv.Search(id_list=[paper_id])))
paper.download_pdf(filename="your_paper_title.pdf")  # Downloads the paper

'./your_paper_title.pdf'

In [241]:
url = 'http://arxiv.org/pdf/2311.03168v1'
url.split("/")[-1]

'2311.03168v1'

In [3]:
searcher = newsSearcher()
documents, urls = searcher.search(query)

In [11]:
classes = 'generate picture, serach news, ask technical question, else'
query = 'who is Elon musk?'
classify_query(query, classes, modelId='anthropic.claude-3-haiku-20240307-v1:0')

'ask technical question'

## Search Google and bing with Langchain API wrapper

In [None]:
%pip install azure-search-documents
%pip install google-api-python-client

In [63]:
#from langchain_community.utilities import GoogleSearchAPIWrapper

In [67]:
from langchain.utilities import GoogleSearchAPIWrapper, BingSearchAPIWrapper
#from langchain_community.utilities.bing_search import GoogleSearchAPIWrapper, BingSearchAPIWrapper
# Set up the Google Search API wrapper

google_search = GoogleSearchAPIWrapper(google_api_key=os.getenv("google_api_key"), google_cse_id='b4660fad40a8f4db5')
#bing_search = BingSearchAPIWrapper(bing_subscription_key="YOUR_BING_SUBSCRIPTION_KEY", bing_search_url="https://YOUR_BING_SEARCH_ENDPOINT/bing/v7.0/search")
# Perform a search
query = "arxiv+moe+Mixture+experts+mistral+router"
google_results = google_search.results(query, num_results=2)
#bing_results = bing_search.run(query)

In [68]:
google_results

[{'title': 'Mixtral of Experts',
  'link': 'https://arxiv.org/abs/2401.04088',
  'snippet': 'Jan 8, 2024 ... We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the\xa0...'},
 {'title': "Can anyone explain MoE like I'm 25 : r/LocalLLaMA",
  'link': 'https://www.reddit.com/r/LocalLLaMA/comments/174f42z/can_anyone_explain_moe_like_im_25/',
  'snippet': 'Oct 10, 2023 ... ... Mistral and so on? ... *A Primer on Mixture of Experts (MoE)*. - Why ... This incentivises the router to develop an MoE where all experts are used\xa0...'}]

In [69]:
documents = []
for item in google_results:
    doc = Document(
        page_content=item['snippet'],
        metadata={
            'title': item['title'],
            'source': item['link']
        }
    )
    documents.append(doc)

## Lang Graph

In [29]:
#from langchain.chat_models import BedrockChat
from langchain_community.chat_models import BedrockChat
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain.utilities import GoogleSearchAPIWrapper
from langchain.retrievers.web_research import WebResearchRetriever
GOOGLE_API_KEY = os.getenv('serp_api_token')
google_search = GoogleSearchAPIWrapper(google_api_key=os.getenv('google_api_key'), google_cse_id=os.getenv('google_cse_id')) 
#tavily_search = TavilySearchResults(k=3)

In [30]:
def settings():

    # Vectorstore
    import faiss
    from langchain.vectorstores import FAISS 
    from langchain.embeddings.openai import OpenAIEmbeddings
    from langchain.docstore import InMemoryDocstore  
    embeddings_model = OpenAIEmbeddings()  
    embedding_size = 1536  
    index = faiss.IndexFlatL2(embedding_size)  
    vectorstore_public = FAISS(embeddings_model.embed_query, index, InMemoryDocstore({}), {})

    # LLM
    bedrock_client = boto3.client('bedrock-runtime')
    embedding_bedrock = BedrockEmbeddings(client=bedrock_client, model_id=embedding_model_id)
    model_kwargs =  { 
        "max_tokens": max_tokens,
        "temperature": temperature,
        "top_k": top_k,
        "top_p": top_p,
        #"stop_sequences": ["\n\nHuman"],
    }
    model_id = 'anthropic.claude-3-haiku-20240307-v1:0'
    llm = BedrockChat(
        model_id=model_id, client=bedrock_client, model_kwargs=model_kwargs
    )
    

    # Search
    from langchain.utilities import GoogleSearchAPIWrapper
    search = GoogleSearchAPIWrapper()   

    # Initialize 
    web_retriever = WebResearchRetriever.from_llm(
        vectorstore=vectorstore_public,
        llm=llm, 
        search=search, 
        num_search_results=3
    )

    return web_retriever, llm

In [31]:
from langchain.callbacks.base import BaseCallbackHandler
from langchain.chains import RetrievalQAWithSourcesChain
class StreamHandler(BaseCallbackHandler):
    def __init__(self, container, initial_text=""):
        self.container = container
        self.text = initial_text

    def on_llm_new_token(self, token: str, **kwargs) -> None:
        self.text += token
        self.container.info(self.text)


class PrintRetrievalHandler(BaseCallbackHandler):
    def __init__(self, container):
        self.container = container.expander("Context Retrieval")

    def on_retriever_start(self, query: str, **kwargs):
        self.container.write(f"**Question:** {query}")

    def on_retriever_end(self, documents, **kwargs):
        # self.container.write(documents)
        for idx, doc in enumerate(documents):
            source = doc.metadata["source"]
            self.container.write(f"**Results from {source}**")
            self.container.text(doc.page_content)
