# Bedrock Knowledgebases - Direct Ingestion

### Overview

Amazon Bedrock Knowledge Bases allows you to modify your data source and sync the changes in one step. You can take advantage of this feature if your knowledge base is connected to one of the following types of data sources:

* Amazon S3

* Custom

With direct ingestion, you can directly add, update, or delete files in a knowledge base in a single action and your knowledge base can have access to documents without the need to sync.

For S3 data sources, any changes that you index into the knowledge base directly aren't reflected in the S3 location. You can use these API operations to make changes to your knowledge base immediately available in a single step. However, you should follow up by making the same changes in your S3 location so that they aren't overwritten the next time you sync your data source.

This notebook provides sample code for ingesting documents directly into Amazon Bedrock Knowledgebases using a *Custom* datasource.

### 1. Import libraries

In [2]:
%pip install --force-reinstall -q -r ../requirements.txt

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip
ERROR: Could not open requirements file: [Errno 2] No such file or directory: '../requirements.txt'


In [3]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
import boto3
import os
from utils.knowledge_base import BedrockKnowledgeBase
from utils.knowledge_base import interactive_sleep
import time
import base64

os.environ['AWS_DEFAULT_PROFILE'] = "genai-demo-admin"

bedrock_agent_build_time_client = boto3.client("bedrock-agent")
bedrock_agent_runtime_client = boto3.client('bedrock-agent-runtime') 
session = boto3.session.Session()
region =  session.region_name

# Get the current timestamp
current_time = time.time()

# Format the timestamp as a string
timestamp_str = time.strftime("%Y%m%d%H%M%S", time.localtime(current_time))[-7:]
# Create the suffix using the timestamp
suffix = f"{timestamp_str}"

### 2. Create a Bedrock Knowledgebase

In this step we will create a Bedrock knowledgebase with a *Custom* datasource.

In [6]:
knowledge_base_name = f"bedrock-sample-knowledge-base-{suffix}"
knowledge_base_description = "Multi data source knowledge base."
data_sources = [
    {
        "type": "CUSTOM",
        "description": "Custom data source for Bedrock knowledgebase",
    }
]

bedrock_knowledge_base = BedrockKnowledgeBase(
    kb_name=f'{knowledge_base_name}',
    kb_description=knowledge_base_description,
    data_sources=data_sources,
    chunking_strategy = "FIXED_SIZE", 
    suffix = f'{suffix}-f'
)
knowledge_base_id = bedrock_knowledge_base.knowledge_base["knowledgeBaseId"]
print("knowledge_base_id: " + knowledge_base_id)

for data_source in bedrock_knowledge_base.data_source:
    data_source_id = data_source["dataSourceId"]
    print("data_source_id: " + data_source_id)

Step 1 - Creating or retrieving S3 bucket(s) for Knowledge Base documents
[]
buckets_to_check:  []
Step 2 - Creating Knowledge Base Execution Role (AmazonBedrockExecutionRoleForKnowledgeBase_4185235-f) and Policies
Step 3 - Creating OSS encryption, network and data access policies
Step 4 - Creating OSS Collection (this step takes a couple of minutes to complete)
{ 'ResponseMetadata': { 'HTTPHeaders': { 'connection': 'keep-alive',
                                         'content-length': '320',
                                         'content-type': 'application/x-amz-json-1.0',
                                         'date': 'Fri, 14 Feb 2025 18:52:44 '
                                                 'GMT',
                                         'x-amzn-requestid': 'e6a79279-bc26-4093-91a9-ddb81758a315'},
                        'HTTPStatusCode': 200,
                        'RequestId': 'e6a79279-bc26-4093-91a9-ddb81758a315',
                        'RetryAttempts': 0},
  'creat

### 3. Ingest a sample document into Bedrock knowledgebase

In this step we will ingest a sample pdf document.

In [7]:
with open("everything-about-project-kuiper.pdf", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read())

In [12]:
bedrock_agent_build_time_client.ingest_knowledge_base_documents(
    knowledgeBaseId=knowledge_base_id,
    dataSourceId=data_source_id,
    documents=[
        {
            "content":{
                "dataSourceType": "CUSTOM",
                "custom": {
                    "sourceType": "IN_LINE",
                    "customDocumentIdentifier": {
                        'id': 'doc-2'
                    },
                    "inlineContent": {
                        "byteContent":{
                            "data": str(encoded_string),
                            "mimeType": "application/pdf"
                        },
                        "type": "BYTE"
                    },
                }
            },
            "metadata":{
                "type": "IN_LINE_ATTRIBUTE",
                "inlineAttributes": [
                    {
                        "key": "attribute1",
                        "value": {
                            "type": "STRING",
                            "stringValue": "value2"                           
                        }
                    }
                ]
            }
        },
    ],
)
print("Sleeping for a minute to let the document ingestion complete.")
interactive_sleep(60)

{'ResponseMetadata': {'RequestId': '6699af3b-f2e4-4dab-b562-5f00553a5a7a',
  'HTTPStatusCode': 202,
  'HTTPHeaders': {'date': 'Fri, 14 Feb 2025 18:56:22 GMT',
   'content-type': 'application/json',
   'content-length': '212',
   'connection': 'keep-alive',
   'x-amzn-requestid': '6699af3b-f2e4-4dab-b562-5f00553a5a7a',
   'x-amz-apigw-id': 'F_O5bFPboAMEG4w=',
   'x-amzn-trace-id': 'Root=1-67af91d4-35beaf2d05b74cbc79f80ae4'},
  'RetryAttempts': 0},
 'documentDetails': [{'dataSourceId': 'XYXODTK3OK',
   'identifier': {'custom': {'id': 'doc-2'}, 'dataSourceType': 'CUSTOM'},
   'knowledgeBaseId': 'LIYDJAP04Q',
   'status': 'STARTING',
   'updatedAt': datetime.datetime(2025, 2, 14, 18, 56, 22, 178762, tzinfo=tzutc())}]}

### 4. Test Knowledgebase

In this step we will test the knowledgebase with a sample question related to the document ingested in the above step.

In [9]:
query = "What is Kuiper?"

In [13]:
foundation_model = "anthropic.claude-3-sonnet-20240229-v1:0"

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            'knowledgeBaseId': knowledge_base_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(region, foundation_model),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

print(response['output']['text'],end='\n'*2)

Kuiper is a region of the outer solar system that lies beyond the orbit of Neptune. It is a disc-shaped region consisting mainly of small bodies or remnants from the solar system's formation. Many of these objects are composed largely of frozen gases like methane, ammonia and water. The Kuiper Belt is similar to the asteroid belt, but it is much larger and massier. It extends from about 30 AU (the orbit of Neptune) to about 55 AU from the Sun. The first Kuiper Belt Object (KBO) was discovered in 1992, with the discovery of other dwarf planets like Pluto, Haumea, Makemake and Eris in subsequent years.



### 5. Clean up
Let's delete all the resources to avoid unnecessary costs.

In [15]:
# delete role and policies
print("===============================Knowledge base with custom datasource==============================\n")
bedrock_knowledge_base.delete_kb(delete_s3_bucket=True, delete_iam_roles_and_policies=True)


