# Knowledge Bases for Amazon Bedrock - Exemplo de ponta a ponta

Este notebook fornece código de exemplo para construir um índice vazio do OpenSearch Serverless (OSS), uma Knowledge Base do Amazon Bedrock e ingerir documentos no índice.

Um pipeline de dados que ingere documentos (normalmente armazenados no Amazon S3) em uma base de conhecimento, ou seja, um banco de dados vetorial como o OpenSearch Service Serverless (OSS), para que esteja disponível para pesquisa quando uma pergunta for recebida.

#### Etapas:
- Criar uma execution role do Amazon Bedrock Knowledge Base com as políticas necessárias para acessar dados do S3 e gravar embeddings no OSS.
- Criar um índice vazio do OpenSearch serverless.
- Baixar documentos.
- Criar uma Knowledge Base do Amazon Bedrock.
- Criar uma fonte de dados dentro da Knowledge Base que se conectará ao Amazon S3.
- Iniciar um trabalho de ingestão usando as APIs do Knowledge Bases (KB) que lerão dados do S3, dividirão em chunks, converterão os chunks em embeddings usando o modelo Amazon Titan Embeddings e armazenarão esses embeddings no OSS. Tudo isso sem precisar construir, implantar e gerenciar o pipeline de dados.

## Caso de uso
### Conjunto de dados
Neste exemplo, você usará vários anos das Cartas aos Acionistas da Amazon como um corpus de texto para realizar perguntas e respostas.

## Setup 
Antes de executar o restante deste notebook, você precisará executar as células abaixo para garantir que as bibliotecas necessárias estejam instaladas e também se conectar ao Amazon Bedrock.

In [1]:
%pip install -U opensearch-py==2.3.1
%pip install -U boto3==1.33.2
%pip install -U retrying==1.3.4

Collecting opensearch-py==2.3.1
  Downloading opensearch_py-2.3.1-py2.py3-none-any.whl.metadata (6.9 kB)
Collecting urllib3<2,>=1.21.1 (from opensearch-py==2.3.1)
  Downloading urllib3-1.26.18-py2.py3-none-any.whl.metadata (48 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
Downloading opensearch_py-2.3.1-py2.py3-none-any.whl (327 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m327.3/327.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading urllib3-1.26.18-py2.py3-none-any.whl (143 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.8/143.8 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: urllib3, opensearch-py
  Attempting uninstall: urllib3
    Found existing installation: urllib3 2.2.0
    Uninstalling urllib3-2.2.0:
      Successfully uninstalled urllib3-2.2.0
Successfully installed opensearch-py-2.3.1 urlli

In [None]:
# restart kernel
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

In [2]:
import warnings
warnings.filterwarnings("ignore")

In [3]:
import json
import os
import boto3
import pprint
import time
from opensearchpy import OpenSearch, RequestsHttpConnection, AWSV4SignerAuth
from utility import create_bedrock_execution_role, create_oss_policy_attach_bedrock_execution_role, create_policies_in_oss
import random
from retrying import retry
from urllib.request import urlretrieve

suffix = random.randrange(200, 900)

boto3_session = boto3.session.Session()
region_name = boto3_session.region_name
bedrock_client = boto3_session.client("bedrock-runtime", region_name=region_name)
bedrock_agent_client = boto3_session.client("bedrock-agent", region_name=region_name)
aoss_client = boto3_session.client("opensearchserverless")
s3_client = boto3_session.client("s3")

bucket_name = "llm-cquinta-teste" #troque o nome deste bucket por um bucket seu!

pp = pprint.PrettyPrinter(indent=2)

In [4]:
# Definindo a pergunta que faremos ao modelo
query = "What is Amazon's doing in the field of generative AI?"

In [5]:
# Definindo o modelo que será utilizando neste notebook
model_id = "anthropic.claude-instant-v1" # para claude v2, utilize "anthropic.claude-v2"
model_arn = f"arn:aws:bedrock:us-east-1::foundation-model/{model_id}"

In [6]:
# Fazendo a pergunta antes de criar a base de conhecimento e sem utilizar RAG
key_word_args = {
  "modelId": model_id,
  "body": "{\"prompt\":\"Human: " + query + "\\nAssistant:\",\"max_tokens_to_sample\":300,\"temperature\":1,\"top_k\":250,\"top_p\":0.999,\"stop_sequences\":[\"\\n\\nHuman:\"],\"anthropic_version\":\"bedrock-2023-05-31\"}"
}

response = bedrock_client.invoke_model(**key_word_args)
response_body = json.loads(response.get("body").read())
generated_text = response_body.get("completion")

pp.pprint(generated_text)

(' Amazon is actively working on generative AI through several research '
 'projects and products:\n'
 '\n'
 '- Amazon AI is researching techniques like text generation, image '
 'generation, video generation, machine translation, speech synthesis and more '
 'using deep learning models. They publish papers regularly on GANs, '
 'transformers and other generative techniques.\n'
 '\n'
 '- Amazon Polly is a text-to-speech service that uses neural networks to '
 'generate natural-sounding speech from text. It can be integrated into '
 'applications.\n'
 '\n'
 '- Amazon Lex is a service for building conversational bots and agents using '
 'voice and text. It uses generative models for natural language '
 'understanding.\n'
 '\n'
 '- StylePainter is an AI tool from Amazon Research that can transform '
 'sketch-like drawings into realistic images by referencing a large dataset of '
 'images. \n'
 '\n'
 '- Amazon Music HD uses generative models to upsample audio to higher '
 'bitrates and res

## Criar um repositório de vetores - índice no Amazon OpenSearch Serverless

### Passo 1 - Criar role de execução do Bedrock NB, poíticas e coleção no OSS
Primeiro de tudo, temos que criar um repositório de vetores. Nesta seção, vamos usar *Amazon OpenSearch Serverless.*

O Amazon OpenSearch Serverless é uma opção serverless do Amazon OpenSearch Service. Como desenvolvedor, você pode usar o OpenSearch Serverless para executar cargas de trabalho de petabytes sem precisar configurar, gerenciar e dimensionar clusters do OpenSearch. Você obtém os mesmos tempos de resposta interativos de milissegundos do OpenSearch Service com a simplicidade de um ambiente serverless. Pague apenas pelo que usar, escalando recursos automaticamente para fornecer a capacidade certa para sua aplicação, sem impactar a ingestão de dados.

In [7]:
bedrock_kb_execution_role = create_bedrock_execution_role(bucket_name=bucket_name)
bedrock_kb_execution_role_arn = bedrock_kb_execution_role["Role"]["Arn"]

In [8]:
# Cria as políticas de segurança, rede e acesso a dados
vector_store_name = f"bedrock-sample-rag-{suffix}"
encryption_policy, network_policy, access_policy = create_policies_in_oss(
    vector_store_name=vector_store_name,
    aoss_client=aoss_client,
    bedrock_kb_execution_role_arn=bedrock_kb_execution_role_arn)

In [9]:
# Cria a coleção no OpenSearch Serverless
collection = aoss_client.create_collection(name=vector_store_name,type="VECTORSEARCH")

In [10]:
pp.pprint(collection)
time.sleep(10)

{ 'ResponseMetadata': { 'HTTPHeaders': { 'connection': 'keep-alive',
                                         'content-length': '314',
                                         'content-type': 'application/x-amz-json-1.0',
                                         'date': 'Thu, 22 Feb 2024 00:28:38 '
                                                 'GMT',
                                         'x-amzn-requestid': '79dee257-318b-4372-8601-9ede209ce196'},
                        'HTTPStatusCode': 200,
                        'RequestId': '79dee257-318b-4372-8601-9ede209ce196',
                        'RetryAttempts': 0},
  'createCollectionDetail': { 'arn': 'arn:aws:aoss:us-east-1:707257249187:collection/racqttjgcqnx7kznnit9',
                              'createdDate': 1708561717875,
                              'id': 'racqttjgcqnx7kznnit9',
                              'kmsKeyArn': 'auto',
                              'lastModifiedDate': 1708561717875,
                             

In [11]:
collection_id = collection["createCollectionDetail"]["id"]
host = collection_id + "." + region_name + ".aoss.amazonaws.com"
print(host)

racqttjgcqnx7kznnit9.us-east-1.aoss.amazonaws.com


In [12]:
# Cria a política oss e atacha isso na role de execução do Amazon Bedrock
create_oss_policy_attach_bedrock_execution_role(collection_id=collection_id,
                                                bedrock_kb_execution_role=bedrock_kb_execution_role)

Opensearch serverless arn:  arn:aws:iam::707257249187:policy/AmazonBedrockOSSPolicyForKnowledgeBase_657


## Passo 2 - Cria o vector index

In [13]:
credentials = boto3_session.get_credentials()
awsauth = auth = AWSV4SignerAuth(credentials, region_name, "aoss")

index_name = f"bedrock-sample-index-{suffix}"
body_json = {
   "settings": {
      "index.knn": "true"
   },
   "mappings": {
      "properties": {
         "vector": {
            "type": "knn_vector",
            "dimension": 1536
         },
         "text": {
            "type": "text"
         },
         "text-metadata": {
            "type": "text"         }
      }
   }
}

oss_client = OpenSearch(
    hosts=[{"host": host, "port": 443}],
    http_auth=awsauth,
    use_ssl=True,
    verify_certs=True,
    connection_class=RequestsHttpConnection,
    timeout=300
)

## Pode levar até um minuto para que as regras de acesso aos dados sejam aplicadas
time.sleep(60)

In [15]:
# Cria o índice
response = oss_client.indices.create(index=index_name, body=json.dumps(body_json))
print("\nCriando o índice:")
print(response)


Criando o índice:
{'acknowledged': True, 'shards_acknowledged': True, 'index': 'bedrock-sample-index-230'}


## Download dos dados

In [17]:
!mkdir -p ./data

urls = [
    "https://s2.q4cdn.com/299287126/files/doc_financials/2023/ar/2022-Shareholder-Letter.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2022/ar/2021-Shareholder-Letter.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2021/ar/Amazon-2020-Shareholder-Letter-and-1997-Shareholder-Letter.pdf",
    "https://s2.q4cdn.com/299287126/files/doc_financials/2020/ar/2019-Shareholder-Letter.pdf"
]

filenames = [
    "AMZN-2022-Shareholder-Letter.pdf",
    "AMZN-2021-Shareholder-Letter.pdf",
    "AMZN-2020-Shareholder-Letter.pdf",
    "AMZN-2019-Shareholder-Letter.pdf"
]

data_directory = "./data/"

for idx, url in enumerate(urls):
    file_path = data_directory + filenames[idx]
    urlretrieve(url, file_path)


KeyboardInterrupt: 

#### Sobe os dados no bucket do S3

In [18]:
def upload_directory(path,bucket_name):
        for root,dirs,files in os.walk(path):
            for file in files:
                s3_client.upload_file(os.path.join(root,file),bucket_name,file)

upload_directory(data_directory, bucket_name)

## Criar a base de conhecimento
Passos:
- Inicialize a configuração do OpenSearch serverless, que incluirá o ARN da coleção.
- Inicialize a estratégia de divisão em chunks, com base na qual o KB dividirá os documentos em pedaços do tamanho igual ao tamanho de chunk mencionado em `chunkingStrategyConfiguration`.
- Inicialize a configuração do S3, que será usada para criar o objeto de fonte de dados posteriormente.
- Inicialize o ARN do modelo de embeddings Titan, pois este será usado para criar os embeddings para cada um dos chunks de texto.

In [19]:
opensearchServerlessConfiguration = {
            "collectionArn": collection["createCollectionDetail"]["arn"],
            "vectorIndexName": index_name,
            "fieldMapping": {
                "vectorField": "vector",
                "textField": "text",
                "metadataField": "text-metadata"
            }
        }

chunkingStrategyConfiguration = {
    "chunkingStrategy": "FIXED_SIZE",
    "fixedSizeChunkingConfiguration": {
        "maxTokens": 512,
        "overlapPercentage": 20
    }
}

s3Configuration = {
    "bucketArn": f"arn:aws:s3:::{bucket_name}",
}

embeddingModelArn = f"arn:aws:bedrock:{region_name}::foundation-model/amazon.titan-embed-text-v1"

name = f"bedrock-sample-knowledge-base-{suffix}"
description = "Amazon shareholder letter knowledge base."
roleArn = bedrock_kb_execution_role_arn


Forneça as configurações acima como entrada para o método `create_knowledge_base`, que criará a base de conhecimento.

In [20]:
# Cria a base de conhecimento
@retry(wait_random_min=1000, wait_random_max=2000,stop_max_attempt_number=7)
def create_knowledge_base():
    create_kb_response = bedrock_agent_client.create_knowledge_base(
        name = name,
        description = description,
        roleArn = roleArn,
        knowledgeBaseConfiguration = {
            "type": "VECTOR",
            "vectorKnowledgeBaseConfiguration": {
                "embeddingModelArn": embeddingModelArn
            }
        },
        storageConfiguration = {
            "type": "OPENSEARCH_SERVERLESS",
            "opensearchServerlessConfiguration":opensearchServerlessConfiguration
        }
    )
    return create_kb_response["knowledgeBase"]

In [21]:
try:
    kb = create_knowledge_base()
except Exception as err:
    print(f"{err=}, {type(err)=}")

In [22]:
pp.pprint(kb)

{ 'createdAt': datetime.datetime(2024, 2, 22, 0, 50, 12, 137310, tzinfo=tzutc()),
  'description': 'Amazon shareholder letter knowledge base.',
  'knowledgeBaseArn': 'arn:aws:bedrock:us-east-1:707257249187:knowledge-base/ZIEZ45COEB',
  'knowledgeBaseConfiguration': { 'type': 'VECTOR',
                                  'vectorKnowledgeBaseConfiguration': { 'embeddingModelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.titan-embed-text-v1'}},
  'knowledgeBaseId': 'ZIEZ45COEB',
  'name': 'bedrock-sample-knowledge-base-230',
  'roleArn': 'arn:aws:iam::707257249187:role/AmazonBedrockExecutionRoleForKnowledgeBase_657',
  'status': 'CREATING',
  'storageConfiguration': { 'opensearchServerlessConfiguration': { 'collectionArn': 'arn:aws:aoss:us-east-1:707257249187:collection/racqttjgcqnx7kznnit9',
                                                                   'fieldMapping': { 'metadataField': 'text-metadata',
                                                                       

In [23]:
# Recupera a Knowledge Base
get_kb_response = bedrock_agent_client.get_knowledge_base(knowledgeBaseId = kb["knowledgeBaseId"])

Agora precisamos criar uma fonte de dados, que será associada à base de conhecimento criada acima. Assim que a fonte de dados estiver pronta, poderemos começar a ingerir os documentos.

In [24]:
# Cria a fonte de dados na base de conhecimento
create_ds_response = bedrock_agent_client.create_data_source(
    name = name,
    description = description,
    knowledgeBaseId = kb["knowledgeBaseId"],
    dataSourceConfiguration = {
        "type": "S3",
        "s3Configuration":s3Configuration
    },
    vectorIngestionConfiguration = {
        "chunkingConfiguration": chunkingStrategyConfiguration
    }
)
ds = create_ds_response["dataSource"]
pp.pprint(ds)

{ 'createdAt': datetime.datetime(2024, 2, 22, 0, 51, 20, 904039, tzinfo=tzutc()),
  'dataSourceConfiguration': { 's3Configuration': { 'bucketArn': 'arn:aws:s3:::llm-cquinta-teste'},
                               'type': 'S3'},
  'dataSourceId': 'UIT9LY8343',
  'description': 'Amazon shareholder letter knowledge base.',
  'knowledgeBaseId': 'ZIEZ45COEB',
  'name': 'bedrock-sample-knowledge-base-230',
  'status': 'AVAILABLE',
  'updatedAt': datetime.datetime(2024, 2, 22, 0, 51, 20, 904039, tzinfo=tzutc()),
  'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE',
                                                               'fixedSizeChunkingConfiguration': { 'maxTokens': 512,
                                                                                                   'overlapPercentage': 20}}}}


In [25]:
# Recupera a fonte de dados
bedrock_agent_client.get_data_source(knowledgeBaseId = kb["knowledgeBaseId"], dataSourceId = ds["dataSourceId"])

{'ResponseMetadata': {'RequestId': 'b893fbd3-95d5-433a-a9e4-35d192a5cb88',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Thu, 22 Feb 2024 00:51:31 GMT',
   'content-type': 'application/json',
   'content-length': '557',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'b893fbd3-95d5-433a-a9e4-35d192a5cb88',
   'x-amz-apigw-id': 'Tg0XHGPRoAMEIpg=',
   'x-amzn-trace-id': 'Root=1-65d69a93-54b6590f06df52cb71deabb4'},
  'RetryAttempts': 0},
 'dataSource': {'knowledgeBaseId': 'ZIEZ45COEB',
  'dataSourceId': 'UIT9LY8343',
  'name': 'bedrock-sample-knowledge-base-230',
  'status': 'AVAILABLE',
  'description': 'Amazon shareholder letter knowledge base.',
  'dataSourceConfiguration': {'type': 'S3',
   's3Configuration': {'bucketArn': 'arn:aws:s3:::llm-cquinta-teste'}},
  'vectorIngestionConfiguration': {'chunkingConfiguration': {'chunkingStrategy': 'FIXED_SIZE',
    'fixedSizeChunkingConfiguration': {'maxTokens': 512,
     'overlapPercentage': 20}}},
  'createdAt': datetime.datetime(

### Inicia o JOB de ingestão
Assim que a base de conhecimento e a fonte de dados forem criadas, podemos iniciar o job de ingestão.

Durante o job de ingestão, a base de conhecimento buscará os documentos na fonte de dados, pré-processará para extrair o texto, dividirá em chunks com base no tamanho de chunk fornecido, criará embeddings para cada chunk e então gravará no banco de dados vetorial, neste caso o Amazon OpenSearch Serverless.

In [26]:
# Inicia o job de ingestão
start_job_response = bedrock_agent_client.start_ingestion_job(knowledgeBaseId = kb["knowledgeBaseId"], 
                                                              dataSourceId = ds["dataSourceId"])

In [27]:
job = start_job_response["ingestionJob"]


In [34]:
pp.pprint(job)

{ 'dataSourceId': 'UIT9LY8343',
  'ingestionJobId': 'N4WE7EFPWD',
  'knowledgeBaseId': 'ZIEZ45COEB',
  'startedAt': datetime.datetime(2024, 2, 22, 0, 51, 40, 834095, tzinfo=tzutc()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 0,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 0},
  'status': 'STARTING',
  'updatedAt': datetime.datetime(2024, 2, 22, 0, 51, 40, 834095, tzinfo=tzutc())}


In [35]:
# Recupera o job job e aguarda seu término 
while(job["status"]!="COMPLETE" ):
  get_job_response = bedrock_agent_client.get_ingestion_job(
      knowledgeBaseId = kb["knowledgeBaseId"],
        dataSourceId = ds["dataSourceId"],
        ingestionJobId = job["ingestionJobId"]
  )
  job = get_job_response["ingestionJob"]
pp.pprint(job)
time.sleep(40)

{ 'dataSourceId': 'UIT9LY8343',
  'ingestionJobId': 'N4WE7EFPWD',
  'knowledgeBaseId': 'ZIEZ45COEB',
  'startedAt': datetime.datetime(2024, 2, 22, 0, 51, 40, 834095, tzinfo=tzutc()),
  'statistics': { 'numberOfDocumentsDeleted': 0,
                  'numberOfDocumentsFailed': 0,
                  'numberOfDocumentsScanned': 4,
                  'numberOfModifiedDocumentsIndexed': 0,
                  'numberOfNewDocumentsIndexed': 4},
  'status': 'COMPLETE',
  'updatedAt': datetime.datetime(2024, 2, 22, 0, 52, 16, 706196, tzinfo=tzutc())}


In [36]:
kb_id = kb["knowledgeBaseId"]
pp.pprint(kb_id)

'ZIEZ45COEB'


## Testando a base de conhecimento
### Usando a API RetrieveAndGenerate
Por baixo do capô, a API RetrieveAndGenerate converte consultas em embeddings, pesquisa a base de conhecimento e então enriquece o prompt do modelo fundacional com os resultados da pesquisa como informações de contexto e retorna a resposta gerada pelo FM à pergunta. Para conversas com múltiplos turnos, o KB gerencia a memória de curto prazo da conversa para fornecer resultados mais contextuais.

A saída da API RetrieveAndGenerate inclui a resposta gerada, atribuição da fonte, bem como os chunks de texto recuperados.

In [37]:
# utilizando a Knowledge Base utilizando a API RetrieveAndGenerate
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region_name=region_name)

In [38]:
response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": query
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": kb_id,
            "modelArn": model_arn
        }
    },
)

generated_text = response["output"]["text"]
pp.pprint(generated_text)

('Amazon has been working on their own large language models (LLMs) for '
 'generative AI for a while and believes it will transform and improve '
 'virtually every customer experience. They are continuing to invest '
 'substantially in these models across all of their consumer, seller, brand, '
 'and creator experiences. Additionally, Amazon is democratizing this '
 'technology through AWS so companies of all sizes can leverage generative AI. '
 'AWS is offering machine learning chips like Trainium and Inferentia so '
 'customers can afford to train and run their LLMs in production.')


In [39]:
## Imprime a atribuição da fonte/citações dos documentos originais para ver se a resposta gerada pertence ao contexto.
citations = response["citations"]
contexts = []
for citation in citations:
    retrievedReferences = citation["retrievedReferences"]
    for reference in retrievedReferences:
        contexts.append(reference["content"]["text"])

pp.pprint(contexts)

[ 'This shift was driven by several factors, including access to higher '
  'volumes of compute capacity at lower prices than was ever available. Amazon '
  'has been using machine learning extensively for 25 years, employing it in '
  'everything from personalized ecommerce recommendations, to fulfillment '
  'center pick paths, to drones for Prime Air, to Alexa, to the many machine '
  'learning services AWS offers (where AWS has the broadest machine learning '
  'functionality and customer base of any cloud provider). More recently, a '
  'newer form of machine learning, called Generative AI, has burst onto the '
  'scene and promises to significantly accelerate machine learning adoption. '
  'Generative AI is based on very Large Language Models (trained on up to '
  'hundreds of billions of parameters, and growing), across expansive '
  'datasets, and has radically general and broad recall and learning '
  'capabilities. We have been working on our own LLMs for a while now, believe

### Retrieve API
A API Retrieve converte consultas de usuários em embeddings, pesquisa a base de conhecimento e retorna os resultados relevantes, dando a você mais controle para construir fluxos de trabalho personalizados com base nos resultados da pesquisa semântica.

A saída da API Retrieve inclui os chunks de texto recuperados, o tipo de localização e URI dos dados de origem, bem como as pontuações de relevância das recuperações.

In [40]:
# API de recuperação para buscar apenas o contexto relevante.
relevant_documents = bedrock_agent_runtime_client.retrieve(
    retrievalQuery= {
        "text": query
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        "vectorSearchConfiguration": {
            "numberOfResults": 3 #  irá recuperar os 3 principais documentos que correspondem à consulta.
        }
    }
)

In [41]:
pp.pprint(relevant_documents["retrievalResults"])

[ { 'content': { 'text': 'This shift was driven by several factors, including '
                         'access to higher volumes of compute capacity at '
                         'lower prices than was ever available. Amazon has '
                         'been using machine learning extensively for 25 '
                         'years, employing it in everything from personalized '
                         'ecommerce recommendations, to fulfillment center '
                         'pick paths, to drones for Prime Air, to Alexa, to '
                         'the many machine learning services AWS offers (where '
                         'AWS has the broadest machine learning functionality '
                         'and customer base of any cloud provider). More '
                         'recently, a newer form of machine learning, called '
                         'Generative AI, has burst onto the scene and promises '
                         'to significantly accelerate machine

## Limpando a casa
Lembre-se de excluir todos os recursos criados, pois você incorrerá em custos para armazenar documentos no índice OSS.

In [1]:
# Deleta a base de conhecimento
bedrock_agent_client.delete_data_source(dataSourceId = ds["dataSourceId"], knowledgeBaseId=kb["knowledgeBaseId"])
bedrock_agent_client.delete_knowledge_base(knowledgeBaseId=kb["knowledgeBaseId"])
oss_client.indices.delete(index=index_name)
aoss_client.delete_collection(id=collection_id)
aoss_client.delete_access_policy(type="data", name=access_policy["accessPolicyDetail"]["name"])
aoss_client.delete_security_policy(type="network", name=network_policy["securityPolicyDetail"]["name"])
aoss_client.delete_security_policy(type="encryption", name=encryption_policy["securityPolicyDetail"]["name"])

NameError: name 'bedrock_agent_client' is not defined

In [2]:
# Exclui a role e as políticas
from utility import delete_iam_role_and_policies
delete_iam_role_and_policies()

NoSuchEntityException: An error occurred (NoSuchEntity) when calling the DetachRolePolicy operation: The role with name AmazonBedrockExecutionRoleForKnowledgeBase_313 cannot be found.