# Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock

This notebook provides sample code for a data pipeline that ingests documents (typically stored in Amazon S3) into a knowledge base i.e. a vector database such as Amazon Aurora Postgresql using PGVector.

This notebook works well on `ml.t3.medium` instance with `Python3` kernel from **JupyterLab** or `Data Science 2.0` kernel from **SageMaker Studio Classic**.

Here is a list of packages that are used in this notebook.

```
!!pip list | grep -E -w "boto3|ipython-sql|psycopg|SQLAlchemy"
--------------------------------------------------------------
boto3                                1.34.127
ipython-sql                          0.5.0
psycopg                              3.1.19
psycopg-binary                       3.1.19
psycopg-pool                         3.2.2
SQLAlchemy                           2.0.28
```

# Prerequsites

The following IAM policies need to be attached to the SageMaker execution role that you use to run this notebook:

- AmazonSageMakerFullAccess
- AWSCloudFormationReadOnlyAccess
- AmazonRDSReadOnlyAccess

## Step 1: Setup
Install the required packages.

In [None]:
%%capture --no-stderr

!pip install -Uq pip

!pip install -U "boto3>=1.26.159"
!pip install -U ipython-sql==0.5.0
!pip install -U psycopg[binary]==3.1.19
!pip install -U SQLAlchemy==2.0.28

In [None]:
!pip list | grep -E -w "boto3|ipython-sql|psycopg|SQLAlchemy"

#### Get connection info out of your database secret

In [None]:
import boto3

aws_region = boto3.Session().region_name
aws_region

In [None]:
import urllib

from utils import (
    get_cfn_outputs,
    get_secret_name,
    get_secret
)

CFN_STACK_NAME = "BedrockKBAuroraPgVectorStack" # name of CloudFormation stack

secret_id = get_secret_name(CFN_STACK_NAME)
secret = get_secret(secret_id)

db_username = secret['username']
db_password = urllib.parse.quote_plus(secret['password'])
db_port = secret['port']
db_host = secret['host']

In [None]:
bedrock_vector_database_name = get_cfn_outputs(CFN_STACK_NAME, region_name=aws_region)['VectorDBName']
bedrock_vector_database_name

#### store `bedrock_vector_database_name` to use later

In [None]:
%store bedrock_vector_database_name

#### Load `ipython-sql` library to access RDBMS via IPython

In [None]:
driver = 'psycopg'
connection_string = f"postgresql+{driver}://{db_username}:{db_password}@{db_host}:{db_port}/{bedrock_vector_database_name}?autocommit=true"
connection_string

In [None]:
%config SqlMagic.style = '_DEPRECATED_DEFAULT' # Ensure that the SqlMagic style is compatible with the previous version

In [None]:
%load_ext sql

In [None]:
%sql $connection_string

In [None]:
%%sql

SELECT datname FROM pg_database;

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
5 rows affected.


datname
template0
rdsadmin
template1
postgres
bedrock_vector_db


In [None]:
%%sql

SELECT current_database();

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
1 rows affected.


current_database
bedrock_vector_db


## Step 2: Create a schema and a table to be used for a Knowledge Base for Amazon Bedrock

In [None]:
schema_name = 'bedrock_integration'
table_name = 'bedrock_kb'
bedrock_vectordb_username = 'bedrock_user'

#### store variables to use later

In [None]:
%store schema_name
%store table_name
%store bedrock_vectordb_username

In [None]:
%%sql

GRANT ALL PRIVILEGES ON DATABASE {bedrock_vector_database_name} TO {db_username};

-- Set up Pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Create a specific schema that Bedrock can use to query the data
CREATE SCHEMA IF NOT EXISTS {schema_name};

-- Create a table for embedding vector
CREATE TABLE IF NOT EXISTS {schema_name}.{table_name} (
    id uuid PRIMARY KEY,
    embedding vector(1536),
    chunks text,
    metadata json,
    file_name varchar(255),
    year int
);
COMMENT ON COLUMN {schema_name}.{table_name}.file_name IS 'source file name used for metdata filtering';
COMMENT ON COLUMN {schema_name}.{table_name}.year IS 'file creation year used for metadata filtering';

-- Create an index with the cosine operator for the bedrock to query the data
CREATE INDEX ON {schema_name}.{table_name}
USING hnsw (embedding vector_cosine_ops);

-- Create a new role that Bedrock can use to query the database
-- Grant the user permission to manage the schema
CREATE ROLE {bedrock_vectordb_username} WITH PASSWORD '{secret["password"]}' LOGIN;
GRANT ALL ON SCHEMA {schema_name} TO {bedrock_vectordb_username};

## Step 3: Verification

Check to see if pgvector is available

In [None]:
%%sql

SELECT typname
FROM pg_type
WHERE typname = 'vector';

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
1 rows affected.


typname
vector


(Optional) Use the following command to check the version of the `pg_vector` installed:

In [None]:
%%sql

SELECT *
FROM pg_extension
WHERE extname = 'vector';

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
1 rows affected.


oid,extname,extowner,extnamespace,extrelocatable,extversion,extconfig,extcondition
20531,vector,10,2200,True,0.7.0,,


Check to see if a schema and a table are ready to use

In [None]:
%%sql

SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
    schemaname != 'information_schema';

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
1 rows affected.


schemaname,tablename,tableowner,tablespace,hasindexes,hasrules,hastriggers,rowsecurity
bedrock_integration,bedrock_kb,postgres,,True,False,False,False


List indexes using `pg_indexes` view

In [None]:
%%sql

SELECT tablename, indexname, indexdef
FROM pg_indexes
WHERE schemaname = '{schema_name}'
ORDER BY tablename, indexname;

 * postgresql+psycopg://postgres:***@rag-pgvector-demo.cluster-cnrh6fettief.us-east-1.rds.amazonaws.com:5432/bedrock_vector_db?autocommit=true
2 rows affected.


tablename,indexname,indexdef
bedrock_kb,bedrock_kb_embedding_idx,CREATE INDEX bedrock_kb_embedding_idx ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops)
bedrock_kb,bedrock_kb_pkey,CREATE UNIQUE INDEX bedrock_kb_pkey ON bedrock_integration.bedrock_kb USING btree (id)


## (Optional) Clean up

If you don't need the vector database anymore, you can clean up all resources using the following commands.

#### Drop table

In [None]:
%%sql

DROP TABLE IF EXISTS {schema_name}.{table_name};

In [None]:
%%sql

SELECT *
FROM pg_catalog.pg_tables
WHERE schemaname != 'pg_catalog' AND
    schemaname != 'information_schema';

#### Drop database

In [None]:
%%sql

DROP DATABASE IF EXISTS {bedrock_vector_database_name};

#### Drop schema

In [None]:
%%sql

DROP SCHEMA IF EXISTS {schema_name};

In [None]:
%%sql

SELECT *
FROM pg_catalog.pg_namespace
ORDER BY nspname;

#### Drop role

In [None]:
%%sql

DROP ROLE IF EXISTS {bedrock_vectordb_username};

In [None]:
%%sql

SELECT usename AS role_name,
  CASE
     WHEN usesuper AND usecreatedb THEN
	   CAST('superuser, create database' AS pg_catalog.text)
     WHEN usesuper THEN
	    CAST('superuser' AS pg_catalog.text)
     WHEN usecreatedb THEN
	    CAST('create database' AS pg_catalog.text)
     ELSE
	    CAST('' AS pg_catalog.text)
  END role_attributes
FROM pg_catalog.pg_user
ORDER BY role_name desc;

## References

  * [Using Aurora PostgreSQL as a Knowledge Base for Amazon Bedrock](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.VectorDB.html)
    * [Preparing Aurora PostgreSQL to be used as a Knowledge Base for Amazon Bedrock](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.VectorDB.html#AuroraPostgreSQL.VectorDB.PreparingKB)
  * [(Workshop) Generative AI Use Cases with Aurora PostgreSQL and pgvector](https://catalog.workshops.aws/pgvector/en-US/)
  * [PostgreSQL Tutorial](https://www.postgresqltutorial.com/)