# Set Up Red Cross Vector Databases in SageMaker

This notebook creates the **three Red Cross knowledge bases** (and their S3 Vectors backends) from a **SageMaker Studio** or **SageMaker Notebook Instance**.

## What gets created

1. **Biomedical** – vector bucket, index, knowledge base, data source (blood drives, appointments)
2. **Humanitarian** – vector bucket, index, knowledge base, data source (relief centers, grants)
3. **Training** – vector bucket, index, knowledge base, data source (first aid classes, registrations)

## Prerequisites

- **CloudFormation stack** `RedCrossStackInfra` must be deployed (creates S3 bucket `{account}-{region}-kb-data-bucket` and IAM role).
- **SageMaker execution role** must have:
  - `s3:PutObject`, `s3:ListBucket` on the KB data bucket
  - `bedrock:CreateKnowledgeBase`, `bedrock:CreateDataSource`, `bedrock:StartIngestionJob`, `bedrock:ListKnowledgeBases`, `bedrock:ListDataSources`
  - `s3vectors:CreateVectorBucket`, `s3vectors:CreateIndex`, `s3vectors:GetVectorBucket`, `s3vectors:GetIndex`
  - `ssm:PutParameter`, `ssm:GetParameter`
- **Repo** with `knowledge_base_data/` and `scripts/setup_redcross_knowledge_bases.py` (clone or upload).

## 1. Clone the repo (if not already present)

If you're in a fresh SageMaker environment, clone the workshop repo. Otherwise, set `REPO_DIR` to your existing path.

In [None]:
import os
from pathlib import Path

# Option A: Clone the repo (uncomment and set your repo URL)
# !git clone https://github.com/aws-samples/bedrock-agentcore-workshop.git
# REPO_DIR = Path("bedrock-agentcore-workshop")

# Option B: Use existing path (e.g. if you uploaded the project or already cloned)
REPO_DIR = Path(os.getcwd())
if (REPO_DIR / "knowledge_base_data").is_dir() and (REPO_DIR / "scripts" / "setup_redcross_knowledge_bases.py").exists():
    print(f"Using repo at: {REPO_DIR.resolve()}")
else:
    # If you cloned into a subdir, point to it:
    REPO_DIR = Path("bedrock-agentcore-workshop")
    if not (REPO_DIR / "knowledge_base_data").is_dir():
        raise FileNotFoundError(
            f"knowledge_base_data not found. Set REPO_DIR to the project root that contains knowledge_base_data/ and scripts/setup_redcross_knowledge_bases.py"
        )
    print(f"Using repo at: {REPO_DIR.resolve()}")

## 2. (Optional) Upgrade boto3 for S3 Vectors support

S3 Vectors is a newer service. If the script fails with `Unknown service: 's3vectors'`, run this cell and then re-run the setup cell.

In [None]:
!pip install --upgrade boto3 -q
import boto3
print(f"boto3 version: {boto3.__version__}")

## 3. Set AWS region (if needed)

SageMaker usually inherits the region from the environment. If your stack is in a specific region, set it here.

In [None]:
# Set only if your CloudFormation stack and Bedrock are in a different region
# os.environ["AWS_DEFAULT_REGION"] = "us-west-2"

import boto3
region = boto3.Session().region_name or os.environ.get("AWS_REGION") or os.environ.get("AWS_DEFAULT_REGION") or "us-west-2"
print(f"Using region: {region}")

## 4. Run the setup script

This uploads `knowledge_base_data/` to S3, creates the three vector buckets, indexes, knowledge bases, and data sources, and writes KB IDs to Parameter Store.

In [None]:
import subprocess
import sys

script = REPO_DIR / "scripts" / "setup_redcross_knowledge_bases.py"
data_dir = REPO_DIR / "knowledge_base_data"

if not script.exists():
    raise FileNotFoundError(f"Setup script not found: {script}")
if not data_dir.is_dir():
    raise FileNotFoundError(f"knowledge_base_data not found: {data_dir}")

cmd = [
    sys.executable,
    str(script),
    "--knowledge-base-data-dir",
    str(data_dir.resolve()),
]
print("Running:", " ".join(cmd))
result = subprocess.run(cmd, cwd=str(REPO_DIR))
if result.returncode != 0:
    raise RuntimeError(f"Setup script exited with code {result.returncode}")

## 5. Verify Parameter Store

After a successful run, the knowledge base IDs are stored in SSM. You can list them with the script or AWS CLI:

In [None]:
import boto3
import os

ssm = boto3.client('ssm')
acc = boto3.client('sts').get_caller_identity()['Account']
reg = os.environ.get('AWS_DEFAULT_REGION') or boto3.Session().region_name or 'us-west-2'
path = f'/{acc}-{reg}/kb'
print(f'SSM parameters under {path}:')
try:
    r = ssm.get_parameters_by_path(Path=path, Recursive=True, WithDecryption=True)
    for p in r.get('Parameters', []):
        v = p['Value']
        print(f"  {p['Name']} = {v[:60]}..." if len(v) > 60 else f"  {p['Name']} = {v}")
except Exception as e:
    print('Error:', e)

---

**Ingestion** can take a few minutes. You can check status in the Bedrock console: **Knowledge bases** → select a base → **Data sources** → **Sync**. The agent can use the KBs once ingestion shows **Completed**.