# Knowledge Base Setup for PR Article Generation

This notebook sets up the Amazon Bedrock Knowledge Base that will be used by the CrewAI agents for PR article generation. The knowledge base contains examples of high-quality PR articles that serve as reference material for writing style and structure.

## Purpose
- Create a Knowledge Base for Amazon Bedrock
- Upload example PR articles to S3
- Configure embeddings for semantic search
- Store knowledge base ID for use in the main workflow

## Prerequisites
- AWS credentials configured
- S3 bucket access
- Bedrock service permissions
- Example PR articles in the `good_prs` directory

## Environment Setup

Load environment variables and initialize AWS clients needed for knowledge base creation.

In [None]:
import boto3
import os
import sys
import uuid
from typing import Optional

# Initialize AWS clients
sts_client = boto3.client('sts')
session = boto3.session.Session()

account_id = sts_client.get_caller_identity()["Account"]
region = session.region_name

s3_client = boto3.client('s3', region)
bedrock_client = boto3.client('bedrock-runtime', region)
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime", region)

print(f"Account ID: {account_id}")
print(f"Region: {region}")

## Knowledge Base Helper Import

Import the utility class for creating and managing Bedrock Knowledge Bases.

In [4]:
# Add parent directories to path for imports
sys.path.insert(0, ".")
sys.path.insert(1, "..")

from utils.knowledge_base_helper import (
    KnowledgeBasesForAmazonBedrock, upload_directory
)

# Initialize the knowledge base helper
kb = KnowledgeBasesForAmazonBedrock()

## Knowledge Base Configuration

Define the configuration parameters for the knowledge base:
- **Name**: Unique identifier with random suffix
- **Description**: Purpose and content description
- **S3 Location**: Bucket and prefix for storing documents
- **Embeddings Model**: Amazon Titan for semantic search

In [None]:
# Knowledge base configuration
lab5_knowledge_base_name = f'pr-agent-kb-{str(uuid.uuid4())[:8]}'
knowledge_base_description = "Knowledge Base containing examples of pristine, high-quality PR articles for media and entertainment content"
s3_bucket_name = f"labs-bucket-{region}-{account_id}"
bucket_prefix = "data/kb/reflection/"
embedding_model = "amazon.titan-embed-text-v2:0"

print(f"Knowledge Base Name: {lab5_knowledge_base_name}")
print(f"S3 Bucket: {s3_bucket_name}")
print(f"S3 Prefix: {bucket_prefix}")
print(f"Embedding Model: {embedding_model}")

## Upload PR Article Examples

Upload the example PR articles from the local `../good_prs` directory to S3. These articles will serve as reference material for the AI agents.

In [None]:
# Check if the good_prs directory exists
pr_examples_path = "good_prs"

if os.path.exists(pr_examples_path):
    print(f"Found PR examples directory: {pr_examples_path}")
    
    # List files to be uploaded
    files_to_upload = []
    for root, dirs, files in os.walk(pr_examples_path):
        files_to_upload.extend(files)
    
    print(f"Files to upload: {files_to_upload}")
    
    # Upload the files
    uploaded_files = upload_directory(pr_examples_path, s3_bucket_name, bucket_prefix)
else:
    print(f"Warning: PR examples directory not found at {pr_examples_path}")
    print("Please ensure the good_prs directory exists with example PR articles")

## Create Knowledge Base

Create the Bedrock Knowledge Base using the uploaded documents. This process:
1. Creates the knowledge base with specified configuration
2. Sets up the data source pointing to the S3 location
3. Configures the embedding model for semantic search
4. Returns the knowledge base ID and data source ID for later use

In [None]:
# Create or retrieve the knowledge base
print("Creating Knowledge Base...")

try:
    lab5_kb_id, lab5_ds_id = kb.create_or_retrieve_knowledge_base(
        lab5_knowledge_base_name,
        knowledge_base_description,
        s3_bucket_name,
        embedding_model,
        bucket_prefix
    )
    
    print(f"\n✓ Knowledge Base created successfully!")
    print(f"Knowledge Base ID: {lab5_kb_id}")
    print(f"Data Source ID: {lab5_ds_id}")
    
except Exception as e:
    print(f"✗ Failed to create Knowledge Base: {str(e)}")
    raise

## Store Knowledge Base ID

Store the knowledge base ID in the notebook's variable store so it can be accessed by other notebooks in the same session.

In [None]:
# Store the knowledge base ID for use in other notebooks
%store lab5_kb_id
%store lab5_ds_id
%store lab5_knowledge_base_name

print(f"Stored variables:")
print(f"  kb_id = {lab5_kb_id}")
print(f"  ds_id = {lab5_ds_id}")
print(f"  knowledge_base_name = {lab5_knowledge_base_name}")

## Verification

Verify that the knowledge base was created successfully and is ready for use.

In [None]:
# Verify the knowledge base exists and get its status
try:
    bedrock_agent_client = boto3.client('bedrock-agent', region_name=region)
    
    response = bedrock_agent_client.get_knowledge_base(knowledgeBaseId=lab5_kb_id)
    
    kb_status = response['knowledgeBase']['status']
    kb_name = response['knowledgeBase']['name']
    
    print(f"Knowledge Base Status: {kb_status}")
    print(f"Knowledge Base Name: {kb_name}")
    
    if kb_status == 'ACTIVE':
        print("✓ Knowledge Base is ready for use!")
    else:
        print(f"⚠ Knowledge Base is in {kb_status} state. It may need time to become active.")
        
except Exception as e:
    print(f"Could not verify knowledge base status: {str(e)}")

## Summary

This notebook has successfully:

### ✓ Completed Tasks
- **Environment Setup**: Configured AWS clients and credentials
- **Data Upload**: Uploaded PR article examples to S3
- **Knowledge Base Creation**: Created Bedrock Knowledge Base with semantic search
- **Variable Storage**: Stored IDs for use in other notebooks
- **Verification**: Confirmed knowledge base status

### 📋 Next Steps
1. **Run the main workflow**: Use `reflection_agents.ipynb` to execute the multi-agent PR generation
2. **Load stored variables**: The main notebook will automatically load `kb_id` and other stored variables
3. **Monitor performance**: Check knowledge base retrieval quality and adjust if needed

### 🔧 Configuration Details
- **Knowledge Base ID**: `{kb_id if 'kb_id' in locals() else 'Not yet created'}`
- **Embedding Model**: Amazon Titan Text Embeddings v2
- **Storage Location**: S3 with automatic indexing
- **Search Capability**: Semantic similarity search for relevant PR examples

The knowledge base is now ready to provide contextual examples to the CrewAI agents for high-quality PR article generation.