## Setup DynamoDB Table for RAG Knowledge Base

This notebook demonstrates how to:
1. Create a DynamoDB table to store glossary terms and definitions
2. Insert terms and definitions into the table
3. Query the table for glossary lookups

The glossary will be used as a knowledge base for query expansion in an agentic RAG (Retrieval-Augmented Generation) system.

## Install Dependencies

Install required packages for working with AWS services and data processing.

In [None]:
%pip install --force-reinstall -q -r ../../features-examples/requirements.txt

In [None]:
# Restart kernel to ensure all installed dependencies are properly loaded
from IPython.core.display import HTML
HTML("<script>Jupyter.notebook.kernel.restart()</script>")

## Set Up AWS Session

Import boto3 and set up an AWS session for interacting with DynamoDB. Also check the boto3 version.

In [None]:
import json
import boto3
from typing import List, Optional
from pydantic import BaseModel, validator

session = boto3.session.Session()
region = session.region_name


In [None]:
import boto3
import json, sys
from datetime import datetime

print('Running boto3 version:', boto3.__version__)

## Parse Glossary Text

This function `convert_glossary_to_dict` parses a multi-line glossary string into a structured list of dictionaries. 
Each entry contains a term and its definition, which will be stored in the DynamoDB table.

In [None]:
def convert_glossary_to_dict(glossary):
    terms = []
    current_term = None
    current_definition = None

    for line in glossary.split('\n'):
        line = line.strip()
        if line:
            terms_def = line.split(': ')
            current_term = terms_def[0].lower()
            current_definition = terms_def[1]
            if current_term:
                terms.append({'term': current_term.lower(), 'term_definition': current_definition})

    return terms

## Test the Glossary Parser

Create a sample glossary with financial terminology and test the parsing function. 
The output shows a list of dictionaries with term-definition pairs converted from the text format.

In [None]:
sample_glossary = """
net book value: The net book value of each asset class is calculated as the cost less the accumulated depreciation.
dcm: Disclosure Committee Members 
ceo: Cheif Executive Officer
htm: held-to-maturity
DCMs: Disclosure Committee Members
"""
elements = convert_glossary_to_dict(sample_glossary)
print(elements)

## Create DynamoDB Table Helper Function

This function creates a DynamoDB table with a defined schema:
- Uses 'term' as the partition key (HASH)
- Sets up the attribute definitions and provisioned throughput
- Configures read and write capacity units

In [None]:
import boto3, botocore

def create_dynamo_table(table_name):
    """
    Creates a DynamoDB table with the given name and attributes.
    
    Args:
        table_name (str): The name of the DynamoDB table to create.
        attributes (dict): A dictionary where keys are attribute names and values are attribute data types.
    
    Returns:
        dict: The response from the DynamoDB service when creating the table.
    """
    # Create a DynamoDB client
    dynamodb = boto3.client('dynamodb')
    
    # Define the key schema and attribute definitions
    key_schema = [
        {'AttributeName': 'term', 'KeyType': 'HASH'},  # 'term' is the partition key
    ]
    attribute_definitions = [
        {'AttributeName': 'term', 'AttributeType': 'S'},
    ]
    
    # Create the table
    table = dynamodb.create_table(
        TableName=table_name,
        KeySchema=key_schema,
        AttributeDefinitions=attribute_definitions,
        ProvisionedThroughput={
            'ReadCapacityUnits': 5,
            'WriteCapacityUnits': 5
        }
    )
    
    return table

## Create the DynamoDB Table

Create a new DynamoDB table named 'glossary-2'. The output shows the table description including:
- Table attributes and schema
- Provisioned throughput values
- Creation time and status
- ARN and other table properties

In [None]:
# Define the table name
table_name = 'glossary-2'

try:
    # Attempt to create the table
    table_response = create_dynamo_table(table_name)
    print(table_response)
except botocore.exceptions.ClientError as e:
    # Check if the error is because the table already exists
    if e.response['Error']['Code'] == 'ResourceInUseException':
        print(f"Table {table_name} already exists. Skipping creation.")
    else:
        # Re-raise the exception if it's not the "table exists" error
        raise

## Insert Items into DynamoDB

This cell:
1. Creates a DynamoDB resource
2. References the table we just created
3. Adds a delay to ensure the table is fully created
4. Defines a function to batch write items to the table
5. Inserts the parsed glossary elements into the DynamoDB table

In [None]:
import boto3
import time

# Create a DynamoDB resource
dynamodb = boto3.resource('dynamodb')

table = dynamodb.Table(table_name)

time.sleep(10)

def write_items_ddb(elements):
# Batch write items to the table
    with table.batch_writer() as batch:
        for element in elements:
            batch.put_item(Item=element)

    print("Elements inserted successfully.")
    return 0

write_items_ddb(elements)

## Helper function: Get All Terms from DynamoDB

This function queries the DynamoDB table to retrieve all terms:
1. Creates a DynamoDB resource for the specified table
2. Uses the scan operation with a projection expression to get only the term field
3. Handles pagination with LastEvaluatedKey to get all items if there are more than the query limit
4. Returns a list of all terms in the table

In [None]:
def get_all_terms(table_name):
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table(table_name)

    response = table.scan(
        ProjectionExpression='term'
    )

    terms = [item['term'] for item in response['Items']]

    while 'LastEvaluatedKey' in response:
        response = table.scan(
            ProjectionExpression='term',
            ExclusiveStartKey=response['LastEvaluatedKey']
        )
        terms.extend([item['term'] for item in response['Items']])

    return terms


## Verify Term Storage

Test the `get_all_terms` function to retrieve and display all stored terms from the DynamoDB table.

In [None]:
get_all_terms(table_name)

## Store the Table Name

Store the table name as a variable that can be accessed in other notebooks using Jupyter's `%store` magic command.
This allows the second notebook to use the same table name without hardcoding it.

In [None]:
%store table_name

## Optional Clean Up

If you don't plan to run the second notebook: `02-langgraph_agentic_rag.ipynb`, please uncomment and run the cell below when you no longer need the table to avoid incurring unnecessary AWS costs.

In [None]:
# Cleanup function to delete the DynamoDB table
def delete_dynamodb_table(table_name):
    """
    Deletes a DynamoDB table with the given name.
    
    Args:
        table_name (str): The name of the DynamoDB table to delete.
    
    Returns:
        dict: The response from the DynamoDB service when deleting the table.
    """
    try:
        # Create a DynamoDB client
        dynamodb = boto3.client('dynamodb')
        
        # Delete the table
        response = dynamodb.delete_table(
            TableName=table_name
        )
        
        print(f"Table {table_name} deletion initiated. Waiting for completion...")
        
        # Create a DynamoDB resource
        dynamodb_resource = boto3.resource('dynamodb')
        
        # Wait for the table to be deleted
        table = dynamodb_resource.Table(table_name)
        table.meta.client.get_waiter('table_not_exists').wait(TableName=table_name)
        
        print(f"Table {table_name} has been successfully deleted.")
        return response
        
    except boto3.exceptions.ClientError as e:
        if e.response['Error']['Code'] == 'ResourceNotFoundException':
            print(f"Table {table_name} does not exist.")
        else:
            print(f"Error deleting table: {e}")
        return None

In [None]:
# To delete the table, uncomment the line below and run this cell
# delete_dynamodb_table(table_name)