# HCL IDP Solution - DynamoDB Table Setup

This notebook sets up DynamoDB tables for the HCL IDP solution:
- `hcltech-doc-extraction` - Stores extracted document data
- `hcltech-dashboard` - Stores dashboard and summary data

## Prerequisites
- AWS credentials configured
- IAM permissions for DynamoDB operations
- boto3 library installed

## Step 1: Install Required Libraries

In [None]:
!pip install boto3 -q

## Step 2: Import Libraries and Setup

In [2]:
import boto3
import json
import os
import re
from botocore.exceptions import ClientError
from IPython.display import display, Markdown

# Configuration
REGION = 'us-east-1'  # Change this if needed
print(f"üåç Using AWS Region: {REGION}")

üåç Using AWS Region: us-east-1


## Step 3: Verify AWS Credentials

In [None]:
# Check AWS credentials
try:
    sts = boto3.client('sts')
    identity = sts.get_caller_identity()
    print("‚úÖ AWS Credentials verified")
    print(f"   Account: {identity['Account']}")
    print(f"   User/Role: {identity['Arn'].split('/')[-1]}")
except Exception as e:
    print(f"‚ùå AWS Credentials error: {e}")
    print("Please configure AWS credentials before proceeding")

## Step 4: Define Table Creation Functions

In [3]:
def create_dynamodb_table(table_name, key_schema, attribute_definitions, region=REGION):
    """
    Create a DynamoDB table with specified configuration
    """
    dynamodb = boto3.client('dynamodb', region_name=region)
    
    try:
        response = dynamodb.create_table(
            TableName=table_name,
            KeySchema=key_schema,
            AttributeDefinitions=attribute_definitions,
            BillingMode='PAY_PER_REQUEST',
            TableClass='STANDARD',
            DeletionProtectionEnabled=True,
            WarmThroughput={
                'ReadUnitsPerSecond': 12000,
                'WriteUnitsPerSecond': 4000
            }
        )
        
        print(f"‚úÖ Creating table: {table_name}")
        print(f"   Table ARN: {response['TableDescription']['TableArn']}")
        
        # Wait for table to become active
        waiter = dynamodb.get_waiter('table_exists')
        print(f"‚è≥ Waiting for table {table_name} to become active...")
        waiter.wait(TableName=table_name)
        print(f"‚úÖ Table {table_name} is now active")
        
        return True
        
    except ClientError as e:
        if e.response['Error']['Code'] == 'ResourceInUseException':
            print(f"‚ö†Ô∏è  Table {table_name} already exists")
            return True
        else:
            print(f"‚ùå Error creating table {table_name}: {e}")
            return False

## Step 5: Create HCL Tables

In [4]:
print("üöÄ Setting up HCL DynamoDB tables...")

# Common table configuration (same as NMM tables)
key_schema = [
    {
        'AttributeName': 'docid',
        'KeyType': 'HASH'
    }
]

attribute_definitions = [
    {
        'AttributeName': 'docid',
        'AttributeType': 'S'
    }
]

tables_to_create = [
    'hcltech-doc-extraction',
    'hcltech-dashboard'
]

success_count = 0

for table_name in tables_to_create:
    if create_dynamodb_table(table_name, key_schema, attribute_definitions, REGION):
        success_count += 1
    print()  # Add spacing

print(f"üìä Summary: Tables created/verified: {success_count}/{len(tables_to_create)}")

if success_count == len(tables_to_create):
    print("‚úÖ All tables are ready!")
else:
    print("‚ùå Some tables failed to create")

üöÄ Setting up HCL DynamoDB tables...
‚úÖ Creating table: hcltech-doc-extraction
   Table ARN: arn:aws:dynamodb:us-east-1:040504913362:table/hcltech-doc-extraction
‚è≥ Waiting for table hcltech-doc-extraction to become active...
‚úÖ Table hcltech-doc-extraction is now active

‚úÖ Creating table: hcltech-doc-dashboard
   Table ARN: arn:aws:dynamodb:us-east-1:040504913362:table/hcltech-doc-dashboard
‚è≥ Waiting for table hcltech-doc-dashboard to become active...
‚úÖ Table hcltech-doc-dashboard is now active

üìä Summary: Tables created/verified: 2/2
‚úÖ All tables are ready!


## Step 6: Verify Tables

In [5]:
# Verify that tables exist and are active
dynamodb = boto3.client('dynamodb', region_name=REGION)

tables_to_check = [
    'hcltech-doc-extraction',
    'hcltech-dashboard'
]

print("üîç Verifying tables...")

for table_name in tables_to_check:
    try:
        response = dynamodb.describe_table(TableName=table_name)
        status = response['Table']['TableStatus']
        item_count = response['Table']['ItemCount']
        table_size = response['Table']['TableSizeBytes']
        
        print(f"‚úÖ {table_name}:")
        print(f"   Status: {status}")
        print(f"   Items: {item_count}")
        print(f"   Size: {table_size} bytes")
        print()
        
    except ClientError as e:
        if e.response['Error']['Code'] == 'ResourceNotFoundException':
            print(f"‚ùå {table_name}: NOT FOUND")
        else:
            print(f"‚ùå {table_name}: ERROR - {e}")

üîç Verifying tables...
‚úÖ hcltech-doc-extraction:
   Status: ACTIVE
   Items: 0
   Size: 0 bytes

‚úÖ hcltech-doc-dashboard:
   Status: ACTIVE
   Items: 0
   Size: 0 bytes



## Step 7: Update Python Code to Use New Tables

In [6]:
# def update_file_table_names(file_path):
#     """
#     Update table names in a Python file
#     """
#     replacements = {
#         'nmm-doc-extraction': 'hcltech-doc-extraction',
#         'nmm-dashboard': 'hcltech-dashboard',
#         'nmm-dashboard': 'hcltech-dashboard'
#     }
    
#     try:
#         with open(file_path, 'r') as file:
#             content = file.read()
        
#         original_content = content
        
#         # Replace table names
#         for old_name, new_name in replacements.items():
#             content = content.replace(f'"{old_name}"', f'"{new_name}"')
#             content = content.replace(f"'{old_name}'", f"'{new_name}'")
        
#         # Write back if changes were made
#         if content != original_content:
#             with open(file_path, 'w') as file:
#                 file.write(content)
#             print(f"‚úÖ Updated: {file_path}")
#             return True
#         else:
#             print(f"‚ÑπÔ∏è  No changes needed: {file_path}")
#             return False
            
#     except Exception as e:
#         print(f"‚ùå Error updating {file_path}: {e}")
#         return False

# # Update all Python files
# print("üîÑ Updating table names in Python files...")

# python_files = [
#     'agent1_docextraction_agent.py',
#     'agent2_docclassification_agent.py', 
#     'agent3_doc_entity_extraction.py',
#     'orchestrator.py',
#     'orchestrator-agent.py'
# ]

# updated_count = 0

# for file_name in python_files:
#     if os.path.exists(file_name):
#         if update_file_table_names(file_name):
#             updated_count += 1
#     else:
#         print(f"‚ö†Ô∏è  File not found: {file_name}")

# print(f"\nüìä Summary: {updated_count} files updated")
# print("‚úÖ Table name update completed!")

üîÑ Updating table names in Python files...
‚úÖ Updated: agent1_docextraction_agent.py
‚úÖ Updated: agent2_docclassification_agent.py
‚úÖ Updated: agent3_doc_entity_extraction.py
‚ö†Ô∏è  File not found: orchestrator.py
‚ÑπÔ∏è  No changes needed: orchestrator-agent.py

üìä Summary: 3 files updated
‚úÖ Table name update completed!


## Step 8: Final Verification

In [7]:
# List all tables to confirm HCL tables exist
try:
    response = dynamodb.list_tables()
    all_tables = response['TableNames']
    hcl_tables = [table for table in all_tables if 'hcltech' in table]
    
    print("üéØ HCL Tables found:")
    for table in hcl_tables:
        print(f"   ‚úÖ {table}")
    
    if len(hcl_tables) >= 2:
        print("\nüéâ Setup completed successfully!")
        print("\nüìù Next steps:")
        print("   1. ‚úÖ Tables created")
        print("   2. ‚úÖ Code updated")
        print("   3. üîÑ Test the application")
        print("   4. üîß Configure IAM permissions if needed")
    else:
        print("\n‚ö†Ô∏è  Not all tables were created. Please check the errors above.")
        
except Exception as e:
    print(f"‚ùå Error listing tables: {e}")

üéØ HCL Tables found:

‚ö†Ô∏è  Not all tables were created. Please check the errors above.


## Troubleshooting

If you encounter issues:

### Permission Errors
```bash
# Check AWS credentials
aws sts get-caller-identity

# Verify DynamoDB permissions
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::ACCOUNT:user/USERNAME \
  --action-names dynamodb:CreateTable \
  --resource-arns "*"
```

### Required IAM Permissions
- `dynamodb:CreateTable`
- `dynamodb:DescribeTable`
- `dynamodb:ListTables`

### Table Configuration
- **Billing Mode**: Pay-per-request (no upfront costs)
- **Deletion Protection**: Enabled (prevents accidental deletion)
- **Warm Throughput**: 12,000 RCU, 4,000 WCU (for consistent performance)

## Cost Considerations
- Pay-per-request billing charges only for actual usage
- Warm throughput incurs additional costs for guaranteed performance
- Storage charged per GB of data stored

## Security Best Practices
1. Use IAM roles instead of access keys when possible
2. Grant minimum required permissions (least privilege)
3. Enable CloudTrail for audit logging
4. Consider VPC endpoints for private access