# Test Few-Shot Classification Implementation

This notebook tests the new `{FEW_SHOT_EXAMPLES}` placeholder functionality in the Classification service.

In [None]:
import sys
import os
import yaml
import json
from pathlib import Path

# Set ROOT_DIR - used to locate example images from local directory
# OR set CONFIGURATION_BUCKET to S3 Configration bucket name (contains config_library)
ROOTDIR = '../..'
os.environ['ROOT_DIR'] = f"{ROOTDIR}/"

# Add the idp_common package to the path
sys.path.insert(0, '{ROOTDIR}/lib/idp_common_pkg')

from idp_common.classification.service import ClassificationService


## Load the Few-Shot Configuration

In [None]:
# Load the few-shot configuration
config_path = f'{ROOTDIR}/config_library/pattern-2/few_shot_example_with_multimodal_page_classification/config.yaml'
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)

print("Configuration loaded successfully!")
print(f"Number of classes: {len(config.get('classes', []))}")
print(f"Classification method: {config.get('classification', {}).get('classificationMethod')}")

Configuration loaded successfully!
Number of classes: 12
Classification method: multimodalPageLevelClassification


## Examine the Task Prompt Template

In [24]:
# Look at the task prompt to see the FEW_SHOT_EXAMPLES placeholder
task_prompt = config['classification']['task_prompt']
print("Task prompt template:")
print("=" * 50)
print(task_prompt)
print("=" * 50)

# Check if it contains the placeholder
has_placeholder = "{FEW_SHOT_EXAMPLES}" in task_prompt
print(f"\nContains {{FEW_SHOT_EXAMPLES}} placeholder: {has_placeholder}")

Task prompt template:
Classify this document into exactly one of these categories:

{CLASS_NAMES_AND_DESCRIPTIONS}

Respond only with a JSON object containing the class label. For example: {{"class": "letter"}}
<few_shot_examples>
{FEW_SHOT_EXAMPLES}
</few_shot_examples>
<<CACHEPOINT>>
<document_ocr_data>
{DOCUMENT_TEXT}
</document_ocr_data>

Contains {FEW_SHOT_EXAMPLES} placeholder: True


## Initialize Classification Service

In [25]:
# Initialize the classification service with the few-shot config
try:
    service = ClassificationService(
        config=config,
        backend="bedrock",
        region="us-east-1"  # You may need to adjust this
    )
    print("Classification service initialized successfully!")
except Exception as e:
    print(f"Error initializing service: {e}")
    print("Note: This is expected if AWS credentials are not configured for Bedrock")

Classification service initialized successfully!


## Examine Class Examples Structure

In [26]:
# Let's examine the examples in the configuration
print("Examples found in configuration:")
print("=" * 50)

classes = config.get('classes', [])
total_examples = 0

for class_obj in classes:
    class_name = class_obj.get('name', 'Unknown')
    examples = class_obj.get('examples', [])
    
    print(f"\nClass: {class_name}")
    print(f"Number of examples: {len(examples)}")
    
    for i, example in enumerate(examples):
        print(f"  Example {i+1}:")
        print(f"    Name: {example.get('name', 'N/A')}")
        print(f"    Class Prompt: {example.get('classPrompt', 'N/A')}")
        print(f"    Image Path: {example.get('imagePath', 'N/A')}")
        
        # Check if image file exists (test the path resolution logic)
        image_path = example.get('imagePath')
        if image_path:
            print(f"    S3 URI: {image_path}")
        total_examples += 1

print(f"\nTotal examples across all classes: {total_examples}")
print(f"\nEnvironment variables:")
print(f"  CONFIGURATION_BUCKET: {os.environ.get('CONFIGURATION_BUCKET', 'Not set - using ROOT_DIR to resolve paths locally')}")
print(f"  ROOT_DIR: {os.environ.get('ROOT_DIR', 'Not set')}")


Examples found in configuration:

Class: letter
Number of examples: 2
  Example 1:
    Name: Letter1
    Class Prompt: This is an example of the class 'letter'
    Image Path: config_library/pattern-2/few_shot_example/example-images/letter1.jpg
    S3 URI: config_library/pattern-2/few_shot_example/example-images/letter1.jpg
  Example 2:
    Name: Letter2
    Class Prompt: This is an example of the class 'letter'
    Image Path: config_library/pattern-2/few_shot_example/example-images/letter2.png
    S3 URI: config_library/pattern-2/few_shot_example/example-images/letter2.png

Class: form
Number of examples: 0

Class: invoice
Number of examples: 0

Class: resume
Number of examples: 0

Class: scientific_publication
Number of examples: 0

Class: memo
Number of examples: 0

Class: advertisement
Number of examples: 0

Class: email
Number of examples: 1
  Example 1:
    Name: Email1
    Class Prompt: This is an example of the class 'email'
    Image Path: config_library/pattern-2/few_shot_ex

## Test Complete Content Building with Examples

In [27]:
# Test the complete content building with few-shot examples
print("Testing _build_content_with_few_shot_examples method...")

# Sample document text for testing
sample_document_text = "This is a sample document for testing classification."

try:
    # Get classification config
    classification_config = service._get_classification_config()
    task_prompt_template = classification_config['task_prompt']
    
    # Build content with few-shot examples
    content = service._build_content_with_few_shot_examples(
        task_prompt_template=task_prompt_template,
        document_text=sample_document_text,
        class_names_and_descriptions=service._format_classes_list()
    )
    
    print(f"Generated content array with {len(content)} items")
    print("\nContent structure:")
    
    for i, item in enumerate(content):
        print(f"\nItem {i+1}:")
        if 'text' in item:
            print(f"  Type: text")
            text_preview = item['text'][:200].replace('\n', '\\n')
            print(f"  Preview: {text_preview}{'...' if len(item['text']) > 200 else ''}")
        elif 'image' in item:
            print(f"  Type: image")
            print(f"  Format: {item['image'].get('format', 'unknown')}")
            if 'source' in item['image'] and 'bytes' in item['image']['source']:
                print(f"  Size: {len(item['image']['source']['bytes'])} bytes")
        else:
            print(f"  Type: unknown")
            print(f"  Keys: {list(item.keys())}")
            
except Exception as e:
    print(f"Error building content with few-shot examples: {e}")
    import traceback
    traceback.print_exc()

Testing _build_content_with_few_shot_examples method...
Generated content array with 12 items

Content structure:

Item 1:
  Type: text
  Preview: Classify this document into exactly one of these categories:\n\nletter  	[ A formal written correspondence with sender/recipient addresses, date, salutation, body, and closing signature ]\nform  	[ A str...

Item 2:
  Type: text
  Preview: This is an example of the class 'letter'

Item 3:
  Type: image
  Format: jpeg
  Size: 106629 bytes

Item 4:
  Type: text
  Preview: This is an example of the class 'letter'

Item 5:
  Type: image
  Format: png
  Size: 83993 bytes

Item 6:
  Type: text
  Preview: This is an example of the class 'email'

Item 7:
  Type: image
  Format: jpeg
  Size: 49648 bytes

Item 8:
  Type: text
  Preview: Here are example images for each page of a 5 page 'bank-statement '

Item 9:
  Type: image
  Format: jpeg
  Size: 101188 bytes

Item 10:
  Type: image
  Format: jpeg
  Size: 153717 bytes

Item 11:
  Type: image
  Forma

## Test Path Resolution Logic

In [None]:
# Test path resolution with different environment variables
print("Testing image path resolution logic:")
print("=" * 50)

# Test 1: Without ROOT_DIR or CONFIGURATION_BUCKET
print("\n1. WITHOUT ROOT_DIR or CONFIGURATION_BUCKET:")
print("-" * 50)

if 'ROOT_DIR' in os.environ:
    del os.environ['ROOT_DIR']
if 'CONFIGURATION_BUCKET' in os.environ:
    del os.environ['CONFIGURATION_BUCKET']

try:
    # Create a new service instance without ROOT_DIR
    test_service = ClassificationService(
        config=config,
        backend="bedrock",
        region="us-east-1"
    )
    
    examples_content = test_service._build_few_shot_examples_content()
    print(f"Successfully built {len(examples_content)} content items using calculated path")
    
    # Count successful image loads
    image_items = [item for item in examples_content if 'image' in item]
    print(f"Loaded {len(image_items)} image items from local files")
    
except Exception as e:
    print(f"Error building examples without ROOT_DIR: {e}")
    print("This is normal - eithe ROOT_DIR or CONFIGURATION_BUCKET must be set, OR image paths must specify full S3 URI")


# Test 2: With CONFIGURATION_BUCKET
print("\n2. WITH CONFIGURATION_BUCKET environment variable:")
print("-" * 50)

# Set a test bucket name
os.environ['CONFIGURATION_BUCKET'] = 'test-config-bucket'

try:
    test_service = ClassificationService(
        config=config,
        backend="bedrock",
        region="us-east-1"
    )
    
    print(f"CONFIGURATION_BUCKET set to: {os.environ.get('CONFIGURATION_BUCKET')}")
    print("Note: This would attempt to load images from S3, which may fail without proper setup")
    
    # This will likely fail since the S3 bucket doesn't exist, but it shows the logic
    try:
        examples_content = test_service._build_few_shot_examples_content()
        print(f"Successfully built {len(examples_content)} content items using S3")
    except Exception as e:
        print(f"Expected error when trying to access S3: {e}")
        print("This is normal - the logic correctly tries to use S3 when CONFIGURATION_BUCKET is set")

except Exception as e:
    print(f"Error with CONFIGURATION_BUCKET test: {e}")

# Restore config
del os.environ['CONFIGURATION_BUCKET']
os.environ['ROOT_DIR'] = ROOTDIR

Error reading binary content from s3://test-config-bucket/config_library/pattern-2/few_shot_example/example-images/letter1.jpg: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Failed to load image s3://test-config-bucket/config_library/pattern-2/few_shot_example/example-images/letter1.jpg: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Error reading binary content from s3://test-config-bucket/config_library/pattern-2/few_shot_example/example-images/letter2.png: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Failed to load image s3://test-config-bucket/config_library/pattern-2/few_shot_example/example-images/letter2.png: An error occurred (AccessDenied) when calling the GetObject operation: Access Denied
Error reading binary content from s3://test-config-bucket/config_library/pattern-2/few_shot_example/example-images/email1.jpg: An error occurred (AccessDenied) when calling t

Testing image path resolution logic:

1. WITHOUT ROOT_DIR or CONFIGURATION_BUCKET:
--------------------------------------------------
Error building examples without ROOT_DIR: Failed to load example images from config_library/pattern-2/few_shot_example/example-images/letter1.jpg: No CONFIGURATION_BUCKET or ROOT_DIR set. Cannot read example images from local filesystem.
This is normal - eithe ROOT_DIR or CONFIGURATION_BUCKET must be set, OR image paths must specify full S3 URI

2. WITH CONFIGURATION_BUCKET environment variable:
--------------------------------------------------
CONFIGURATION_BUCKET set to: test-config-bucket
Note: This would attempt to load images from S3, which may fail without proper setup
Expected error when trying to access S3: Failed to load example images from config_library/pattern-2/few_shot_example/example-images/bank-statement-pages/: An error occurred (AccessDenied) when calling the ListObjectsV2 operation: Access Denied
This is normal - the logic correctly t