# Document Intelligence Services Test Notebook

This notebook tests each service individually and demonstrates the complete flow:
1. **Agent Service**: Start and store general conversations without documents
2. **Storage & Document Services**: Upload and process documents
3. **Agent Service with RAG**: Have conversations about uploaded documents

## Setup
First, let's import the necessary modules and initialize the application.

In [None]:
import sys
import os
from pathlib import Path
import uuid
from datetime import datetime

# Add the src directory to the path
sys.path.insert(0, str(Path.cwd() / "src"))

# Import the application and services
from doc_intel.app import DocumentIntelligenceApp
from doc_intel.services.agent_service import AgentService
from doc_intel.services.storage_service import StorageService
from doc_intel.services.document_service import DocumentService
from doc_intel.services.database_service import DatabaseService

print("✅ All imports successful")

✅ All imports successful


## Initialize the Application

Let's start by initializing the Document Intelligence application and checking the system status.

In [2]:
# Initialize the application
app = DocumentIntelligenceApp()
print("✅ Application initialized successfully")

# Check system status
status = app.get_system_status()
print("\n📊 System Status:")
for service_name, service_status in status["services"].items():
    status_icon = "✅" if service_status["available"] else "❌"
    print(f"{status_icon} {service_name.title()}: {service_status['message']}")

print(f"\n🏥 Overall Health: {'✅ Healthy' if status['overall_health'] else '❌ Unhealthy'}")

✅ Application initialized successfully

📊 System Status:
✅ Database: Database connection successful
✅ Agent: AI agent and vector search capabilities available
✅ Storage: Databricks connection available
✅ Document: Databricks connection available

🏥 Overall Health: ✅ Healthy


## Test 1: Agent Service - General Conversations

Let's test the agent service's ability to handle general conversations without documents.

In [5]:
app.agent_service.vectorstore.similarity_search("Hello, world!")

ProgrammingError: (psycopg2.errors.UndefinedColumn) column langchain_pg_embedding.id does not exist
LINE 1: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_i...
               ^

[SQL: SELECT langchain_pg_embedding.id AS langchain_pg_embedding_id, langchain_pg_embedding.collection_id AS langchain_pg_embedding_collection_id, langchain_pg_embedding.embedding AS langchain_pg_embedding_embedding, langchain_pg_embedding.document AS langchain_pg_embedding_document, langchain_pg_embedding.cmetadata AS langchain_pg_embedding_cmetadata, langchain_pg_embedding.embedding <=> %(embedding_1)s AS distance 
FROM langchain_pg_embedding JOIN langchain_pg_collection ON langchain_pg_embedding.collection_id = langchain_pg_collection.uuid 
WHERE langchain_pg_embedding.collection_id = %(collection_id_1)s::UUID ORDER BY distance ASC 
 LIMIT %(param_1)s]
[parameters: {'embedding_1': '[-0.58447265625,-0.39794921875,0.027435302734375,-0.0316162109375,-0.040374755859375,0.1781005859375,-0.324951171875,0.309814453125,-0.473876953125,- ... (14820 characters truncated) ... 375,-1.595703125,-0.2222900390625,-0.59716796875,-0.4765625,1.0,0.939453125,0.736328125,0.2452392578125,0.69873046875,-0.294189453125,-0.62744140625]', 'collection_id_1': UUID('df69593a-050d-48a4-b4e4-9243aaf99b7b'), 'param_1': 4}]
(Background on this error at: https://sqlalche.me/e/20/f405)

In [13]:
app.agent_service.embeddings.embed_query("Hello, world!")

HTTPError: 404 Client Error: Failed to find MT LLM Endpoint for requested endpoint: databricks-gte-large for url: https://adb-984752964297111.11.azuredatabricks.net/serving-endpoints/databricks-gte-large/invocations. Response text: {"error_code": "RESOURCE_DOES_NOT_EXIST", "message": "Failed to find MT LLM Endpoint for requested endpoint: databricks-gte-large"}

In [7]:
print("🧪 Testing Agent Service - General Conversations")
print("=" * 60)

# Test basic agent availability
print(f"Agent Service Available: {app.agent_service.is_available}")
print(f"RAG Capabilities Available: {app.agent_service.rag_available}")

if app.agent_service.is_available:
    # Test general conversation without documents
    general_messages = [
        {"role": "user", "content": "Hello! Can you tell me about yourself?"}
    ]
    
    print("\n💬 Testing general conversation...")
    success, response, metadata = app.agent_service.generate_response(
        messages=general_messages,
        context_documents=None
    )
    
    if success:
        print(f"✅ Response: {response}")
        print(f"📊 Metadata: {metadata}")
    else:
        print(f"❌ Failed: {response}")
        
    # Test conversation state management
    print("\n🔄 Testing conversation state management...")
    if app.agent_service.conversation_state_available:
        print("✅ Conversation state management available")
    else:
        print("⚠️ Conversation state management not available")
else:
    print("❌ Agent service not available")

🧪 Testing Agent Service - General Conversations
Agent Service Available: True
RAG Capabilities Available: True

💬 Testing general conversation...
✅ Response: Hello! I'm a document intelligence assistant designed to help you work with and understand documents. Here's what I can do for you:

**Document Analysis:**
- Read and analyze uploaded documents (PDFs, text files, images with text, etc.)
- Extract key information and summarize content
- Answer specific questions about document contents
- Help identify important sections, data, or themes

**General Assistance:**
- Provide explanations and clarifications about document content
- Help organize and structure information from documents
- Assist with document-related tasks like formatting guidance
- Offer insights and analysis based on the materials you share

**How to work with me:**
- Upload any documents you'd like me to analyze
- Ask specific questions about the content
- Request summaries or explanations of complex material
- Get he

## Test 2: Storage Service - Document Upload

Now let's test the storage service's ability to handle document uploads.

In [None]:
print("🧪 Testing Storage Service - Document Upload")
print("=" * 60)

# Create a sample document for testing
sample_content = """
This is a sample document for testing the storage service.
It contains multiple paragraphs to test document processing.
The document will be used to test the complete flow.

Key points:
- Document storage and retrieval
- Document processing pipeline
- Vector search capabilities
- RAG integration
""".encode('utf-8')

filename = "test_document.txt"
test_user = "test_user@example.com"

print(f"📄 Testing upload of: {filename}")
print(f"👤 Test user: {test_user}")
print(f"📏 Content length: {len(sample_content)} bytes")

# Test storage service upload
success, doc_hash, upload_path, message = app.storage_service.upload_document(
    file_content=sample_content,
    filename=filename,
    username=test_user
)

if success:
    print(f"✅ Upload successful!")
    print(f"🔑 Document hash: {doc_hash}")
    print(f"📁 Upload path: {upload_path}")
    print(f"💬 Message: {message}")
    
    # Store the doc_hash for later tests
    test_doc_hash = doc_hash
else:
    print(f"❌ Upload failed: {message}")
    test_doc_hash = None

## Test 3: Document Service - Document Processing

Now let's test the document service's ability to process uploaded documents.

In [None]:
print("🧪 Testing Document Service - Document Processing")
print("=" * 60)

if test_doc_hash:
    print(f"📄 Processing document: {test_doc_hash}")
    
    # Test document processing
    result = app.document_service.process_document(
        file_content=sample_content,
        filename=filename,
        username=test_user
    )
    
    if result["success"]:
        print(f"✅ Document processing successful!")
        print(f"📊 Result: {result}")
        
        # Check if document was stored in database
        doc_info = app.get_document_info(test_doc_hash)
        if doc_info:
            print(f"📋 Document info: {doc_info}")
        else:
            print("⚠️ Document not found in database")
    else:
        print(f"❌ Document processing failed: {result.get('error', 'Unknown error')}")
else:
    print("⚠️ Skipping document processing test - no document hash available")

## Test 4: Agent Service with RAG - Document Conversations

Now let's test the agent service's RAG capabilities with the uploaded document.

In [None]:
print("🧪 Testing Agent Service with RAG - Document Conversations")
print("=" * 60)

if test_doc_hash and app.agent_service.rag_available:
    print(f"🔍 Testing RAG with document: {test_doc_hash}")
    
    # Test document search
    search_query = "What are the key points mentioned in the document?"
    print(f"\n🔍 Search query: {search_query}")
    
    search_success, results, metadata = app.agent_service.search_documents(
        query=search_query,
        limit=3,
        document_ids=[test_doc_hash]
    )
    
    if search_success:
        print(f"✅ Search successful! Found {len(results)} results")
        for i, result in enumerate(results):
            print(f"\n📄 Result {i+1}:")
            print(f"   Content: {result['content'][:100]}...")
            print(f"   Document ID: {result['document_id']}")
            print(f"   Chunk Index: {result['chunk_index']}")
        
        # Test conversation with document context
        print(f"\n💬 Testing conversation with document context...")
        
        # Create a conversation about the document
        doc_messages = [
            {"role": "user", "content": "What is this document about?"}
        ]
        
        success, response, metadata = app.agent_service.generate_response(
            messages=doc_messages,
            context_documents=results
        )
        
        if success:
            print(f"✅ Response: {response}")
            print(f"📊 Metadata: {metadata}")
        else:
            print(f"❌ Failed: {response}")
    else:
        print(f"❌ Search failed: {metadata.get('error', 'Unknown error')}")
        
else:
    print("⚠️ Skipping RAG test - document hash or RAG capabilities not available")

## Test 5: Complete Flow Integration

Let's test the complete flow using the main application methods.

In [None]:
print("🧪 Testing Complete Flow Integration")
print("=" * 60)

# Test 1: Start a new conversation
print("1️⃣ Starting new conversation...")
conv_result = app.start_new_conversation(
    username=test_user,
    title="Test conversation with document"
)

if conv_result["success"]:
    conversation_id = conv_result["conversation_id"]
    print(f"✅ Conversation started: {conversation_id}")
    
    # Test 2: Upload and process document
    print("\n2️⃣ Uploading and processing document...")
    upload_result = app.upload_and_process_document(
        file_content=sample_content,
        filename=filename,
        username=test_user
    )
    
    if upload_result["success"]:
        doc_hash = upload_result["doc_hash"]
        print(f"✅ Document processed: {doc_hash}")
        
        # Test 3: Add document to conversation
        print("\n3️⃣ Adding document to conversation...")
        if app.add_documents_to_conversation(conversation_id, [doc_hash]):
            print(f"✅ Document added to conversation")
            
            # Test 4: Send message about the document
            print("\n4️⃣ Sending message about document...")
            message_result = app.send_chat_message(
                conversation_id=conversation_id,
                user_message="What are the main topics covered in this document?"
            )
            
            if message_result["success"]:
                print(f"✅ Response received: {message_result['response']}")
            else:
                print(f"❌ Message failed: {message_result.get('error', 'Unknown error')}")
        else:
            print(f"❌ Failed to add document to conversation")
    else:
        print(f"❌ Document processing failed: {upload_result.get('error', 'Unknown error')}")
else:
    print(f"❌ Failed to start conversation: {conv_result.get('error', 'Unknown error')}")

## Test 6: Service Health and Status

Let's check the final status of all services and verify everything is working.

In [None]:
print("🧪 Final Service Health Check")
print("=" * 60)

# Check final system status
final_status = app.get_system_status()
print("📊 Final System Status:")
for service_name, service_status in final_status["services"].items():
    status_icon = "✅" if service_status["available"] else "❌"
    print(f"{status_icon} {service_name.title()}: {service_status['message']}")

print(f"\n🏥 Overall Health: {'✅ Healthy' if final_status['overall_health'] else '❌ Unhealthy'}")

# Check conversation history
print(f"\n💬 Conversation History:")
conversations = app.get_user_conversations(test_user)
if conversations:
    for conv in conversations:
        print(f"   📝 {conv['title']} (ID: {conv['conversation_id']})")
else:
    print("   No conversations found")

# Check user documents
print(f"\n📄 User Documents:")
documents = app.get_user_documents(test_user)
if documents:
    for doc in documents:
        print(f"   📄 {doc['filename']} (Status: {doc['status']})")
else:
    print("   No documents found")

print("\n🎉 Testing complete!")
print("\n📋 Summary:")
print("✅ Agent Service: General conversations and RAG capabilities")
print("✅ Storage Service: Document upload and management")
print("✅ Document Service: Document processing pipeline")
print("✅ Integration: Complete flow from upload to conversation")
print("✅ Database: Conversation and document persistence")