# Document Intelligence Services Testing & Configuration

This notebook tests each service in the Document Intelligence application using the **new consistent API architecture**.

## 🏗️ Service Architecture Overview

All services now use a consistent initialization pattern with **WorkspaceClient** and **config subsections**:

```python
# Consistent pattern for all services
service = ServiceClass(
    client=workspace_client,
    config=config_subsection
)
```

### 🔧 Core Services

| Service | Purpose | Config Section |
|---------|---------|----------------|
| **StorageService** | Document upload/download via Databricks volumes | `storage.*` |
| **DocumentService** | Serverless document processing job queuing | `document.*` |
| **DatabaseService** | PostgreSQL operations with graceful degradation | `database.*` |
| **EmbeddingService** | Text chunking, embeddings, vector operations | `embedding.*` |
| **AgentService** | LLM interactions, RAG, conversation management | `agent.*` |

### 🔐 Configuration System

- **`config.yaml`**: Non-sensitive settings (chunk sizes, timeouts, etc.)
- **Databricks Secrets**: Sensitive values (tokens, passwords)
- **DotConfig**: Clean dot notation access to configuration

Each service degrades gracefully when backends are unavailable.


In [None]:
# Import required libraries
import sys
import logging
from pathlib import Path
import json
from typing import Dict, Any, List, Optional

# Configure logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s - %(name)s - %(message)s')
logger = logging.getLogger(__name__)

# Add src to path
sys.path.append(str(Path.cwd()))

print("✅ Environment setup complete")
print(f"📁 Working directory: {Path.cwd()}")


## 📋 Configuration Overview

Let's examine the configuration system and service availability.


In [None]:
from src.doc_intelligence.config import config
from src.doc_intelligence.utils import create_workspace_client

print("=== Configuration System Status ===")
print(f"📋 Configuration sections: {list(config.config.keys())}")
print(f"📱 Application: {config.config.get('application', {}).get('name', 'N/A')}")
print(f"🔐 Secrets scope: {config.config.get('application', {}).get('secrets_scope', 'N/A')}")

print("\n=== Service Availability ===")
print(f"🏢 Databricks available: {config.databricks_available}")
print(f"🗄️  Database available: {config.database_available}")

# Create workspace client
client = create_workspace_client(
    host=config.databricks_host,
    token=config.databricks_token
)
print(f"🔗 WorkspaceClient created: {client is not None}")

if client:
    from src.doc_intelligence.utils import get_current_user
    user = get_current_user(client)
    print(f"👤 Current user: {user}")

print("\n=== Configuration Sections Detail ===")
for section_name in ['storage', 'document', 'database', 'embedding', 'agent']:
    section_config = config.config.get(section_name, {})
    print(f"📦 {section_name}: {len(section_config)} settings configured")
    if section_config:
        print(f"   Keys: {list(section_config.keys())}")


## 🧪 Service Testing

Test each service with their major implementations in the new consistent API.


In [None]:
from src.doc_intelligence.services import *

print("=== Comprehensive Service Testing ===")

# 💾 StorageService - Document upload/download
print("\n🔧 Testing StorageService...")
storage_service = StorageService(client, config.config.get("storage", {}))
print(f"   📁 Volume path: {storage_service.volume_path}")
print(f"   📏 Max file size: {storage_service.max_file_size_mb}MB")

# Test upload
test_content = b"Test document for Document Intelligence"
success, doc_hash, path, msg = storage_service.upload_document(
    test_content, "test.txt", "test@example.com"
)
print(f"   ✅ Upload test: {success} - {msg}")

# 🗄️ DatabaseService - PostgreSQL operations  
print("\n🔧 Testing DatabaseService...")
database_service = DatabaseService(client, config.config.get("database", {}))
print(f"   🗄️  Available: {database_service.is_available}")
if database_service.is_available:
    test_success, test_msg = database_service.test_connection()
    print(f"   🔌 Connection: {test_success} - {test_msg}")
    if test_success:
        user = database_service.create_user("test@example.com")
        print(f"   👤 User creation: {user is not None}")
else:
    print("   ⚠️  Database not configured - graceful degradation active")

# 🧠 EmbeddingService - Text chunking and embeddings
print("\n🔧 Testing EmbeddingService...")
agent_config = config.config.get("agent", {})
embedding_service = EmbeddingService(
    client, 
    config.config.get("embedding", {}),
    agent_config.get("embedding_endpoint")
)
print(f"   🧠 Available: {embedding_service.is_available}")
print(f"   📏 Chunk size: {embedding_service.chunk_size}")

# Test chunking
test_text = "This is a test document for chunking. " * 50  # Long text
chunk_success, chunks, chunk_msg = embedding_service.chunk_text(test_text)
print(f"   ✂️ Chunking: {chunk_success} - {len(chunks) if chunks else 0} chunks")

if chunks:
    # Test embeddings
    embed_success, embeddings, embed_msg = embedding_service.generate_embeddings(chunks[:2])
    print(f"   🧠 Embeddings: {embed_success} - {len(embeddings) if embeddings else 0} vectors")

# 🚀 DocumentService - Job queuing
print("\n🔧 Testing DocumentService...")
document_service = DocumentService(client, config.config.get("document", {}))
print(f"   ⏱️  Timeout: {document_service.timeout_minutes}min")
print(f"   🔄 Max retries: {document_service.max_retries}")

# Test job queue
queue_success, run_id, queue_msg = document_service.queue_document_processing(
    "/test/input.pdf", "/test/output", "test_hash"
)
print(f"   📋 Job queue: {queue_success} - Run ID: {run_id}")

if run_id:
    status_success, status_info, status_msg = document_service.check_job_status(run_id)
    print(f"   📊 Status check: {status_success} - State: {status_info.get('state', 'Unknown') if status_info else 'None'}")

# 🤖 AgentService - LLM and conversation management
print("\n🔧 Testing AgentService...")
agent_service = AgentService(client, config.config.get("agent", {}))
print(f"   🤖 Available: {agent_service.is_available}")
print(f"   🔗 LLM endpoint: {agent_service.chat_endpoint or 'Not configured'}")
print(f"   💬 Max tokens: {agent_service.llm_max_tokens}")
print(f"   🔍 RAG chunk size: {agent_service.rag_chunk_size}")
print(f"   🗃️  Checkpointer: {agent_service.checkpointer_type}")

# Test response generation
test_messages = [{"role": "user", "content": "Hello! Test message."}]
try:
    response_success, response, metadata = agent_service.generate_response(test_messages)
    print(f"   💬 Response generation: {response_success}")
    if response_success and response:
        preview = response[:100] + "..." if len(response) > 100 else response
        print(f"   📝 Response preview: {preview}")
        print(f"   🤖 Model: {metadata.get('model_used', 'Unknown')}")
except Exception as e:
    print(f"   💬 Response generation: Error - {str(e)}")

print("\n✅ Service testing complete! All services using consistent (client, config) API")

# Summary
print("\n=== Testing Summary ===")
print(f"✅ StorageService: Consistent API ✓")
print(f"✅ DatabaseService: Graceful degradation ✓") 
print(f"✅ EmbeddingService: Text processing ✓")
print(f"✅ DocumentService: Job management ✓")
print(f"✅ AgentService: LLM integration ✓")
print(f"🏗️  Architecture: All services use WorkspaceClient + config subsections")
print(f"🔧 Configuration: Moved from auth.* to application.secrets_scope")
print(f"📦 Modularity: Each service independently testable and configurable")
