# DTC API SDK Method Flow Guide

This notebook demonstrates the complete workflow for processing documents using the DTC API SDK's new `upload_file_to_webhook()` method.

## Overview

The document processing flow consists of 3 simple steps:
1. **Create Task** - Set up a processing task with pipeline configuration
2. **Upload File** - Process document through webhook endpoint
3. **Automatic Cleanup** - Task cleans up automatically (no manual action needed)

Let's walk through each step!


In [None]:
# Import required libraries
import json
import os
from dtc_api_sdk.client import DTCApiClient

# Initialize the DTC API client
# Note: The client automatically looks for DTC_API_KEY environment variable
# You can also pass api_key directly: DTCApiClient(api_key="your_key_here")
client = DTCApiClient()

print("✅ SDK client initialized successfully!")
print(f"Base URL: {client.base_url}")
print(f"API Key source: {'Environment variable DTC_API_KEY' if os.getenv('DTC_API_KEY') else 'Direct parameter'}")
print(f"API Key: {client.api_key[:20]}...")  # Show first 20 chars for verification


✅ SDK client initialized successfully!
Base URL: https://eaas-dev.aparavi.com
API Key: qFz-sHpQd3NMAk1XU9Xv...


In [2]:
# Load pipeline configuration from JSON file
with open('../example_pipelines/simpleparser.json', 'r') as f:
    pipeline_data = json.load(f)

# Wrap configuration with required "pipeline" key
pipeline_config = {
    "pipeline": {
        "source": "webhook_1",
        "components": pipeline_data.get("components", [])
    }
}

print("✅ Pipeline configuration loaded")
print(f"Components: {len(pipeline_config['pipeline']['components'])}")

# Create task using the pipeline configuration
task_token = client.execute_task(
    pipeline_config, 
    name="SDK Flow Demo Task"
)

print(f"✅ Task created successfully!")
print(f"Task Token: {task_token}")
print(f"Task ready for file processing")


✅ Pipeline configuration loaded
Components: 3
✅ Task created successfully!
Task Token: f1c4c7cc-6cf1-43be-b5aa-764f3b4b6121
Task ready for file processing


## Step 2: Upload & Process Document

Now we can upload a document for processing using the new `upload_file_to_webhook()` method. This method handles all the complexity of direct file upload automatically.


In [3]:
# Check what files are available for testing
import os
test_files = os.listdir('../test_data')
print("Available test files:")
for file in test_files:
    print(f"  - {file}")

# Choose a file for processing
file_path = "../test_data/10-MB-Test.docx"  # Using a file that we know exists
print(f"\n📄 Processing file: {file_path}")

# Upload and process the file using the new SDK method
try:
    result = client.upload_file_to_webhook(
        token=task_token,
        file_path=file_path,
        timeout=60
    )
    
    print("✅ File processed successfully!")
    print(f"Result type: {type(result)}")
    print(f"Result keys: {list(result.keys()) if isinstance(result, dict) else 'Not a dict'}")
    
    # Show a sample of the result
    if isinstance(result, dict):
        for key, value in result.items():
            if isinstance(value, str) and len(value) > 100:
                print(f"{key}: {value[:100]}...")
            else:
                print(f"{key}: {value}")
    else:
        print(f"Result: {result}")
        
except Exception as e:
    print(f"❌ Error processing file: {e}")
    print("This might be due to server connection issues, but the method call structure is correct.")


Available test files:
  - 10-MB-Test.docx
  - 10-MB-Test.xlsx
  - 1301648458579-12786030723-ticket.pdf
  - Challenge_LeadAiDev.docx

📄 Processing file: ../test_data/10-MB-Test.docx
✅ File processed successfully!
Result type: <class 'dict'>
Result keys: ['status', 'data', 'metrics']
status: OK
data: {'objectsRequested': 1, 'objectsCompleted': 1, 'types': {}, 'objects': {'3b6cb688-1e2e-589e-9f5b-3746f869c259': {'__types': {'text': 'text'}, 'metadata': {'Content-Type': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'cp:revision': '3', 'dc:creator': 'Brian Hileman', 'dc:modifier': 'Brian Hileman', 'dc:publisher': 'bewglobal', 'dcterms:created': '2017-04-27T17:44:00Z', 'dcterms:modified': '2017-04-27T17:46:00Z', 'xmpTPg:NPages': '1'}, 'name': 'a57594cc-ffd5-441b-9da3-a33b5ef7ee42', 'path': 'a57594cc-ffd5-441b-9da3-a33b5ef7ee42', 'text': ['First and Last Name\tSSN\tCredit Card Number\tFirst and Last Name\tSSN\tCredit Card Number\tFirst and Last Name\tSSN\tCredit C

## Step 3: Automatic Cleanup

The task automatically cleans up after processing is complete. No manual cleanup is required!

## Complete Example Function

Here's how to combine all steps into a reusable function:


In [4]:
def process_document_with_sdk(file_path, pipeline_config_path='../example_pipelines/simpleparser.json'):
    """
    Complete document processing workflow using the DTC API SDK.
    
    Args:
        file_path: Path to the document to process
        pipeline_config_path: Path to the pipeline configuration JSON file
        
    Returns:
        Processing result from the API
    """
    # Initialize client
    client = DTCApiClient()
    
    # Load pipeline configuration
    with open(pipeline_config_path, 'r') as f:
        pipeline_data = json.load(f)
    
    # Wrap configuration with required "pipeline" key
    pipeline_config = {
        "pipeline": {
            "source": "webhook_1",
            "components": pipeline_data.get("components", [])
        }
    }
    
    # Step 1: Create task
    task_token = client.execute_task(
        pipeline_config, 
        name=f"Process {os.path.basename(file_path)}"
    )
    
    # Step 2: Upload and process file
    result = client.upload_file_to_webhook(
        token=task_token,
        file_path=file_path,
        timeout=60
    )
    
    # Step 3: Automatic cleanup (no action needed)
    return result

# Usage example (uncomment to test with a small file)
result = process_document_with_sdk("../test_data/Challenge_LeadAiDev.docx")
print(f"Processing completed: {result}")

print("✅ Complete workflow function defined!")
print("Use process_document_with_sdk() to process any document with just one function call.")


DTCApiError: HTTP error during file upload: 500 Server Error: Internal Server Error for url: https://eaas-dev.aparavi.com/webhook?type=cpu&apikey=qFz-sHpQd3NMAk1XU9Xv53hBfU3FqxmaSIXJPUTDld-Gl7in1xBVAHT1eo1Ers25&token=2523531d-3398-4ce9-a8a8-d570fd1fcf89

In [None]:
# Path to the invoice PDF
invoice_path = "test_data/Invoice-E6CD52F5-0002.pdf"

# Process the invoice with error handling
def process_invoice_with_retry(file_path, max_retries=3):
    """Process invoice PDF with retry logic and error handling"""
    print(f"Processing: {file_path}")
    
    for attempt in range(max_retries):
        try:
            print(f"Attempt {attempt + 1}/{max_retries}")
            
            # Execute task with pipeline configuration
            task_token = client.execute_task(
                pipeline_config, 
                name="invoice-processing"
            )
            print(f"Task created with token: {task_token[:8]}...")
            
            # Prepare file data for webhook
            with open(file_path, 'rb') as f:
                file_content = f.read()
            
            webhook_data = {
                "filename": os.path.basename(file_path),
                "content_type": "application/pdf",
                "size": len(file_content),
                "data": base64.b64encode(file_content).decode('utf-8')
            }
            
            print(f"File size: {len(file_content)} bytes")
            print(f"Base64 data length: {len(webhook_data['data'])} characters")
            
            # Send file for processing with extended timeout
            print("Sending file for processing...")
            original_timeout = client.timeout
            client.timeout = 120  # Increase timeout for PDF processing
            
            response = client.send_webhook(task_token, webhook_data)
            
            # Restore original timeout
            client.timeout = original_timeout
            
            return response
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                print("Retrying in 5 seconds...")
                import time
                time.sleep(5)
            else:
                print("All attempts failed!")
                raise

# Process the invoice
result = process_invoice_with_retry(invoice_path)
print("Processing completed!")


Processing: test_data/Invoice-E6CD52F5-0002.pdf
Attempt 1/3
Task created with token: 6174ca42...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 1 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 2/3
Task created with token: 0efc6518...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 2 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 3/3
Task created with token: 8285bf97...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 3 failed: All connection attempts failed
All attempts failed!


DTCApiError: All connection attempts failed

In [None]:
# Path to the invoice PDF
invoice_path = "test_data/Invoice-E6CD52F5-0002.pdf"

# Process the invoice with error handling
def process_invoice_with_retry(file_path, max_retries=3):
    """Process invoice PDF with retry logic and error handling"""
    print(f"Processing: {file_path}")
    
    for attempt in range(max_retries):
        try:
            print(f"Attempt {attempt + 1}/{max_retries}")
            
            # Execute task with pipeline configuration
            task_token = client.execute_task(
                pipeline_config, 
                name="invoice-processing"
            )
            print(f"Task created with token: {task_token[:8]}...")
            
            # Prepare file data for webhook
            with open(file_path, 'rb') as f:
                file_content = f.read()
            
            webhook_data = {
                "filename": os.path.basename(file_path),
                "content_type": "application/pdf",
                "size": len(file_content),
                "data": base64.b64encode(file_content).decode('utf-8')
            }
            
            print(f"File size: {len(file_content)} bytes")
            print(f"Base64 data length: {len(webhook_data['data'])} characters")
            
            # Send file for processing with extended timeout
            print("Sending file for processing...")
            original_timeout = client.timeout
            client.timeout = 120  # Increase timeout for PDF processing
            
            response = client.send_webhook(task_token, webhook_data)
            
            # Restore original timeout
            client.timeout = original_timeout
            
            return response
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                print("Retrying in 5 seconds...")
                import time
                time.sleep(5)
            else:
                print("All attempts failed!")
                raise

# Process the invoice
result = process_invoice_with_retry(invoice_path)
print("Processing completed!")


Processing: test_data/Invoice-E6CD52F5-0002.pdf
Attempt 1/3
Task created with token: 6174ca42...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 1 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 2/3
Task created with token: 0efc6518...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 2 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 3/3
Task created with token: 8285bf97...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 3 failed: All connection attempts failed
All attempts failed!


DTCApiError: All connection attempts failed

In [None]:
# Path to the invoice PDF
invoice_path = "test_data/Invoice-E6CD52F5-0002.pdf"

# Process the invoice with error handling
def process_invoice_with_retry(file_path, max_retries=3):
    """Process invoice PDF with retry logic and error handling"""
    print(f"Processing: {file_path}")
    
    for attempt in range(max_retries):
        try:
            print(f"Attempt {attempt + 1}/{max_retries}")
            
            # Execute task with pipeline configuration
            task_token = client.execute_task(
                pipeline_config, 
                name="invoice-processing"
            )
            print(f"Task created with token: {task_token[:8]}...")
            
            # Prepare file data for webhook
            with open(file_path, 'rb') as f:
                file_content = f.read()
            
            webhook_data = {
                "filename": os.path.basename(file_path),
                "content_type": "application/pdf",
                "size": len(file_content),
                "data": base64.b64encode(file_content).decode('utf-8')
            }
            
            print(f"File size: {len(file_content)} bytes")
            print(f"Base64 data length: {len(webhook_data['data'])} characters")
            
            # Send file for processing with extended timeout
            print("Sending file for processing...")
            original_timeout = client.timeout
            client.timeout = 120  # Increase timeout for PDF processing
            
            response = client.send_webhook(task_token, webhook_data)
            
            # Restore original timeout
            client.timeout = original_timeout
            
            return response
            
        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt < max_retries - 1:
                print("Retrying in 5 seconds...")
                import time
                time.sleep(5)
            else:
                print("All attempts failed!")
                raise

# Process the invoice
result = process_invoice_with_retry(invoice_path)
print("Processing completed!")


Processing: test_data/Invoice-E6CD52F5-0002.pdf
Attempt 1/3
Task created with token: 6174ca42...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 1 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 2/3
Task created with token: 0efc6518...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 2 failed: All connection attempts failed
Retrying in 5 seconds...
Attempt 3/3
Task created with token: 8285bf97...
File size: 39631 bytes
Base64 data length: 52844 characters
Sending file for processing...
Attempt 3 failed: All connection attempts failed
All attempts failed!


DTCApiError: All connection attempts failed