### Aparavi Pipeline Demo Notebook

This notebook demonstrates how to use the Aparavi Data Toolchain SDK for document processing and PII anonymization.

![Aparavi pipeline](../images/pipeline.png)

In [1]:
%pip install -q -U -r ../requirements.txt

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.1.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
# Load env and initialize Aparavi client
from dotenv import load_dotenv
import os
from aparavi_dtc_sdk import AparaviClient

load_dotenv(dotenv_path="../env", override=True)

api_key = os.getenv("APARAVI_API_KEY")
base_url = os.getenv("APARAVI_BASE_URL", "https://eaas.aparavi.com/")

print("✓ Environment loaded")
print("APARAVI_BASE_URL:", repr(base_url))
print("APARAVI_API_KEY set:", bool(api_key))


✓ Environment loaded
APARAVI_BASE_URL: 'https://eaas-dev.aparavi.com/'
APARAVI_API_KEY set: True


In [None]:
# initialize client
client = None
if not api_key:
    print("Please set APARAVI_API_KEY in ../env")
else:
    client = AparaviClient(base_url=base_url, api_key=api_key)
    print("✓ Client initialized")
    print("  Endpoint:", base_url)


✓ Client initialized
  Endpoint: https://eaas-dev.aparavi.com/


In [10]:
# Process a sample document
try:
    result = client.execute_pipeline_workflow(
        pipeline ="../pipelines/anonymize_pipeline.json",
        file_glob="../tests/sampleData/*.pdf"
    )
    
    if result:
        print("✓ Document processed successfully!")
        if isinstance(result, dict):
            print(f"  Status: {result.get('status', 'Unknown')}")
            print(f"  Completed: {result.get('completed', False)}")
except Exception as e:
    print(f"✗ Error: {e}")


[Aparavi DTC SDK] POST /pipe/validate
[Aparavi DTC SDK] Status: 200
[Aparavi DTC SDK] Pipeline validation: OK
[Aparavi DTC SDK] PUT /task
[Aparavi DTC SDK] Status: 200
[Aparavi DTC SDK] Task token: 7ba7d0f3-c3cd-41dd-89f6-4ce13de667cf
[Aparavi DTC SDK] Webhook pipeline detected. Polling until task is running...
[Aparavi DTC SDK] GET /task
[Aparavi DTC SDK] Status: 200
[Aparavi DTC SDK] Task Status: [Attempt 1] OK
[Aparavi DTC SDK] GET /task
[Aparavi DTC SDK] Status: 200
[Aparavi DTC SDK] Task Status: [Attempt 2] OK
[Aparavi DTC SDK] Task operation failed: Interrupted by user.
[Aparavi DTC SDK] DELETE /task
[Aparavi DTC SDK] Status: 200
[Aparavi DTC SDK] Task ended: OK


In [None]:
import sys
sys.path.insert(0, "..")
from tools.render import render_aparavi_result

# Render the result
render_aparavi_result(result)

### 1. 1d3da009-8ff8-4ace-9dd0-b8f62febec9c

`1d3da009-8ff8-4ace-9dd0-b8f62febec9c`


# Complex Sample PDF Document

This PDF contains various elements including text, tables, forms, images, handwriting-like text, and sensitive-looking information for testing purposes.

| Name  | ███ | ███         | Credit Card         |
| ----- | --- | ----------- | ------------------- |
| █████ | ██  | ███████████ | ███████████████████ |
| ███   | ██  | ███████████ | ████████████████████|


## Here is a generated graph:

| X-axis | Sine Wave |
| ------ | --------- |
| 0      | 0.00      |
| 1      | 0.50      |
| 2      | 1.00      |
| 3      | 0.75      |
| 4      | 0.00      |
| 5      | -1.00     |
| 6      | -0.75     |
| 7      | 0.00      |
| 8      | 1.00      |
| 9      | 0.75      |
| 10     | -0.50     |


## Handwritten Note (simulated):

*This is a sample handwritten note. Meet me at 5 pm!*

## ████████████:

| First Name: | \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |
| ----------- | ---------------------------------- |
| Last Name:  | \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |
| Signature:  | \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ |





# Sensitive Information (for testing):

Password: █████████████
Bank Account Number: ██████████
Medical Record: ████████████████

