# Structurify SDK Quickstart

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/structurify/structurify-sdk/blob/main/examples/structurify_quickstart.ipynb)

This notebook demonstrates how to use the [Structurify](https://structurify.ai) Python SDK to:

1. List available project templates
2. Create a project from a template
3. Upload documents
4. Run AI extraction
5. Export results

## Prerequisites

You'll need a Structurify API key. Get one from your [Structurify dashboard](https://app.structurify.ai).

## 1. Installation

Install the Structurify SDK from PyPI:

In [None]:
!pip install structurify -q

## 2. Setup

Import the SDK and configure your API key:

In [None]:
from structurify import Structurify
from google.colab import userdata

# Option 1: Use Colab secrets (recommended)
# Add your API key to Colab secrets with the name 'STRUCTURIFY_API_KEY'
try:
    api_key = userdata.get('STRUCTURIFY_API_KEY')
except:
    api_key = None

# Option 2: Set directly (not recommended for shared notebooks)
if not api_key:
    api_key = "sk_live_your_api_key"  # Replace with your actual API key

# Initialize the client
client = Structurify(api_key=api_key)
print("Structurify client initialized!")

## 3. List Available Templates

Structurify provides pre-configured templates for common document types like invoices, receipts, and contracts.

In [None]:
# List all available project templates
templates = client.templates.list()

print(f"Found {len(templates)} templates:\n")
for template in templates:
    print(f"  - {template['name']} (ID: {template['id']})")
    if template.get('description'):
        print(f"    {template['description']}")

## 4. Create a Project

Create a new project using one of the templates. Projects are containers for documents and define what data to extract.

In [None]:
# Create a project from the invoice template
project = client.projects.create(
    name="Colab Demo Project",
    template_id="tpl_invoice"  # Change this to match an available template
)

print(f"Created project: {project['name']}")
print(f"Project ID: {project['id']}")

## 5. Upload Documents

Upload documents to your project. Structurify supports PDFs, images (PNG, JPG), and office documents.

In [None]:
# Download a sample invoice PDF for testing
import urllib.request

sample_url = "https://www.w3.org/WAI/WCAG21/Techniques/pdf/img/table-word.pdf"
sample_file = "sample_document.pdf"

urllib.request.urlretrieve(sample_url, sample_file)
print(f"Downloaded sample document: {sample_file}")

In [None]:
# Upload the document to your project
doc = client.documents.upload(
    project_id=project['id'],
    file_path=sample_file
)

print(f"Uploaded document: {doc['name']}")
print(f"Document ID: {doc['id']}")
print(f"Status: {doc['status']}")

### Upload from Google Drive (Optional)

You can also upload documents directly from your Google Drive:

In [None]:
# Uncomment to mount Google Drive and upload from there
# from google.colab import drive
# drive.mount('/content/drive')

# Upload from Drive
# doc = client.documents.upload(
#     project_id=project['id'],
#     file_path="/content/drive/MyDrive/your_document.pdf"
# )

## 6. Run Extraction

Run AI extraction on all documents in the project. This consumes 1 credit per document.

In [None]:
# Start extraction job
job = client.extraction.run(project_id=project['id'])

print(f"Extraction job started!")
print(f"Job ID: {job['id']}")
print(f"Status: {job['status']}")
print(f"Total tasks: {job.get('totalTasks', 'N/A')}")

In [None]:
# Wait for extraction to complete (with progress updates)
import time

print("Waiting for extraction to complete...")

while True:
    job_status = client.extraction.get(job['id'])
    status = job_status.get('status', '')
    progress = job_status.get('progress', 0)
    
    print(f"  Status: {status}, Progress: {progress}%")
    
    if status in ['done', 'error', 'cancelled']:
        break
    
    time.sleep(2)

print(f"\nExtraction completed!")
print(f"Completed tasks: {job_status.get('completedTasks', 0)}")
print(f"Failed tasks: {job_status.get('failedTasks', 0)}")

## 7. Export Results

Export the extracted data as CSV or JSON.

In [None]:
# Export as CSV
export_result = client.exports.create(
    project_id=project['id'],
    format="csv"
)

print(f"Export created!")
print(f"Export ID: {export_result['export']['id']}")

In [None]:
# Download the export
csv_data = client.exports.download(export_result['export']['id'])

# Save to file
with open('extracted_data.csv', 'w') as f:
    f.write(csv_data)

print("Exported data saved to extracted_data.csv")
print("\nPreview:")
print(csv_data[:500] if len(csv_data) > 500 else csv_data)

In [None]:
# Load into pandas for analysis
import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO(csv_data))
df.head()

## 8. Cleanup (Optional)

Delete the project when you're done:

In [None]:
# Uncomment to delete the project
# client.projects.delete(project['id'])
# print(f"Project {project['id']} deleted")

## Error Handling

The SDK provides specific exception types for different error scenarios:

In [None]:
from structurify import (
    AuthenticationError,
    InsufficientCreditsError,
    NotFoundError,
    RateLimitError,
    ValidationError,
)

try:
    # This will fail with NotFoundError
    client.projects.get("proj_nonexistent")
except AuthenticationError:
    print("Invalid API key")
except NotFoundError:
    print("Project not found")
except InsufficientCreditsError:
    print("Not enough credits - please top up")
except RateLimitError as e:
    print(f"Rate limited - retry after {e.retry_after}s")
except ValidationError as e:
    print(f"Invalid request: {e.message}")

## Next Steps

- Read the [full documentation](https://docs.structurify.ai)
- Explore the [API reference](https://github.com/structurify/structurify-sdk/blob/main/docs/api-reference.md)
- Check out more [examples](https://github.com/structurify/structurify-sdk/tree/main/docs/examples)
- Visit [structurify.ai](https://structurify.ai) for the web interface