# üîí CUBO GDPR Compliance Guide

This notebook demonstrates CUBO's GDPR compliance features.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-username/cubo/blob/main/examples/02_gdpr_compliance.ipynb)

## GDPR Features Covered
1. **Query Scrubbing** - Automatic PII removal from logs
2. **Document Deletion** - Right to erasure (Article 17)
3. **Audit Export** - Compliance audit trails
4. **Data Tracing** - Full request traceability

In [None]:
import requests
import json
from datetime import datetime, timedelta

API_URL = "http://localhost:8000"

def check_api():
    try:
        r = requests.get(f"{API_URL}/api/health", timeout=5)
        return r.status_code == 200
    except Exception:
        return False

api_available = check_api()
print(f"API Status: {'‚úÖ Running' if api_available else '‚ùå Not running'}")

## 1Ô∏è‚É£ Query Scrubbing

CUBO automatically scrubs PII from query logs. The response includes a flag indicating if scrubbing occurred:

In [None]:
# Query with potential PII
if api_available:
    response = requests.post(
        f"{API_URL}/api/query",
        json={
            "query": "What are the vacation policies?",
            "top_k": 3
        },
        headers={"x-trace-id": "demo-trace-001"}
    )
    
    if response.status_code == 200:
        result = response.json()
        print(f"Query Scrubbed: {result.get('query_scrubbed', False)}")
        print(f"Trace ID: {result.get('trace_id')}")
        print("\nThis trace ID can be used for audit purposes.")

## 2Ô∏è‚É£ Document Deletion (Article 17 - Right to Erasure)

CUBO provides a DELETE endpoint to remove documents from the index:

In [None]:
def delete_document(doc_id: str):
    """Delete a document from CUBO index (GDPR Art. 17)."""
    response = requests.delete(
        f"{API_URL}/api/documents/{doc_id}",
        headers={"x-trace-id": f"deletion-{datetime.now().isoformat()}"}
    )
    return response.json() if response.status_code in [200, 404] else None

# Example deletion (will return 404 if document doesn't exist)
if api_available:
    result = delete_document("example_document.pdf")
    if result:
        print(json.dumps(result, indent=2))

### Deletion Response Structure

```json
{
  "doc_id": "example_document.pdf",
  "deleted": true,
  "chunks_removed": 15,
  "trace_id": "deletion-2024-11-30T10:30:00",
  "message": "Document example_document.pdf deleted successfully"
}
```

## 3Ô∏è‚É£ Audit Export

Export audit logs for compliance reviews:

In [None]:
def export_audit(start_date=None, end_date=None, format="json"):
    """Export GDPR audit log."""
    params = {"format": format}
    if start_date:
        params["start_date"] = start_date
    if end_date:
        params["end_date"] = end_date
    
    response = requests.get(f"{API_URL}/api/export-audit", params=params)
    return response

# Export last 7 days of audit data
if api_available:
    start = (datetime.now() - timedelta(days=7)).strftime("%Y-%m-%d")
    end = datetime.now().strftime("%Y-%m-%d")
    
    # JSON format
    result = export_audit(start, end, "json")
    if result.status_code == 200:
        data = result.json()
        print("üìä Audit Export Summary")
        print(f"   Date Range: {start} to {end}")
        print(f"   Total Entries: {data.get('count', 0)}")
        
        # Show first few entries
        entries = data.get('audit_entries', [])[:3]
        if entries:
            print("\n   Sample Entries:")
            for e in entries:
                print(f"   - {e['timestamp']}: {e['component']} - {e['action'][:50]}...")

### Export as CSV

For compliance tools that prefer CSV:

In [None]:
if api_available:
    result = export_audit(format="csv")
    if result.status_code == 200:
        # Save to file
        csv_content = result.content.decode('utf-8')
        lines = csv_content.split('\n')
        print(f"CSV Export: {len(lines)-1} rows")
        print(f"\nHeader: {lines[0]}")
        if len(lines) > 1:
            print(f"First row: {lines[1][:100]}...")

## 4Ô∏è‚É£ Request Tracing

Every request can be traced using trace IDs:

In [None]:
def get_trace(trace_id: str):
    """Get trace events for a specific request."""
    response = requests.get(f"{API_URL}/api/traces/{trace_id}")
    return response.json() if response.status_code == 200 else None

# If we have a trace ID from a previous query
if api_available and 'result' in dir() and result and result.get('trace_id'):
    trace_id = result['trace_id']
    trace_data = get_trace(trace_id)
    if trace_data:
        print(f"üìç Trace: {trace_id}")
        print(f"   Events: {len(trace_data.get('events', []))}")
        for event in trace_data.get('events', []):
            print(f"   - {event['timestamp']}: {event['component']}.{event['event']}")

## üîí GDPR Compliance Checklist

| Requirement | CUBO Feature | Status |
|-------------|--------------|--------|
| Right to Access (Art. 15) | `/api/traces/{id}` | ‚úÖ |
| Right to Erasure (Art. 17) | `DELETE /api/documents/{id}` | ‚úÖ |
| Data Portability (Art. 20) | `/api/export-audit` | ‚úÖ |
| Privacy by Design (Art. 25) | Query scrubbing, offline-first | ‚úÖ |
| Audit Trail | JSONL logs with trace IDs | ‚úÖ |

## üéØ Next Steps

- Review [03_multimodal_ocr.ipynb](03_multimodal_ocr.ipynb) for document processing
- See the [Privacy Documentation](../docs/PRIVACY.md) for full details
- Configure scrubbing patterns in `config.json`