# üöÄ CUBO Quick Start Guide

This notebook will get you up and running with CUBO in under 5 minutes!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/your-username/cubo/blob/main/examples/01_quick_start.ipynb)

## What You'll Learn
1. Install CUBO and dependencies
2. Ingest documents
3. Query your documents
4. Understand citations and sources

## üì¶ Installation

If running on Colab, uncomment and run the cell below:

In [None]:
# Uncomment for Colab installation
# !pip install -q sentence-transformers faiss-cpu torch requests

In [None]:
import sys
from pathlib import Path

# Add CUBO to path if running locally
cubo_root = Path(".").resolve().parent
if cubo_root.exists() and str(cubo_root) not in sys.path:
    sys.path.insert(0, str(cubo_root))

print(f"CUBO root: {cubo_root}")

## üìÑ Create Sample Documents

Let's create some sample documents to work with:

In [None]:
# Create sample documents
sample_docs = {
    "company_policy.txt": """Company Policy Document
    
Working Hours: Employees are expected to work 40 hours per week.
Remote work is permitted with manager approval.

Vacation Policy: Employees receive 20 days of paid vacation per year.
Vacation requests must be submitted 2 weeks in advance.

Health Benefits: Full medical, dental, and vision coverage.
Coverage begins on the first day of employment.
""",
    "product_manual.txt": """Product Manual - Model X100
    
Installation: Connect the device to a power source.
Wait for the LED to turn green before proceeding.

Troubleshooting:
- Red LED: Power issue, check connections
- Blinking LED: Firmware update in progress
- No LED: Contact support at support@example.com

Warranty: 24-month warranty from date of purchase.
Register your product at warranty.example.com
"""
}

# Create data directory and save documents
data_dir = Path("sample_data")
data_dir.mkdir(exist_ok=True)

for filename, content in sample_docs.items():
    (data_dir / filename).write_text(content)
    print(f"‚úÖ Created {filename}")

## üîå Using the API

If you have the CUBO API server running, you can query it directly:

In [None]:
import requests

API_URL = "http://localhost:8000"

def check_api():
    """Check if API is running."""
    try:
        r = requests.get(f"{API_URL}/api/health", timeout=5)
        return r.status_code == 200
    except Exception:
        return False

api_available = check_api()
print(f"API Status: {'‚úÖ Running' if api_available else '‚ùå Not running (start with: python start_api_server.py)'}")

## üîç Query Your Documents

Let's ask questions about our documents:

In [None]:
def query_documents(question: str, top_k: int = 3):
    """Query CUBO API and return response with citations."""
    if not api_available:
        print("‚ö†Ô∏è API not available. Start the server first.")
        return None
    
    response = requests.post(
        f"{API_URL}/api/query",
        json={"query": question, "top_k": top_k}
    )
    
    if response.status_code == 200:
        return response.json()
    else:
        print(f"Error: {response.status_code}")
        return None

# Example query
if api_available:
    result = query_documents("What is the vacation policy?")
    if result:
        print("üìù Answer:")
        print(result.get("answer", "No answer"))
        print("\nüìö Citations:")
        for c in result.get("citations", []):
            print(f"  - {c['source_file']} (score: {c['relevance_score']:.2f})")

## üìä Understanding Citations

Each query response includes structured citations for transparency:

In [None]:
def display_citations(result):
    """Display citations in a formatted way."""
    if not result or "citations" not in result:
        print("No citations available")
        return
    
    print("\n" + "="*50)
    print("üìö SOURCE CITATIONS")
    print("="*50)
    
    for i, citation in enumerate(result["citations"], 1):
        print(f"\n[{i}] {citation['source_file']}")
        if citation.get('page'):
            print(f"    Page: {citation['page']}")
        print(f"    Chunk: {citation['chunk_index']}")
        print(f"    Relevance: {citation['relevance_score']:.2%}")
        print(f"    Snippet: {citation['text_snippet'][:100]}...")

# Display citations from previous query
if api_available and result:
    display_citations(result)

## üéØ Next Steps

- Check out [02_gdpr_compliance.ipynb](02_gdpr_compliance.ipynb) for GDPR features
- See [03_multimodal_ocr.ipynb](03_multimodal_ocr.ipynb) for PDF/image processing
- Read the [API Documentation](../docs/API_INTEGRATION.md) for full API reference

In [None]:
# Cleanup
import shutil

if data_dir.exists():
    shutil.rmtree(data_dir)
    print("‚úÖ Cleaned up sample data")