# Batch demo with Colab
This notebook shows how to run the customer-service pipeline on ten sample emails inside Google Colab, display the replies, and export the batch to Excel. The steps assume the notebook lives in the `notebooks/` directory of the project (as when opened via GitHub → "Open in Colab").


## 1. Install dependencies
Install the project requirements into the Colab runtime. When the notebook is opened from the repo, the parent folder contains `requirements.txt`.


In [None]:
%%capture
%cd ../
%pip install -q -r requirements.txt
%cd notebooks/


## 2. Configure the runtime
Set the backend to the deterministic stub (so the notebook does not require Ollama/llama.cpp) and load helper modules.


In [None]:
import os
import json
from pathlib import Path
import pandas as pd
from app.pipeline import run_pipeline

os.environ.setdefault('MODEL_BACKEND', 'stub')
os.environ.setdefault('PIPELINE_LOG_PATH', '')

data_path = Path('../data/test_emails.json')
records = json.loads(data_path.read_text())
df = pd.DataFrame(records)
df['expected_keys'] = df['expected_keys'].apply(lambda v: v or [])
df = df.rename(columns={'body': 'email'})
batch_df = df.head(10).copy()


## 3. Review sample emails
Preview the first ten emails and their expected keys before invoking the model to establish ground truth.


In [None]:
preview_df = batch_df[['id', 'subject', 'email', 'expected_keys']].copy()
preview_df['expected_keys'] = preview_df['expected_keys'].apply(lambda keys: ', '.join(keys))
preview_df.reset_index(drop=True)


## 4. Run the pipeline and append results
Loop over the batch DataFrame, call `run_pipeline`, and enrich it with replies, answers, and evaluation scores.


In [None]:
rows = []
for _, row in batch_df.iterrows():
    metadata = {'expected_keys': row.get('expected_keys')}
    result = run_pipeline(row['email'], metadata=metadata)
    rows.append({
        'id': row['id'],
        'subject': row['subject'],
        'email': row['email'],
        'expected_keys': ', '.join(row.get('expected_keys', [])),
        'reply': result.get('reply', ''),
        'answers': json.dumps(result.get('answers', {}), ensure_ascii=False),
        'score': result.get('evaluation', {}).get('score'),
        'matched': ', '.join(result.get('evaluation', {}).get('matched', [])),
        'missing': ', '.join(result.get('evaluation', {}).get('missing', [])),
    })

results_df = pd.DataFrame(rows)
results_df


## 5. Save to Excel
Write the enriched DataFrame to an Excel workbook so the responses can be shared or audited.


In [None]:
output_path = Path('colab_batch_responses.xlsx')
results_df.to_excel(output_path, index=False)
output_path.resolve()


## 6. Download from Colab (optional)
If you're in Colab, run the next cell to download the Excel file to your local machine.


In [None]:
try:
    from google.colab import files
    files.download(str(output_path))
except ImportError:
    print('google.colab is unavailable outside Colab; skip download step.')
