## Prerequisites
- Python 3.9+ with the repo cloned locally
- `pip install -r requirements.txt` run at the repo root
- Optional GPU (the demo works on CPU for the toy dataset)

This notebook assumes it lives in the project root or the `docs/` folder shipped with the repo.

In [None]:
# --- Environment bootstrap ---
import os, sys, json, pprint
from pathlib import Path

PROJECT_ROOT = Path('..').resolve() if Path.cwd().name == 'docs' else Path('.')
sys.path.append(str(PROJECT_ROOT))

print(f'Project root set to: {PROJECT_ROOT}')

## 1. Load Sample Documents
Use the built-in `data/example` folder so the walkthrough works out-of-the-box. The `DataProcessor` handles encoding detection, validation, and train/test splits.

In [None]:
from backend.config import ConfigManager
from backend.data_processor import DataProcessor

config = ConfigManager(str(PROJECT_ROOT / 'config' / 'settings.yaml'))
processor = DataProcessor(config)
example_dir = PROJECT_ROOT / 'data' / 'example'
example_files = [str(p) for p in example_dir.glob('*.*')]

processed_data = processor.process_uploaded_files(example_files)
train_data, test_data = processor.split_data(processed_data)

print(f'Documents loaded: {len(processed_data.texts)}')
print(f'Train/Test split: {len(train_data.texts)} / {len(test_data.texts)}')

## 2. Spin Up the Application Core
`RFTApplication` mirrors the Gradio UI logic. We reuse `process_uploaded_files` to populate in-memory state exactly the way the interface expects.

In [None]:
from frontend.app_core import RFTApplication

app = RFTApplication(str(PROJECT_ROOT / 'config' / 'settings.yaml'))
upload_summary = app.process_uploaded_files(example_files)

pprint.pprint(upload_summary)

## 3. Inspect Predictions
Grab the next document queued for labeling, check the model's guess, and visualize the metadata you would normally see inside the Gradio labeling tab.

In [None]:
document, doc_payload = app.get_next_document()

print('--- Document Preview ---')
print(document[:500], '...')

print('
--- Model Prediction ---')
pprint.pprint(doc_payload['prediction'])
print(f