# Policy Completeness Evaluation Framework - Technical Documentation

This document contains the technical reference for the Policy Completeness Evaluation Framework.


## Framework Components

The framework is organized into the following parts:


### Documentation (`docs/`)
- `beginners_guide.md` - Narrative guide for non-engineers
- `taxonomy_and_coverage.md` - Comprehensive test methodology and mapping
- `README.md` (this file) - Technical documentation


### Configuration (`config/`)
- `controls_map.yaml` - AIUC-1 control crosswalk mapping


### Source Code (`src/`)
- `eval_pipeline.ipynb` - Main evaluation pipeline with multi-turn rubric checks


### Data (`data/`)
- `tests.json` - Machine-readable test specifications (AC1-AC9)
- `sample_bot_responses.json` - Sample bot responses for testing


### Output (`output/`)
- `test_results.json` - Generated test results after running the pipeline


### Web Interface (`web/`)
- `product_view.html` - Buyer-facing narrative dashboard


## Quick Start Guide

1. **Open the evaluation pipeline**
   ```bash
   cd src
   jupyter notebook eval_pipeline.ipynb
   ```
2. **Configure test data**
   - Edit `../data/sample_bot_responses.json` with your bot's responses
3. **Run evaluation**
   - Execute all cells in the notebook
   - Results are saved to `../output/test_results.json`
4. **View results**
   - Open `../web/product_view.html`
   - Update the dashboard with your test results


## Repository Layout

```
policy-completeness-eval-pack2/
|-- README.md                     # Root-level overview
|-- config/
|   `-- controls_map.yaml         # AIUC-1 control crosswalk
|-- data/
|   |-- sample_bot_responses.json # Sample responses for tests
|   `-- tests.json                # AC1-AC9 specifications
|-- docs/
|   |-- README.md                 # Technical documentation (this file)
|   |-- beginners_guide.md        # Non-technical walk-through
|   `-- taxonomy_and_coverage.md  # Detailed methodology
|-- output/
|   `-- test_results.json         # Generated evaluation results
|-- src/
|   `-- eval_pipeline.ipynb       # Evaluation notebook
`-- web/
    `-- product_view.html         # Executive dashboard
```


## Notes
- The pack evaluates responses; it does not implement runtime safeguards.
- Tests are grounded in real airline incidents and generalize to other policy surfaces.
