| Name | Affiliation | |
|---|---|---|
| Sangyub Lee | Intelligence and Informatics Lab, Korea University, Seoul, South Korea & Korean National Police Agency |
yubii2@korea.ac.kr |
| Heedou Kim | Data Mining and Information Systems Lab, Korea University, Seoul, South Korea & Korean National Police Agency |
heedou123@korea.ac.kr |
| Hyuncheol Kim* | Intelligence and Informatics Lab, Korea University, Seoul, South Korea |
harrykim@korea.ac.kr |
- *: Corresponding Author
PAS (Police Action Scenario) is a dedicated framework for evaluating Large Language Models (LLMs) in real-world policing contexts.
Modern policing requires nuanced judgment and situational awareness—standard benchmarks alone are not sufficient. PAS introduces a scenario-based, multi-stage evaluation method designed specifically for policing tasks.
PAS defines LLM evaluation as a five-stage process:
-
S: Police Action Scenarios
Situation-driven tasks reflecting real-world policing needs. -
R: Reference Responses
Expert-crafted gold answers created with input from law enforcement professionals. -
G: Response Generation
LLM-generated outputs based on the given scenarios. -
M: Core Evaluation Metrics
Task-relevant metrics and evaluation methodologies tailored for public safety applications. -
P: Policing LLM Performance Evaluation
Final assessment of the LLM’s effectiveness, accuracy, and fitness for deployment in policing.
Formally expressed as:
E_police = f(S, R, G, M, P)
PAS fills the gap in evaluating LLMs for law enforcement by combining structured scenarios, expert benchmarks, and targeted metrics.