An evaluation instrument for grimdark fiction, and a transfer protocol for tacit creative judgment.
Most evaluations of creative writing collapse into taste. A reviewer reads an output, decides whether it landed, and moves on. That works fine for a single reader. It does not produce infrastructure anyone else can run.
The Grimdark Lens is a project about closing that gap in a domain where the gap is wide. Grimdark fiction has identifiable conventions and identifiable failure modes, and current language models miss it in ways a working novelist can name on sight. This project takes that practitioner-level judgment and turns it into something a machine can apply, with the rubric, the anchors, the reliability study, and the reasoning all open to inspection.
- A codebook of fifteen error codes across three dimensions: Voice Commitment, Specificity, Consequence Weight. Each code has a failure description, a textual signal, a pass-fail rule, and a paired set of anchor passages.
- A five-annotator reliability study with per-code Fleiss' kappa values, the disagreement cases, and the calibration changelog that produced the final codes.
- A side-by-side viewer for the same prompt rendered across four frontier models, with error codes highlighted inline where the lens flagged them.
- A scrollable walk through how the codebook changed across calibration rounds, with the passages that forced each revision.
Launching December 2026. Click "Watch" → "Custom" → "Releases" to get one notification when this ships.
Alejandro Freixes (alejandroashes.com) has twenty years of experience designing evaluation frameworks and currently does rubric work for frontier AI labs.