Code and data for the blog post Mapping Deception, a replication of the MASK benchmark for evaluating AI honesty.
| Path | Description |
|---|---|
blog/ |
Blog post source: sections, figures, and the build pipeline |
eval_logs/ |
Encrypted (.eval.enc) eval logs from the replication runs |
The eval logs are encrypted to comply with the MASK dataset access policy.
To build you need age_private.key (the private key) in the repo root. Open an issue with your email and I'll send it.
uv sync
make build # decrypts eval_logs/ → eval_logs_dec/, scans, generates build/blog_post.html
make serve # local preview at localhost:9437Explore the raw eval logs (requires age_private.key):
make decrypt # (if not already done) decrypts to eval_logs_dec/
uv run inspect view eval_logs_dec/Run the eval yourself against a model of your choice. See usage instructions.
Suggested extensions:
- Add additional models
- Extend the analysis (see footnotes for some directions that I think would be interesting).
- Spot mistakes in the write-up
The public key (age.pub) is in the repo.
make encrypt-log LOG=path/to/your.eval # produces eval_logs/your.eval.encThen open a PR with the .enc file.
I will update the blog post when a good number of additional models have been added.
@misc{simmons2025mappingdeception,
title={Mapping Deception},
author={Scott Simmons},
year={2025},
url={https://sdsimmons.com/assets/writing/mask-blog-post/mask_eval.html},
}