This is a research report I authored over the summer as part of the CHERI fellowship program, under the supervision of Patrick Levermore. The project explores the complexities of AI alignment, with a specific focus on reinterpreting the Eliciting Latent Knowledge problem through the lens of the Comprehensive AI Services (CAIS) model. Furthermore, I delve into the model's applicability in ensuring R&D design safety and certification.
Refer to the blog post for a summary of the paper.