Skip to content
Implementing 'AI Safety via Debate' experiments
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Training classifiers via Debate

Note: This code is a work in progress. It will change, and hopefully get better, during the next few months.

This repository provides code to reproduce the experiments from AI Safety via Debate (blogpost).

On top of that we run additional experiment on MNIST as well as FashionMNIST data and train classifiers from debate results.


Install the python dependencies by running the following in a python 3.6

pip install -r requirements.txt


All code is located in the ai-safety-debate folder.

  • To train a judge use
  • To run individual debates use
  • To evaluate the accuracy of a judge combined with debate use
  • To use debate to train a classifier use

We use sacred for tracking experiments. The results are typically stored in the experiments and amplification_experiments folders. Scripts that use sacred have parameters specified in a config function. To specify values for these parameters, use the with statement, e.g.

python ai-safety-debate/ with judge_path=ai-safety-debate/saved_models/mnist4
You can’t perform that action at this time.