Website with results and examples available here.
Given a policy p, a default policy d, and some condition, this code will allow you to score states according to how important it is to follow p over d when trying to satisfy the condition. If we choose d to be some simple default action, we can understand in which states p is actually useful over doing something obvious.
Using Anaconda to make a new environment:
conda env create -f env.yml
conda activate polrank
For CartPole, that's it!
For Minigrid:
pip3 install gym-minigrid
pip3 install torch-ac
For Atari Games, if you want to use the pre-trained Atari-Zoo agents (recommended), you will need to set up their package as well.
git clone https://github.com/uber-research/atari-model-zoo.git
cd atari-model-zoo
python3 setup.py install
To run experiments from paper, use quick_start.py. Run the experiment with:
python3 quick_start.py [ENV_NAME]
To show the possible games, do
python3 quick_start.py -h
Running an experiment will download any models needed to run the experiments. Then, this will run a counting phase, in which the test suite is built, a scoring phase in which all the states are scored, and an interpolating phase, in which pruned policies are made and tested. Results will be stored in the results
folder.
To run the code with all the customization that is available, use:
python3 polrank
You may want to look at the commands in quick_start.py
to get started.
Credit for each environment and policy-training method is supplied in the README for each environment, in polrank/environments/