Skip to content

CannyLab/causal_overhypotheses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Towards Understanding How Machines Can Learn Causal Overhypotheses (Submission: NeurIPS Baselines 2022)

This repository hosts the code for the blicket-environment baselines presented in the paper Towards Understanding How Machines Can Learn Causal Overhypotheses.

Environment Details

The environment is a standard gym environment located at envs/causal_env_v0.py.

Running our Benchmarks

The following details are provided for running the experiments in the paper.

Installation and Requirements

To install the environment, we require python <= 3.7, and tensorflow/tensorflow-gpu < 2 (usually 1.15.5). This is due to a dependency on the old version of stable-baselines (the new version does not support LSTM models). Dependencies can be installed with:

pip install tensorflow==1.15.5 stable-baselines gym==0.21.0 protobuf==3.20 tqdm

Q-Learning

To run Q-Learning experiments, use the command python models/q_learning.py with the correct options:

usage: q_learning.py [-h] [--num NUM] [--alpha ALPHA] [--discount DISCOUNT] [--epsilon EPSILON]

Train a q-learner

optional arguments:
  -h, --help           show this help message and exit
  --num NUM            Number of times to experiment
  --alpha ALPHA        Learning rate
  --discount DISCOUNT  Discount factor
  --epsilon EPSILON    Eepsilon-greedy exploration rate

Standard RL Models

To train standard RL models, use the command python driver.py with the correct options:

usage: driver.py [-h] [--alg ALG] [--policy POLICY] [--lstm_units LSTM_UNITS]
                 [--num_steps NUM_STEPS]
                 [--quiz_disabled_steps QUIZ_DISABLED_STEPS]
                 [--holdout_strategy HOLDOUT_STRATEGY]
                 [--reward_structure REWARD_STRUCTURE]

Train a model

optional arguments:
  -h, --help            show this help message and exit
  --alg ALG             Algorithm to use
  --policy POLICY       Policy to use
  --lstm_units LSTM_UNITS
                        Number of LSTM units
  --num_steps NUM_STEPS
                        Number of training steps
  --quiz_disabled_steps QUIZ_DISABLED_STEPS
                        Number of quiz disabled steps (-1 for no forced
                        exploration)
  --holdout_strategy HOLDOUT_STRATEGY
                        Holdout strategy
  --reward_structure REWARD_STRUCTURE
                        Reward structure

Algorithm Choices: [a2c, ppo2]

Policy Choices: [mlp, mlp_lstm, mlp_lnlstm]

Holdout Strategy Choices: [none, disjunctive_train (Only disjunctive overhypotheses), conjunctive_train (Only conjunctive overhypotheses), disjunctive_loo (Only disjunctive, leave on out), conjunctive_loo (Only conjunctive, leave one out), both_loo (Leave one out for both)]

Reward Structure Choices: [baseline (Light up the blicket detector), quiz (Determine which are blickets), quiz-type (Determine which are blickets + Causal vs. Non-Causal), quiz-typeonly (Causal vs. Non-Causal only)]

This will produce a model output file {model_name}.zip based on the options that are chosen during the training process. Evaluation data is printed during the training process, and training is terminated when 3M training steps are reached.

Decision Transformer / Behavior Cloning

Training the decision transformer consists of two (or three) steps. The first step is to generate trajectories for training. This can either be done with scripts/collect_trajectories.py, which has the following signature:

usage: collect_trajectories.py [-h] [--env ENV] [--model_path MODEL_PATH]
                               [--num_trajectories NUM_TRAJECTORIES]
                               [--max_steps MAX_STEPS]
                               [--quiz_disabled_steps QUIZ_DISABLED_STEPS]
                               [--output_path OUTPUT_PATH]

Collect Trajectories from Causal Environments

optional arguments:
  -h, --help            show this help message and exit
  --env ENV             Environment to use
  --model_path MODEL_PATH
                        Path to model
  --num_trajectories NUM_TRAJECTORIES
                        Number of trajectories to collect
  --max_steps MAX_STEPS
                        Maximum number of steps per trajectory
  --quiz_disabled_steps QUIZ_DISABLED_STEPS
                        Number of steps to disable quiz
  --output_path OUTPUT_PATH
                        Path to output file

The model path generated by the training step from the standard RL models can be passed to this script to generate samples from the pre-trained model. This will generate a file trajectories.pkl. This pkl file should be re-named to take the form causal-<name>-v2.pkl. This file should then be moved to the models/decision-transformer/data folder.

To train the decision transformer, follow the instructions in models/decision-transformer to install the conda environment, and then run python experiment.py with the options --env causal and a --dataset option corresponding to the chosen value for <name> above (for example, if the trajectories are named causal-mlp-v2.pkl the command would be python experiment.py --env causal --dataset mlp --batch_size 128 --K 30 --model dt). The batch size, K, and model (dt: decision transformer, bc: behavior cloning) can be adjusted as desired.

License Information

This code is licensed under the MIT license by the Regents of the University of California. The code we use for the Decision Transformer benchmark is also licensed under the MIT license by the authors of https://arxiv.org/abs/2106.01345.

About

Code for Dataset and Benchmarks Submission, Neurips 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages