This repository contains code for performing temporal credit assignment in reinforcement learning using trajectory-aware eligibility traces ("etraces").
The code requires Python 3.6+ to run.
To get started, cd
into the root directory and use pip
to install the required dependencies.
cd trajectory-aware-etraces/
pip install -r requirements.txt
The scripts needed for reproducing the experiments in the paper are contained in trajectory_aware_etraces/experiments/control
.
From the root directory,
cd trajectory_aware_etraces/experiments/control
- (Optional) Edit
config.yml
to change the environment, algorithms, or any other experiment settings.
Note: The default number of trials for all experiments is 1000, which may take a long time to run. You may want to start with a smaller value (e.g., 5 or 10) when experimenting for faster iteration.
python run_experiments.py
to generate the data (saved indata/
by default).- (Optional)
python grid_search.py
to print the AUC (mean and 95% confidence interval) for all of the tested hyperparameter combinations.- Identify the best α-value for each λ-value.
- Edit the
lambda_sweep_alphas
key inconfig.yml
to set the identified values for each method. (The default values were used to generate the plots in the paper.)
python lambda_sweep.py
to generate the λ-sweep plot.python learning_curves.py
to generate the learning curves.
Violating Condition 5.1 can sometimes cause divergence, as we demonstrated in the paper with two counterexamples. The code for these counterexamples, which calculates the Z matrix and its norm for the specified hyperparameter settings, can be run from the root directory:
Counterexample 5.7 Off-Policy Truncated IS:
python counterexamples/offpolicy_truncated_is.py
Counterexample 5.8 On-Policy Binary Traces:
python counterexamples/onpolicy_binary.py