This project is for "Finding Interpretable and Transferable Meta-Knowledge for Reinforcement Learning" These algorithms will make original RL algorithms more interpretable and transferable. In this project, We employ Asynchronous Advantage Actor-critic (A3C) and Proximal Policy Optimization (PPO) to combine MKRL, to form MKA3C and MKPPO respectively. Optional envs include CartPole, MountainCar, Acrobot, Pendulum,InvertedPendulum, Swimmer, Hopper and BipedalWalke.
It requires python3 (>=3.5). In order to install requirements, follow:
conda create --name dtrl python=3.5
conda activate dtrl
pip install -r requirements.txt
The A3C algorithm implement in discrete action version.
python A3C_Original.py --mode 'train' --env "Acrobot-v1" --render False --load False
--getting_data True
The PPO algorithm implement for continuous action for Gym env.
python DPPO_Original.py --mode 'train' --env "Pendulum-v1" --render False --load False --getting_data True
The PPO algorithm implement for MUJOCO env.
python DPPO_MUJOCO_Original.py --mode 'train' --env "BipedalWalke-v1" --render False --load False --getting_data True
--mode
'train' or 'test'.
--load False
: if True, trained network is loaded during retraining.
--getting_data
True: if True, collect data for generating tree model when test.
Classification Decision Tree, it used for discrete action version.
python DecisionTree.py --env "Acrobot-v1"
Regression Decision Tree, it used for discrete action version.
python DRegressionDecisionTree.py --env "BipedalWalke-v2"
3. MKRL, utilize both the nonlinear fitting ability of RL algorithm and the interpretation of decision tree.
The MKA3C algorithm implement in discrete action version.
python A3C_transfer.py --mode 'train' --env "Acrobot-v1" --render False --load False --mixed_version True
The MKPPO algorithm implement for continuous action for Gym env.
python DPPO_transfer.py --mode 'train' --env "Pendulum-v1" --render False --load False --mixed_version True
The MKPPO algorithm implement for MUJOCO env.
python DPPO_MUJOCO_Transfer.py --mode 'test' --env "BipedalWalke-v2" --render False --load False --mixed_version True
--mode
'train' or 'test'.
--load False
: if True, trained network is loaded during retraining.
--mixed_version
True: if True, MKRL, otheriwise, DT.