Explainability of Deep Reinforcement Learning Algorithms in Robotic Domains by using Layer-wise Relevance Propagation
Our modified versions of robotic environments are under the ./CustomGymEnvs
directory.
In this directory, there is a changed_envs
directory which contains the new FetchReach-v1
environment called FetchReach-v2
with changed action-space. The actions in the updated environment are torques (rather than
the x, y, and z velocity of the end-effector). Under the envs
directory, there are
original environments and environments with occluded entities. Under the faulty_envs
, there
are environments with blocked joints. Under the graph_envs
, there are environments with graph
representation of the robots.
For parsing the robot's xml
model and converting the representation into a graph, the
RobotGraphModel
package has been developed. Under this directory, there is a model_parser.py
file which parses the xml
model of the environment. The robot_graph.py
first parses the model
of the robot, identifying the nodes (<body>
in the xml
) and edges (<joint>
in the xml
) of the robot.
Two nested <body>
's are connected to each other through a <joint>
that is defined in the inner body.
For each environment, we have developed a class specific to that environment that has inherited from
the RobotGraph
class withing the robot_graph.py
file. Each of these subclasses define the
set of node and edge features for a specific environment. Each of these subclasses are used
by the OpenAI Gym wrappers under the CustomGymEnvs/graph_envs
.
Our algorithm is Soft Actor-Critic. The one with graph representation is
under ./Graph_SAC
and the original one with fully-connected network is under ./SAC
. For using
Graph Neural Network architecture, we use the implementation of
torchgraph developed for the paper:
Explainability Techniques for Graph Convolutional Networks.
For the LRP implementation, we use this repository developed for the same paper.
The python version is 3.8.10
.
The first step before running the project is to install MuJoCo 2.1
:
$ wget https://github.com/deepmind/mujoco/releases/download/2.1.0/mujoco210-linux-x86_64.tar.gz
$ tar -xvf mujoco210-linux-x86_64.tar.gz
$ mv mujoco210 ~/.mujoco/
$ pip3 install -U 'mujoco-py<2.2,>=2.1'
Download the project file into the $HOME/Documents
folder.
Then add the following lines to the ~/.bashrc
file:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin
export PYTHONPATH=$PYTHONPATH:$HOME/Documents/SAC_GCN
Then install the requirements of the project:
$ pip3 install -r requirements.txt
To run the experiments with graph representation of the robot, run the following command:
$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type graph
where the MAIN_FILE
is the absolute path to the /Controller/graph/main.py
file. For a complete set of arguments, please
check out the main.py
file. The ENV-NAME
can be the following names:
FetchReach-v2
Walker2d-v2
HalfCheetah-v2
Hopper-v2
After training the agent using graph networks, the Layer-wise Relevance Propagation (LRP)
is applied to highlight the contribution of each part of the robot to the decision making. The data
for experiments are saved under ./Data/{ENV-NAME}/graph
.
After the convergence of the policy, the LRP is applied to the learned policy to calculate the relevance scores given
by each action to each entity across time-steps. To run LRP for the ENV-NAME
environment,
run the following:
$ python $EVALUATE --env-name {ENV-NAME} --exp-type graph
where EVALUATE
is the absolute path to the ./Evaluate/evaluate.py
file. The result of
running this file would be stored under ./Data/{ENV-NAME}/graph/edge_relevance.pkl
and ./Data/{ENV-NAME}/graph/global_relevance.pkl
, which contains the relevance scores given
to edge and global units of the input graph, respectively.
In this phase, the results of the first phase are evaluated by either the following experiments:
- Occluding the entity's features in the observation space, which validates its relevance score.
- Blocking the joint which validates the importance of each joint in the action space.
In each of the above, based on the amount of drop in their performance, their relevance scores are validated.
For more information, please refer to the paper.
For all the following commands, $MAIN_FILE
is the absolute path to the
./Controller/basic/main.py
file.
For running experiments in the standard setting, just run the following:
$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type standard
To run experiments for the occlusion case, use the following command:
$ python $MAIN_FILE --env-name {ENV-NAME} --exp-type {ENTITY-NAME}
where ENTITY-NAME
is the name of the entity we want to occlude. For each environment, the list
of the ENTITY-NAME
s' are appeared in the following:
FetchReach-v2
- goal
- shoulder_pan_joint
- shoulder_lift_joint
- upperarm_roll_joint
- wrist_flex_joint
- forearm_roll_joint
- wrist_roll_joint
- elbow_flex_joint
Walker2d-v2
- torso
- foot_joint
- leg_joint
- thigh_joint
- foot_left_joint
- leg_left_joint
- thigh_left_joint
HalfCheetah-v2
- torso
- bfoot
- bshin
- bthigh
- ffoot
- fshin
- fthigh
Hopper-v2
- torso
- foot_joint
- leg_joint
- thigh_joint
For running experiments for the blockage case, use the following command:
$ python $MAIN_FILE --env-name {BROKEN-ENV-NAME} --exp-type {JOINT-NAME}
where BROKEN-ENV-NAME
is the name of the environment with broken joint, as appeared in the
following list:
FetchReachBroken-v2
Walker2dBroken-v2
HalfCheetahBroken-v2
HopperBroken-v2
and JOINT-NAME
is the name of the joint we want to block. For each environment,
the list of the JOINT-NAME
s' are appeared in the following:
FetchReachBroken-v2
- shoulder_pan_joint
- shoulder_lift_joint
- upperarm_roll_joint
- wrist_flex_joint
- forearm_roll_joint
- wrist_roll_joint
- elbow_flex_joint
Walker2dBroken-v2
- foot_joint
- leg_joint
- thigh_joint
- foot_left_joint
- leg_left_joint
- thigh_left_joint
HalfCheetahBroken-v2
- bfoot
- bshin
- bthigh
- ffoot
- fshin
- fthigh
HopperBroken-v2
- foot_joint
- leg_joint
- thigh_joint
Note that these experiments use the original SAC algorithm with fully-connected networks
under the ./SAC
directory. For each environment, the resulting data is stored under the following
directories:
- For the occlusion case:
./Data/{ENV-NAME}/{ENTITY-NAME}
- For the blockage case:
./Data/{BROKEN-ENV-NAME}/{JOINT-NAME}
To plot the results of the experiments, run the following code:
$ python $PLOT --env-name {ENV-NAME}
where $PLOT
is the absolute path to the ./Plots/plot.py
file.
The result would be stored under ./Result/{ENV-NAME}.jpg
.
For further information about the method and results, please refer to our paper:
@article{taghian2024explainability,
title={Explainability of deep reinforcement learning algorithms in robotic domains by using Layer-wise Relevance Propagation},
author={Taghian, Mehran and Miwa, Shotaro and Mitsuka, Yoshihiro and G{\"u}nther, Johannes and Golestan, Shadan and Zaiane, Osmar},
journal={Engineering Applications of Artificial Intelligence},
volume={137},
pages={109131},
year={2024},
publisher={Elsevier}
}