An OpenAI Gym environment for the Jaco2 robotic arm by Kinova. The environment is implemented both for the real arm and the Gazebo simulator. The goal is to bring the arm's end effector as close as possible to the target green ball. The target object position is initialised randomly at the beginning of each episode.
- Install ROS.
- ROS Melodic on Ubuntu 18.04
- ROS Kinetic on Ubuntu 16.04
To use ROS with Python 3, run:
sudo apt-get install python3-pip
sudo pip3 install rospkg catkin_pkg
Install and configure your Catkin workspace.
Install dependencies for the Kinova-ros package, as indicated here.
sudo apt-get install ros-<distro>-gazebo-ros-control
sudo apt-get install ros-<distro>-ros-controllers*
sudo apt-get install ros-<distro>-trac-ik-kinematics-plugin
sudo apt-get install ros-<distro>-effort-controllers
sudo apt-get install ros-<distro>-joint-state-controller
sudo apt-get install ros-<distro>-joint-trajectory-controller
sudo apt-get install ros-<distro>-controller-*
(replace <distro>
by your ROS distribution, for example kinetic
or melodic
- Install Gym.
pip3 install gym
- Install jaco-gym.
git clone
cd jaco-gym
pip3 install -e .
- Install the ROS packages and build.
cp -r ROS_packages/sphere_description ~/catkin_ws/src
cp -r ROS_packages/kinova-ros ~/catkin_ws/src
cd ~/catkin_ws
Note, the kinova-ros package was adapted from the official package.
- Install the RL library Stable-baselines.
sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev
pip3 install stable-baselines[mpi]
- Install the dependencies for RL Baselines Zoo.
sudo apt-get install swig ffmpeg
pip3 install box2d box2d-kengz pyyaml optuna pytablewriter
- Install Tensorflow 1.14. Stable-baselines does not yet support Tensorflow 2.
pip3 install tensorflow-gpu==1.14
In terminal 1:
roslaunch kinova_bringup kinova_robot.launch kinova_robotType:=j2n6s300
In terminal 2:
python3 scripts/
In terminal 1:
roslaunch kinova_gazebo robot_launch_render.launch kinova_robotType:=j2n6s300 # enable graphic rendering
# OR
roslaunch kinova_gazebo robot_launch_noRender_noSphere.launch kinova_robotType:=j2n6s300 # disable graphic rendering
In terminal 2:
python3 scripts/
In terminal 1:
roslaunch kinova_gazebo robot_launch_noRender_noSphere.launch kinova_robotType:=j2n6s300
In terminal 2:
python3 scripts/
In terminal 1:
roslaunch kinova_gazebo robot_launch.launch kinova_robotType:=j2n6s300
Uncomment this line in jaco_gym/envs/
In terminal 2:
python3 scripts/
python3 scripts/
In terminal 1:
roslaunch kinova_gazebo robot_launch_noRender_noSphere.launch kinova_robotType:=j2n6s300
In terminal 2:
cd stable-baselines-zoo/
python3 --algo ppo2 --env JacoGazebo-v1 -n 100000 --seed 0 --log-folder logs/ppo2/JacoGazebo-v1_100000/ &> submission_log/
python3 --algo sac --env JacoGazebo-v1 -n 100000 --seed 0 --log-folder logs/sac/JacoGazebo-v1_100000/
python3 --algo td3 --env JacoGazebo-v1 -n 100000 --seed 0 --log-folder logs/td3/JacoGazebo-v1_100000/
In terminal 1:
roslaunch kinova_gazebo robot_launch.launch kinova_robotType:=j2n6s300
Uncomment this line in jaco_gym/envs/
In terminal 2:
cd stable-baselines-zoo/
python3 --algo ppo2 --env JacoGazebo-v1 -f logs/ --exp-id 0 -n 2000
python3 -f logs/ppo2/JacoGazebo-v1_1/
If reading the full state:
Type: Box(36)
Num | Observation | Min | Max |
0 | joint_1 angle (rad) | -inf | inf |
1 | joint_2 angle (rad) | -inf | inf |
2 | joint_3 angle (rad) | -inf | inf |
3 | joint_4 angle (rad) | -inf | inf |
4 | joint_5 angle (rad) | -inf | inf |
5 | joint_6 angle (rad) | -inf | inf |
6 | joint_finger_1 angle (rad) | -inf | inf |
7 | joint_finger_2 angle (rad) | -inf | inf |
8 | joint_finger_3 angle (rad) | -inf | inf |
9 | joint_finger_tip_1 angle (rad) | -inf | inf |
10 | joint_finger_tip_2 angle (rad) | -inf | inf |
11 | joint_finger_tip_3 angle (rad) | -inf | inf |
12 | joint_1 velocity (rad/s) | -inf | inf |
13 | joint_2 velocity (rad/s) | -inf | inf |
14 | joint_3 velocity (rad/s) | -inf | inf |
15 | joint_4 velocity (rad/s) | -inf | inf |
16 | joint_5 velocity (rad/s) | -inf | inf |
17 | joint_6 velocity (rad/s) | -inf | inf |
18 | joint_finger_1 velocity (rad/s) | -inf | inf |
19 | joint_finger_2 velocity (rad/s) | -inf | inf |
20 | joint_finger_3 velocity (rad/s) | -inf | inf |
21 | joint_finger_tip_1 velocity (rad/s) | -inf | inf |
22 | joint_finger_tip_2 velocity (rad/s) | -inf | inf |
23 | joint_finger_tip_3 velocity (rad/s) | -inf | inf |
24 | joint_1 effort (N.m) | -inf | inf |
25 | joint_2 effort (N.m) | -inf | inf |
26 | joint_3 effort (N.m) | -inf | inf |
27 | joint_4 effort (N.m) | -inf | inf |
28 | joint_5 effort (N.m) | -inf | inf |
29 | joint_6 effort (N.m) | -inf | inf |
30 | joint_finger_1 effort (N.m) | -inf | inf |
31 | joint_finger_2 effort (N.m) | -inf | inf |
32 | joint_finger_3 effort (N.m) | -inf | inf |
33 | joint_finger_tip_1 effort (N.m) | -inf | inf |
34 | joint_finger_tip_2 effort (N.m) | -inf | inf |
35 | joint_finger_tip_3 effort (N.m) | -inf | inf |
If reading the simplified state:
Type: Box(12)
Num | Observation | Min | Max |
0 | joint_1 angle (rad) | -inf | inf |
1 | joint_2 angle (rad) | -inf | inf |
2 | joint_3 angle (rad) | -inf | inf |
3 | joint_4 angle (rad) | -inf | inf |
4 | joint_5 angle (rad) | -inf | inf |
5 | joint_6 angle (rad) | -inf | inf |
6 | joint_1 velocity (rad/s) | -inf | inf |
7 | joint_2 velocity (rad/s) | -inf | inf |
8 | joint_3 velocity (rad/s) | -inf | inf |
9 | joint_4 velocity (rad/s) | -inf | inf |
10 | joint_5 velocity (rad/s) | -inf | inf |
11 | joint_6 velocity (rad/s) | -inf | inf |
Type: Box(6)
Num | Action | Min | Max |
0 | joint_1 angle (scaled) | -1 | 1 |
1 | joint_2 angle (scaled) | -1 | 1 |
2 | joint_3 angle (scaled) | -1 | 1 |
3 | joint_4 angle (scaled) | -1 | 1 |
4 | joint_5 angle (scaled) | -1 | 1 |
5 | joint_6 angle (scaled) | -1 | 1 |
Note, at the moment joint_2 angle is restricted to 180 deg and joint_3 angle is restricted to the interval [90, 270] deg in order to reduce the arm's amplitude of motion.
The reward is incremented at each time step by the negative of the distance between the target object position and the end deflector position (joint_6).
The arm is initialised with its joint angles as follows (in degrees): [0, 180, 180, 0, 0, 0]. The target object is initialised to a random location within the arm's reach.
An episode terminates if more than 50 time steps are completed.
The info dictionary returned by the env.step function is structured as follows:
info = {'tip coordinates': [x, y, z], 'target coordinates': array([x, y, z])}
You can profile the time individual lines of code take to execute to monitor the code performance using line_profiler.
pip install line-profiler
For example:
vim scripts/
def main():
for episode in range(3):
obs = env.reset()
kernprof -l
python -m line_profiler > profiling_result_test.txt
Tested on:
- Ubuntu 18.04 and 16.04
- Python 3.6.9
- Gym 0.15.4