Skip to content

SeoulTech-HCIRLab/PrioritizedHindsightDualBuffer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prioritized Hindsight with Dual Buffer for meta-reinforcement learning

This repository contains the implementation of prioritized hindsight with dual buffer created by Sofanit Wubeshet Beyene and Ji-Hyeong Han from Seoul National University of Science and Technology.

Abstract

Sharing prior knowledge across multiple robotic manipulation tasks is a challenging research topic. Although the state-of-the-art deep reinforcement learning (DRL) algorithms have shown immense success in single robotic tasks, it is still challenging to extend these algorithms to apply directly to resolve multi-task manipulation problems. This is mostly due to the problems associated with efficient exploration in high-dimensional state and continuous action spaces. Furthermore, in multi-task scenarios, the problem of sparse reward and sample inefficiency of DRL algorithms is exacerbated. Therefore, we propose a method to increase the sample efficiency of the soft-actor critic (SAC) algorithm and extend it to a multi-task setting. The agent learns a prior policy from two structurally similar tasks and adapts the policy to a target task.
We propose a prioritized hindsight with dual experience replay to improve data storage and sampling technique which, in turn, assists the agent in performing structured exploration that leads to sample efficiency. The proposed method separate the experience replay buffer into two buffers to contain real trajectories and hindsight trajectories to reduce the bias introduced by the hindsight trajectories in the buffer. Moreover, we utilize high-reward transitions from previous tasks to assist the network in easily adapting to the new task. We demonstrate the proposed method based on several manipulation tasks using a 7-DoF robotic arm in RLBench. The experimental results show that the proposed method outperforms vanilla SAC in both single-task setting and multi-task setting.

p_reach_2 closebox closemicrowave

Installation

Requirements

Environment
 Python3.8
 torch1.9.1

Install RLBench

https://github.com/stepjam/RLBench

How to run

Replace environment files

env/conda_environment.yml file contains all the necessary packages required to run this experiment.


env/reach_target.py by tasks/reach_target.py
env/close_box.py by tasks/close_box.py
env/close_mircowave.py to tasks/close_mircrowave.py

Run

To run HER

python algo/check_multi_main.py 

To run the proposed use the same sac.py

python algo/final/final_multi_main.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages