Skip to content

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system utility, decrease the overall cost, and increase mission success probability. Deep Reinforcement Learning (DRL) helps organize agents' behaviors and actions b…

License

RickYang2016/Bayesian-Soft-Actor-Critic

Repository files navigation

Bayesian Soft Actor Critic (BSAC)

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic changing environments to improve the system utility, decrease the overall cost, and increase mission success probability. Deep Reinforcement Learning (DRL) helps organize agents' behaviors and actions based on their state and represents complex strategies (composition of actions). This project proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method, soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing each sub-policies as a joint policy.

Example

Hopper-V2 3SABCHopper-V2 3SABC Video

image

Experiments Setup

This implementation requires Anaconda / OpenAI Gym / Mujoco / PyTorch / rl-plotter.

Getting Started

  1. Install OpenAI Gym:
pip install gym
  1. Install Mujoco:
mkdir -p ~/.mujoco && cd ~/.mujoco
wget -P . https://www.roboti.us/download/mjpro200_linux.zip
unzip mjpro200_linux.zip
  • Copy your Mujoco license key (mjkey.txt) to the path:
cp mjkey.txt ~/.mujoco
cp mjkey.txt ~/.mujoco/mujoco200_linux/bin
  • Add environment variables:
export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}
  • Download mujoco-py and create conda environment:
mkdir ~/mujoco_py
cd ~/mujoco-py
git clone https://github.com/openai/mujoco-py.git
conda create -n myenv python=3.6
source activate myenv
sudo apt-get install build-essential
  • Install dependence:
cd ~/mujoco-py
pip install -r requirements.txt
pip install -r requirements.dev.txt
python setup.py install
  1. Install reinforcement learning (RL) plotter -- rl-plotter:
pip install rl_plotter

Examples for Training Agent

  1. Hopper-V2 with 3 factors BSAC:
cd ~/hopper-v2_3bsac
pyhton3 main_bsac.py 
  1. Walker2d-V2 with 5 factors BSAC:
cd ~/walker2d-v2_5bsac
pyhton3 main_bsac.py
  1. Humanoid-V2:
  • 3 factors BSAC:
cd ~/humanoid-v2_3bsac
pyhton3 main_bsac.py
  • 5 factors BSAC:
cd ~/humanoid-v2_5bsac
pyhton3 main_bsac.py
  • 9 factors BSAC:
cd ~/humanoid-v2_9bsac
pyhton3 main_bsac.py

Note: Before running the code, please set the specific directory in files main_bsac.py and networks.py for the data updating.

Evaluation

Hopper-V2 3SABCHopper-V2 3SABC VideoHopper-V2 3SABC Video
Hopper-V2 3SABCHopper-V2 3SABC VideoHopper-V2 3SABC Video
Hopper-V2 3SABCHopper-V2 3SABC VideoHopper-V2 3SABC Video
Hopper-V2 3SABCHopper-V2 3SABC Video

Conclusion

From theoretical derivation, we formulate the training process of the BSAC and implement it in OpenAI's MuJoCo standard continuous control benchmark domains such as the Hopper, Walker, and the Humanoid. The results illustrated the effectiveness of the proposed architecture in enabling the application domains with high-dimensional action spaces and can achieve higher performance against the state-of-the-art RL methods. Furthermore, we believe that the potential generality and practicability of the BSAC evoke further theoretical and empirical investigations. Especially, implementing the BSAC on real robots is not only a challenging problem but will also help us develop robust computation models for multi-agent/robot systems, such as robot locomotion control, multi-robot planning and navigation, and robot-aided search and rescue missions.

About

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic environments to improve the system utility, decrease the overall cost, and increase mission success probability. Deep Reinforcement Learning (DRL) helps organize agents' behaviors and actions b…

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages