Bayesian Soft Actor Critic (BSAC)

Adopting reasonable strategies is challenging but crucial for an intelligent agent with limited resources working in hazardous, unstructured, and dynamic changing environments to improve the system utility, decrease the overall cost, and increase mission success probability. Deep Reinforcement Learning (DRL) helps organize agents' behaviors and actions based on their state and represents complex strategies (composition of actions). This project proposes a novel hierarchical strategy decomposition approach based on Bayesian chaining to separate an intricate policy into several simple sub-policies and organize their relationships as Bayesian strategy networks (BSN). We integrate this approach into the state-of-the-art DRL method, soft actor-critic (SAC), and build the corresponding Bayesian soft actor-critic (BSAC) model by organizing each sub-policies as a joint policy.

Example

Experiments Setup

This implementation requires Anaconda / OpenAI Gym / Mujoco / PyTorch / rl-plotter.

Getting Started

Install OpenAI Gym:

pip install gym

Install Mujoco:

Download Mujoco 200 linux:

mkdir -p ~/.mujoco && cd ~/.mujoco
wget -P . https://www.roboti.us/download/mjpro200_linux.zip
unzip mjpro200_linux.zip

Copy your Mujoco license key (mjkey.txt) to the path:

cp mjkey.txt ~/.mujoco
cp mjkey.txt ~/.mujoco/mujoco200_linux/bin

Add environment variables:

export LD_LIBRARY_PATH=~/.mujoco/mujoco200/bin${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} 
export MUJOCO_KEY_PATH=~/.mujoco${MUJOCO_KEY_PATH}

Download mujoco-py and create conda environment:

mkdir ~/mujoco_py
cd ~/mujoco-py
git clone https://github.com/openai/mujoco-py.git
conda create -n myenv python=3.6
source activate myenv
sudo apt-get install build-essential

Install dependence:

cd ~/mujoco-py
pip install -r requirements.txt
pip install -r requirements.dev.txt
python setup.py install

Install reinforcement learning (RL) plotter -- rl-plotter:

pip install rl_plotter

Examples for Training Agent

Hopper-V2 with 3 factors BSAC:

cd ~/hopper-v2_3bsac
pyhton3 main_bsac.py

Walker2d-V2 with 5 factors BSAC:

cd ~/walker2d-v2_5bsac
pyhton3 main_bsac.py

Humanoid-V2:

3 factors BSAC:

cd ~/humanoid-v2_3bsac
pyhton3 main_bsac.py

5 factors BSAC:

cd ~/humanoid-v2_5bsac
pyhton3 main_bsac.py

9 factors BSAC:

cd ~/humanoid-v2_9bsac
pyhton3 main_bsac.py

Note: Before running the code, please set the specific directory in files main_bsac.py and networks.py for the data updating.

Evaluation

Conclusion

From theoretical derivation, we formulate the training process of the BSAC and implement it in OpenAI's MuJoCo standard continuous control benchmark domains such as the Hopper, Walker, and the Humanoid. The results illustrated the effectiveness of the proposed architecture in enabling the application domains with high-dimensional action spaces and can achieve higher performance against the state-of-the-art RL methods. Furthermore, we believe that the potential generality and practicability of the BSAC evoke further theoretical and empirical investigations. Especially, implementing the BSAC on real robots is not only a challenging problem but will also help us develop robust computation models for multi-agent/robot systems, such as robot locomotion control, multi-robot planning and navigation, and robot-aided search and rescue missions.

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
figures		figures
hopper-v2_3bsac		hopper-v2_3bsac
humanoid-v2_3bsac		humanoid-v2_3bsac
humanoid-v2_5bsac		humanoid-v2_5bsac
humanoid-v2_9bsac		humanoid-v2_9bsac
walker2d-v2_5bsac		walker2d-v2_5bsac
LICENSE.md		LICENSE.md
README.md		README.md
StreetFighter2.py		StreetFighter2.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figures

figures

hopper-v2_3bsac

hopper-v2_3bsac

humanoid-v2_3bsac

humanoid-v2_3bsac

humanoid-v2_5bsac

humanoid-v2_5bsac

humanoid-v2_9bsac

humanoid-v2_9bsac

walker2d-v2_5bsac

walker2d-v2_5bsac

LICENSE.md

LICENSE.md

README.md

README.md

StreetFighter2.py

StreetFighter2.py

Repository files navigation

Bayesian Soft Actor Critic (BSAC)

Example

Experiments Setup

Getting Started

Examples for Training Agent

Evaluation

Conclusion

About

Releases

Packages

Languages

License

RickYang2016/Bayesian-Soft-Actor-Critic

Folders and files

Latest commit

History

Repository files navigation

Bayesian Soft Actor Critic (BSAC)

Example

Experiments Setup

Getting Started

Examples for Training Agent

Evaluation

Conclusion

About

Resources

License

Stars

Watchers

Forks

Languages