An Empirical Investigation of Mutual Information Skill Learning.

Abstract

Unsupervised skill learning methods are a form of unsupervised pre-training for reinforcement learning (RL) that has the potential to improve the sample efficiency of solving downstream tasks. Prior work has proposed several methods for unsupervised skill discovery based on mutual information (MI) objectives, with different methods varying in how this mutual information is estimated and opti- mized. This paper studies how different design decisions in skill learning algorithms affect the sample efficiency of solving downstream tasks. Our key findings are that the sample efficiency of downstream adaptation under off-policy backbones is better than their on-policy counterparts. In contrast, on-policy backbones result in better state coverage, moreover, regularizing the discriminator gives better downstream results, and careful choice of the mutual information lower bound and the discriminator architecture yields significant improvements in downstream returns, also, we show empirically that the learned representations during the pre-training step correspond to the controllable aspects of the environment.

Getting Started

Clone the repository

git clone https://github.com/FaisalAhmed0/SLUSD

Create a new conda environment and activate it

cd SLUSD
conda create -n slusd python=3.8
conda activate slusd

Install a custom version of stablebaselines 3

git clone https://github.com/FaisalAhmed0/stable-baselines3.git
cd stable-baselines3
pip install -e .

Install MBRL

git clone https://github.com/facebookresearch/mbrl-lib.git
cd ../mbrl-lib
pip install -e .

Make sure that mujoco is installed by following the instruction in https://github.com/openai/mujoco-py

Install the paper specific code

cd ../SLUSD
pip install -r requirements.txt
pip3 install -e .

To run the same adaptation expeirment of the paper.

python src/finetune.py --run_all True

When running with run_all is True, all random seeds will run on parallel.

To run for a specific environment the cmd args are as follows

Arg	Description	Supported values	Default Value
env	Environment name	All OpenAI gym environment with continuous actions and state vectros	"MountainCarContinuous-v0"
alg	Deep RL algorithm	"sac" for soft-Actor critic, "ppo" for Proximal Policy Optimization	"ppo"
skills	Number of skills to learn	Positive integers	6
presteps	Number of pretraining steps	Positive integers	1000000
lb	Mutual Information lower bound	"ba" for $I_{BA}$, "nce" for $I_{NCE}$, "nwj" for $I_{NWJ} and "interpolate" for $I_{\alpha}$.	ba
pm	Discriminator parameterization	"MLP" for a feed forward neural network, "Seprabale" for the seperable architecture, and "Concat" for the concatenation architecture and "linear" for the linear parametrization	MLP

To run the same scalability expeirment of the paper.

python src/experiments/scalability_exper.py --run_all True

For the regulrization experiment

python src/experiments/regularization_exper.py --run_all True

To observe the training dynamics run tensorboard inside the SLUSD folder

tensorboard --logdir ./logs_finetune

To record a video for all skills

python record.py --env <env_name> --stamp <timestamp> --skills <no. skills> --cls <pm> --lb <mi lower bound>

Where stamp is the timestamp for the experiment you can copy it from the experiment's folder name. If you are running this code on a server make sure xvfb is installed.

sudo apt-get install xvfb

And run your the recording script.

xvfb-run -a python record.py --env <env_name> --stamp <timestamp> --skills <no. skills> --cls <pm> --lb <mi lower bound>

Name		Name	Last commit message	Last commit date
Latest commit History 165 Commits
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
asymptotics.txt		asymptotics.txt
plot_results.py		plot_results.py
record.py		record.py
requirements.txt		requirements.txt
setup.py		setup.py
state_coverage.py		state_coverage.py

FaisalAhmed0/SLUSD

Folders and files

Latest commit

History

Repository files navigation