Skip to content
No description, website, or topics provided.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
examples
policy_transfer
README.md
setup.py

README.md

Learning Transferrable and Adaptive Control Policies

This is code for the following papers:

Policy Transfer with Strategy Optimization

Prepare for the Unknown: Learning a Universal Policy with Online System Identification

Prerequisites

To use this code you need to install OpenAI Baselines, Dart and PyDart2.

You can find detailed instructions for installing OpenAI Baselines here. For installing Dart and PyDart2, you can follow the installation instructions here.

Note that the environments also depends on OpenAI Gym, however it should come with Baselines.

Installation

Run the following command from the project directory:

pip install -e .

How to use

SO-CMA

SO-CMA has two stages: training universal policy and strategy optimization.

To train a universal policy, use the code in ppo. FOr the strategy optimization part, use the code in test_socma.

An example of Dart hopper transferred to MuJoCo hopper can be found in examples:

examples/socma_hopper_5d_train.sh

The training results will be saved to data/.

To perform strategy optimization, run:

examples/socma_hopper_5d_test.sh

You can also use test_policy.py to test individual policies.

UP-OSI

Training UP-OSI involves two steps: training a universal policy and training an online system identification model.

To train a universal policy, use the code in ppo. To train the online system identification model, use the code in train_osi.

An example training script for the hopper environment is available in examples, use the following command to run the example training script:

examples/uposi_hopper_2d_train.sh

The training results will be saved to data/.

To test the resulting controller, run:

examples/uposi_hopper_2d_test.sh

and follow the prompt in the terminal. After each rollout a plot of the estimated model parameters and true model parameters is shown.

ODE Internal Error

If you see errors like: ODE INTERNAL ERROR 1: assertion "d[i] != dReal(0.0)" failed in _dLDLTRemove(), try downloading lcp.cpp and replace the one in dart/external/odelcpsolver/ with it. Recompile Dart and Pydart2 afterward and the issue should be gone.

Additional feedbacks:

Please contact Wenhao Yu (wenhaoyu@gatech.edu) if you have any feedbacks/questions about this work.

You can’t perform that action at this time.