Add an option to run SQIL with various off-policy algorithms #778

michalzajac-ml · 2023-09-07T19:55:34Z

Description

This PR adds a possibility to combine SQIL with off-policy algorithms other than DQN, such as SAC, TD3, DDPG, as requested in #767.
A tutorial with SQIL+SAC training on HalfCheetah env is also provided. Random policy gets < 0, expert demonstrations are at ~3400. SQIL+SAC reaches 1400.7 +/- 254.1 after 300K steps (mean +/- std from 5 runs).

Testing

pytest tests/algorithms/test_sqil.py -- adapted relevant tests to work with new base algorithms.
Also one can run the provided tutorial.

…rade to 0.2.

AdamGleave

Thanks for the implementation! Overall looks strong, just a few relatively minor changes.

AdamGleave · 2023-09-07T23:16:35Z

tests/algorithms/test_sqil.py

    cache = pytestconfig.cache
    assert cache is not None
    return expert_trajectories.make_expert_transition_loader(
-        cache_dir=cache.mkdir("experts"),
+        cache_dir=cache.mkdir(env_name.replace("/", "_")),


Why do we need environment name in the cache directory? Should already be included in the environment path in https://github.com/HumanCompatibleAI/imitation/blob/master/src/imitation/testing/expert_trajectories.py#L74

Yeah, indeed, I was confused about the implementation of this function that uses cache and was not sure if I need to make it unique or not. Now I see that this root cache dir can be shared.

tests/algorithms/test_sqil.py

docs/tutorials/8a_train_sqil_sac.ipynb

AdamGleave · 2023-09-07T23:32:04Z

docs/tutorials/8a_train_sqil_sac.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After we collected our expert trajectories, it's time to set up our behavior cloning algorithm."


I know this was just copied from the original tutorial but I find the reference to behavior cloning potentially ambiguous: it usually refers to supervised learning on expert trajectories (and we have a BC class that does exactly), SQIL is doing something conceptually similar but quite different in the details (RL rather than supervised learning).

Would suggest rephrasing this (and the original tutorial), could just call it an imitation algorithm rather than supervised learning algorithm.

AdamGleave · 2023-09-07T23:32:51Z

docs/tutorials/8a_train_sqil_sac.ipynb

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After training, we can observe that agent is quite improved (> 1000), although it does not reach the expert performance in this case."


If you have time to do more tuning, great, but not a priority; this is enough to illustrate the algorithm.

Co-authored-by: Adam Gleave <adam@gleave.me>

AdamGleave

LGTM

…mpatibleAI#778) * Pin huggingface_sb3 version. * Properly specify the compatible seals version so it does not auto-upgrade to 0.2. * Make random_mdp test deterministic by seeding the environment. * Add an option to run SQIL with various off-policy algorithms * Add 8a_train_sqil_sac to toctree * Fix performance tests for SQIL * fix * Update docs/tutorials/8a_train_sqil_sac.ipynb Co-authored-by: Adam Gleave <adam@gleave.me> * minor fixes * Bring back performance tests for SQIL --------- Co-authored-by: Maximilian Ernestus <maximilian@ernestus.de> Co-authored-by: Adam Gleave <adam@gleave.me>

ernestum and others added 4 commits September 5, 2023 21:32

Pin huggingface_sb3 version.

b8d1616

Properly specify the compatible seals version so it does not auto-upg…

09c5f2f

…rade to 0.2.

Make random_mdp test deterministic by seeding the environment.

4872ceb

Add an option to run SQIL with various off-policy algorithms

ddb5e0e

michalzajac-ml requested a review from AdamGleave September 7, 2023 19:55

michalzajac-ml added 3 commits September 7, 2023 21:58

Add 8a_train_sqil_sac to toctree

3d21245

Fix performance tests for SQIL

170db55

fix

b12e9aa

Base automatically changed from dependency_fixes to master September 7, 2023 22:57

Merge branch 'master' into 767-sqil-other-algos

b00cf15

AdamGleave requested changes Sep 7, 2023

View reviewed changes

michalzajac-ml and others added 3 commits September 8, 2023 13:12

Update docs/tutorials/8a_train_sqil_sac.ipynb

387ddfa

Co-authored-by: Adam Gleave <adam@gleave.me>

minor fixes

83d4ffe

Bring back performance tests for SQIL

db4e58a

AdamGleave approved these changes Sep 8, 2023

View reviewed changes

AdamGleave merged commit 5c85ebf into master Sep 8, 2023
7 of 9 checks passed

AdamGleave deleted the 767-sqil-other-algos branch September 8, 2023 16:26

AdamGleave mentioned this pull request Sep 9, 2023

Generalize SQIL to work with other off-policy algos #767

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an option to run SQIL with various off-policy algorithms #778

Add an option to run SQIL with various off-policy algorithms #778

michalzajac-ml commented Sep 7, 2023

AdamGleave left a comment

AdamGleave Sep 7, 2023

michalzajac-ml Sep 8, 2023

AdamGleave Sep 7, 2023

AdamGleave Sep 7, 2023

AdamGleave left a comment

Add an option to run SQIL with various off-policy algorithms #778

Add an option to run SQIL with various off-policy algorithms #778

Conversation

michalzajac-ml commented Sep 7, 2023

Description

Testing

AdamGleave left a comment

Choose a reason for hiding this comment

AdamGleave Sep 7, 2023

Choose a reason for hiding this comment

michalzajac-ml Sep 8, 2023

Choose a reason for hiding this comment

AdamGleave Sep 7, 2023

Choose a reason for hiding this comment

AdamGleave Sep 7, 2023

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment