Tune hyperparameters in tutorials for GAIL and AIRL #772

michalzajac-ml · 2023-09-04T20:29:55Z

Description

This PR tunes hyperparameters for the GAIL and AIRL tutorials.
For GAIL, the expert performance is reached (~500 on CartPole) with 800K PPO steps (~2 min run time on MacBook Air M1). For AIRL, the default is the "fast" version which improves over random but does not reach the expert performance (800K steps, ~2 min run time); if we switch off "fast" then the expert performance is reached (2M steps, ~5 min run time).

The hyperparameters were inspired by configs for half-cheetah from the benchmarking directory + a bit of manual tuning. Also, for GAIL I needed to change from BasicShapedRewardNet to BasicRewardNet to make it work (not exactly sure why but it affected performance a lot!).

Testing

Just ran the notebooks, and also tested with a few different seeds to make sure results are stable.

ernestum

Thanks for the contribution! I meant to do this for quite some time!
The changes itsel LGTM.
I think the pipeline fails because we get the newest seals version (0.2) which is made for gymnasium. If we change our seals version specifier in setup.py to seals~=0.1.5, this should be fixed.

…rade to 0.2.

AdamGleave · 2023-09-05T23:35:06Z

docs/algorithms/airl.rst

@@ -84,7 +88,7 @@ Detailed example notebook: :doc:`../tutorials/4_train_airl`
    learner_rewards_before_training, _ = evaluate_policy(
        learner, env, 100, return_episode_rewards=True,
    )
-    airl_trainer.train(20000)
+    airl_trainer.train(20000)  # Train for 2_000_000 steps to match expert.


2 million timesteps is a lot of timesteps for something as simple as CartPole, I expect we can do better but this seems fine for the purpose of this PR, at least the environment runs quickly.

I'd keep it for now (it's already an improvement) and possibly revisit in another PR.

AdamGleave · 2023-09-05T23:38:45Z

docs/tutorials/3_train_gail.ipynb

-    "print(\"mean reward after training:\", np.mean(learner_rewards_after_training))\n",
-    "print(\"mean reward before training:\", np.mean(learner_rewards_before_training))\n",
-    "\n",
-    "plt.hist(\n",


Why are you removing histogram (here and in AIRL)? Fine to remove if it's not informative. But perhaps we should report the SD as well as the means?

Yeah, the reason was I thought it was not super informative (especially in case we reach expert perf). Good suggestion with SD though, will add!

Shameless plug: this would be a nice application for my newly release data-samples-printer:

import data_samples_printer as dsp dsp.pprint( before_training=learner_rewards_before_training, after_training=learner_rewards_after_training )

prints something like:

▁ ▁ ▁▄ ▄▄▄█▇▄▄▇▄▇█▄█▃▃▇▄▇ ▇▁▃▄▁▃ ▄▃▁ ▁▁ ▁ -0.00 ±1.08 before_training ▂▃▇█▄▄▂▁ -0.01 ±0.20 after_training

@ernestum , thanks for this, the lib looks quite cool! I'll remember about it in the future. For this PR I decided to not introduce additional dependency though.

AdamGleave

LGTM

…I#772) * Pin huggingface_sb3 version. * Properly specify the compatible seals version so it does not auto-upgrade to 0.2. * Make random_mdp test deterministic by seeding the environment. * Tune hyperparameters in tutorials for GAIL and AIRL * Modify .rst docs for GAIL and AIRL to match tutorials * GAIL and AIRL tutorials: report also std in results --------- Co-authored-by: Maximilian Ernestus <maximilian@ernestus.de> Co-authored-by: Adam Gleave <adam@gleave.me>

michalzajac-ml requested review from ernestum and AdamGleave September 4, 2023 20:29

michalzajac-ml added the docs Documentation missing, incorrect or unclear label Sep 5, 2023

michalzajac-ml linked an issue Sep 5, 2023 that may be closed by this pull request

Ensure all tutorials work as expected #763

Closed

8 tasks

ernestum approved these changes Sep 5, 2023

View reviewed changes

ernestum mentioned this pull request Sep 5, 2023

Tune hyperparameters for kernel density estimation tutorial #774

Merged

ernestum added 3 commits September 5, 2023 21:32

Pin huggingface_sb3 version.

b8d1616

Properly specify the compatible seals version so it does not auto-upg…

09c5f2f

…rade to 0.2.

Make random_mdp test deterministic by seeding the environment.

4872ceb

AdamGleave reviewed Sep 5, 2023

View reviewed changes

michalzajac-ml added 2 commits September 6, 2023 10:03

Tune hyperparameters in tutorials for GAIL and AIRL

91b508c

Modify .rst docs for GAIL and AIRL to match tutorials

4fc83be

michalzajac-ml force-pushed the 763-tune-gail-airl branch from 4a0678e to 4fc83be Compare September 6, 2023 08:03

michalzajac-ml changed the base branch from master to dependency_fixes September 6, 2023 08:03

GAIL and AIRL tutorials: report also std in results

ab6e0c3

AdamGleave approved these changes Sep 7, 2023

View reviewed changes

Base automatically changed from dependency_fixes to master September 7, 2023 22:56

Merge branch 'master' into 763-tune-gail-airl

154ed62

AdamGleave merged commit 74b63ff into master Sep 7, 2023
7 of 9 checks passed

AdamGleave deleted the 763-tune-gail-airl branch September 7, 2023 23:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune hyperparameters in tutorials for GAIL and AIRL #772

Tune hyperparameters in tutorials for GAIL and AIRL #772

michalzajac-ml commented Sep 4, 2023

ernestum left a comment

AdamGleave Sep 5, 2023

michalzajac-ml Sep 6, 2023

AdamGleave Sep 5, 2023

michalzajac-ml Sep 6, 2023

ernestum Sep 6, 2023 •

edited

michalzajac-ml Sep 8, 2023

AdamGleave left a comment

Tune hyperparameters in tutorials for GAIL and AIRL #772

Tune hyperparameters in tutorials for GAIL and AIRL #772

Conversation

michalzajac-ml commented Sep 4, 2023

Description

Testing

ernestum left a comment

Choose a reason for hiding this comment

AdamGleave Sep 5, 2023

Choose a reason for hiding this comment

michalzajac-ml Sep 6, 2023

Choose a reason for hiding this comment

AdamGleave Sep 5, 2023

Choose a reason for hiding this comment

michalzajac-ml Sep 6, 2023

Choose a reason for hiding this comment

ernestum Sep 6, 2023 • edited

Choose a reason for hiding this comment

michalzajac-ml Sep 8, 2023

Choose a reason for hiding this comment

AdamGleave left a comment

Choose a reason for hiding this comment

ernestum Sep 6, 2023 •

edited