RecurrentPPO: 9x speedup - whole sequence batching #118

b-vm · 2022-11-28T17:19:45Z

Description

Moving from 2d batches to 3d batches of whole sequences leads to a 5-9 times speedup in terms of fps while keeping results similar. Proof.

Context

I have raised an issue to propose this change (required)

Types of changes

Its currently implemented as an additional feature but would probably be more optimal to replace the original.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist:

Note: we are using a maximum length of 127 characters per line

…g speed improvement

…pping those batches

b-vm · 2022-12-16T20:07:52Z

@araffin have you been able to take a look at this yet? I am very curious what you think about it.

araffin · 2022-12-16T22:29:52Z

have you been able to take a look at this yet? I am very curious what you think about it.

no, not yet, still on my stack... and going on holidays soon, so, I'll probably take a look next week or in january.

b-vm · 2022-12-29T09:38:44Z

Cool. Let me know if you need any help running experiments/coding

…ble-baselines3-contrib into Stable-Baselines-Team-master

merge with master

araffin · 2023-04-03T13:37:05Z

Hello,
I tried but couldn't test the PR, I got an error (before my changes) both with Pendulum and BipedalWalker:

Traceback (most recent call last):
  File "sb3_contrib/whole_sequence_speed_test.py", line 167, in <module>
    model.learn(2e5, tb_log_name=f"PendulumNoVel-v1_whole_sequences_batch_size{batch_size}")
  File "sb3_contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 505, in learn
    self.train()
  File "sb3_contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 361, in train
    values, log_prob, entropy = self.policy.evaluate_actions_whole_sequence(
  File "sb3_contrib/sb3_contrib/common/recurrent/policies.py", line 372, in evaluate_actions_whole_sequence
    latent_pi, _ = self.lstm_actor(features)
  File "mambaforge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "mambaforge/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 810, in forward
    self.check_forward_args(input, hx, batch_sizes)
  File "mambaforge/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 730, in check_forward_args
    self.check_input(input, batch_sizes)
  File "mambaforge/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 218, in check_input
    raise RuntimeError(
RuntimeError: input.size(-1) must be equal to input_size. Expected 3, got 6

b-vm · 2023-04-12T09:18:38Z

My bad. Bug is fixed now!

araffin · 2023-04-27T09:52:26Z

I had to set drop_last=False sometimes, otherwise I was getting error due to the fact nothing was sampled:
UnboundLocalError: local variable 'loss' referenced before assignment

To reproduce:

python -m rl_zoo3.train --algo ppo_lstm --env PendulumNoVel-v1 -params whole_sequences:True use_sde:False
python -m rl_zoo3.train --algo ppo_lstm --env CartPoleNoVel-v1 -params whole_sequences:True

On CartPole, I have another error:

Traceback (most recent call last):
  File "torchy-zoo/train.py", line 4, in <module>
    train()
  File "torchy-zoo/rl_zoo3/train.py", line 267, in train
    exp_manager.learn(model)
  File "torchy-zoo/rl_zoo3/exp_manager.py", line 236, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "sb3_contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 521, in learn
    self.train()
  File "sb3_contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 377, in train
    values, log_prob, entropy = self.policy.evaluate_actions_whole_sequence(
  File "sb3_contrib/sb3_contrib/common/recurrent/policies.py", line 387, in evaluate_actions_whole_sequence
    log_prob = distribution.distribution.log_prob(actions).sum(dim=-1)
  File "mambaforge/lib/python3.10/site-packages/torch/distributions/categorical.py", line 123, in log_prob
    self._validate_sample(value)
  File "mambaforge/lib/python3.10/site-packages/torch/distributions/distribution.py", line 288, in _validate_sample
    raise ValueError('Value is not broadcastable with batch_shape+event_shape: {} vs {}.'.
ValueError: Value is not broadcastable with batch_shape+event_shape: torch.Size([32, 15, 1]) vs torch.Size([32, 15]).

Also, SDE seems not supported (that's ok, but need to be checked at runtime).

Finally, I experienced some NaN issue from time to time when drop_last=False (I fixed that by deactivating advantage normalization) :

ValueError: Expected parameter loc (Tensor of shape (4, 1)) of distribution Normal(loc: torch.Size([4, 1]), scale: torch.Size([4, 1])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan],
        [nan],
        [nan],
        [nan]])

araffin · 2023-04-27T10:24:48Z

Also an error when using CNN:

python train.py --algo ppo_lstm --env CarRacing-v2 -P --n-eval-envs 5 --eval-episodes 20 -params batch_size:8 whole_sequences:True

    self.train()
  File "/home/antonin/Documents/rl/sb3-contrib/sb3_contrib/ppo_recurrent/ppo_recurrent.py", line 377, in train
    values, log_prob, entropy = self.policy.evaluate_actions_whole_sequence(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/Documents/rl/sb3-contrib/sb3_contrib/common/recurrent/policies.py", line 371, in evaluate_actions_whole_sequence
    features = self.extract_features(obs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/Documents/rl/stable-baselines3/stable_baselines3/common/policies.py", line 640, in extract_features
    return super().extract_features(obs, self.features_extractor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/Documents/rl/stable-baselines3/stable_baselines3/common/policies.py", line 131, in extract_features
    return features_extractor(preprocessed_obs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/Documents/rl/stable-baselines3/stable_baselines3/common/torch_layers.py", line 106, in forward
    return self.linear(self.cnn(observations))
                       ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
            ^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1482, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/antonin/miniconda3/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [281, 8, 2, 64, 64]

update to latest

araffin · 2023-10-06T09:34:18Z

On CartPole, I have another error:

The error for CartPole seems to be still there...

b-vm · 2023-10-15T22:22:33Z

Yes, it has only been implemented for Box action spaces so that might be it.

I have not much time to work on this anymore. So feel free to do it.

b-vm and others added 7 commits November 12, 2022 22:05

added whole sequence batching functionality to PPORecurrent

6467c79

added masking and fixed some bugs

ff8cb9d

implemented for non dict obs + fixed bugs + added basic script showin…

cfc0f70

…g speed improvement

bug fix episode starts after first update

8b60954

updated testing script

59f4a7a

fixed NaNs due to supersmall batch sizesoccurring in edge case by dro…

9196049

…pping those batches

Merge branch 'master' into sequence_batching

a74be36

b-vm mentioned this pull request Nov 28, 2022

PPORecurrent mini batch size inconsistent #113

Open

Merge branch 'master' into sequence_batching

0f5a0f7

b-vm added 4 commits January 8, 2023 13:33

Merge branch 'master' of https://github.com/Stable-Baselines-Team/sta…

f445d4d

…ble-baselines3-contrib into Stable-Baselines-Team-master

Merge branch 'Stable-Baselines-Team-master' into sequence_batching

b83d5f8

refactoring, made code simpler.

d3af84d

improved indexing to sample all sequences

18ace01

araffin mentioned this pull request Feb 18, 2023

issue about RPPO #156

Closed

integrated whole sequence train function with existing

de092ba

b-vm changed the title ~~RecurrentPPO: Whole sequence batching~~ RecurrentPPO: 9x speedup - whole sequence batching Mar 1, 2023

b-vm and others added 2 commits March 1, 2023 11:34

Merge pull request #2 from b-vm/master

d66bcaa

merge with master

improvement of isntance checking

5dc8bc9

araffin self-requested a review March 7, 2023 23:00

araffin added 2 commits April 3, 2023 15:03

Merge branch 'master' into sequence_batching

9a268be

Reformat and simplify

e543b31

b-vm added 2 commits April 12, 2023 11:00

bug fix flatten extractor instance check

194f758

simplified if statement

8f18c9c

araffin added 2 commits April 27, 2023 11:36

Merge branch 'master' into sequence_batching

5e6f371

Update comment

ad0d3ed

araffin added 2 commits April 27, 2023 11:57

Re-add drop last, was causing NaN

b43e9b5

Fix NaN

ef37cc7

b-vm and others added 4 commits May 29, 2023 18:46

fixed bug: append correct index, and only once.

a2382c8

Merge pull request #3 from Stable-Baselines-Team/master

2837be5

update to latest

Merge branch 'master' into sequence_batching

79fbd3f

Merge branch 'master' into sequence_batching

7edb731

Merge branch 'master' into sequence_batching

9898e27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecurrentPPO: 9x speedup - whole sequence batching #118

RecurrentPPO: 9x speedup - whole sequence batching #118

b-vm commented Nov 28, 2022 •

edited

b-vm commented Dec 16, 2022

araffin commented Dec 16, 2022

b-vm commented Dec 29, 2022

araffin commented Apr 3, 2023

b-vm commented Apr 12, 2023

araffin commented Apr 27, 2023 •

edited

araffin commented Apr 27, 2023

araffin commented Oct 6, 2023

b-vm commented Oct 15, 2023

RecurrentPPO: 9x speedup - whole sequence batching #118

Are you sure you want to change the base?

RecurrentPPO: 9x speedup - whole sequence batching #118

Conversation

b-vm commented Nov 28, 2022 • edited

Description

Context

Types of changes

Checklist:

b-vm commented Dec 16, 2022

araffin commented Dec 16, 2022

b-vm commented Dec 29, 2022

araffin commented Apr 3, 2023

b-vm commented Apr 12, 2023

araffin commented Apr 27, 2023 • edited

araffin commented Apr 27, 2023

araffin commented Oct 6, 2023

b-vm commented Oct 15, 2023

b-vm commented Nov 28, 2022 •

edited

araffin commented Apr 27, 2023 •

edited