Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] Polyak Averaging could be done faster #93

Closed
m-rph opened this issue Jul 9, 2020 · 5 comments · Fixed by #106
Closed

[enhancement] Polyak Averaging could be done faster #93

m-rph opened this issue Jul 9, 2020 · 5 comments · Fixed by #106
Labels
enhancement New feature or request

Comments

@m-rph
Copy link
Contributor

m-rph commented Jul 9, 2020

This is rather minor, but polyak averaging in DQN/SAC/TD3 could be done faster with far fewer intermediate tensors using torch.addcmul_ https://pytorch.org/docs/stable/torch.html#torch.addcmul.

@araffin araffin added the enhancement New feature or request label Jul 9, 2020
@araffin
Copy link
Member

araffin commented Jul 9, 2020

Why not, do you have some benchmark for that?

I would first run line profiler and see where are the bottlenecks before optimizing things without knowing if they have an real impact.

@m-rph
Copy link
Contributor Author

m-rph commented Jul 9, 2020

Sure, I will benchmark both simply because it's a small change and see if it makes any difference.

@araffin
Copy link
Member

araffin commented Jul 9, 2020

After a quick run of line profiler, here are the result for SAC:
9% of the time is passed for collecting rollout and 91% training.
In collecting rollout those two lines take the most time:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   359                                                           # Select action randomly or according to policy
   360      2000    1017804.0    508.9     71.8                  action, buffer_action = self._sample_action(learning_starts, action_noise)
   361                                           
   362                                                           # Rescale and perform action
   363      2000     294299.0    147.1     20.8                  new_obs, reward, done, infos = env.step(action)

One is hard to reduce (env.step())

In the train method, it is more evenly separated, the most time consuming operations (each around 10%)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   177      1900    1149457.0    605.0      7.9              actions_pi, log_prob = self.actor.action_log_prob(replay_data.observations)

   202      1900    1017249.0    535.4      7.0                  next_actions, next_log_prob = self.actor.action_log_prob(replay_data.next_observations)

   220      1900    1013137.0    533.2      7.0              critic_loss.backward()
   221      1900    1819593.0    957.7     12.6              self.critic.optimizer.step()

   232      1900    2293261.0   1207.0     15.8              actor_loss.backward()
   233      1900    1254770.0    660.4      8.7              self.actor.optimizer.step()

   237     24700     410007.0     16.6      2.8                  for param, target_param in zip(self.critic.parameters(), self.critic_target.parameters()):
   238     22800     929857.0     40.8      6.4                      target_param.data.copy_(self.tau * param.data + (1 - self.tau) * target_param.data)

So optimizing polyak average sounds good ;)

@m-rph
Copy link
Contributor Author

m-rph commented Jul 9, 2020

I tested these on sac, there is a good 1.5-1.8 speedup here. More on the GPU than the cpu because of data transfers.

def fast_polyak(agent):
  one = th.ones(1, requires_grad=False).to(agent.device)
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.addcmul_(param.data, one, value=agent.tau)

def slow_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.copy_((1-agent.tau)*target_param.data + agent.tau*param.data)

# how openai does it in their codebase
def openai_polyak(agent):
  for param, target_param in zip(agent.critic.parameters(), agent.critic_target.parameters()):
    target_param.data.mul_(1-agent.tau)
    target_param.data.add_(agent.tau*param.data)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32,32])).learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 9.61 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 12.1 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[512,512,512])).learn(1000)
%timeit for _ in range(10):  slow_polyak(agent) 

# 100 loops, best of 3: 9.55 ms per loop
# 100 loops, best of 3: 17.4 ms per loop
# 100 loops, best of 3: 11.9 ms per loop

agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit  for _ in range(10): fast_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  slow_polyak(agent)
agent = SAC("MlpPolicy", "MountainCarContinuous-v0", policy_kwargs=dict(net_arch=[32,32]), device='cpu').learn(1000)
%timeit for _ in range(10):  openai_polyak(agent) 

# 100 loops, best of 3: 2.91 ms per loop
# 100 loops, best of 3: 4.58 ms per loop
# 100 loops, best of 3: 3.82 ms per loop

This is actually quite large, at 1Million polyak updates, this shaves off 28 minutes for cpu and 2 hours 11 minutes on GPU.

@araffin
Copy link
Member

araffin commented Jul 16, 2020

@partiallytyped Could you quickly try on cpu but with num_threads=1 ?

That's the only case where I did not see an improvement yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants